Hi @Valve, everyone,
I've just finished my research on fingerprinting and I have couple of thoughts to share with you. I believe you may find them an useful contribution. I will simplify the story as it is going to be quite long anyway.
A couple of months ago, I have developed a similar fingerprinting solution and collected many deterministic samples for analysis of fingerprints "usefulness". I was trying to answer the question which features should be fingerprinted in which way to provide the highest entropy, ensure stability, at the same time having on mind an overall execution time and the code length. Here are some thoughts/questions/observations:
1) First of all, what is the point of having fingerprints from "has the user tampered with" family? I see a logical hole here – creating artificial fingerprints out of existing one doesn't increase the diversity but just consumes the code length and execution time. If it's not clear what I mean by artificial, let's make an example: we collect screen.availHeight and screen.height properties as one fingerprint; we create second fingerprint telling availHeight > height; if an user tampered with the setting, the first fingerprint will already make the final fingerprint different from the other, adding additional flag will not improve the uniqueness as this is just a duplicate information (paradoxically, users that are trying to hide their identity by setting some strange values are making themselves easily 'fingerprintable', beauty of this world ;))
2) I believe ad-block detection should be disabled by default, the same way as flash font detection. In some browsers, add-ons are not enabled in private-mode (unless the user does it), therefore the fingerprint is often different while it shouldn't.
3) Concerning execution time, especially when fingerprinting JS fonts using extended list is enabled, the overall time gets quite heavy. I have noticed it affects the user experience in some cases, e.g. some scripts responsible for scrolling are getting starved. I know there is no easy way to solve the issue as we cannot use WebWorkers but at least the problem could be addressed / or the overall execution time decreased. I have implemented a naive solution for my script but I didn't really have time to check if it's making much of difference.
4) Fonts analysis with JS. Pretty awesome fingerprint but really time consuming and unstable (I have observed for this fingerprint the highest number of changes) if you collect too many fonts. The entropy I have achieved with a set of 100 optimal fonts was almost identical if consider 800 collected values. Obviously, it would get better with a bigger dataset but my conclusion is still clear – it is really worth to limit the number of fonts (not the random but the most representative set). Execution time improves significantly and the same for stability.
5) @Valve, you were considering employment of User-Agent parsing library to improve the stability. UA scored for me the 2nd highest instability so it is crucial to do something about it. Yet, the best solution imho is to simply trim the unstable parts from the string, which is in fact the browser version (engine and os are quite ok). Parsing using a library, except of being more expensive, would skip some information that could be useful.
6) I have found out that canvas, screen dimensions and webGL fingerprints should be particularly taken care of, again, due to the high instability. Drawing a "smile" icon in canvas fingerprint proved to make it really unstable, unfortunately. On the other hand, drawing a simple text is really stable and provides decent information. I know it may sound like going back in evolution but there is so many aspects to be considered... Surprisingly, usage of Arial font instead of fake (fallback) one gave better results, even though the second scored a higher number of unique and distinct values. A lot to play with.
7) In general, I believe that more focus should be devoted to stability of fp2.js. More and more fingerprints are being added and it doesn't necessary make the script better. People are quite concerned about how often the fingerprint is changing, it is essential for the use cases to keep it stable. I haven't made any official analysis of this particular script but by employing most of the fingerprinting methods I can say the stability leaves a large field for improvement. Well, it must be hard to develop this script without any deterministic data about its efficiency. I am wondering if anyone is collecting and analyzing such data?
Well, it's much longer than I intended, sorry! :)
Let me know what you think...
fingerprintjs2within my typescript App but it seems the module loader can't find fingerprintjs2 module, my sample app is on github: https://github.com/melloc01/ts-library-starter - there are some instructions the get the app running, thank you in advance = ] @Valve