Hello all. I've just recently found this chat but have been involved in device fingerprinting for 6+ months now. Me and my team recently did some work on determining whether a mobile device is a real mobile device or an emulator. I would be happy to share my results if anyone is interested. It is not perfect but basically it relies on the number of calculations or speed of canvas rendering. Even on a low quality laptop the laptop running the emulator outperformed many smartphones. Only the most expensive phones (Android) were able to give similar results. If you use a quality desktop with graphics cards, i7, etc it is not comparable.
I wanted to ask if anyone has experience with TCP/IP fingerprinting? I believe ThreatMetrix uses it. From a few days of research I have found you can determine a few things from the data such as OS. I am working on a solution to beat proxies (other than a database of ips).
Hi @Valve, everyone,
I've just finished my research on fingerprinting and I have couple of thoughts to share with you. I believe you may find them an useful contribution. I will simplify the story as it is going to be quite long anyway.
A couple of months ago, I have developed a similar fingerprinting solution and collected many deterministic samples for analysis of fingerprints "usefulness". I was trying to answer the question which features should be fingerprinted in which way to provide the highest entropy, ensure stability, at the same time having on mind an overall execution time and the code length. Here are some thoughts/questions/observations:
1) First of all, what is the point of having fingerprints from "has the user tampered with" family? I see a logical hole here – creating artificial fingerprints out of existing one doesn't increase the diversity but just consumes the code length and execution time. If it's not clear what I mean by artificial, let's make an example: we collect screen.availHeight and screen.height properties as one fingerprint; we create second fingerprint telling availHeight > height; if an user tampered with the setting, the first fingerprint will already make the final fingerprint different from the other, adding additional flag will not improve the uniqueness as this is just a duplicate information (paradoxically, users that are trying to hide their identity by setting some strange values are making themselves easily 'fingerprintable', beauty of this world ;))
2) I believe ad-block detection should be disabled by default, the same way as flash font detection. In some browsers, add-ons are not enabled in private-mode (unless the user does it), therefore the fingerprint is often different while it shouldn't.
3) Concerning execution time, especially when fingerprinting JS fonts using extended list is enabled, the overall time gets quite heavy. I have noticed it affects the user experience in some cases, e.g. some scripts responsible for scrolling are getting starved. I know there is no easy way to solve the issue as we cannot use WebWorkers but at least the problem could be addressed / or the overall execution time decreased. I have implemented a naive solution for my script but I didn't really have time to check if it's making much of difference.
4) Fonts analysis with JS. Pretty awesome fingerprint but really time consuming and unstable (I have observed for this fingerprint the highest number of changes) if you collect too many fonts. The entropy I have achieved with a set of 100 optimal fonts was almost identical if consider 800 collected values. Obviously, it would get better with a bigger dataset but my conclusion is still clear – it is really worth to limit the number of fonts (not the random but the most representative set). Execution time improves significantly and the same for stability.
5) @Valve, you were considering employment of User-Agent parsing library to improve the stability. UA scored for me the 2nd highest instability so it is crucial to do something about it. Yet, the best solution imho is to simply trim the unstable parts from the string, which is in fact the browser version (engine and os are quite ok). Parsing using a library, except of being more expensive, would skip some information that could be useful.
6) I have found out that canvas, screen dimensions and webGL fingerprints should be particularly taken care of, again, due to the high instability. Drawing a "smile" icon in canvas fingerprint proved to make it really unstable, unfortunately. On the other hand, drawing a simple text is really stable and provides decent information. I know it may sound like going back in evolution but there is so many aspects to be considered... Surprisingly, usage of Arial font instead of fake (fallback) one gave better results, even though the second scored a higher number of unique and distinct values. A lot to play with.
7) In general, I believe that more focus should be devoted to stability of fp2.js. More and more fingerprints are being added and it doesn't necessary make the script better. People are quite concerned about how often the fingerprint is changing, it is essential for the use cases to keep it stable. I haven't made any official analysis of this particular script but by employing most of the fingerprinting methods I can say the stability leaves a large field for improvement. Well, it must be hard to develop this script without any deterministic data about its efficiency. I am wondering if anyone is collecting and analyzing such data?
Well, it's much longer than I intended, sorry! :)
Let me know what you think...