A Bit of Anonymity


December 23, 2018

All Articles

There’s an interesting paradox that arises when you try to protect your anonymity online. Let’s say that you want to avoid online tracking, so you set the “do not track” bit and off you go. Anyone who respects Do Not Track will leave you alone, but anyone who does not suddenly is able to track you much more easily. Paradoxically, cutting out tracking entirely from ethical advertisers who respect DNT has made your targeted advertising situation worse overall.

In order to understand why that is, we need to lay a bit of foundation. Our assumed goal is to minimize the worst targeted advertising. The more accurately advertisers can target you, the worse it is for you because you are more likely to be affected by the ads. Being vaguely identifiable to a lot of advertisers means sometimes an advertiser might get lucky and show you something which really resonates with you. Being specifically identifiable to a few advertisers means being exposed to advertisements that are specifically tailored to be persuasive to you on a fairly regular basis.

The problem is in the way that you are tracked online. Advertisers can keep track of your browsing in several ways. Perhaps the most simple is to simply ask you to identify yourself. This is done by sending your browser a cookie—a file containing some arbitrary data—for their website. Then, when they encounter a new browser, they ask whether or not the browser has a cookie, and that cookie can then be used to identify you. If you don’t want to be identified, it’s as simple as not sending the cookie. There’s an incentive to track people even when they do not want to be tracked, so if you don’t play nice with the cookie they move on to more sophisticated methods.

Every time you connect to a website, you send all sorts of information. Just to initiate the connection, you need to provide your IP address. Otherwise, the web server wouldn’t even be able to send you back the page you wanted to see. You provide something called a user agent string which identifies your web browser and operating system—among other things—so that the website can do things like provide you with download links to software versions that will work on your computer. Once you’re on the page, javascript needs access to your screen resolution and the fonts that you have installed so that the content on the page can be displayed properly. The page can check whether you have a webcam installed, and what features it supports. The page can check whether you have Flash or HTML5 support.

Sometimes information is leaked unintentionally as well. For example, there is a feature of web browsers that they will color hyperlinks differently depending on whether or not you’ve visited the page they link to. This is extremely helpful if you are trying to navigate a page with a lot of links, and you are trying to keep track of what you have seen and what you haven’t. It turned out that this coloring information was discoverable by the page itself, which lead to websites being essentially able to query your browser history by trying to color a bunch of different links to popular websites.

All of these things individually are features that are shared by a large number of people. Pretty much everyone has approximately the same set of fonts installed, and uses one of a fairly modest set of browser versions. Taken in aggregate, we call all of these signals a “browser fingerprint,” and in total they can be used to pretty effectively and uniquely identify you. Where Do Not Track comes in is that it is an additional signal that can be used as part of a fingerprint. Since most people don’t use the Do Not Track bit, setting it makes you more trackable since the tracker can rule out any of their profiles that don’t have that bit set while trying to identify you.

We can do some more formal analysis of how much more trackable it makes us by thinking about our anonymity as a proportion of the total population. There are approximately 7 billion people on the planet. Around half of them1 are online, but those that are often use multiple devices, so we can call it a wash and say there are 7 billion browsing devices to track. The smallest unit of information is a single bit, which can take a value of either 0 or 1. If we provide each person on the planet with their own ID number, we would need log 27,000,000,000= 33 bits to do so. As a standard measure of how much anonymity a particular piece of information strips away, we can think about how many bits of that ID number it could determine. If half the planet uses Firefox, and the other half uses Chrome, then knowing which browser a person is using strips away 50% of their anonymity, or 1 bit. Doing something really outside of the norm, like having a bunch of rare, curated fonts or a nonstandard size browser window will blow away essentially all of your bits of anonymity, since the pool of people who also have your same setup is so small. Using a version of a browser that is too old, or too up to date, might only cost a fraction of a bit.

Why not just compare percentages directly, instead of converting into bits first? The benefit we get from the bits of anonymity metric is that we can compare the privacy implications of different actions across different spaces. If you are in a small community, you have much fewer bits of anonymity to spend, and so something that reduces you by a couple bits really hurts. Being the one person on a niche community forum who uses an obscure browser will uniquely identify you, but you can still do the analysis of how many bits it costs by using population numbers pulled from the global population.

We can use this metric to compare decisions that we might make related to our online anonymity as well. If 90% of users don’t use Do Not Track, then setting it costs log 2(10.1) 3.3 bits of anonymity. If you’re using desktop chrome, that costs 0.73 bits, since Chrome has around 60% desktop market share. Safari has 17%, so that’s 2.5 bits.

If you truly want to be anonymous online, then you have two options: either you have an entirely default configuration with the most standard browser size and version, the most general IP address you can use (which probably means a popular VPN), no fonts, a clean, unmodifed version of the browser with no history or cookies and so on. The other option is to hide yourself so well that you cannot be fingerprinted for some other reason. The risk is that if you don’t get it exactly right, whatever unusual configuration choices you’ve made will uniquely identify you, stripping away every last bit of anonymity you have. Unless you’re very confident in the success of your scheme for hiding from advertisers, think about how many bits of anonymity it strips away if it goes wrong, and whether that’s more bits than you’d lose just by using a default setup.

1https://en.wikipedia.org/wiki/Global_Internet_usage