14 July, 2022

cover image - an array of oven baked actual cookies Photo by RUMEYSA AYDIN on Unsplash

GDPR defaced my website and other stories

tl; dr: When GDPR and similar laws came out, every website had a knee-jerk reaction and added a cookie banner just in case. I went in search of a clear-cut response to the question "can I run analytics without defacing my website with annoying consent banners?" - the answer I found was a murky yes (this is not legal advice). While "cookie-less" solutions by some Google Analytics alternatives don't need cookie consent, strictly speaking, 3rd-party analytics doesn't fall into GDPR's "legitimate interest" either. The underlying issue is that cookies are only one way to track users online. The French DPA recently waived the need of using consent banners for certain analytics SAAS. Although not exactly settling the issue, it sets an important precedent.

Back in 2019 when Plausible.io was first launched as a privacy focused alternative to google analytics, I jumped on it almost immediately. The predatory nature of big tech and surveillance-based advertisement systems were something I wanted to get away from. The fact that Plausible is cookie-less meant on the plus side, that you didn't need to deface your website with the now infamous cookie-consent banners.

plausible analytics cookie consent disclaimer stating that they don't use cookies and don't need consent Plausible blog stating you don't need to ask for user consent.

How it works:

On their data policy, you see how they anonymously count views for a website, while avoiding duplicates for returning users. They create a unique identifier for each user, by hashing user data (the IP address and the browser user-agent), using a random daily_salt. In principle this prevents linking user behaviour (what pages the user visited, etc) to a given actual user.

hash(daily_salt + website_domain + ip_address + user_agent)

My story:

First I wanted to use it for my personal projects. I had a vision of building sustainable commercial projects, while still respecting user privacy and good UX, so things like Plausible) and Ethical Ads for monetization, where on my sight. This is the approach I took for Passa à Primeira, for instance.

Additionally, as I was approached to do client work through Codecadre, I felt the need to dig a bit deeper into these services and inform myself about GDPR. If I was going to advise a customer to drop the cookie consent banner because these technologies were "GDPR safe", I felt I really had to know what I was talking about. Last thing I wanted was sent an owner of a small business down a path where later he or she would face a fine by some European data authority.

That investigation led me to this discussion on hacker news after a post, which at first does read like an attack. It raises however some valid points that are left unanswered.

The issue:

As you can see outlined above, the IP of the user is being collected and processed. Even if there's a hashing step before being stored, the fact of the matter is that the IP is collected (in fact this happens at the protocol level even, just for the sake of there being an HTTP connection to the analytics server.) Recital 30 explicitly calls IP a personal online identifier.

This is what one of the founders, Uku Täht had to say.

Thanks for clearing this up. The general data points and metrics we store are not personal data. IP address is the only piece of data that we touch that is considered PII under some regulations including GDPR. The IP address is fully anonymized by hashing it together with a daily changing salt. Old salts are deleted to as to prevent re-identification. According to GDPR Recital 26, anonymized data does not fall within the GDPR at all because data is no longer considered “personal data” following anonymization: 'The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.'

Quoting Recital 26:

"This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes."

Uku claims that because data is stored anonymously, it's no longer considered personal information. However, Recital 26 clearly states things in terms of processing. Looking at the anonymization algorithm above, user personal data is clearly being handled "in the daylight". Making this about stored anonymous data is not addressing the point. You could easily argue that personal identifiers are being sent to a third party and being processed without user consent.

One interpretation of user consent in GDPR, for me, is that the consent banners are there to let the user decide if he/she is OK with the way in which their personal data are being handled by the 3rd party systems. Usually this tracking is done through a cookie, however, even if there are no cookies involved you can still have tracking. The fact that Plausible is privacy focused, should be a reason for the user to accept, not for consent to be assumed. Just as well as Plausible decided to implement good privacy practices, the same user data could have been collected by a less conscious actor.

I asked Plausible about this on github and they gave me an "it's best to consult with your legal team", then stating they had a legal team consulting with them (which I have no reason to doubt) but then pointed me to some product review online. This last source however, as far as I could tell, had no one with a legal background behind.

One of their main selling points is their Saas being "no cookie consent necessary". If you're going to make claims on the basis of European privacy laws, you need to back it up with reasons based on these laws. It might be the case that this form of 3rd party data collection is in the clear for European legislators (more on that later), but if you're getting paying costumers with these claims you're going to have to substantiate them.

The case of Matomo:

Matomo is another privacy focused Google Analytics alternative. Although avoiding cookies is not the main use case, on their website you'll see how to configure the service without using cookies as well.

In the case of Matomo, there's a precedent set by the French data protection authority (CNIL) that allows even for the cookie version to be used without tracking consent, as long as a set of conditions are met.

CNIL also published a similar analysis for a bunch of analytics services that, if correctly configured, can be used without tracking consent.

My two cents:

With any of these systems we've seen, IP and personal data from users will always be sent to the 3rd party servers. Unless you're running your own analytics service that is. Until someone comes up with a clever zero-knowledge system that aggregates views without personal data ever touching the server, there won't be a hard guarantee that user privacy is respected. The only thing keeping user data safe is these companies commitment to it and fines.

I do think both companies built a solid subscription based business (1 Million $ ARR in the case of Plausible for instance) around not selling user data to ad networks, which I find is an example to follow. The future of content monetization on the internet has to pass necessarily through this type of revenue models, not just because of GDPR, but because of overall good privacy practices.

The French DPA seems to be OK with privacy focused analytics not needing cookie/tracking consent banners as there's no user data/behaviours being stored. Honestly, if it's OK for the French DPA, it's OK for me, but you decide - can I run analytics without needing to deface my website with annoying consent banners?"

Post script

Browser fingerprinting:

Using IP and browser data like the user-agent for uniquely identifying users across multiple requests is called browser fingerprinting, and is a technique with a bad rep.

One of the problems of making marketing cookies difficult to set is that now websites will come up with ways of tracking users in alternative, more creative ways. This is the point the creator of cookies highlights here.

Anonymized IPs are easy to brute force given the limited domain of unique IPs in IPv4. Check this discussion on stack overflow.

In Plausible's case, they still use a daily salt and the user-agent, which changes things.

Also, check how easy it is to be uniquely identified using browser fingerprinting.

This is my result:

Yes! You are unique among the 670659 fingerprints in our entire dataset.

Legitimate interest vs consent

Another way of framing this data transfer to a 3rd party service would be for it to fall under legitimate interest (article 6).

Check this checklist from the British Information Commissioner's office to help you decide what falls under legitimate interest. It seems to me that it's clear analytics doesn't fall under this category. The last word however, would belong to n European data protection authority.

More info