Criteo, a French surveillance marketing giant

Criteo, an advertising giant whose business model is based on your surveillance

Little known to the general public, Criteo is a French success story that managed to become the world leader in retargeting. According to LinkedIn, the company has more than 3000 employees; it has also been listed on NASDAQ since October 2013.

How does Criteo work? All I need to do is visit a partner e-commerce site (example: Fnac), then a media site (example: Lemonde.fr), to be bombarded with ads showing previously viewed products as well as suggestions:

example

Retargeting ads such as those offered by Criteo are among the most intrusive on the web; they make you realize that your purchasing habits hold no secrets from advertisers. And if we look at the typology of advertising players, Criteo is one of the least privacy-friendly players:

Criteo is an ad network: the more it knows about you, the more money it makes.
The data Criteo collects is highly personal: the products you view and the things you buy.
Unlike an e-retailer such as Amazon, which has a storefront and a direct relationship with its customers, Criteo operates in the shadows. You do not know you are being tracked, and you did not choose it.

Revenues put at risk by web browsers

So you might say, Criteo is an old-school ad company, one that grew when mobile wasn't yet dominant, when retargeting was solely based on third-party cookies.

If you do not use an adblocker, there is a good chance that your browser is already blocking Criteo tracking: Safari with Intelligent Tracking Prevention, Firefox, Brave and even Edge have recently taken strong action against trackers. Chrome is very late, but it will ban third-party cookies in less than 2 years. And you can always decide to delete third-party cookies in your browser settings, to start fresh against adtech companies.

These browser measures protect you against invasive tracking by adtech companies, which can no longer correctly identify you, as Criteo explains to its investors (slide 7):

Criteo Browsers

The protections put in place by browsers are excellent news for users. They represent an existential threat for Criteo, so its R&D is investing in workaround solutions.

A history of bypassing browser protections

Criteo doesn't say this in its investor presentation, but it has experience circumventing browser protections. If we go back to the first release of Safari Intelligent Tracking Prevention in September 2017, here is Criteo's communication to investors in November 2017:

Apples’ Intelligent Tracking Prevention feature, or ITP, was released on mobile on September 19, 2017. We believe our solution for Safari users currently allows us to mitigate about half of the potential impact from ITP. In the third quarter, ITP had a minimal net negative impact on our Revenue ex-TAC of less than $1 million. Given our expectations of the roll out of Apple’s iOS11 and our coverage of Safari users, we expect ITP to have a net negative impact on our Revenue ex-TAC in the fourth quarter of between 8% and 10% relative to our base case projections for the quarter. We will continue to improve and deploy our solution for Safari users over the coming quarters.

Criteo had already implemented a workaround (explained in this article):

What we’ve developed is a privacy-friendly solution, which is reliant on a [non-cookie] identifier that allows the transfer of information between websites and our servers

Note the "privacy-friendly" (!). Except that Apple quickly reacted to introduce an ITP update in early December 2017, making the Criteo bypass inoperable. So here is the new communication from Criteo to investors in December 2017:

Earlier this month, Apple launched a new version of its mobile operating system, iOS 11.2, which disables the solution that some companies in the advertising ecosystem, including Criteo, currently use to reach Safari users. As a result, we believe the projected 9%-13% ITP net negative impact on Criteo’s 2018 Revenue ex-TAC relative to our pre-ITP base case projections, communicated on November 1, 2017, is no longer valid. We are focused on developing an alternative sustainable solution for the long term, built on our best-in-class user privacy standards, aligning the interests of Apple users, publishers and advertisers. This solution is still under development and its effectiveness cannot be assessed at this early stage. Should it not mitigate any ITP impact, we believe the ITP net negative impact on Criteo’s 2018 Revenue ex-TAC, relative to our pre-ITP base case projections, would become approximately 22%.

Here we should give Apple credit for regularly improving its ITP solution (the cat-and-mouse game has continued since 2017), with ITP now at version 2.3. It is striking, though, that Criteo has never been penalized for these workarounds, and that it discusses them so shamelessly.

Regulations to better protect users' privacy

Also, with regulations evolving to better protect users' privacy (GDPR, and ePrivacy in Europe, CCPA in California), Criteo finds itself increasingly under pressure: a complaint from Privacy International was filed against Criteo, Quantcast and Tapad in November 2018 for violation of the GDPR, and the CNIL opened an investigation into the complaint against Criteo in March 2020. Criteo also notes these regulations in its “Identification” deck for investors (slide 8), without dwelling on the legal risks:

Regulation

The stock market is not fooled: while Criteo is still valued at 600 million euros, its share price has fallen sharply over the last 5 years:

Criteo Nasdaq

Criteo has understood these two trends (technical via browsers and regulatory): it is now turning to far more insidious and pervasive surveillance via the “Criteo Shopper Graph,” while speaking out of both sides of its mouth on privacy.

Criteo Shopper Graph: the massive database of personal data leaked to Criteo

Again in its “Identification” presentation for investors (slide 19), Criteo spells out its goal: to identify you without the need for third-party cookies. Today, 50% of its business would still be dependent on third-party cookies:

Third-party cookies

On its website, Criteo highlights “Criteo Shopper Graph”, its user database. How many people are in this database? Here are some figures put forward by Criteo to present the “Criteo Shopper Graph”:

75% of online shoppers worldwide.
More than 1.9 billion monthly active consumers.
More than 120 different buying signals.
35 billion browsing or purchasing events tracked daily.
800 billion dollars in annual e-commerce sales.
1.5 billion multi-device identifiers (Criteo recognizes you on several of your devices).
4.5 billion products.

By default, Criteo does not combine the personal data collected from one of its clients with that of its other clients. But if a client wants to access the Shopper Graph, it must agree to this pooling. For example: if Fnac wants to access the Shopper Graph to improve the profitability of its advertising campaigns, it will have to accept that Criteo combines your personal data collected on the Fnac site with your personal data collected from other Criteo clients that have signed up for the Shopper Graph.

And the offer works well: Criteo reports that 75% of its clients take part:

Criteo Shopper Graph participation

How does Criteo build this massive database of personal data? Criteo offers explanations on its website: the Shopper Graph aggregates 3 data sources:

Criteo Shopper Graph

The Identity Graph allows Criteo to recognize you on all your devices.
The Interest Map allows Criteo to know your different purchase intentions.
Measurement Data allows Criteo to collect details of your various purchases.

Let's take a closer look at the Identity Graph, Criteo's tracking engine.

The Identity Graph, or how Criteo recognizes you on all your devices

If we start from user browsing, the first step is to identify you when you browse e-commerce sites or apps. Criteo collects the different identifiers of the same user depending on the device used, and can then link them to determine that they belong to one and the same person:

Identifiers

We see here that Criteo collects several types of identifiers and associates them with the Criteo id to better track you:

A cookie identifier for each of your web or mobile browsers.
A CRM id for each client, when you are logged in with that client.
A mobile identifier (IDFA on Apple, Android Advertising Id on Android).
The hash of your email address (a unique fingerprint that does not let Criteo recover the email address itself), when the client or Criteo partner passes the information along.

This tracking is particularly invasive and disrespectful of your privacy because Criteo manages to link your different identifiers ("Graph"), and some of these identifiers are difficult (mobile identifiers) or almost impossible (CRM id, hash of your email address) to reset. One example among others: the Fnac application leaks the hash of your email address to Criteo, and does not provide any way to opt out of this tracking.

Once Criteo has recognized you, the second step is to collect all your purchase intentions (the Interest Map):

Purchase intentions

Regardless of the device or e-commerce site you visit, Criteo will collect:

The products you viewed.
The products you added to your cart.
The products you bought.

Criteo customers can also transmit their own customer list to Criteo, via email addresses or user identifiers such as IDFA, Android Advertising ID or Criteo Id. Criteo then indicates the percentage of users in the list who already belong to the Shopper Graph.

Then, the 3rd step is to recognize you so it can target you with those notorious intrusive ads, when you browse a website or media app:

Editor - advertising

Here we can note that Criteo has preferential partnerships with many media outlets, which often gives it preferential access to advertising inventory (see in particular its Direct Bidder) and also lets it obtain permanent identifiers (email addresses, logins). For media outlets with which Criteo has no preferential partnership, Criteo buys via RTB (programmatic) but has to pay a tax to the SSPs (intermediaries).

The 4th step is to measure whether you click on the ad (Criteo pays publishers for each ad display, and is paid by the advertiser not for the purchase but for the click on the ad), then to measure whether you buy from the advertiser (Measurement Data). But tracking does not stop at your "online" behavior: Criteo can also collect your "offline" purchases if its clients pass that information to it:

Offline purchases

Criteo does not mention it in its presentation of the Shopper Graph, but its Identification presentation for investors tells us that it also collects your personal data via partnerships (slide 27):

Data sources

As you can see, Criteo reports partnerships with Liveramp, Oracle and publishers in order to retrieve identification data. You can learn more about the partnership with Liveramp via this commercial video. If we look at its website, Criteo also mentions other partnerships allowing it to associate your different identifiers:

Criteo partner identifiers

Criteo leaks the identifiers of its Shopper Graph to its customers

Criteo doesn't just track 75% of the world's buyers; it also lets the clients of its Shopper Graph (those who agree to share your personal data) retrieve the user identifiers from this graph (the Criteo Ids) for their own accounts, free of charge. Here are the options Criteo offers for leaking personal data to its clients:

Share Criteo Id

To recap:

On your smartphone, you saw a lamp on the website of e-retailer ABC.
You go back to e-retailer ABC's website, but this time on your laptop.

Criteo recognizes you, even though you never logged in to e-retailer ABC's website, because you have already logged in to other Criteo partner sites, on both your smartphone and your laptop.

And Criteo lets e-retailer ABC recognize you as well.

Criteo once again bypasses your browser's protections... and introduces a security vulnerability

Except for Chrome, browsers take steps to protect you from trackers. Many internet users also install adblockers. We have already seen that Criteo has long experience circumventing the measures taken by browsers, and continues to act in this direction so it can keep tracking you despite your wishes.

One of its latest initiatives? Encourage its advertiser and publisher clients to delegate a subdomain to Criteo by setting up a CNAME (read the detailed article by Romain Cointepas, co-founder of NextDNS, an app that effectively fights this tracking). The CNAME lets you specify that a subdomain is an alias for another domain.

It is an old technique used by certain analytics tools, and it is currently enjoying renewed interest, particularly from French companies such as Eulerian, AT Internet and therefore Criteo. Here is Criteo's documentation for advertisers and publishers on setting up a CNAME. And here are examples of customers who have implemented this delegation:

The domain name xgctpf.allocine.fr is an alias to dnsdelegation.io, which itself points towards gum.criteo.com
The domain name ddhhbh.alfaromeo.fr is also an alias to dnsdelegation.io which also points to gum.criteo.com

Note the subdomains with random character strings: this lets Criteo track you covertly, and it makes it hard for adblockers to keep their lists of domains to block up to date (the publisher can easily decide to change the CNAME from one week to the next). Firefox allows extensions to do CNAME resolution, which allows uBlock Origin on Firefox to properly block these calls, but other browsers do not allow this.

Criteo thus gains access to most users with an adblocker (the CNAME issue is a hot topic in the adblocker communities) or a browser configured to block third-party cookies, which lets it:

Measure all visits and conversions on the advertiser's site.
Retarget you on one of your other devices (if you don't have an adblocker on all your devices and the advertiser sends Criteo a permanent identifier such as your email address).
Theoretically identify you across different sites via fingerprinting techniques: with your IP address, or even with additional information collected from your browser. I say theoretically because I have no proof that Criteo uses fingerprinting techniques.

One more word on fingerprinting: Criteo is clearly interested in the subject, as shown by this article from the Criteo R&D blog, which details a research article on IP address based tracking:

For the past few years, web browsers have increasingly limited the persistence of identifiers (cookies), making user tracking more difficult. A revealing example is Safari’s Intelligent Tracking Prevention. This paper presents a clever way to overcome the lack of persistent identifiers without infringing on user privacy, that is without using browser fingerprinting. It consists of using community detection in the Device Graph to detect stable cohorts (person or household level grouping). It is then possible to find the IP addresses that are associated with the cohort over time and thus defining a persistent ID based on these IP addresses. This technique is called Graph backfilling. This technique reaches its limits when many people use the same IP or in the case of dynamic IPs. This is why it works like a charm in the US, but is more difficult to apply in China.

IP tracking

CNAME is one of the techniques (along with logins and emails) that allows Criteo to brag to investors (slide 15) that it has “1st party” access to websites visited by users (there is no need for third-party cookies to collect your personal data):

Criteo 1st party

Here is the email sent by Criteo to its customers and partners (via f_to_k):

Email Criteo CNAME

This email seems innocuous but the CNAME technique is much less so. It introduces a security vulnerability if the partner site has not taken precautions: Criteo can then read the cookies placed on the partner site's domain. Let's study an example by browsing allocine.fr with Safari and the Charles Proxy tool:

Criteo Allocine Cookies

As we can see, xgctpf.allocine.fr (aka Criteo) collects the cookies placed on allocine.fr (which are not meant for it):

Identifier cookies placed by Google Analytics: _ga, _gat, _gid, _gads
A Facebook ID cookie: _fbp
Cookies storing your geolocation: geocode, geolevel1, geolevel2, geolevel3
Your Allocine authentication cookies (I was logged in): ACAUTH, ACCT, ACID, GraphToken

Through the various "stolen" identifiers, Criteo can enrich its Shopper Graph, but that is not the most serious part: through your authentication cookies, Criteo can log in to your own Allociné account! Here are the steps to verify this:

Retrieve authentication cookies from Allocine via Safari and Charles Proxy.
Go to allocine.fr from Chrome, make sure you are logged out.
Use the Chrome extension EditThisCookie to create your authentication cookies.
Refresh the page, and you're logged in!

Criteo connects Bonus: Allociné does not offer a secure version of its website, so Criteo is not the only one that can intercept your authentication cookies. Your ISP and the machines sitting between your device and Allociné's servers can also log in as you.

This major security flaw is probably present on many partner sites because Criteo has already managed to convince more than 10,000 partners to install a CNAME (Allociné was a benign example; most partners are e-commerce sites that can hold much more sensitive information, such as your credit card).

Read this conversation on the Reddit channel r/adops: adding a CNAME is far from trivial, contrary to what the Criteo email suggests. How can this information leak be avoided while using Criteo CNAME? By preventing subdomains from reading cookies from the domain (e.g. read Mozilla documentation, “Cookie Scope”).

Nevertheless, Criteo still operates with complete impunity: the CNAME technique was the subject of a Journal du Net article in which Criteo, ID5 and Prisma Media justify the CNAME practice. Prisma Media's programmatic manager even accuses browsers of overstepping their role by wanting to protect internet users' privacy:

It is frankly problematic that a browser decides what is fair or not by claiming to protect users' privacy, when that is the role of consent management. If the person does not give their consent, of course we do not place a cookie.

For Criteo too, the browser should not "make decisions" on behalf of the internet user, who must be able to consent (or not) to advertising surveillance. But how does Criteo respond to the need to obtain user consent before tracking them?

For collecting consent, Criteo relies on partner advertisers and publishers

The Criteo page explaining the use of personal data gives a good idea of the extent of the personal data collected and how it is used. The legal basis put forward by Criteo to justify this capture of your personal data while complying with the GDPR is consent. But for Criteo, obtaining consent is the sole responsibility of its partners, advertisers and publishers:

Criteo's processing operations comply with current regulations in countries requiring user consent for the use of cookies or any other similar technology. This consent is collected on the websites and mobile applications of Advertisers and Publishers.

Criteo emphasizes this point in its privacy policy:

Note that the use of Criteo technologies is governed by the privacy policies published on the websites and mobile applications of our partners. They are required to provide complete and appropriate information and, to the extent required by law, to obtain your consent before disclosing any personal data about you.

Criteo even indicates that it contractually requires its partner advertisers and publishers to obtain user consent before implementing trackers:

Criteo contractually requires that Advertisers and Publishers respect its general Editorial Charter and its Charter dedicated to partners, as well as the various regulations in force on the protection of personal data, in particular the GDPR. By using Criteo services, they undertake, to the extent that regulations require it: [...] to obtain the consent of users before implementing cookies or other similar technologies, for the purpose of serving personalized ads.

However, Criteo does nothing to enforce this contract, as can be seen with the example of Fnac. Obtaining genuine consent from users before it could identify them would amount to Criteo shutting up shop, hence its doublespeak.

Criteo twists the definition of consent

Criteo is no stranger to doublespeak. So on the one hand it will explain to users in its privacy policy that it respects the notion of consent and contractually asks its partners to apply it. On the other hand, it tells its partners that they do not need to obtain explicit consent from users.

In its "Criteo Privacy Guidelines for Customers and Publisher Partners", Criteo advises its clients on the information clauses to include in their privacy policies. Again, Criteo tells customers that they have an obligation to obtain consent from users within the EU. But what does consent mean for Criteo? The definition is closer to a simple information obligation:

In particular, according to EU laws, the request for consent is considered valid when: Users are informed about the use of cookies and technologies other than cookies by Criteo for the purpose of offering targeted advertising when giving their consent.

Criteo even goes so far as to suggest using the flaw introduced by the CNIL which still allows almost the entire French web to consider that continuing to browse a site (simple scroll or click on a new page) constitutes consent:

Suggested cookie notice for countries where consent is required By continuing to browse our site, you accept the use of cookies and non-cookie technologies to provide you with personalized content and advertising across the sites.

As the European Data Protection Board points out — an independent European body whose objectives are to ensure the consistent application of the GDPR and to promote cooperation between EU data protection authorities — consent must be clear, affirmative and unambiguous. The latest guidelines were published on May 4; in them we read, for example, that scrolling does not constitute consent:

scroll GDPR consent

In an article titled "GDPR: Criteo is ready to take on the challenge", Criteo details its vision of consent, considering that it does not need to obtain explicit consent from the user:

The GDPR establishes a clear distinction between unambiguous consent and explicit consent. Explicit consent implies an express choice on the part of the user. This applies, for example, to the collection of sensitive data such as race, religion, sexual orientation, political affiliation and health. In contrast, as such online tracking devices (e.g. cookies) are categorized as simple personal data. Also, according to the new regulation, an express opt-in is not required with regard to classic retargeting cookies which do not collect sensitive data.

Except that this behavior violates the GDPR, which requires clear, affirmative and unambiguous consent from the user. The CNIL must also move forward to bring its doctrine into line with the GDPR: it started the process in July 2019, but is now using the coronavirus crisis as an excuse to pause this necessary adaptation.

Criteo even wrote a GDPR white paper, detailing its fallacious arguments. Only one point remains: Criteo could not survive if it relied on genuine user consent. It therefore feels obliged to twist the definition of consent.

Criteo's lies in its privacy policy

In its privacy policy, Criteo indicates that it does not receive personal data:

No personal data (surname, first name, postal address, unencrypted email address or other) is communicated to us.

This is false: Criteo customers can send your email address to Criteo in plain text, as indicated on this support page to “create an audience”:

A CRM email address file containing full addresses, email addresses encrypted by MD5 or SHA256 hash of MD5 (full addresses>MD5>SHA256).

If the client sends email addresses in the clear, Criteo says it encrypts them before storing them, so you should trust it.

Still in its privacy policy, Criteo admits that the data it claims is not personal (pseudonymous data) is indeed considered personal data in the European Union and California:

However, this information is considered personal data under the EU General Data Protection Regulation (GDPR) as well as the California Consumer Privacy Act (CCPA).

Another lie, in the “Criteo Commitments” section, we can read:

Criteo ads do not in any way involve collecting the following data: [...] persistent identifiers, such as identifiers of the devices you use (UDID, MAC address, etc.)

Except that the hash of your email address is indeed a persistent identifier. Also, this "commitment" is in contradiction with the “Online identification at Criteo” deck for investors. On slide 16, Criteo indicates that 96% of the “identities” (= users) in its “Identity Graph” contain at least one persistent identifier:

Identity Graph Criteo - persistent identifiers

On slide 22, Criteo also states that it uses third parties to obtain more persistent identifiers, and thus to stop depending on cookies:

persistent identifier partners

Criteo's duplicity is blatant: on one side, a commitment to users not to collect persistent identifiers "under any circumstances"; on the other, facilitating the collection of these persistent identifiers from partner advertisers and publishers (as a reminder, the support page showing how to send Criteo a list of emails) and communicating these persistent identifiers to investors.

Disabling Criteo services on mobile apps does not work

As seen with the example of Fnac on iOS, Criteo continues to collect a hash of your email address, even when you have disabled ad tracking:

Limit advertising tracking

Yet its page "Disable Criteo services on mobile apps" indicates:

Criteo withdrawal consent

Here we find a double lie from Criteo: my e-mail (or even a hash of my email) is a persistent identifier, yet Criteo collects it (first lie), even if I deactivate ad tracking (second lie).

Repeated and documented abuses, but still no sanction

Criteo's unethical practices have been detailed and denounced for a long time, here are a few elements:

Privacy International's complaint against Criteo, Quantcast and Tapad for violation of the GDPR dates from November 2018; it took 16 months for the CNIL to react and start its investigation.
The CNAME technique is just one element among others developed by Criteo to circumvent the protections put in place by browsers, the EFF has documented Criteo's previous attempts to circumvent Safari ITP.
These techniques were detailed by Gotham City Research LLC in a dedicated report.
Another report from Gotham City Research LLC denouncing widespread fraud on the inventory managed by Criteo. Note that the reports were not disinterested, because Gotham openly bet against Criteo's share price; the fact remains that the practices denounced were proven.
Criteo is part of the Acceptable Ads committee, a initiative created by Adblock Plus, allowing Criteo to display advertisements even if you have installed Adblock Plus or other adblockers supporting the initiative (these advertisements are "adapted": they take up a little less space than traditional Criteo advertisements). Shocking because Criteo ads are particularly intrusive, but “Acceptable Ads” is “only” based on visual pollution.

However, nothing happens. Given the history of the CNIL, it is reasonable to doubt that there will be any real sanctions against a French digital champion with significant economic and political clout, as demonstrated by Bruno Le Maire's visit for the 1st anniversary of Criteo's artificial intelligence lab, last October. For Bruno Le Maire, Criteo is "one of the great French successes of the last 15 years":

Tweet Bruno Le Maire - Criteo

What to do then? Unfortunately, while waiting for real political ambition, the solutions remain individual and technical: use an adblocker such as uBlock Origin combined with Firefox on the web (or other privacy-friendly browsers such as Brave and Safari), use apps such as DNSCloak, Adguard or NextDNS on iOS, or even install Pi-hole on a Raspberry Pi if you enjoy technical projects.