html Analytics and advertising tools, what risks for your privacy? | Tracking pixels

Analytics and advertising tools, what risks for your privacy?

How marketing companies that monitor you work and what options you have to opt out of being tracked

Published by Pixel de Tracking on May 10, 2020

When you surf the web or on a mobile application, you are tracked by many companies. Here is a (partial) overview of the different analytics and advertising tools and their consequences on your privacy.

Analytics tools

These tools aim to provide usage statistics for a website or application; for example, they allow you to track:

  • Indicators such as the number of visitors, sessions, page views or conversions.
  • The pages and screens consulted.
  • Methods of entry to the site (direct, search engines, social networks, websites).
  • Visitor characteristics: region, device, browser, screen size, etc.

On the web, analytics tools working with 1st party cookies

Most analytics tools compartmentalize data by customer. For example, your navigation path on L'Équipe has no interest in providing statistics to the Lemonde site. Also on the web, this separation is technically enforced by the use of 1st party cookies : you are tracked via a pseudonym placed on the client's domain (example: lequipe.fr), this pseudonym cannot technically be read by another domain (example: lemonde.fr). Here are examples of tools using 1st party cookies:

  • Google Analytics : Google's analytics tool is present on most websites and many applications. By default, it works via 1st party cookies.
  • Adobe Analytics : via the acquisition of Omniture in 2009, Adobe offers an analytics tool widely used by major accounts.
  • AT Internet : French analytics tool, still quite popular on French media sites.
  • Matomo : formerly Piwik, an open source analytics tool that can be self-hosted (like on this blog).

1st party cookies have the “advantage” of being more durable than 3rd party cookies. In order to protect the privacy of their users, browsers are increasingly blocking 3rd party cookies (Safari, Firefox and Brave lead the way, Chrome is dead last but has decided to block 3rd party cookies within 2 years).

Please note that when you connect to a website, you lose your anonymity and the site may potentially combine your journey on site, regardless of the device you use, with CRM data already recorded about you (your subscription, your purchases, etc.). Without sending any personal data, some sites will simply send your customer ID to the analytics tool, then export the raw data from the analytics tool to their Business Intelligence tool for subsequent analysis.

How to avoid this tracking? You can install an adblocker such as uBlock Origin.

On the web, other analytics tools work with 3rd party cookies

However, some web analytics tools can track you across multiple sites, via 3rd party cookies : your pseudonym is placed on the domain of the analytics tool, allowing it to access it regardless of the site you consult. Obviously the impact on your privacy is worse: the tool is then capable of profiling you via your browsing on each of the websites where it is installed. We can cite as examples:

  • Google Analytics : Google's free analytics tool is present on most websites and many applications. By default on the web, Google Analytics installs 1st party cookies, but offers an opt-in option for its customers to activate 3rd party cookies (on the doubleclick.net domain), in addition to 1st party cookies. This option allows the client website to enable certain advertising features such as remarketing, but also to obtain aggregated information on the visitor profile (demographics and interests). Activating this feature obviously allows Google to profile you even better.
  • Quantcast : this advertising company offers a free analytics tool for publishers. This tool allows publishers to better understand their audience, but above all will allow Quantcast to enrich its database of user profiles.

The Google Analytics audience building tool allows advertisers to define very precise targets to then retarget them:

Audience_Builder_Google_Analytics

Here too, an adblocker such as uBlock Origin will protect you.

Analytics tools on Apps

The degree of monitoring of analytics tools on Apps is higher. First of all, these analytics tools access user identifiers with greater persistence, and accessible to all Apps: in particular IDFA at Apple and AAID at Google at Google. Users can deactivate these identifiers but the options are well hidden (and your monitoring is not over, other "1st party" identifiers then take over). In comparison, 3rd party cookies are increasingly blocked by browsers and by extensions to protect privacy.

Also, analytics tools dedicated to Apps (or having started by offering their services to Apps) offer functionalities to track users individually, which is still rarely the case for specialized analytics tools on the web (in general these tools are more "old" and "late" in terms of functionalities). Also, these tools often collect non-anonymized personal data such as your name or your email address, which remains rare on web analytics tools. For example, here is how Mixpanel sells its solution on its site :

mixpanel

These companies can therefore combine a lot of information about you. Some of them specialize in Analytics and have no commercial reason to combine your personal data from different Apps, but other companies also provide advertising services and thus allow themselves to combine your personal data. Here are examples identified during application testing on this blog:

  • Mixpanel : pure analytics tool that started with Apps.
  • Amplitude : another pure analytics tool that started with Apps.
  • Adjust : analytics tool for Apps also offering attribution (knowing which advertising campaigns are effective), fraud prevention, audience segmentation and retargeting.
  • Appsflyer : another analytics tool for Apps offering a multitude of services such as marketing analysis, fraud prevention or attribution.

Protecting yourself becomes more complicated here, with Apps rarely giving you control over these tools. You will need to use apps such as DNSCloak, Adguard or NextDNS on iOS.

Advertising solutions

Since advertising is very often based on your behavior, these solutions do not respect your privacy. Depending on the purpose of the type of company, the risk is nevertheless different. Via the Ad Ops Insider site (more details here), here is the diagram summarizing the exchanges between the different actors involved in delivering an advertisement to you:

RTB

Now let's look at the different tools involved, and their implications for your privacy.

Tools operating on behalf of a publisher or advertiser

These tools do not need to combine your behavioral data from multiple customers to work well. Here are the main tools on the publisher side (the website on which the advertisements are displayed):

  • The publisher adserver : the “conductor”, the tool that decides which advertising campaigns to display when you visit a media site, and measures their distribution on behalf of the publisher. He will have to arbitrate between direct sales (advertising campaigns sold directly by the publisher, often with a fixed number of advertising displays) and indirect sales (advertising that the publisher does not control, but which it delegates to ad-networks and SSPs). The publisher server does not need to know your web browsing to function correctly, "just" to know your behavior on the publisher's site.
  • The SSP (Supply-Side Platform) or ad-exchange : the programmatic marketplace, its role is to auction off the publisher's advertising opportunities. It is connected to numerous DSPs (programmatic purchasing platforms operated by advertisers) and ad-networks (intermediaries who have also often developed programmatic purchasing platforms). It also does not need to know your behavior on the web to work properly. Note that often, the publisher puts several SSPs into competition using a mechanism called “Header Bidding”.

Some solutions combine both a publisher adserver and an SSP: we find in particular Google Ad Manager (dominant actor), AppNexus (renamed Xandr since its buyout by AT&T), Freewheel (bought by Comcast) or Smart AdServer (French actor). Many solutions only offer SSP.

On the advertiser side, here are the main tools:

  • The advertiser server : the tool in charge of distributing advertising and measuring its effectiveness on behalf of the advertiser. It measures all of the advertiser's campaigns: direct and indirect purchases (via ad networks and DSPs). The advertiser server does not need to know your web browsing to function correctly, "just" to remember the different interactions with the advertiser's advertisements.
  • The DSP (Demand-Side Platform) : the programmatic purchasing platform, its role is to buy advertising on behalf of the advertiser, on the right sites, for the right target (the most relevant users) and at the right price. This tool does not need to know your behavior on the web to work correctly, but it will be able to bet more intelligently if it knows your history with the advertiser. Note that the advertiser can use several DSPs in order to put them in competition.

Some solutions combine both advertiser adserver and DSP: we always find Google via Display & Video 360 (dominant actor), but also Adform. Many solutions only offer a DSP.

Publishers and advertisers may also use DMP (Data Management Platforms), these tools allow them to collect your browsing data, combine them with your personal data from a CRM (subscriptions, purchases, etc.) and transfer them if necessary to their advertising tools. Application examples:

  • A publisher will be able to sell an advertising campaign to Sony PlayStation, targeted at subscribers to its video games newsletter. The publisher's DMP collects subscribers to the video game newsletter, then transfers this "target" to the publisher's server, which will allow it to broadcast the advertising campaign to the right target.
  • An e-commerce site wants to exclude people who have already installed its application from its advertising campaign promoting the application. The advertiser's DMP collects the profiles of users who have installed the application, then transfers this "target" to the advertiser's DSP, which will allow it to exclude the target from the advertising campaign.

Targeted advertising campaigns which automatically result in a leak of your personal data to these adtech players.

The top companies offering DMP are marketing giants such as Oracle, Salesforce or Adobe. These companies offer many other marketing tools such as CRMs, and are thus able to cover most of an advertiser's customer management needs.

Programmatic, where the widespread leak of your personal data

If in theory, these tools do not need to track you with a single pseudonym across the entire web or across all Apps to function (but only within the scope of their client, just like analytics tools using 1st party cookies), this is nevertheless what they do (via 3rd party cookies) and this allows them in particular to make programmatic purchases work.

The DSPs and ad-networks that buy advertising space programmatically “need” to know you in order to bet intelligently. Except that they do not have direct access to your terminal (they are called by the SSPs, who have access to your terminal). On Apps, this is not a problem because the SSPs send your advertising identifier (IDFA at Apple, AAID at Google).

On the web, you do not have a unique identifier for all the sites you visit, the SSPs must therefore synchronize your identifier with the connected DSPs (so for example, the DSP 1 which recognizes you via the identifier "123" is aware that you have the identifier "xyz" with SSP A, which allows it to recognize you when SSP A sends it the advertising opportunity). If you want to go deeper, the cookie synchronization mechanism is very well explained on the Ad Ops Insider website from which the diagram below comes:

Cookie_sync

Let's summarize the leak of your personal data:

  • On Apps, SSPs leak your personal data to numerous DSPs and ad-networks (sometimes hundreds) without prior synchronization of identifiers. The leak of your personal data is entirely hidden (it happens between the servers of the SSPs and the servers of the DSPs), you will only be able to see the advertising of the DSP which won the auction (but each of the DSPs called will have been able to enrich your user profile).
  • On the web, the SSPs having to synchronize your identifiers with the connected DSPs upstream, it is possible to see these “ID synchronization pixels” passing by, causing additional delays.

These leaks of your personal data are not limited to interactions between SSPs and DSPs, it is a simplification, it also takes place with other players in the advertising chain such as (non-exhaustive list):

  • Solutions that detect fraud (advertising attracts mafias because it is a lucrative market, a significant part of the advertisements broadcast are never seen by humans but simply by bots).
  • Visibility measurement solutions (unscrupulous publishers like to put advertisements at the bottom of the page, which you will never see).
  • Solutions that sell user data, for example social media sharing tools such as ShareThis, collect your browsing data for resale.
  • Attribution solutions, which will measure each of your advertising interactions to evaluate which advertising campaigns are the most effective.

About this you can read here the elements of Brave's complaint against Google and the IAB (Interactive Advertising Bureau, the pressure group for adtech companies) regarding the violation of the GDPR by RTB (Real-Time Bidding: programmatic advertising). The complaint was filed in September 2018, the ICO (the UK ICO) has put the investigation on hold due to Coronavirus, so don't be in a hurry.

Here is a overview of the main players (again, non-exhaustive list) involved in the advertising chain:

Lumascape_Adtech

What can you do? If you follow the recommendations of these actors, you can install opt-out cookies for each of them, which is not very practical. Also, these actors co-constructed the Transparency & Consent Framework (TCF), a protocol for transmitting information regarding your consent. But the TCF does not work correctly:

  • Like we have already seen it, the consent banners on which the TCF is based use Dark Patterns to make refusing tracking difficult, not to mention they don't work properly.
  • The TCF is a communication protocol between adtech players, nothing then obliges them to respect the signal received.
  • In particular, not giving your consent does not prevent these actors from collecting your personal data or even profiling you. Some people just think they should deactivate personalized advertising.
  • Controls and therefore sanctions are almost non-existent, the advertising industry claims that self-regulation is enough.

Consequences of this ultra-complex ecosystem, where anything goes:

  • Your personal data leaks to hundreds of different tools, with no real control possible.
  • Publishers receive only half of the money spent by advertisers, with middlemen each taking a commission.

waterfall

Study on programmatic transparency, note that the study cannot explain 15% of the money spent.

Ad-networks, intermediaries working with both publishers and advertisers

SSPs and DSPs are tools that publishers and advertisers have control over (“self-service” tools):

  • The commission is fixed in the contract (between 5% and 15% in general).
  • The configuration of the SSP is the responsibility of the publisher (minimum sales price, advertisers accepted, advertising formats accepted, preferential agreements for certain brands).
  • The setting of advertising campaigns is the responsibility of the advertiser (or the agency operating the DSP, subject to validation by the advertiser): choice of distribution sites, targeting, advertising formats or bidding strategy to achieve the objective.

Ad-networks, on the other hand, do not leave control to the advertiser or publisher:

  • The publisher has minimal control via its SSP if the ad-network buys programmatically.
  • The advertiser cannot decide in advance on the sites or applications on which he will broadcast.
  • He often does not have access to detailed reporting of his advertising campaign.
  • He does not choose his bidding strategy himself, but delegates decisions to the ad-network.
  • In return, the effort required is minimal.
  • The ad-network commission is often opaque but easily rises to 30% (Google AdSense) or even 50% (Criteo).

Why then go through an ad-network? For 2 main reasons:

  • The campaign will be less costly to operate (no need for complex settings on a DSP).
  • The results will often be better (these ad-networks monitor you extensively, your personal data allows you to be more effective).

One might believe that ad networks are in the minority compared to DSPs and SSPs, operated directly by advertisers and publishers, but this is not the case:

  • On the web, Google AdSense represents a considerable part of publishers' revenues.
  • Still on the web, intermediaries such as Criteo also have a very significant weight. They buy programmatically but can also buy directly from publishers to avoid the SSP commission.
  • On Apps, Google and Facebook ad networks are very powerful: Google Admob and Facebook Audience Network.
  • Still on Apps, programmatic has more difficulty establishing itself because advertising formats are often customized, and fit more difficult into the box of programmatic standardization. Ad networks are still very powerful.

For your privacy, these ad networks are a disaster because in order to earn more money, they have to profile you better. Here is “their” virtuous circle:

  • Capture of your personal data via the distribution of targeted advertisements (or simply by “listening” to advertising opportunities on programmatic).
  • For some (Google, Facebook, Twitter, Pinterest, LinkedIn), additional capture of your personal data via essential B2C services (search engine, social networks, professional network, etc.).
  • For some (Google, Facebook, Quantcast, etc), additional capture of your personal data via analytics tools.
  • Improvements to “profiling” and “pricing” algorithms via the mass of personal data collected, and via measuring the performance of advertising campaigns.
  • Improving the effectiveness of advertising campaigns, advertisers are willing to spend more money.
  • Publishers are increasing their revenues and are ready to open up their advertising inventories further.
  • Even broader capture of your personal data.

In this little game, the following companies are doing very well but leave you almost no control, their economic models being in contradiction with respect for your privacy:

  • Google : the Mountain View giant knows everything about your aspirations, which benefits its ad-network which is dominant on the web (Google AdSense) and very well established on Apps (Google AdMob). Your control over this capture of your personal data is very limited: Google does not allow you to refuse the collection of your personal data, only to refuse personalized advertising and the association of your personal data with your Google profile.
  • Facebook : the Menlo Park giant also knows you intimately, which allows its ad-network Facebook Audience Network to work very well on the Apps. Facebook gives you no control over its collection of your personal data.
  • Criteo : the French adtech giant, world leader in retargeting (ads that follow you everywhere, following consultation of a product), does not allow you to refuse collection, only to refuse personalized advertising.

What to do? A complaint from Privacy International was filed against Criteo, Quantcast and Tapad in November 2018, the CNIL started the investigation of Criteo in March 2020, you don't have to be in a hurry.

The only solution for today remains technical, and therefore not accessible to all users: the installation of an adblocker such as uBlock Origin on the web or apps such as DNSCloak, Adguard or NextDNS on iOS.