Taming the Cookie Monster
Last time we examined how easily university and college website visitors can link to privacy statements and notices. We also identified how clearly sites describe the types of information disclosure statements they provide.
This time, we present our findings from reviewing cookie use and disclosures from 2,736 higher education websites in Australia, Canada, Ireland, New Zealand, United Kingdom and the United States.
What We Did
We recently upgraded our QA service to capture and report cookies set as visitors traverse websites. However, for the current exercise, we narrowed the scope to:
- examine cookies loaded on home pages; and,
- first-party cookies loaded by the institutional domain, as these cookies are under a college or university’s direct control.
In practice, most institutions use third-party software to provide additional functionality – from social media sharing to video replay – and these applications load additional cookies as they run. We highlighted the potential data privacy issues of third-party applications in two blog posts here: How to Stop Tags Choking University Website Performance and here: How to Maintain Tag Control & Privacy on University & College Websites
What We Found
All 2,736 sites we tested load one or more first-party cookies when a visitor lands on the home page. About 2% of the sites load a single cookie. The average site loads six and the most ‘extreme’ sites load 20 or more. Overall, 95% of the sites examined load 12 or fewer cookies.
Figure 1: Frequency distribution of the number of first-party cookies loaded on the home page of 2,736 university and college websites.
Figure 1 clearly shows a cluster of sites loading 3, 4 or 5 cookies – this sub-group represents just over 50% (51.9%) of all the websites examined. The three components of this sub-group represent the most frequently encountered cookie combinations in our review:
- 3 cookies – almost exclusively three cookies loaded by the older version of Google Analytics (_gc, _gat and _gid)
- 4 cookies – the three older version of Google Analytics cookies plus a cookie to handle a session connection to the web server
- 5 cookies – typically the current Google Analytics implementation cookies (_utma, _utmb, _utmc), a session cookie and a cookie set by DoubleClick for advertising.
The histogram summarises data for 16,320 first-party cookies set on 2,736 website home pages. In practice, the cookies have three different sources:
- The smallest cookie source is “Resource Headers”, in other words cookies set when loading a “resource” (for example an image or style sheet) on a page. These are cookies set by content management systems, networking hardware, servers and the like. This source represents about 2% of all the cookies being set.
- A larger source of cookies is “Page Headers”, in other words cookies set by the server responding to the requested page. These types of cookie overlap with those set by “Resource Headers” and also originate from content management systems, networking hardware, servers and the like. This source accounts for about 8% of all the cookies being set.
Table 1 summarises the overall number of cookies and the number of unique cookies identified for each category.
|Resource Header||Page Header||Script||Total|
Table 1: Total number of first-party cookies recorded on the home pages of 2,736 higher education websites by cookie source.
Based on our development work and testing, we hypothesise that cookies set by page and resource headers tend to be stable as they are usually set for technical reasons (session control, network access, logins to university services). As a result, they are relatively easy to catalogue and manage. Based on the sites reviewed in this study these categories of cookies represent only 10% of the first-party cookies present on university and college home pages.
And, just to be clear, the cookies being discussed in this article are only first-party cookies, those directly under institutional control. Including third-party cookies set by social media sharing, advertising and other third party applications pushes the total number of cookies being loaded way higher.
Management and Privacy Implications
There are two management issues arising from our study.
The second task is one of providing cookie information to site visitors in disclosure statements that are complete and accurate. The latter is hard to achieve without accurate lists of the cookies actually loaded on a site.
We did not review 2,736 privacy notices for their cookie disclosures, but we have spot checked a sample that illustrate three main approaches to disclosure:
- Generic advice about cookies, why they are used and what may be done with data that is collected (this site loads four cookies: three are Google Analytics and a fourth records that a visitor has acknowledged the cookie disclosure):
- An intermediate approach disclosing cookie use and offering generic advice about managing cookies through browser control and opt-out schemes. This site loads a total of 23 first-party cookies on its home page:
- Detailed disclosure itemising the cookies being loaded and their purpose. This site lists a total of 32 cookies in its schedule (dating from April 2016), but omits five additional Google Analytics, DoubleClick and Visual Web Optimizer cookies that are currently loaded on the home page. It is hard to keep these lists accurate:
Whichever approach is preferred for disclosure a detailed, and regularly updated, list of the all first- and third-party cookies being loaded across a site should be available, for risk mitigation, site performance and operational efficiency.