Taming the Cookie Monster on Higher Education Websites

image of cookies stacked on top of each other

Taming the Cookie Monster

Last time we examined how easily university and college website visitors can link to privacy statements and notices. We also identified how clearly sites describe the types of information disclosure statements they provide.

One important disclosure that visitors should be able to access readily is a statement addressing cookie use. College and university websites use cookies extensively to record visitor details, set preferences and to personalise site interactions.

This time, we present our findings from reviewing cookie use and disclosures from 2,736 higher education websites in Australia, Canada, Ireland, New Zealand, United Kingdom and the United States.

What We Did

We recently upgraded our QA service to capture and report cookies set as visitors traverse websites. However, for the current exercise, we narrowed the scope to:

  • examine cookies loaded on home pages; and,
  • first-party cookies loaded by the institutional domain, as these cookies are under a college or university’s direct control.

In practice, most institutions use third-party software to provide additional functionality – from social media sharing to video replay – and these applications load additional cookies as they run. We highlighted the potential data privacy issues of third-party applications in two blog posts here: How to Stop Tags Choking University Website Performance and here: How to Maintain Tag Control & Privacy on University & College Websites

What We Found

Summary

All 2,736 sites we tested load one or more first-party cookies when a visitor lands on the home page. About 2% of the sites load a single cookie. The average site loads six and the most ‘extreme’ sites load 20 or more. Overall, 95% of the sites examined load 12 or fewer cookies.

Figure 1: Frequency distribution of the number of first-party cookies loaded on the home page of 2,736 university and college websites.

Figure 1 clearly shows a cluster of sites loading 3, 4 or 5 cookies – this sub-group represents just over 50% (51.9%) of all the websites examined. The three components of this sub-group represent the most frequently encountered cookie combinations in our review:

  • 3 cookies – almost exclusively three cookies loaded by the older version of Google Analytics (_gc, _gat and _gid)
  • 4 cookies – the three older version of Google Analytics cookies plus a cookie to handle a session connection to the web server
  • 5 cookies – typically the current Google Analytics implementation cookies (_utma, _utmb, _utmc), a session cookie and a cookie set by DoubleClick for advertising.

The Details

The histogram summarises data for 16,320 first-party cookies set on 2,736 website home pages. In practice, the cookies have three different sources:

  • The smallest cookie source is “Resource Headers”, in other words cookies set when loading a “resource” (for example an image or style sheet) on a page. These are cookies set by content management systems, networking hardware, servers and the like. This source represents about 2% of all the cookies being set.
  • A larger source of cookies is “Page Headers”, in other words cookies set by the server responding to the requested page. These types of cookie overlap with those set by “Resource Headers” and also originate from content management systems, networking hardware, servers and the like. This source accounts for about 8% of all the cookies being set.
  • The largest source of cookies, in our survey, originates from home page JavaScript execution. This source represents 90% of all the cookies loaded and covers cookies set by Google Analytics, A/B testing applications, chat applications and cookies set to record language preferences, permission to set cookies or JavaScript execution.

Table 1 summarises the overall number of cookies and the number of unique cookies identified for each category.

Resource Header Page Header Script Total
All Cookies 326 1,252 14,742 16,320
Unique Cookies 166 416 2,065 2,647

Table 1: Total number of first-party cookies recorded on the home pages of 2,736 higher education websites by cookie source.

Based on our development work and testing, we hypothesise that cookies set by page and resource headers tend to be stable as they are usually set for technical reasons (session control, network access, logins to university services). As a result, they are relatively easy to catalogue and manage. Based on the sites reviewed in this study these categories of cookies represent only 10% of the first-party cookies present on university and college home pages.

The larger management task comes from the cookies loaded by JavaScript. And, this is a genuine management challenge. Above, we noted that 50% of sites have between three and five first-party cookies, mostly related to Google Analytics. However, a large proportion of the sites with six or seven first-party cookies have this number because they have old and current versions of Google Analytics running simultaneously – for no obvious benefit.

One can see that managing cookies is complicated by the multiple sources and the different individuals or organizations behind these. Networking specialists may wish to load cookies to assist with load balancing and server access, while marketing staff want to follow up visits via re-marketing campaigns, while developers implemented the JavaScript they were asked to, but didn’t appreciate potential data privacy implications.

And, just to be clear, the cookies being discussed in this article are only first-party cookies, those directly under institutional control. Including third-party cookies set by social media sharing, advertising and other third party applications pushes the total number of cookies being loaded way higher.

In fact, the highest cookie ‘payload’ we observed was 220 cookies once all third-party JavaScript had completed execution. That’s quite a management headache.

Management and Privacy Implications

There are two management issues arising from our study.

The first is the task of simply cataloguing the cookies being loaded on a website and recording the origin of those cookies. Discovering and recording cookies in use allows an assessment of whether the cookie is needed and, in the case of JavaScript cookies, if the script is required.

The second task is one of providing cookie information to site visitors in disclosure statements that are complete and accurate. The latter is hard to achieve without accurate lists of the cookies actually loaded on a site.

We did not review 2,736 privacy notices for their cookie disclosures, but we have spot checked a sample that illustrate three main approaches to disclosure:

  • Generic advice about cookies, why they are used and what may be done with data that is collected (this site loads four cookies: three are Google Analytics and a fourth records that a visitor has acknowledged the cookie disclosure):

 

  • An intermediate approach disclosing cookie use and offering generic advice about managing cookies through browser control and opt-out schemes. This site loads a total of 23 first-party cookies on its home page:

  • Detailed disclosure itemising the cookies being loaded and their purpose. This site lists a total of 32 cookies in its schedule (dating from April 2016), but omits five additional Google Analytics, DoubleClick and Visual Web Optimizer cookies that are currently loaded on the home page. It is hard to keep these lists accurate:

Conclusion

For managing privacy compliance, providing less detailed, but accurate statements seems more tractable than attempting to keep detailed lists up to date, when marketing staff or developers can readily implement new JavaScript code that will change the cookies actually being loaded.

Whichever approach is preferred for disclosure a detailed, and regularly updated, list of the all first- and third-party cookies being loaded across a site should be available, for risk mitigation, site performance and operational efficiency.

 

Sign Up for Email Delivery:

We collect the following solely to email you new blog posts.

* indicates required
 

MailChimp stores your details. We do not share data with third parties.

 

 

Don’t have accurate and current information on all the websites you own? Not able to monitor and check each website’s content quality and risk status? Let’s talk about how we can help.

 

Blog photo image: unsplash.com