The Website Content Epidemic
In our previous blog post we proposed a model for higher education web governance that takes its lead from providing the best visitor experience. And, we further stated that providing the best visitor experience is multidimensional, as it necessitates the technical infrastructure being fit for purpose, the site being capable of responding rapidly and displaying content on different devices and the content being current, organised and accessible.
In that post we reviewed some of the factors influencing the speed at which pages load and we reported typical results for higher education websites, using the websites of Canada’s U15 universities as a test group.
In this post, we examine content organisation. A good visitor experience is one in which visitors can find relevant and up-to-date content and actually access it. Maintaining well-organised content is an important issue, because higher education websites are large, they accumulate many different types of content and some of it (much of it?) isn’t reviewed as frequently as it should be to determine if it is still relevant.
The objective should be to provide the ‘right’ content, in an appropriate format and make it easy to find.
There is a tendency to take a technical view when maintaining websites and classify content by file type rather than by its utility to a site visitor. Taking the visitor perspective can ease the task of providing the right content in the right format, so an alternative approach is to cluster website accessible content files into six groupings:
- Documents – a mix of file types, but dominated by Adobe PDF files
- Media files – a variety of audio, images and video files, both on- and off-site
- Presentations – Microsoft PowerPoint presentation files
- Social Media – links to social media networks and associated content
- Spreadsheets & other data files – spreadsheet files and other data formats
- Technical files supporting site or page functionality – critical to the visitor experience, but not accessed, in the conventional sense, by most visitors
In this post we review the documents, presentations and spreadsheets that are typically accessible via higher education websites. In a previous post we looked at social media connections and a future post will examine the proliferation of media files on higher education websites.
The PDF Plague
As we had used the data from the U15 university websites for our previous study, we used this group for the current research exercise. The fifteen universities have a total of just over 31.4 million indexed pages publicly available: roughly two million pages per site. We performed our file counts on 8 June 2016 - the totals will have changed by the time you read this.
Between the fifteen sites we found 1.8 million documents, presentations and spreadsheets: 97.4% being documents and the balance presentation and spreadsheet files.
The overall volume of content helps to explain the difficulties associated with managing it and ensuring that users can readily find it. The audiences are many, as sample checks show the files to represent everything from policy documents, financial records, budget estimates, experimental results, presentations by advocacy groups, student CVs and the like.
And, by not ensuring material is in the appropriate formats for visitors, pages often lack proper titles or have no descriptions or links to files simply say ‘Click here to download’ and – files that need to be downloaded rather than open in a browser are potentially in formats that aren’t mobile friendly.
There is a PDF epidemic. Almost 90% of all the documents accessible in the public areas of the U15 university websites are PDFs.
We found a total of 1.7 million PDF files or an average of just over 100,000 PDFs per institution – these being scattered over the hundreds of ‘sub-sites’ that exist at each institution. We did not review the specific formats (PDF versus PDF/UA), ages or other data associated with these files, but clearly this is a task that each institution should carry out to ensure the content remains current and relevant.
It would also be instructive to know how frequently these documents are being accessed – and, as most of these sites have Google Analytics installed, it would be straightforward to implement an event to record accesses or downloads. A review with the aim of de-cluttering might also address whether the content needs to be in PDF format or if it should appear as HTML.
As with all website maintenance, the devil is in the details.
Chart 1: The relative proportion of the document files, as identified by the file extension (e.g. .txt) found on the publicly accessible websites for Canada's U15 universities. In total we counted 1.7 million document files
While recovering from the PDF epidemic should dominate maintenance activities, we note that these sites collectively have over 80,000 plain text files (.txt), about 70,000 Microsoft Word documents (.doc & .docx) and 30,000 PostScript (.ps) files accessible to visitors.
For each of these file types it would make sense to understand how these file formats best serve visitors. For example, text files will display in a browser, but the pages do not have search engine friendly titles or descriptions and with minimal HTML formatting would be virtually inaccessible to anyone using a page reader.
Word document files generally force a download and may or may not be accessible on mobile devices and one questions the prudence of distributing files that can be edited.
While most of us believe we are trapped in a world of PowerPoint presentations, the total number of PowerPoint files is just 2% of the number of PDFs – although many of the PDFs may turn out to be ‘printed’ versions of presentations.
The number of ‘raw’ or editable presentation files publicly accessible across the fifteen websites is just under 35,000 or about 2,300 files per website.
Chart 2: The relative proportion of the presentation format files, as identified by the file extension (e.g. .ppt) found on the publicly accessible websites for Canada's U15 universities. In total we counted 34,774 presentation files
PowerPoint files generally force a download rather than opening in a browser and may not be accessible on mobile devices and, again, one questions the prudence of distributing files that can be edited.
There is a reasonable case to be made for converting these files into PDF documents or at least sharing the content via other presentation sharing services.
Spreadsheets and Data Files
Numerical data files represent less than 1% of the total number of document, presentation and spreadsheet files accessible on the U15 websites: about 12,000 files, in all. The files represent a mix of sources from budget and financial planning documents, via experiment results to mapping data.
Chart 3: The relative proportion of the spreadsheet and other data format files, as identified by the file extension (e.g. .xls) found on the publicly accessible websites for Canada's U15 universities. In total we counted 12,104 files.
Microsoft Excel files need to be downloaded rather than opened in a browser and may not be accessible on mobile devices and, once more, one questions the prudence of distributing files that can be edited. A similar situation prevails for the other file types we encountered.
Providing the best visitor experience means taking the point of view of the visitor and understand what would be a reasonable expectation when visitors search for and locate content on a website. With the exception of PDF documents, most of the other formats need to be downloaded and require separate software to access the content – is that a reasonable expectation?
It is not clear that the different file formats being used to present content to site visitors is the product of conscious choice or of accident. At the very least, there is a case for putting in place policies to ensure that visitors can actually access the content they need without needing to navigate content that is potentially not mobile friendly or requires third party software for access.
Don’t have accurate and current information on all the websites you own? Not able to monitor and check each website’s content quality and risk status? Let’s talk about how we can help.