How to Give Up PDFs and Improve Your Higher Education Website’s User Experience
Websites have become higher education’s most important communications medium, allowing a wide range of audiences to use online services and to find information.
They typically have a tragic flaw: large collections of PDF formatted files (PDFs). And, we mean large. In mid-2016 we checked Canada’s U15 universities to find 1.7 million PDFs or just over 100,000 per institution, see: Higher Education's Website Content Epidemic.
Unless universities and colleges use tightly integrated workflows to manufacture their PDFs, the results are web governance headaches and poorer user experiences than if they had relied on HTML-based web pages.
A solution addressing both concerns is PDF pruning: converting PDFs to HTML as needed, while retaining PDFs where they are the ‘right’ answer.
Why are PDF Files a Problem?
PDFs’ shortcomings as a universal communications medium are a combination of web governance and user experience concerns.
For the purposes of this article, web governance is a set of institutional policies supporting its websites’ various objectives. Web governance means not only having defined processes and procedures, but also oversight to ensure they are being followed.
Content creators often use PDFs to remain independent, by retaining control over content design and creation and even deliberately deviating from centrally-agreed branding.
The same rugged individualism can creep into controlling and versioning source documents so that uploaded PDF content can, over time, become irrelevant, inaccurate or even legally risky.
Once uploaded PDFs are opaque to their creators. Their embedded web page links can become broken, without anyone being aware, potentially leaving users lost or confused.
PDFs are also opaque to marketing, communications and other groups that use web analytics to record and understand user behaviour. It is difficult to track user interactions within a PDF and measure how well they meet their communications objectives.
When processes are still paper-based or when content will be printed PDFs are a good solution. A PDF virtue is that they are formatted to be printed rather than rendered on screens, although many will have struggled to get non-standard page sizes to print correctly.
Decentralised PDF ‘manufacturing’ is the norm in higher education, but it can be difficult for UX, web or digital teams to provide appropriate guidance to far flung creators of PDFs.
The governance concerns can be largely allayed by reducing the number of PDFs, while simultaneously ensuring that, when they are the right choice, they are produced with great care.
Accessibility and User Experience
Higher education marketing and communications has embraced writing for the web along with plain English for effective messaging and web page interactions.
Good HTML-based content has a meticulous hierarchy, facilitating user ‘scanning’ of content. PDFs do not support this type of interaction and often slow comprehension.
On the other hand, PDFs’ best use cases are for difficult content: lengthy reports, academic papers and other complex information. We believe website visitors are better served when these types of PDF content also exist in HTML, allowing quicker interpretation of whether the content is relevant to the end user.
Well-developed techniques make HTML content accessible to most web users, further strengthening its potential for clearly conveying information. The same is true for PDFs, but in practice there are relatively few PDF/UA compliant documents on higher education websites.
Microsoft Word provides accessibility options that transfer into PDFs. Grackledocs exists specifically to produce PDF/UAs from G Suite sources, but higher ed uptake is low.
PDFs are normally generated from third-party applications. In practice, source documents become separated from their uploaded PDFs and few organisations have PDF editing software to update or correct content. HTML content is easily and readily updated.
PDFs can also produce quirky user experiences. Depending on the browser, a PDF may automatically download (to be opened by a third-party application), open in the same window as the clicked link, or open in a new tab. All diverging from leading UX practice.
Perhaps PDFs’ greatest sin, in a world of mobile-first design, is that they aren’t inherently responsive. They simply don’t display as well as HTML pages on mobile devices, often requiring pinching or swiping to re-size content for comfortable reading.
What’s the Solution?
Our working hypothesis is that higher education websites exist to give audiences accurate, understandable answers to their questions.
We need to understand when PDFs meet end user needs and when those are better met by HTML pages. Our earlier analysis strongly points to PDFs only being the preferred solution in a minority of cases.
If we were building websites from scratch it would be relatively easy to distinguish the two cases. In the real world, pages and PDFs already exist. To discover how many PDFs may be needed, we can ‘survey’ the relevant websites recording details of each PDF we find.
It turns out that discovery is highly automatable and, when combined with an accurate web estate ‘map’ or registry, surveys can be staged to make data analysis more tractable.
Site scanning generates metadata allowing the size, number of pages, web page location, accessibility and other PDF characteristics, such as source software, creation date or content titles to be identified.
Surveys reveal duplicate files, missing files (no file associated with a link) and lists of PDFs that need further analysis to determine their fate. One approach divides PDFs into:
- Keep – PDF is the preferred solution and meets editorial and accessibility standards;
- Update – an updated PDF is the preferred solution and could meet editorial and accessibility standards if revised;
- Retire – HTML content would be better. Sometimes, new content will have to be produced as part of an overall content update. Most times, the links and associated PDFs can be readily pruned
Devolved content production means that website PDFs can continually increase, unless pruned. Periodically re-running surveys highlights new concerns and confirms that earlier changes have actually been implemented.