You Are Responsible for All Breakages!
A frequently highlighted issue in managing complex, content-rich websites is finding and fixing broken links.
Broken links prevent site visitors from accessing relevant content, exposing them to '404 error' messages in the process, they reduce the effectiveness of searches and degrade the perceived quality of a site and the reliability of its information. Too many 404 errors can even have SEO implications and impact Google and Bing search engine rankings. All good reasons to minimise the number of broken links on a site.
Breakages arise when content is revised, updated or re-configured, whether by multiple content creators or a central editor. When changes are implemented quickly it is easy to introduce links that don’t work and it can be hard to track them down and repair them.
How many broken links are there on a ‘typical’ higher education website and how much of a problem are breakages, in practice?
In this week’s blog, we discuss an exercise we recently completed to analyse the number of broken links present on the pages of 55 UK university websites.
Starting at the home page, we scanned 1,000 pages checking the links and resources on each of those pages to get solid data about the scope and scale of broken links. Overall, our crawler examined 54,124 pages and tested 166,252 links.
Of the 55 sites scanned, 11% had no broken links: they did have broken resources needed to render certain pages, but all the content was accessible and searchable via Google/Bing. We note that two of these sites have installed a site content auditing tool and it seems to be performing the task it is intended to do.
The balance of the sites all had broken links (and resources) present in varying degrees. Our preferred statistic to indicate the prevalence of breakages is the number of broken links per 100 web pages.
For the 55 websites, we analysed the average number of broken links is 5.92 per 100 pages – ranging from zero to 31.43 per hundred pages. The graph illustrates the distribution of the occurrence of broken links per 100 pages across the sample group. Just under 60% of the sites have five or fewer broken links for every 100 pages, most of the balance of the sites (38%) have between six and 20 broken links per 100 pages. And, 2% of the sites have in excess of 20 broken links per 100 pages.
Figure 1: Distribution of the broken link rates found for the first 1,000 pages scanned on 55 UK university websites. Most (58%) of the websites had five or fewer broken links for each 100 web pages. The total number of pages scanned was 54,124 with 166,252 associated links. Data gathered mid-January 2017.
The average prevalence of broken resources per 100 pages is much smaller than for broken links at 0.8 broken resources per 100 pages. Broken resources are more insidious than broken link and harder to track down and resolve over hundreds or even thousands of pages on a website.
We looked at whether there is any relationship between the relative complexity of a web page, as measured by the number of links on the page and the prevalence of broken links. Across the sample of 55 university websites, each page had an average of three (3.07) links per page.
Our regression analysis of the data says there is no significant correlation, with the R2 value for complexity plotted against broken links per 100 pages being 0.04.
The one number to walk away from this blog post with is the average UK university website has 5.92 broken links per 100 web pages. Another way of looking at this figure is that on average for every 17 pages viewed by visitors to these sites we would expect one of the visitors to encounter a broken link.
Apply that metric to some of the larger university and college websites:
- University of Oxford: 6,160,000 pages – implies 364,672 broken links
- California Institute of Technology: 2,100,000 pages – implies 124,320 broken links
- Stanford University 13,800,000 pages – implies 816,960 broken links
- University of Cambridge 3,670,000 pages – implies 217,264 broken links
- Massachusetts Institute of Technology 16,700,000 pages – implies 988,640 broken links
Broken links prevent site visitors from accessing relevant content and expose them to 404 error messages that they really shouldn’t see. Visitor engagement on most websites is in the single digits, so asking visitors to report broken links is unlikely to be effective in identifying this problem.
A systematic approach is needed. Manually testing links can work for very small sites and Google searches will reveal a number of online tools for testing individual pages.
To be consistent with the professionalism applied elsewhere to college and university site development, design and maintenance, verification needs to be conducted regularly, comprehensively and systematically.
In practice, scanning pages for broken links is just one of a battery of useful tests that should be carried out regularly across a site to ensure that all content renders as appropriate, is accessible via search, referrals from other sites and via social media.
Realistically, scanning large websites needs to be automated, linked to verifying the XML sitemap and robots.txt settings and writing scan results to a database for period-to-period comparison to monitor subsequent remediation efforts.