Search Engine Optimisation (or Optimization, if you prefer) comprises a substantial body of knowledge with many sub-topics and associated subject areas. One excellent resource covering most SEO elements is Moz's Beginner's Guide to SEO - we highly recommend reading it.
In the MOZ guide, chapter 4 covers The Basics of Search Engine Friendly Design and Development. The chapter runs through indexable content, crawlable links, keywords, URL structures, canonical content and on-page optimisation. Our focus is on the latter, illustrating the extent of the potential issues and providing guidance on resolving them. For the purposes of this post we will refer to the topic as on-site optimisation, as, in our work we look at entire sites rather than individual pages.
The four on-site SEO basics we will review are:
- Title Elements (Tags)
- Meta Description Elements (Tags)
- Meta Keywords
Please excuse a slightly pedantic explanation about tags and elements. In HTML, elements are the ‘concept’ of parts of a document. According to HTML specifications tags comprise the text appearing between elements of a document. In common parlance tags and elements are interchangeable. But, as we read and implement specifications we’ll stick with the official definition (whenever we remember).
In reiterating these SEO best practices we’ve gone to considerable lengths to determine current practice in the field, to ensure that our recommendations address material issues found on college and university websites. In preparation for this post we examined the first 1,000 web pages and associated links for 97 UK university websites, starting our review from the main home page and stopping after we had crawled 1,000 pages of the site.
Our conclusions about the relative prevalence of issues associated with the four SEO basics covered in this post are based on the data from the 97 websites.
Title elements provide search engines (and users) with a concise description of the content that will be found on a web page. The text included between the tags is displayed as the clickable first line in search results. Google displays approximately 55 text characters in search results, before truncating the text found in the title element's content.
Page titles also appear as text in web browser tabs, aiding users in finding specific pages among many open tabs, and are used as the default text to populate social media posts when sharing page content, in the absence of other structured mark up on the page.
The title element appears in a web page as follows:
Name of University or College
Google has a number of practical suggestions for effective page titles on websites:
- Make sure every page on your site has a title;
- Use distinct, unique titles for each page on your site;
- Ensure that page titles are descriptive and concise; and,
- Brand your titles, but concisely and conservatively.
How closely do websites adhere to Google’s recommendations? We checked the title element attributes on 97,000 UK university website pages and carried out the following tests:
Are there any pages on the website that are missing the title element? Of the 97 sites tested, 34% were missing title elements on one or more pages. In most cases missing titles only occurred on a handful (albeit hard-to-locate) pages. Two sites had missing titles on 50% of the pages scanned.
Are there any pages on the website on which the title text is duplicated? Ninety-eight percent of the sites had pages with duplicate titles. Just over 50% of the sites had one quarter or more of the 1,000 pages scanned with duplicate titles. Given all these sites use content management systems to generate web pages, tuning the CMS would eliminate most of these occurrences.
Are there any pages on the website on which the title element content exceeds 55 characters in length? Ninety-seven percent of the sites tested had pages on which the title text exceeds 55 characters. About 10% of the offending sites had three quarters of the site pages with excess title length, at the same time about 25% of the offending sites had less than 1/10th of the site pages affected. Overall, only 3% of the sites conformed to SEO best practice. It takes a degree of diligence to track down offending pages to optimise the title text.
Are there any pages on the website that have multiple title elements? Of the 97 sites tested, just 8% had pages with multiple title elements that would give Google and Bing pause for thought. One site in particular appeared to have a CMS configuration issues that is generating multiple title elements on most pages. Overall, multiple title elements is not a major issue.
We note that only one website ‘sailed’ through the title tests without incurring any errors.
All the university websites we examined were generated by content management systems of varying vintages. CMSs are generally capable of generating unique page titles, although it takes content editor intervention to ensure that the page title (and descriptions) best fit the associated content.
Meta Description Elements
The text included in the content attribute of the description element provides users (and potentially, search engines) with a relevant précis of the content found on a page. The description is, normally, displayed as the second line of search engine results and should be about
155 characters .
We qualified the use of the description element in the previous paragraph because Google may choose to use a snippet of text from the page if it deems the text to be more relevant to the page topic than the description text or in the absence of description text. However, for most sites it makes sense to include appropriate description text.
We note that descriptions have no impact on search rankings, but they can increase the click through from search results by expanding on the search headline and being highly relevant to the topic being searched.
Google also has suggestions for effective page descriptions on websites:
- Make sure that every page has a meta description;
- Differentiate the descriptions for different pages;
- Make sure descriptions are truly descriptive; and,
- Programmatically generate descriptions, as needed.
How closely do websites adhere to Google’s description recommendations? We checked the meta description element attributes across 97,000 UK university website pages and carried out two tests:
Are there any pages on the website that are missing descriptions? Of the 97 sites tested, 93% were missing descriptions on one or more pages. Forty percent of the sites tested had descriptions missing from 50% or more of the pages tested. Only seven sites had descriptions on all the pages tested.
Are there any pages on the website on which the description is duplicated? This turns out to be much less of a problem. Although 89% of sites have duplicate descriptions, for about 45% of these sites the problem occurs on less than 10% of the pages. However, a small cluster of nine sites had duplicate descriptions on more than 50% of the web pages audited.
Given that all the sites use a content management systems to generate web pages, the CMSs can be configured to warn of content being published without a description. However, external verification is generally required to identify duplication or near duplication.
To illustrate the effect of these issues, here's a live example of a search engine result reflecting a title that is too long and duplicate descriptions on two web pages:
The meta keyword description is no longer relevant for SEO purposes. There is a minor debate over whether including terms in this meta tag has a negative impact, but search engines have become more sophisticated and ignore terms place in this tag.
So, why did we test for it? Our research strongly suggests that the occurrence of pages with keyword meta descriptions is a proxy for content that is neglected or rarely updated. And, on large, complex, content-rich websites, it is easy for content to be ignored.
As a result, running a simple test of identifying pages with keyword definitions can highlight material that could be archived or deleted. The hypothesis that this material can be removed can be further tested by downloading a ranked list of pages visited from Google Analytics and matching the lists by URL in a spreadsheet.
Google clearly states that the meta keyword description is not used, so there’s no compelling reason to include them. However, CMSs and CMS plugins sometimes insert author names or blog category descriptions into a meta keyword description and this 'trivial usage' likely does no harm.
How closely do websites adhere to Google’s no keyword recommendation? We conducted a single test for keywords:
Are there any pages on the website that include meta keyword descriptions? About 1/3rd of sites have a few keywords present, and by a few we mean fewer than 10 across 1,000 pages. Another 1/3rd have a few pages (10+) to about 30% of the site’s pages with keywords. And, the final third have keywords on 1/3rd to all of the website pages.
Our general recommendation would be to leave the meta keyword description empty, these keywords do nothing for Google ranking and only alert competing colleges and universities to the keywords for which your institution wishes to be found/ranked. On our CMS it is a user option to omit keywords.
Most college and university websites use a content management system to generate web pages, which may result in the same content being accessed through several different URLs. For example, a blog may list its posts by author or by category, as well as a simple listing of the entries. A CMS will typically generate category or author URLs to access the original posts. Defining a canonical (definitive or preferred) URL allows Google, Bing and other search engines to correctly assign link and ranking signals to the preferred or definitive version of the content. This, in turn, improves the likelihood that the content will be found.
Ambiguity over the definitive website version can also arise as a result of the way in which the web server has been configured. Without the use of canonicalisation Google would see the following as four different and distinct websites (because, they are):
How to remedy this issue when it arises? Add a element with the attribute rel="canonical" to the element of the relevant pages, which is readily handled by most CMSs.
Google and most of the SEO sources we consulted in preparing this post state that the canonical reference should be absolute and not relative. This is a departure from the usual practice of using relative references for internal links on websites.
How much of a problem is this in practice? In our review of canonical content, we tested the websites as follows:
Are there any pages on the website that are missing the rel="canonical" attribute? Of the 97 sites tested 58% did not have the rel="canonical" attribute on one or more pages. A subset of 15% of sites had not implemented the rel="canonical" attribute over more than 85% of the website pages tested.
Are there any pages on the website that have multiple rel="canonical" attributes? Of the 97 sites tested 2% had more than one rel="canonical" attribute on one or more pages. This is not a major problem, save for the one site that had multiple rel="canonical" attributes on almost 60% of the pages tested.
Are there any pages on the website where there is a ‘scheme’ mismatch with the rel="canonical" attribute: that is, the site is set up as HTTPS, but the rel="canonical" reference is set as HTTP or vice versa? Of the 97 sites tested 32%, that had rel="canonical" attributes in place, exhibited a scheme mismatch. However, as websites transition to HTTPS in 2017, this issue should diminish.
Overall only 4% of the 97 sites we tested came through the three tests without error.
Despite the capabilities of content management systems, websites can still drift away from best practice and the miss out on the benefits of basic SEO techniques.
It is prudent to scan sites regularly – not once every two years – to identify SEO and other issues, determine the root cause and correct them before they materially impact website performance, content effectiveness and the overall visitor experience.