Web Estate Registry

Autonomous website creation and maintenance can mean the number of sites, their condition and the risks they pose are all unknown
Reduce risk, restore order and make sites more effective using our web estate registry service to find, log and monitor all of your websites

University and college websites typically develop organically, producing estates of hundreds or even thousands of autonomous websites, supported with disparate levels of resources and expertise. As a result, many institutions are unable to answer the following types of questions:

GDPR

With the EU's General Data Protection Regulation coming into force, how many of our websites will be affected? How many of our sites and on which pages do we use forms to gather personal data?

Internal Audit

Our internal audit group needs a list of all our websites for a value-for-money study of hosting service usage. Do we have such a list? In fact, how many websites do we have? How many are hosted internally versus externally?

Accessibility

Our websites must be accessible. In implementing institution-wide accessibility, how many sites would be involved? How many content management systems would be affected? How many pages?

Security & Privacy

We want all of our institution's websites to offer secure HTTPS connections. How many of our sites does this apply to? What web servers do we currently use? Are our HTTPS sites using certificates from our preferred supplier?

While content and technical set-up on individual websites creates risk exposures, the risks increase materially with growing numbers and limited knowledge about each site.

We can help you answer three basic questions to assess your web estate's risk profile:

  1. Exactly how many websites do we own?
  2. Who maintains each of these sites?
  3. What underlying applications and technical infrastructure do they use?

Web Estate Risk Exposures

Web estates typically expose higher education institutions to three risk areas:

Financial risks as a result of:

  • inaccurate, out-of-date, inaccessible, inappropriate and low quality content
  • non-mobile friendly and non-responsive sites
  • unclear branding
  • redundant/legacy technology,
  • duplicated hosting contracts,
  • cost inefficiencies and potential revenue losses

Legal & Regulatory risks caused by:

  • unclear content copyrights
  • unenforced data privacy policies
  • uncontrolled cookie use and incomplete policy implementation
  • unmanaged social media implementations

Security risks resulting from:

  • unpatched content management systems and web servers
  • insecure website connections
  • untested site back-ups
  • incomplete HTTPS implementations
  • on-page security issues

Our web estate registry service identifies all of a web estate's sites, evaluating and recording critical data about each site

Assessing Web Estate Risk Exposures

Our three-step process identifies all the sites within a web estate, then evaluates and records critical data about each site.

Survey

A highly automated survey explores and discovers the full scope and scale of an institution's websites by:

  • finding core web servers and content management systems (CMS)
  • and, iteratively scanning to uncover further sites within the estate

The survey output is a comprehensive list of the websites in a higher education institution's web estate.

Evaluation

Sites identified by a survey are systematically tested to collect data about:

  • technologies - security measures implemented, web server configuration and set-up, content management system(s)
  • site configuration - cookies, metadata characteristics, policy and privacy links and page counts

Evaluations capture and record site-level data to assess and understand potential risk exposures.

Registration

The survey and audit data, in turn, populates a Web Estate Registry to:

  • deliver a central database of an institution's websites and critical data about each site
  • provide the key data to explore, identify and evaluate potential risk exposures

Periodically re-running surveys and audits keeps data current and reliable.


Read more about web estate registries

Web Estate Surveys

Surveys identify all the websites hgher education institutions own or are legally or technically responsible for managing.
Image

Where to Start?

Higher education institutions typically possess partial website lists and can poll the organisation to add further candidate sites. The resulting composite site list can be used to seed a comprehensive automated survey.

In addition to setting initial conditions, a survey needs some intelligent boundaries, to prevent page and link scanning that does not yield useful information.  In planning a survey, a few basic questions need to be answered:

  • Which IP address ranges are relevant?
  • Which domains should be examined?
  • Should the survey cover public-facing and internal websites?
  • How should we discriminate 'services' sharing a web server from a website?
  • Which 'well-known' domains can be safely ignored as not relevant to the exercise?

When to Stop?

The longest survey phase involves systematically checking the seed list's URLs and examining every link and page on the relevant sites to identify connections to other candidate sites.

In practice, limiting scanning to 10,000 to 15,000 pages on a website or ignoring pages for which a server responds with a date/time stamp older than five years can shorten the time needed to complete this exercise.

The scanning exercise delivers a massive list of URLs for intelligent harvesting to yield a shortlist of candidate sites for addition to a Web Estate Registry.

Scanning and site identification continues until such time as the URL analysis shows no new servers or sites are being identified.

Evaluating Web Estates

Surveys initiate an automated data collection and website evaluation process. The common data elements collected and catalogued for each site facilitate risk evaluation and resolution of specific exposure.
Image

Website Data Collection

A site evaluation collects two types of data for each website: data about the underlying technology infrastructure and data about the state of the website implementation. 

For the latter, data could be collected for every page on every website, but this approach is more likely to obscure than illuminate. In practice, for risk identification, it is likely sufficient to use website implementation data for each website’s home page.

The type of underlying technology infrastructure data being collected includes:

The following type of ‘fundamental’ data about each website’s home page implementation can be collected for subsequent evaluation and analysis: 

  • Page metadata – title/description and other ‘tags’.
  • Implementation of JavaScript to provide analytics or page rendering.
  • Cookie use [reporting when first set / cleared].
  • Presence of links to specific types of pages (policy statements) or documents.
  • Accessibility as compared with the WCAG 2.0.
  • Total counts of scanned pages for each (recorded during the survey).

The individual website evaluation data populates a Web Estate Registry, where it can be combined with other classification and user-determined data elements

Web Estate Registry

A web estate registry holds website-specific data collected from surveys and audits.  The aggregate data catalogues all website to help assess and evaluate the specific risks each website poses and to identify overall web estate trends.
Image

Analysis and Data

Our risk matrix identifies many of the potential exposures arising from operating a website and those that specifically affect university and college websites.

In practice, surveys and audits should be re-run periodically to ensure the registry holds current data for each website.

The registry database also holds data that cannot necessarily be collected automatically, such as details of the individuals responsible for each site’s maintenance or user-defined classification and categorisation of specific types of website.

Reporting

The database's data facilitates analysis of the full range of risks that can arise from operating and owning large numbers of autonomously operated websites:

  • Financial: the registry can help identify redundant/legacy technology, duplicated hosting contracts, point to potential cost inefficiencies and aid in identifying revenue losses
  • Legal & Regulatory: the registry can aid in issues caused by unclear content copyrights, unenforced data privacy policies, unmanaged cookie use or incomplete policy implementation
  • Security: in establishing the current technology infrastructure status, the registry aids in minimising risks from unpatched content management systems and web servers, insecure connections and untested site back-ups

Contacts

North America
+1 416 464 9771
 
Europe
+44 203 290 3575
 
Information
 

Locations

North America
Toronto | Canada
 
Europe
Edinburgh | UK