Web Estate Registry
University and college websites typically develop organically, producing estates of hundreds or even thousands of autonomous websites, supported with disparate levels of resources and expertise. As a result, many institutions are unable to answer the following types of questions:
Security & Privacy
While content and technical set-up on individual websites creates risk exposures, the risks increase materially with growing numbers and limited knowledge about each site.
We can help you answer three basic questions to assess your web estate's risk profile:
- Exactly how many websites do we own?
- Who maintains each of these sites?
- What underlying applications and technical infrastructure do they use?
Web Estate Risk Exposures
Financial risks as a result of:
- inaccurate, out-of-date, inaccessible, inappropriate and low quality content
- non-mobile friendly and non-responsive sites
- unclear branding
- redundant/legacy technology,
- duplicated hosting contracts,
- cost inefficiencies and potential revenue losses
Legal & Regulatory risks caused by:
- unclear content copyrights
- unenforced data privacy policies
- uncontrolled cookie use and incomplete policy implementation
- unmanaged social media implementations
Security risks resulting from:
- unpatched content management systems and web servers
- insecure website connections
- untested site back-ups
- incomplete HTTPS implementations
- on-page security issues
Our web estate registry service identifies all of a web estate's sites, evaluating and recording critical data about each site
Assessing Web Estate Risk Exposures
Our three-step process identifies all the sites within a web estate, then evaluates and records critical data about each site.
A highly automated survey explores and discovers the full scope and scale of an institution's websites by:
- finding core web servers and content management systems (CMS)
- and, iteratively scanning to uncover further sites within the estate
The survey output is a comprehensive list of the websites in a higher education institution's web estate.
Sites identified by a survey are systematically tested to collect data about:
- technologies - security measures implemented, web server configuration and set-up, content management system(s)
- site configuration - cookies, metadata characteristics, policy and privacy links and page counts
Evaluations capture and record site-level data to assess and understand potential risk exposures.
The survey and audit data, in turn, populates a Web Estate Registry to:
- deliver a central database of an institution's websites and critical data about each site
- provide the key data to explore, identify and evaluate potential risk exposures
Periodically re-running surveys and audits keeps data current and reliable.
Web Estate Surveys
Where to Start?
Higher education institutions typically possess partial website lists and can poll the organisation to add further candidate sites. The resulting composite site list can be used to seed a comprehensive automated survey.
In addition to setting initial conditions, a survey needs some intelligent boundaries, to prevent page and link scanning that does not yield useful information. In planning a survey, a few basic questions need to be answered:
- Which IP address ranges are relevant?
- Which domains should be examined?
- Should the survey cover public-facing and internal websites?
- How should we discriminate 'services' sharing a web server from a website?
- Which 'well-known' domains can be safely ignored as not relevant to the exercise?
When to Stop?
The longest survey phase involves systematically checking the seed list's URLs and examining every link and page on the relevant sites to identify connections to other candidate sites.
In practice, limiting scanning to 10,000 to 15,000 pages on a website or ignoring pages for which a server responds with a date/time stamp older than five years can shorten the time needed to complete this exercise.
The scanning exercise delivers a massive list of URLs for intelligent harvesting to yield a shortlist of candidate sites for addition to a Web Estate Registry.
Scanning and site identification continues until such time as the URL analysis shows no new servers or sites are being identified.
Evaluating Web Estates
Website Data Collection
A site evaluation collects two types of data for each website: data about the underlying technology infrastructure and data about the state of the website implementation.
For the latter, data could be collected for every page on every website, but this approach is more likely to obscure than illuminate. In practice, for risk identification, it is likely sufficient to use website implementation data for each website’s home page.
The type of underlying technology infrastructure data being collected includes:
- Protocol/scheme in use (HTTP vs. HTTPS).
- Web server technology reported by the server (operating system, serving software and supporting technologies, e.g. ASP.NET or PHP etc.).
- Web server content management system (CMS) as reported by the CMS and by independent analysis.
The following type of ‘fundamental’ data about each website’s home page implementation can be collected for subsequent evaluation and analysis:
- Page metadata – title/description and other ‘tags’.
- Cookie use [reporting when first set / cleared].
- Presence of links to specific types of pages (policy statements) or documents.
- Accessibility as compared with the WCAG 2.0.
- Total counts of scanned pages for each (recorded during the survey).
The individual website evaluation data populates a Web Estate Registry, where it can be combined with other classification and user-determined data elements
Web Estate Registry
Analysis and Data
Our risk matrix identifies many of the potential exposures arising from operating a website and those that specifically affect university and college websites.
In practice, surveys and audits should be re-run periodically to ensure the registry holds current data for each website.
The registry database also holds data that cannot necessarily be collected automatically, such as details of the individuals responsible for each site’s maintenance or user-defined classification and categorisation of specific types of website.
The database's data facilitates analysis of the full range of risks that can arise from operating and owning large numbers of autonomously operated websites:
- Financial: the registry can help identify redundant/legacy technology, duplicated hosting contracts, point to potential cost inefficiencies and aid in identifying revenue losses
- Legal & Regulatory: the registry can aid in issues caused by unclear content copyrights, unenforced data privacy policies, unmanaged cookie use or incomplete policy implementation
- Security: in establishing the current technology infrastructure status, the registry aids in minimising risks from unpatched content management systems and web servers, insecure connections and untested site back-ups