Web Estate Registry
Higher education institutions typically let websites develop organically, leading to web estates with hundreds or thousands of autonomous sites. Estates in which the precise number of sites is unknown, individual site ownership is unclear and for which there is no effective digital oversight.
Our service discovers every website logging key site data to create a comprehensive web estate registry. It then continuously monitors sites on a user-defined schedule.
And, the registry provides accurate and timely data for digital governance and oversight and to make digital marketing and communications more effective.
Data to let higher education institutions answer wider questions such as:
Security & Privacy
Starting from an initial seed list of urls our scanning software intelligently follows every link to discover all the websites and microsites in an institution’s web estate.
As each site is identified the scanning process captures the following base data:
- technologies - security measures implemented, web server configuration and set-up, content management system(s)
- site configuration - cookies, policy and privacy links and page counts
- content – content types used and metadata
The Web Estate Registry holds:
- a central, single-source-of-the-truth database of all of an institution's websites, content owners and critical site data
- the key information to explore, identify and evaluate website enhancement and risk minimization opportunities
Ongoing Web Estate Monitoring
Web Estate Registry with Multiple Registers
A registry of all the sites within a web estate can be arranged as a set of sub-registers: reflecting a university or college’s reporting needs. The system reports the total number of sub-registers and the number of websites in each sub-register.
Sample Register and Websites
Registry and sub-register data can be filtered, search and queried to answer institution-wide, or site-specific, questions. For example, which content management systems do our sites use? Which sites still need to upgrade to HTTPS? What Facebook or Twitter accounts do our sites use?
Website Summary - More Detail Accessed via Top Right HTML/PDF Icons
Where to Start?
Institutions often can poll website owners and using existing lists to seed a comprehensive automated discovery exercise.
As well as needing somewhere to start, discovery needs careful planning to determine:
- Which IP address ranges are relevant?
- Which domains should be examined?
- Should the exercise apply to internal websites as well as public-facing ones?
When to Stop?
Discovery means systematically checking every page link to uncover other relevant sites.
Scanning and site identification is iterative, continuing until no new servers or sites are identified.
In practice, scans can often be limited to thousands of pages, selectively inspecting server time stamps to focus on recent content, thus shortening the discovery phase.
A scanning exercise delivers a candidate list of URLs that can be filtered to a list of sites for loading to a registry.
Data collection acquires data about each website's underlying technology infrastructure, the website implementation and relevant page content.
Up-to-date page content and site configuration information lets you understand:
- Web page metadata – titles, descriptions and other elements
- Cookies being used
- Whether privacy, accessibility and other policy statements are present
- Accessibility as compared with the WCAG 2.0
- Total counts of scanned pages for each (recorded during the survey)
Allowing marketing, communications, content editors and developers to identify and respond to user experience and related concerns, as needed.
With the current and accurate data collected for each website technical staff will know:
- Protocols/schemes in use (HTTP vs. HTTPS)
- Web server technology reported by the server
- Web content management system (CMS) from the CMS and by independent analysis
And, be able to change, modify and update systems and servers as appropriate.
Each site's data is automatically updated on a user-defined schedule.
Analysis can be carried out across sites or within groups of sites to identify and resolve systemic issues, to identify common risk exposures, to support web governance initiatives and comply with marketing and communications branding and other policies.
The web estate registry database can be queried to:
- generate status reports, for example all sites using a specific version of WordPress
- produce risk exposure reports covering financial, legal and regulatory and security risks
- highlight content quality, user experience or other issues affecting digital marketing and communications campaigns.