Breakout1B |

Version Compare

Back to page history

Version User Scope of changes
Jul 2 2008, 12:21 PM EDT (current) Anonymous 8 words added, 10 words deleted
Jul 2 2008, 12:19 PM EDT Anonymous 412 words added

Changes

Key:  Additions   Deletions
Technical Barriers to Web Resource Preservation

The strategic importance of institutional Web sites is now widely recognised but their preservation often remains low priority. Why is this? What are the barriers that hinder those involved from embedding Web resource preservation strategies into their working practice?

Full contemporaneous notes are here on Google docs.
What are the main technical challenges related to Web resource preservation?

  • data structure
  • External dependancies
  • Database driven content
  • Identify content to archive
  • Format decay
  • How long to maintain for?
  • Copyright
  • Maintenance
  • Interoperability of standards
  • numerous servers, sub domains
  • storage
  • recreation of user experience
  • dynamic content

Identify the top three challenges.

  • URIs - organisation of web site
  • Deep Web
  • External Services

Identify solutions to these challenges.
  • Redirects
  • Tools

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Further notes are given below.below (based on notes held Google docs).

ORGANISATION OF WEBSITE, URIs etc:

  • Subdomain setup for preservation; CMS unable to cope
  • Multiple sites (subdomains) on same server not cross linked
  • Mirroring/spidering tools can't find them all (HTTrack takes domain as starting point... without links can't find other subdoms)
  • Resource discovery
  • Preservation/persistence of URLs.
  • Managing redirects. Difficult with tools available
  • Versioning (not supported in URIs)
  • Do we need more formal policy about URI management allocation?
  • Website that doesn't reflect organisational structure is not good for management, navigation, customers etc. - but n.b. organisation structure may change.
  • "Architectural approach" to managing URI schemes etc. (but not bound to transient organisational structures - “logical” structure)


STORAGE SPACE:

  • Storage space issue with automated archiving where pages changed frequently.
  • (WE need to define/determine how CMS tools behave when archiving/versioning.)


DEEP/DYNAMIC WEBSITES:

  • Spidering not always appropriate for db-driven websites. How to do database snapshots meaningfully and practically?
  • How to recreate user experience?
  • PERSONALISATION of web sites/experiences: how to capture that.


OWNERSHIP:

  • "Temporary" websites. Don't know who or what school in university is responsible


WHAT TO KEEP:

  • HOW to decide when keeping the user experience is important; or when underlying information is enough?
  • HOW to preserve when external services used (e.g. Google Maps, Facebook)
  • HR and recruitment- if external services used. (Contract with external service should reflect institutional policy.)
  • Streaming media?
  • ReSTORE: web resources must be mature and complete (closed? finished?) before archiving
  • IPR issues around taking resources and putting them in a new domain.
  • ReSTORE is a bit like a "Home for orphaned websites" (cp. UKWAC?)
  • Quality control - appraisal
  • Distinction between archives and records. Organisation would have specific rules about location etc of archive


WHY KEEP?

  • What's the purpose of preservation? Will dictate different solutions. ReSTORE, UKWAC can archive "finished" things. Other needs may be dynamic.
  • Example: Scholarship application deadline contested. CMS history proved applicant was wrong.



PRESERVATION MANAGEMENT AND PLANNING

  • CURATION of resource needs to be actively managed and funded (WILL PoWR guidance include details of costs and implications?)
  • (Retention period: 5-10 years typically: keeping h/w and s/w to maintain resources over that period ought to be possible.)
  • Maybe we do have to preserve the equipment/browser/SW? Is it viable?
  • These things need to be built in from the beginning. "Funding for project website..." doesn't include preervation planning.
  • Recrords retention policy may make no reference to web.
  • How do we distinguish between retention periods for different web pages? (Automatically)
  • Archive.org useful - but beware of depending on it completely.