A Sustainable, Large-Scale, Minimal Approach to Accessing Web Archives


Gregory Wiedeman
University Archivist
University at Albany, SUNY

@gregwiedeman

github.com/UAlbanyArchives

Web Archiving at UAlbany

  • Partner since 2012
  • Primary goal to preserve permanent university records
  • Became my responsibility in April 2015
  • www.albany.edu
    • subdomains
  • www.ualbanysports.com
  • www.albanystudentpress.net

Web Archiving at UAlbany

  • Began outside collecting this year
  • New York State Politics, Labor, Capital Punishment
    • New York Civil Liberties Union
    • Environmental Advocates of New York
    • WAMC (NPR station)
    • New York State Business Council
    • Civil Service Employees Association (CSEA)
    • Senator Kristen Gillibrand
    • Senator Chuck Schumer

Public Access

  • Effectively none
  • Records that need to be discoverable

    Records that need to be discoverable

    Sustainable Approach to Large-Scale Access

    • Intergration with traditional collections
    • Same, format-neutral access system
    • Arrangement by use and content, not format

    Relationship with Traditional Collections

    • One-to-one: fonds to Archive-It Collection
      • New York Civil Liberties Union
      • Environmental Advocates of New York
    • Many-to-one: many fonds to one Archive-It Collection
      • Campus Offices
        • Office of the Provost
        • Office of Facilities Management
      • Academic departments
        • Department of Africana Studies
        • College of Engineering and Applied Sciences

    Integration with Collection Management

    • Old custom CMS database for collection management
      • No API or way to reuse data
    • Plan to move to ArchivesSpace in Fall 2016
    • Have needs now

    Integration with Collection Management

    Integration with Collection Management

    Integration with Collection Management

    Future: ArchivesSpace Integration

    Access

    • Archives public access systems are terrible
    • Currently testing new system

    Needs

    • API for provenance information
      • Seeds
      • Scoping rules
      • Crawl limits
    • API for search