Born-Digital Records in Practice at UAlbany
Gregory Wiedeman
University Archivist
Some Background
M.E. Grenander Special Collections & Archives
University at Albany, SUNY
- 4 permanent archivists, 1 on grant funding
- Department Head
- Supervisory Archivist (Manuscripts and Front Desk)
- Curator of Digital Collections
- University Archivist
- 3-4 Graduate students
- Undergraduate work study students
Overview of University Archives
- University Records
- Office of the President
- University Senate
- University Council
- Office of the Provost
- Graduate & Undergraduate Education
- Records of Schools and Colleges
- Records of Academic Departments
- Web Archives
- Student Groups and Manuscripts
- Student Association
- Albany Student Press
- Faculty and Alumni Papers
Collecting Background
- Formal records management program until cut in 1990s
- Very effective collecting in 1970s
- Records Management “distributed” among offices
- Permanent Records should come to me
- Large paper backlog
- Establishment of extensible processing practices
What does this have to do with Born-Digital Records
- Concensus that Disk Imaging is most effective way to preserve born-digital records
- This poses real problems for public records
A Bit about File Systems
- File Allocation Table (FAT)
- New Technology File System (NTFS)
- HFS Plus
- ext4
- exFAT
How File Systems Work
- Not designed for preservation
- Designed for efficiency
- Quick Retrieval
Inefficient Storage
Efficient Storage
File Systems are designed for Efficiency
- Files are abstractions
- Pointers to where data is stored
- Unix systems use Inode pointers
- NTFS uses $MFT (Master File Table)
https://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/
Where is a File?
Where do Files Live?
Where do Files Live?
NTFS on Windows
Brian Carrier, File System Forensic Analysis (Addison-Wesley, 2005), p. 283
$MFT (Master File Table)
Carrier, p. 280
$MFT (Master File Table)
Carrier, p. 295
$MFT Timestamps Demo
Deleting Files
Efficiency in Sectors
Efficiency in Sectors
Carrier, p. 188
Slack Space
Carrier, p. 188
Imaging the Disk
-->
Disk Images
- Unix disk dump utility
- FTK Imager
- Guymanger
- ImgBurn
What Can We Learn from a Disk Image?
- File metadata
- Timestamps
- Logs and Journals
- Registries
- Deleted Files not overwritten
- Data in Slack Space
- Corrupted Data
So, What does all this mean?
- Disk imaging keeps all the bits
- More than what you can see on your computer
- Great for manuscripts
- David Baldus Papers
- Maurice Hinchey Papers
Overview of University Archives
- University Records
- Office of the President
- University Senate
- University Council
- Office of the Provost
- Graduate & Undergraduate Education
- Records of Schools and Colleges
- Records of Academic Departments
- Web Archives
- Student Groups and Manuscripts
- Student Association
- Albany Student Press
- Faculty and Alumni Papers
Records Created by the University are Public Records
University Records are subject to FOIL
Born-Digital Records Collecting in Practice
- Files dispersed around the University
- Local Computers
- Network Shares
- Cloud Storage
ANTS issues
- Difficulty to get records creators to commit
- Training
- Required wider commitment
- Availability of network shares
- Maintenance
- Library and packaging issues
- Really an authentication issue
Transfer Scripts
- Network Folder Share
- Archives and creator has access
- Python script run on task scheduler
- Weekly checks for new files
- CSV log files of files transferred
- Creates XML accession metadata file
- Runs createSIP.py command line tool
createSIP.py demo
What is a SIP?
- Different interpretations of SIP, AIP, DIP
Into ArchivesSpace
Born-Digital Photography
Maintenance
- Scripts break over time
- No more ad-hoc approaches
- No more XML data stores
- Network of well-maintained interoperable tools
- ArchivesSpace Migration
Espy Project
Fedora 4 Data Model
Portland Common Data Model
Hydra::Works
Other Projects
Questions