How to Metadata
for Digital Libraries
How Computers See Metadata
- Code
- 100010010111010101010110000100111101010101010101
- Layers of Abstraction
- Uppercase "A":
- ASCII Binary: 01000001
- ASCII Hexidecimal: 41
- Unicode: U+0041
Decision 1: Use Software or Text?
Metadata in Software: Databases
Strengths:
- Data Entry Tools
- Often include structure, standards, and controls
- User may not need to understand underlying structure
- Easier to secure
- Fast
- Can do almost anything
Weaknesses:
- Depending on software can be a preservation risk
- Software may conflict with your needs over time
- Entry costs
Database Management System (DBMS)
A computer program for storing information
Database Management System (DBMS)
A computer program for storing information
Database Software
Proprietary:
- Oracle
- Microsoft SQL Server
- IBM DB2
Open Source:
Incorporated within larger software applications
Metadata as Text: Serialization
XML, CSV, JSON, etc.
Strengths:
- Good for preservation
- Linking different software systems
- Can be Human-readable
Weaknesses:
- No built-in controls
- Generally slow
Serialization: Metadata Outside of Systems
Generally stored as text
Relies on character encoding standards
- Computer must understand: 0110101000001001001111001001010010101010100100000111110
XML
eXtensible Markup Language
CSV
Comma Separated Values
JSON
JavaScript Object Notation
Metadata as Text: Character Encoding
How computers read text
- ASCII (American Standard Code for Information Interchange)
ASCII
Unicode
- Can display any known character
- Not really an encoding, encoded "code points"
- UTF-8, UTF-16, UTF-32
False Choice
Software and Serializations play well together
Roles of Librarians and Archivists:
- Often work with data outside of systems
- Migration, mapping, clean-up
- Often work with metadata as text
Decision 2: What Tools to Use?
Hand Tools
Text editors:
- Notepad
- Text Edit
- Notepad++
- TextWrangler
- Sublime Text
- Integrated Development Environments (IDEs)
Spreadsheets:
- Microsoft Excel
- Google Sheets
- iWork Numbers
- LibreOffice Calc
Enables basic programming
Power Tools
Data-Specific Tools:
- XSLT for XML
- XQuery for XML
- SQL for databases
- SPARQL for RDF
Data Cleanup:
Programming Languages:
- Python
- Ruby
- JavaScript
- Bash scripting
- C#, Java, etc.
Version Control:
These are your best friends
Exercise 1: Create Metadata
Exercise 1: Create Metadata
Create metadata using the tool of your choice.
Metadata Needs Structure
- Enables computers to understand
- Necessary to do anything other than display
- Migrate to new systems or formats
Metadata Structures
Tables
Serializations:
Structure Standards:
Trees (Hierarchical)
Serializations:
Structure Standards:
Metadata Standards
- Often expressed as XML schemas
- Designed hierarchical or tabular
- Make data independent of software
- Promote interoperability, do not ensure it
- More sustainable than local solutions
- Encourage common understandings
Structure Standards
- DC (Dublin Core) - descriptive
- MODS (Metadata Object Description Schema) - descriptive
- METS (Metadata Encoding and Transmission Standard) - structural
- EAD (Encoded Archival Description) - descriptive
- PREMIS (Preservation Metadata: Implementation Strategies) - administrative
- XMP (Extensible Metadata Platform) - technical
Exercise 2: Metadata Map for Migration
Exercise 2: Metadata Map for Migration
Migrate your metadata from Simple Dublin Core to MODS.
Content Standards
- DACS (Describing Archives, a Content Standard)
- AACR2 (Anglo-American Cataloging Rules)
- RDA (Resource Description and Access)
- CCO (Cataloging Cultural Objects)
Exercise 3: Use Content Standards
Exercise 3: Use Content Standards
Apply the "Using Dublin Core" content standard to your metadata.
Standards and Controls
Metadata Standards:
- Content Standards, Structure Standards
- DublinCore, METS, EAD, MODS, PREMIS etc.
- Guidelines, made to be broken
- "plural standards?"
Metadata Controls:
- Best Practices
- Validation
- Tools that give feedback
- Data Models
Metadata Controls
Force compliance with standards
Make sure complex systems do not break
- Database rules
- XML Schema validation
- XML rule-based validation (e.g. Schematron)
- Best practices
- Quality Assurance procedures
Names
University at Albany, SUNY
State University of New York at Albany
SUNY Albany
SUNYA
UAlbany
SUNY/A
State University College at Albany
SUNY A
State University of New York at Albany
New York State College for Teachers
SUNY at Albany
State College for Teachers
NYSCT
New York State Normal College
State Normal College
State Normal School
UAlbany, SUNY
Controlled Vocabularies
Authorities
- Library of Congress Subject Headings (LCSH)
- Library of Congress Name Authority File (LCNAF)
- Getty Art & Architecture Thesaurus (AAT)
- Local Thesauri
Exercise 4: Reconcile Your Data
Exercise 4: Reconcile Your Data
Use appropriate authorities to reconcile your metadata
Data Types
- boolean: "true" or "false", "yes" or "no", "0" or "1"
- integer: "1", "63", "232234835482957205927059270597204"
- float: "43.5", "0.0", "1314.24242"
- string: "This is my string."
- Also strings: "4348928983671", "true"
Data Models
- Part of Systems Analysis and Design
- Basis for complex information systems
- Standardize how data elements relate to one another
- Three types:
- Conceptual
- Logical
- Physical
Portland Common Data Model (PCDM)
- Used as basis for Library Repository Software
- Project Hydra
Messy Metadata
Does not conform to structure standards
Messy Metadata
Does not conform to content standard, inconsistencies in language and format
Messy Metadata
Wrong character encoding
Messy Metadata
Wrong character encoding
Messy Metadata
Invalid serialization
Messy Metadata
Ensure clean metadata at creation
Quality Assurance practices:
- Training
- Documentation
- Sampling or other formal checks
Controls in addition to standards
- Tools that restrict metadata creation
We are humans, we break things
Exercise 5 Keep You Data Clean
Exercise 5: Keep You Data Clean
Develop some best practices and automated controls for your metadata.
Description: an Art
Ofness
- Objective description of what can be observed about an item.
- What is explicitly and inarguably shown?
- "A picture of an unidentified woman who is at a podium, making a gesture."
Aboutness
- Subjective information that an item may suggest or represent
- What does it mean? What is important about it?
Description: an Art
"A picture of an unidentified woman who is at a podium, making a gesture either of quieting a group of spectators or of leading them in song. The event is connected to the Vietnam Teach-In, a form of protest against U.S. involvement in Vietnam held in 1968."
Description: an Art
Layers, Location
- What is the central focus of the image?
- What is in the background?
- "Lowell Thomas and First World Fliers posing in front of plane."
Description: an Art
Exercise 6: Describing Different Formats
Exercise 6: Describing Different Formats
How to Metadata
- Know your data's structure
- Know what you want to do to your data
- Use the tool that fits the task
- Think about long-term ends
- Think in computer logic
- Document: version control
- Steal ideas
- Make a safe zone
- Break things
Metadata and Professionalism
Metadata at
SCALE
- Metadata work is monotonous, minute...boring
- Let the computer do the work
Make Metadata Fast: Indexing
- Capture most relevent metadata
- No need to be human readable
- Quick and easy retrieval
Semanticism
Making metadata machine-actionable
- Triples
- Subject-Predicate-Object
Semanticism
Subject
- What is it?
- Website
- <Website Identifier from Authority>
Predicate
- What property does it have?
- Author
- <Author Identifier from Authority>
Object
- The Property is?
- Name of the author
- <Some Person>
Metadata in Practice
Schema.org
Metadata in Practice
Schema.org