How to Metadata

for Digital Libraries

How Computers See Metadata

  • Code
  • 100010010111010101010110000100111101010101010101
  • Layers of Abstraction
  • Uppercase "A":
    • ASCII Binary: 01000001
    • ASCII Hexidecimal: 41
    • Unicode: U+0041

Decision 1: Use Software or Text?

Metadata in Software: Databases


  • Data Entry Tools
  • Often include structure, standards, and controls
  • User may not need to understand underlying structure
  • Easier to secure
  • Fast
  • Can do almost anything


  • Depending on software can be a preservation risk
  • Software may conflict with your needs over time
  • Entry costs

Database Management System (DBMS)

A computer program for storing information

Database Management System (DBMS)

A computer program for storing information

Database Software


  • Oracle
  • Microsoft SQL Server
  • IBM DB2

Open Source:

  • MySQL
  • PostgreSQL
  • SQLite


  • Relational concept
  • SQL


  • Flexibility
  • Big data

Incorporated within larger software applications

Metadata as Text: Serialization

XML, CSV, JSON, etc.


  • Good for preservation
  • Linking different software systems
  • Can be Human-readable


  • No built-in controls
  • Generally slow

Serialization: Metadata Outside of Systems

Generally stored as text

  • XML
  • CSV, TSV
  • JSON

Relies on character encoding standards

  • Computer must understand: 0110101000001001001111001001010010101010100100000111110


eXtensible Markup Language


Comma Separated Values


JavaScript Object Notation

Metadata as Text: Character Encoding

How computers read text

  • ASCII (American Standard Code for Information Interchange)



  • Can display any known character
  • Not really an encoding, encoded "code points"
  • UTF-8, UTF-16, UTF-32

False Choice

Software and Serializations play well together

Roles of Librarians and Archivists:

  • Often work with data outside of systems
  • Migration, mapping, clean-up
  • Often work with metadata as text

Decision 2: What Tools to Use?

Hand Tools

Text editors:

  • Notepad
  • Text Edit
  • Notepad++
  • TextWrangler
  • Sublime Text
  • Integrated Development Environments (IDEs)


  • Microsoft Excel
  • Google Sheets
  • iWork Numbers
  • LibreOffice Calc

Enables basic programming

Power Tools

Data-Specific Tools:

  • XSLT for XML
  • XQuery for XML
  • SQL for databases
  • SPARQL for RDF

Data Cleanup:

  • OpenRefine

Programming Languages:

  • Python
  • Ruby
  • JavaScript
  • Bash scripting
  • C#, Java, etc.

Version Control:

  • Git
  • GitHub

These are your best friends

Exercise 1: Create Metadata

Exercise 1: Create Metadata

Create metadata using the tool of your choice.

Decision 3: Structure

Metadata Needs Structure

  • Enables computers to understand
  • Necessary to do anything other than display
  • Migrate to new systems or formats

Not This:


Or This:

Metadata Structures



  • CSV, XML, JSON, etc.

Structure Standards:

  • DublinCore, MODS
Trees (Hierarchical)



Structure Standards:


Metadata Standards

  • Often expressed as XML schemas
  • Designed hierarchical or tabular
  • Make data independent of software
  • Promote interoperability, do not ensure it
  • More sustainable than local solutions
  • Encourage common understandings

Structure Standards

  • DC (Dublin Core) - descriptive
  • MODS (Metadata Object Description Schema) - descriptive
  • METS (Metadata Encoding and Transmission Standard) - structural
  • EAD (Encoded Archival Description) - descriptive
  • PREMIS (Preservation Metadata: Implementation Strategies) - administrative
  • XMP (Extensible Metadata Platform) - technical

Exercise 2: Metadata Map for Migration

Exercise 2: Metadata Map for Migration

Migrate your metadata from Simple Dublin Core to MODS.

Content Standards

  • DACS (Describing Archives, a Content Standard)
  • AACR2 (Anglo-American Cataloging Rules)
  • RDA (Resource Description and Access)
  • CCO (Cataloging Cultural Objects)

Exercise 3: Use Content Standards

Exercise 3: Use Content Standards

Apply the "Using Dublin Core" content standard to your metadata.

Decision 4: Controls

Standards and Controls

Metadata Standards:

  • Content Standards, Structure Standards
  • DublinCore, METS, EAD, MODS, PREMIS etc.
  • Guidelines, made to be broken
  • "plural standards?"

Metadata Controls:

  • Best Practices
  • Validation
  • Tools that give feedback
  • Data Models

Metadata Controls

Force compliance with standards

Make sure complex systems do not break

  • Database rules
  • XML Schema validation
  • XML rule-based validation (e.g. Schematron)
  • Best practices
  • Quality Assurance procedures


University at Albany, SUNY

  • State University of New York at Albany
  • SUNY Albany
  • UAlbany
  • SUNY/A
  • State University College at Albany
  • SUNY A
  • State University of New York at Albany
  • New York State College for Teachers
  • SUNY at Albany
  • State College for Teachers
  • New York State Normal College
  • State Normal College
  • State Normal School
  • UAlbany, SUNY
  • Controlled Vocabularies


    • Library of Congress Subject Headings (LCSH)
    • Library of Congress Name Authority File (LCNAF)
    • Getty Art & Architecture Thesaurus (AAT)
    • Local Thesauri

    Exercise 4: Reconcile Your Data

    Exercise 4: Reconcile Your Data

    Use appropriate authorities to reconcile your metadata




    Data Types

    • boolean: "true" or "false", "yes" or "no", "0" or "1"
    • integer: "1", "63", "232234835482957205927059270597204"
    • float: "43.5", "0.0", "1314.24242"
    • string: "This is my string."
    • Also strings: "4348928983671", "true"

    Data Models

    • Part of Systems Analysis and Design
    • Basis for complex information systems
    • Standardize how data elements relate to one another
    • Three types:
      • Conceptual
      • Logical
      • Physical

    Portland Common Data Model (PCDM)

    • Used as basis for Library Repository Software
    • Project Hydra

    Messy Metadata

    Does not conform to structure standards

    Messy Metadata

    Does not conform to content standard, inconsistencies in language and format

    Messy Metadata

    Wrong character encoding

    Messy Metadata

    Wrong character encoding

    Messy Metadata

    Invalid serialization

    Messy Metadata

    Ensure clean metadata at creation

    Quality Assurance practices:

    • Training
    • Documentation
    • Sampling or other formal checks

    Controls in addition to standards

    • Tools that restrict metadata creation

    We are humans, we break things

    Exercise 5 Keep You Data Clean

    Exercise 5: Keep You Data Clean

    Develop some best practices and automated controls for your metadata.

    Description: an Art


    • Objective description of what can be observed about an item.
    • What is explicitly and inarguably shown?
    • "A picture of an unidentified woman who is at a podium, making a gesture."


    • Subjective information that an item may suggest or represent
    • What does it mean? What is important about it?

    Description: an Art

    "A picture of an unidentified woman who is at a podium, making a gesture either of quieting a group of spectators or of leading them in song. The event is connected to the Vietnam Teach-In, a form of protest against U.S. involvement in Vietnam held in 1968."

    Description: an Art

    Layers, Location

    • What is the central focus of the image?
    • What is in the background?
    • "Lowell Thomas and First World Fliers posing in front of plane."

    Description: an Art

    Exercise 6: Describing Different Formats

    Exercise 6: Describing Different Formats



    How to Metadata

    • Know your data's structure
    • Know what you want to do to your data
    • Use the tool that fits the task
    • Think about long-term ends
    • Think in computer logic
    • Document: version control
    • Steal ideas
    • Make a safe zone
    • Break things

    Metadata and Professionalism

    Metadata at


    • Metadata work is monotonous, minute...boring
    • Let the computer do the work

    Make Metadata Fast: Indexing

    • Capture most relevent metadata
    • No need to be human readable
    • Quick and easy retrieval


    Making metadata machine-actionable

    • Triples
    • Subject-Predicate-Object



    • What is it?
    • Website
    • <Website Identifier from Authority>


    • What property does it have?
    • Author
    • <Author Identifier from Authority>


    • The Property is?
    • Name of the author
    • <Some Person>

    Metadata in Practice

    Metadata in Practice