The Ultimate Archival Technologies Cheat Sheet: Preserve Data for Generations

Introduction

Archival technologies encompass the methods, systems, and practices used to preserve information for long-term access and retrieval. They ensure data integrity, authenticity, and accessibility across generations, despite technological changes and physical degradation. With the exponential growth of digital data, effective archival strategies are essential for organizations, institutions, and individuals to maintain valuable information while balancing accessibility, security, and cost-effectiveness.

Core Archival Storage Media

Physical Media Types

Media TypeLifespanStorage CapacityAccess SpeedBest Use CasesNotable Limitations
Archival Paper100+ yearsN/AManualLegal documents, historical recordsBulky, susceptible to environmental damage
Microfilm/Microfiche500+ years~10,000 pages per rollManualNewspapers, periodicalsRequires special readers, no digital features
Magnetic Tape (LTO)15-30 years18-45TB (LTO-9)Slow, sequentialLarge data backups, media archivesSlow access, requires migration
Optical Media (M-DISC)100+ years25-100GBModerateSmall archives, personal documentsLimited capacity, becoming obsolete
Hard Disk Drives3-5 years2-20TBFastActive archives, frequent accessMechanical failure, power requirements
Solid State Drives5-10 years1-16TBVery fastWorking archives, frequent accessHigher cost, vulnerable to write wear
DNA StorageThousands of yearsPotentially exabytesVery slowUltra-long-term, critical dataExperimental, extremely high cost

Cloud Storage Classes

Storage TierRetrieval TimeCostDurabilityBest Use Cases
Hot StorageImmediateHighest99.999%Frequently accessed archives
Cool StorageMinutesModerate99.999%Semi-active archives
Cold StorageHoursLow99.9999%Rarely accessed archives
Archive StorageHours to daysLowest99.99999%Long-term preservation
Glacier StorageHours to daysVery low99.999999%Deep archives, compliance data

Archival File Formats

Document Formats

  • PDF/A: ISO-standardized version of PDF for long-term archiving
    • PDF/A-1: Basic compliance (PDF 1.4)
    • PDF/A-2: JPEG2000, transparency, attachments (PDF 1.7)
    • PDF/A-3: Embedded files of any format (PDF 1.7)
    • PDF/A-4: Based on PDF 2.0

Image Formats

  • TIFF: Lossless, high quality, metadata support
  • JPEG2000: Wavelet-based compression, lossless option
  • PNG: Lossless compression, transparency support
  • DNG: Digital Negative, raw image preservation

Audio/Video Formats

  • FLAC: Lossless audio compression
  • BWF: Broadcast Wave Format with preservation metadata
  • FFV1: Lossless video encoding
  • MKV/Matroska: Container format for video, audio, subtitles

Data Formats

  • XML: Structured, self-describing text format
  • CSV: Simple tabular data format
  • JSON: Lightweight data interchange format
  • SIARD: SQL database archival format

Archival Systems and Approaches

OAIS Reference Model

Open Archival Information System – ISO 14721 standard framework

Key Components:

  1. Ingest: Accepting and preparing data for storage
  2. Archival Storage: Preserving data long-term
  3. Data Management: Maintaining descriptive metadata
  4. Administration: Managing day-to-day operations
  5. Preservation Planning: Ensuring future accessibility
  6. Access: Providing materials to users

Information Packages:

  • SIP: Submission Information Package (received from producers)
  • AIP: Archival Information Package (stored in the archive)
  • DIP: Dissemination Information Package (delivered to consumers)

Digital Preservation Strategies

StrategyDescriptionAdvantagesDisadvantages
Bit-level PreservationMaintaining exact digital objectsOriginal integrityFormat obsolescence
MigrationConverting to newer formatsMaintains accessibilityPotential data loss
EmulationRecreating original environmentsPreserves experienceComplex, resource-intensive
NormalizationConverting to standard formatsSimplifies managementMay lose native features
EncapsulationBundling content with metadataSelf-containedSize, complexity
ReplicationMultiple copies in different locationsProtection from disastersSynchronization challenges

Metadata Standards for Archives

Descriptive Metadata

  • Dublin Core: 15 basic elements for resource description
  • MARC/MARC21: Machine-Readable Cataloging for library materials
  • EAD: Encoded Archival Description for finding aids
  • MODS: Metadata Object Description Schema (simplified MARC)

Preservation Metadata

  • PREMIS: Preservation Metadata Implementation Strategies
  • METS: Metadata Encoding and Transmission Standard
  • EAC-CPF: Encoded Archival Context for Corporate Bodies, Persons, and Families

Technical Metadata

  • MIX: NISO Technical Metadata for Digital Still Images
  • AudioMD: Audio Technical Metadata
  • VideoMD: Video Technical Metadata
  • TextMD: Technical Metadata for Text

Archival Processing Workflow

Acquisition and Appraisal

  1. Collection Development: Establishing scope and criteria
  2. Appraisal: Determining archival value
  3. Accessioning: Formally accepting materials
  4. Rights Management: Addressing intellectual property
  5. Deed of Gift: Documenting transfer of ownership

Processing and Description

  1. Arrangement: Organizing materials logically
  2. Description: Creating finding aids and metadata
  3. Conservation: Physical preservation treatments
  4. Digitization: Converting analog to digital formats
  5. Quality Control: Validating digital objects

Storage and Preservation

  1. Fixity Checking: Validating integrity (checksums)
  2. Format Validation: Verifying format conformance
  3. Metadata Extraction: Capturing technical information
  4. Storage Management: Allocation to appropriate media
  5. Preservation Monitoring: Regular status checks

Access and Use

  1. Discovery Systems: Searchable interfaces
  2. Rights Enforcement: Access restrictions
  3. Reference Services: User assistance
  4. Usage Analytics: Tracking utilization
  5. Content Delivery: Providing access copies

Common Challenges and Solutions

Challenge: Format Obsolescence

Solutions:

  • Implement format migration schedules
  • Use open, standardized formats
  • Maintain format registries (PRONOM)
  • Preserve original software when possible
  • Document format specifications

Challenge: Bit Rot and Media Degradation

Solutions:

  • Implement regular fixity checks
  • Use error-correcting storage systems
  • Schedule media refresh cycles
  • Implement geographic replication
  • Use self-healing storage technologies (ZFS)

Challenge: Scale and Cost

Solutions:

  • Implement tiered storage strategies
  • Adopt risk-based preservation approaches
  • Consider collaborative preservation networks
  • Automate routine preservation tasks
  • Implement retention policies

Challenge: Authenticity and Chain of Custody

Solutions:

  • Implement digital signatures
  • Maintain comprehensive audit logs
  • Use blockchain or distributed ledger technologies
  • Document preservation actions
  • Follow strict custody protocols

Best Practices and Standards

Institutional Framework

  • Develop formal preservation policies
  • Establish governance structures
  • Secure sustainable funding models
  • Conduct regular risk assessments
  • Obtain certification (e.g., CoreTrustSeal, ISO 16363)

Technical Implementation

  • Implement at least three geographically distributed copies
  • Use at least two different storage technologies
  • Perform regular integrity checking
  • Maintain comprehensive metadata
  • Document all preservation actions

Legal and Ethical Considerations

  • Address copyright and intellectual property
  • Respect privacy and confidentiality
  • Consider cultural sensitivities
  • Follow regional data protection laws
  • Establish clear access policies

Tools and Technologies

Digital Repository Software

  • Archivematica: Open-source digital preservation system
  • Preservica: Commercial digital preservation platform
  • DSpace: Open-source repository software
  • Fedora Commons: Flexible repository architecture
  • LOCKSS: “Lots of Copies Keep Stuff Safe” distributed preservation

File Format Tools

  • DROID: File format identification
  • JHOVE: Format validation and characterization
  • ExifTool: Metadata extraction and manipulation
  • FFmpeg: Audio/video transcoding
  • ImageMagick: Image processing and conversion

Storage Management

  • BagIt: Packaging standard for digital content
  • iRODS: Rule-oriented data management
  • ZFS: Self-healing file system
  • Ceph: Distributed storage system
  • WORM Storage: Write Once Read Many technologies

Resources for Further Learning

Standards Organizations

  • International Organization for Standardization (ISO)
  • Library of Congress Digital Preservation
  • Digital Preservation Coalition (DPC)
  • National Digital Stewardship Alliance (NDSA)
  • Open Preservation Foundation (OPF)

Training and Education

  • Digital Preservation Management Workshop
  • Society of American Archivists (SAA) courses
  • Digital POWRR (Preserving digital Objects With Restricted Resources)
  • Library Juice Academy digital preservation courses
  • Certified Archive, Records and Information Specialist (CARIS)

Publications and Websites

  • International Journal of Digital Curation
  • D-Lib Magazine archives
  • Digital Preservation Coalition’s “Handbook”
  • NDSA Levels of Digital Preservation
  • Library of Congress Digital Preservation blog

Remember that effective archival practice requires a balance of policy, process, and technology, along with ongoing commitment to preservation principles. The most successful preservation strategies are those that can evolve with changing technologies while maintaining the integrity and accessibility of archived materials.

Scroll to Top