|
| |
IBM/KB Long-term Preservation Study
 |
|
| |
Introducing the IBM KB Long-term Preservation Report Series
The National Library of the Netherlands (KB, Koninklijke Bibliotheek) is faced with the problem of preserving large amounts of digital documents for the long term. These documents come from two sources: from media published directly in digital form and from digitizing paper documents. In 2000, the KB and IBM started building an electronic deposit system (Digital Information Archiving System or DIAS), the technical core of the infrastructure for the e-Deposit for the Netherlands.
From the beginning it was clear that this project could not rely on out-of-the-box solutions alone because up to that time no solution readily addressed both the aspects of large volume and durable storage as well as the long-term preservation requirements. So an IBM / KB Long-term Preservation Study (LTP Study) was initiated as part of the overall project of developing a deposit system.
The primary objective of the LTP Study was to investigate the functionality required for the long-term preservation (hundreds of years) of the digital information stored in DIAS. This study has resulted in 6 reports: one overview report and five specific reports, each one addressing an important aspect of long-term preservation in its own right. |
| |
| Titles of the IBM / KB Long-term Preservation Study Reports Series: |
| |
Number 1: The Long-Term Preservation Study of the DNEP Project - an Overview of the Results
This report explains the reasons and objectives behind defining the LTP Study as part of the overall project to implement an electronic deposit system. It also provides a quick and general overview of all the study results, which are then elaborated on in more detail in the other published reports.
Download PDF (722 KB)
|
| |
Number 2: Authenticity in a Digital Environment
Authenticity acquires a new meaning in a digital context. Normally objects are physical and their physical characteristics are the main source for defining authenticity. Moreover, authenticity is not a single concept, but involves different aspects that can be associated with an object:
- A traceable path from the objects origin to its current ownership;
- Measures and techniques for safeguarding against and/or recognizing modifications;
- Techniques for establishing the use of original materials.
The problem of digital objects is that in fact they are just conceptual objects. A digital object is a conceptual object to be interpreted (rendered) by executing the digital objects in a specific IT infrastructure (hardware & software). This report focuses on defining a framework in which we can define what is actually meant when one speaks of an authentic digital object.
Download PDF (387 KB)
|
| |
Number 3: Preservation Requirements in a Deposit System
The initial DIAS release only provides basic functionality for preserving and rendering the stored digital objects for the long term. One of the primary responsibilities of the LTP Study is to define the functional requirements of the Preservation Subsystem, which is scheduled for development later. This report identifies requirements of the DIAS Preservation Subsystem so as to provide the services and functions for monitoring the technical environment associated with the digital objects stored in DIAS.
The Preservation Subsystem can be summarized by the following three objectives:
- Identifying digital objects that are in danger of becoming inaccessible because of changes intechnology;
- Implementing the activities associated with technical preservation;
- Supplying the requisite technical metadata in order to generate / validate the environments needed during digital object delivery.
Download PDF (609 KB) |
| |
Number 4: The UVC: a Method for Preserving Digital Documents Proof of Concept
Within IBM Research in Almaden Raymond Lorie was already working on a combined emulation / migration approach to preserve a certain class of digital objects with an approach called a Universal Virtual Computer (UVC).
The main idea consists of archiving a program P along with the data file that decodes the data and returns the information to a future client based on a logical view. The logical view of the data is simple and self-contained enough to be interpreted without any specific software or hardware. Program P is written for a Universal Virtual Computer (UVC) that is general, yet basic enough to continue to be relevant in the future. Given the simplicity of the UVC, it will be relatively easy to write an emulator of the UVC in the future on a real machine of that time. The emulated machine will run the program P and return all data in an easy to understand logical view of the data.
The LTP Study conducted a proof of concept with the KB to test the UVC approach in a library environment. The PDF format was selected because it is the primary data format for electronic publications to be stored in DIAS.
Download PDF (380 KB) |
| |
Number 5: Managing Media Migration in a Deposit System
Storage technology obsolescence makes media migration a necessity. Data has to be copied from one storage medium to another on a regular basis. However, the fact that storage technology becomes obsolete is not the only trigger for rewriting previously stored digital objects. All storage media degrade over time and have to be rewritten either on the same medium (refreshing) or on another medium (migration).
Ordinarily media refreshment / migration would be a straightforward process. However, the large amounts of storage associated with an electronic deposit system introduce certain volume-specific requirements. Most electronic deposit systems define their storage capacity needs in several TeraBytes (1012 Bytes). Take a deposit system with 100 TeraBytes of information stored on tape, for example. Lets assume that you want to migrate all this information to an optical storage medium. Current optical storage media have a capacity of around 5 GigaBytes and a write speed of around 4 MegaBytes/second. A quick calculation shows that a complete migration to optical storage would take at least 290 days (100 TeraBytes / 4 MegaBytes seconds)!
This report describes the actions to be taken to manage media migration / refreshment effectively within an electronic deposit system, focussing specifically on the media migration issues within DIAS. Potential additional capacity required for media migration might be created by redundancy and parallelism.
Download PDF (355 KB) |
| |
Number 6: Archiving Web Publications
More and more web publications are becoming a primary source of information and will thus be stored as digital objects in DIAS. Web publications have specific characteristics and requirements that DIAS must meet if they are to be archived successfully.
This report investigates the issues and requirements introduced by archiving Web publications and their potential impact on DIAS.
Download PDF (389 KB) |
| |
| Requires Adobe® Acrobat® Reader®. |
| |
|
|