Holy Grail: The Infinite Archive
To preserve our knowledge base and cultures, we must find a way to save digital content for future generations
The first great attempt to permanently preserve all of recorded human thoughtthe legendary Library of Alexandria, Egyptlasted several hundred years before going up in a giant inferno of crackling papyrus. Now, a couple of millennia later, the archivists worst fear is not fire but rather the ravages inflicted by perpetually changing file formats for documents, audio, and video.
Librarians and computer scientists from Cambridge, Mass., to, yes, Alexandria, Egypt, are working together to make the ephemeral permanent. Key projects include the Massachusetts Institute of Technologys Dspace digital asset management system, which aims to create an institutional repository that will include digitized versions of lecture notes, videos, papers, and data setsin short, everything produced by faculty and staff. Another is the U.S. Library of Congress US $100 million National Digital Information Infrastructure and Preservation Program (NDIIPP), which is developing a standard way for institutions to preserve their digital archives. And in San Francisco, independent digital librarian Brewster Kahle is attempting to preserve the content of the Web on his Internet Archive (http://www.archive.org), for which hes enlisted the aid of the Bibliotheca Alexandrina, in Alexandria, which hosts one copy of the archive (two others are in San Francisco).
Storing bits for 100 years is easier than preserving content for 10, says Clay Shirky, an adjunct professor at New York University and a consultant to the Library of Congress. It does us no good to store things for 100 years if format drift means our grandchildren cant read them.
There are two approaches to digital archiving: emulation and migration. With emulation, people move architectures backward in time by writing software that mimics, say, an Intel 286 chip. Using that software lets you run the old software the chip ran. With migration, on the other hand, archivists take documents created in an obsolete format and simply convert them to the latest format. When another, newer format comes along, the documents will be converted again, ad infinitum.
In the long run, digital preservation cannot be done on an ad hoc basis. Thats why in 2000, the United States Congress authorized the NDIIPP, led by the Library of Congress in conjunction with various public institutions and private companies, to come up with a national strategy to preserve digital content for future generations.
International cooperation is vital to the quest for the infinite archive, too. Just this past fall, the Library of Congress joined with the national libraries of 11 other countries, including Australia, Britain, Canada, Finland, France, Iceland, and Sweden, to form a consortium to develop tools to capture and share digital content.
According to Laura Campbell, the librarys associate librarian for strategic initiatives, the consortium will begin by developing specifications for crawler tools and will work toward a general framework within which we can all work and ultimately manage digital content.