Holy Grail: The Infinite Archive

To preserve our knowledge base and cultures, we must find a way to save digital content for future generations

Photo: CORBIS BETTMAN

The first great attempt to permanently preserve all of recorded human thought—the legendary Library of Alexandria, Egypt—lasted several hundred years before going up in a giant inferno of crackling papyrus. Now, a couple of millennia later, the archivist’s worst fear is not fire but rather the ravages inflicted by perpetually changing file formats for documents, audio, and video.

Librarians and computer scientists from Cambridge, Mass., to, yes, Alexandria, Egypt, are working together to make the ephemeral permanent. Key projects include the Massachusetts Institute of Technology’s Dspace digital asset management system, which aims to create an institutional repository that will include digitized versions of lecture notes, videos, papers, and data sets—in short, everything produced by faculty and staff. Another is the U.S. Library of Congress’ US $100 million National Digital Information Infrastructure and Preservation Program (NDIIPP), which is developing a standard way for institutions to preserve their digital archives. And in San Francisco, independent digital librarian Brewster Kahle is attempting to preserve the content of the Web on his Internet Archive (http://www.archive.org), for which he’s enlisted the aid of the Bibliotheca Alexandrina, in Alexandria, which hosts one copy of the archive (two others are in San Francisco).

“Storing bits for 100 years is easier than preserving content for 10,” says Clay Shirky, an adjunct professor at New York University and a consultant to the Library of Congress. “It does us no good to store things for 100 years if format drift means our grandchildren can’t read them.”

There are two approaches to digital archiving: emulation and migration. With emulation, people move architectures backward in time by writing software that mimics, say, an Intel 286 chip. Using that software lets you run the old software the chip ran. With migration, on the other hand, archivists take documents created in an obsolete format and simply convert them to the latest format. When another, newer format comes along, the documents will be converted again, ad infinitum.

In the long run, digital preservation cannot be done on an ad hoc basis. That’s why in 2000, the United States Congress authorized the NDIIPP, led by the Library of Congress in conjunction with various public institutions and private companies, to come up with a national strategy to preserve digital content for future generations.

International cooperation is vital to the quest for the infinite archive, too. Just this past fall, the Library of Congress joined with the national libraries of 11 other countries, including Australia, Britain, Canada, Finland, France, Iceland, and Sweden, to form a consortium to develop tools to capture and share digital content.

According to Laura Campbell, the library’s associate librarian for strategic initiatives, the consortium will begin by developing specifications for crawler tools and “will work toward a general framework within which we can all work and ultimately manage digital content.”

Related Stories

Advertisement
Advertisement