Every weekday at 5:00 a.m., a nondescript gray van rolls down the underground service road beneath the French National Library, in Paris, and arrives at a svelte glass skyscraper soaring above the bustling Seine River. Here, at the Tower of the Times, the van delivers a tiny but astoundingly rich snapshot of life in this country that takes its cultural heritage very seriously.
The van has been stuffed willy-nilly with two copies each of some 3000 periodicals printed recently in France that are being sent to the library for preservation. One morning last November, the haul includes the dailies Le Monde and L'Humanité, of course, and also the union newspaper Le Travailleur. Among the other lexical artifacts dutifully funneled from the van up into the tower are a booklet of classified advertisements, a concert flyer, several religious pamphlets, Busty Beauties magazine, and a community newsletter from Bonnes (population 330) announcing a town raffle for three hams, six bottles of wine, and a yogurt-making machine.
"We have a lot of so-called crap, and we're happy about that," says Gildas Illien, an archivist at the library. His colleagues in other countries might turn up their noses at hard-core porn, advertisements, and obscure newsletters, but not Illien. "In a hundred years, what's totally irrelevant or dirty today will end up becoming of extreme interest to historians," he declares.
The Tower of the Times, where Illien works, is one of four spires, each composed of two perpendicular wings resembling the pages of an open book, that make up France's newly modernized national library. The archivists here aren't after just printed material; they're preserving the electronic, too. In fact, it's Illien's daunting task to archive French Web sites—all of them, in all their evanescent, constantly changing, and multimedia splendor.
Since the ancient Sumerians compiled the first collections of inscribed clay tablets, many peoples have attempted to preserve documents, ephemera, and even the flotsam of their political, economic, and social tides. But perhaps no nation today tackles this endeavor as thoroughly as France, one of the few countries in which archivists have the legal right to copy and save virtual documents without fear of a copyright suit. Five centuries ago, King Francis I ordered book publishers to donate copies of their work to posterity. That legal deposit law, as it is known, has expanded over the years to include maps, music scores, periodicals, photographs, sound recordings, posters, motion pictures, television broadcasts, computer software, and finally, in 2006, the World Wide Web.
French archivists are still grappling with that most recent mandate. The Web, of course, is unlike any other publishing platform—not simply because it is amorphous and immeasurably large but because its "documents" are boundless. Nowadays, an "online publication" is barely recognizable as a publication in any traditional sense; it exists in a perpetual state of being updated, and it cannot be considered complete in the absence of everything else it's hyperlinked to. Unlike books and newspapers, which have discernible titles, authors, beginnings, and ends, the Internet is utterly nonstandardized.
The task of preserving what's put online has proved, to no one's surprise, monumental. And it's only getting more so as the Internet expands, as Web sites become more dynamic, and as concern grows over online privacy. Increasingly, much of what people put online is being diffused across social networks and distributed through personalized apps on smartphones and tablet computers. The classic Web site, it seems, is already starting to slide toward obsolescence. "I'm convinced the Web as we know it will be gone in a few years' time," Illien says. "What we're doing in this library is trying to capture a trace of it." But to do even that is requiring engineers to build a new, more sophisticated generation of software robots, known as crawlers, to trawl the Web's vast and varied content.
Illien sees himself as a steward of an ancient tradition; he believes he is helping pioneer a revolution in the way society documents what it does and how it thinks. He points out that since the end of the 19th century, the French National Library has been storing sales catalogs from big department stores, including the famous Galeries Lafayette. "Today," he says, "this exceptional collection…is the best record we have of how people dressed back then and who was buying what." One day, he insists, the archives of eBay will be just as valuable. Capturing them, however, is a task that's very different from anything archivists have ever done.
The Web is regularly accessed and modified by as many as 2 billion people, in every country on Earth. It's a wild bazaar of scripting languages, file formats, media players, search interfaces, hidden databases, pay walls, pop-up advertisements, untraceable comments, public broadcasts, private conversations, and applications that can be navigated in an infinite number of ways. Finding and capturing even a substantial portion of it all would require development teams and computing resources as large as, or probably larger than, Google's.