Out-of-copyright materials in NYPL digital collections are now available as high-resolution downloads. No permission required, no hoops to jump through: just go forth and reuse!
As I mentioned in my February 2013 column, “Balancing Act,” the belief that our life offline is separate from our life online has been denounced as digital dualism. But there’s less of a debate when it comes to differentiating between analog objects and digital data. Yes, the print and electronic copies of the same book contain the same words, but it’s obvious to most people (and, increasingly, to researchers) that the two reading experiences are quite different.
We need to understand such differences because the world is going to see a lot more digital data in the near future. This includes born-digital [pdf] data, which is originally created in an electronic format, as well as born-analog data, which starts life as a physical object and then is reborn digital. A great example of this digitization came earlier this year when the New York Public Library announced that it was making more than 180,000 digitized items available to anyone with an Internet connection, no questions asked.
That librarians would turn themselves into digital curators is no surprise, since as analog curators for the past few centuries they have been constantly bumping into the physical constraints of storage space and material decay. One approach is to get rid of stuff, and librarians and archivists employ a pleasing variety of terms related to the removal of unwanted or duplicate material from their collections: Weeding and culling generally refer to the removal of individual items, while purging, screening, and stripping are most often used for the removal of multiple related items. But the main problem with physical materials is that they possess what archivists call, poetically, inherent vice: the tendency for something to deteriorate over time because of some fault in the material itself (for example, the presence of lignin in cheap paper, which causes the paper to yellow) or the way the material reacts with its surroundings (for instance, the fact that bugs eat some books because they’re attracted to the mold that grows in damp paper).
The digitization of analog materials can solve these problems, and engineers are constantly trying to find faster ways to turn atoms into bits. For now, though, we mostly have to rely on the skills of scanops (scanner operators) to generate those bits, although on their less skilled days those operators end up scanning their own body parts, such as fingers and hands, a phenomenon known as Google hands. Some companies are applying the principles of crowdsourcing and gamification to the digitizing realm, creating leisure activities that let users contribute to the process. (I would be remiss if I didn’t mention the opposite process: turning digital Web documents and data into books and zines, a genre called the printed Web.)
Ideally, digitized data is online (readily available), but it might end up either offline (not available) or nearline (only indirectly available). It can also end up in dark archives (which are inaccessible to the public), dim archives (which are usually inaccessible but can be made accessible), or light archives (another term for those that are fully accessible).
Having digitized some data, the archivist now faces a new problem: the eventual obsolescence of the data structures or media used to store the data, necessitating a format migration (or a media migration) to something newer. Copying the data without changing the format or media type is called refreshing.
There is a large cottage industry of life coaches and self-appointed gurus who recommend, with varying degrees of urgency and stridency, that we become digital dualists and spend less time online. Fulminations against digitization are harder to find, and that’s just as well, since, with enlightened institutions such as the New York Public Library leading the way, having digital access to books, photos, and other analog materials can only be a good thing. Try to ignore the fingers.
This article appears in the April 2016 print issue as “Curating the Digital Age.”
About the Author
Regular contributor Paul McFedries tracks how language constantly evolves in response to new technologies in IEEE Spectrum’s “Technically Speaking” column. (McFedries’s website Word Spy chronicles neologisms more generally.) In June, he’ll be releasing a collection of his Spectrum columns, also called Technically Speaking, through his publishing company, Word Spy Press.