When the Ukrainian invasion began, the Internet Archive launched several efforts to capture the Ukrainian Internet. Its archivists launched a high-volume crawl through hundreds of thousands of websites ending in “.ua.” They selected specific sites to archive as completely as possible, including government, education, and library sites. And they targeted journalism, particularly Ukrainian news sites and aggregators. The organization has also been supporting others working to save Ukraine’s digital resources, including SUCHO (Saving Ukrainian Cultural Heritage Online) and the Archive Team.
Mark Graham, director of the Internet Archive’s Wayback Machine, explained this dive into Ukraine’s Internet and how it differs from the Wayback Machine’s usual approach to preserving digital history.
Weren’t you archiving Ukrainian websites already?
Mark Graham: We do archive a lot of material. We add more than a billion URLs a day to the Wayback Machine. But the Web is big—bigger than our efforts can fully address—and we can’t archive everything. We incorporate many different approaches. For example, we work with partners that feed us URLs—we almost instantly pick up every URL cited in Wikipedia. We have partnered with Cloudflare. And we work with domain experts who submit Google Sheets of URLs. What is different now is that we are marshaling internal resources and working with others to bring a special focus to Ukrainian digital material.
The question that is driving us is how do we use the resources we have to archive as much material that is relevant, significant, and historically useful, as we can? The process is both opportunistic and directed and, of course subjective and constantly changing.
A couple of years ago, I decided to focus on a couple of dozen countries including Ukraine and Russia, archiving more than 300 news sites from Ukraine and 900 news sites from Russia every day. So we had a good start.
What has changed now?
Graham: We are going deeper and wider. Wider, meaning that in addition to more websites—especially news, government, academic sites—we are looking for specific Web pages, like news articles, photos of cultural sites, and descriptions of museum artifacts.
“Once it’s gone, you can’t get it back. It’s a race against time.”
Going deeper means that we take a specific site—say, related to a museum in the Ukraine—and attempt to get every single URL that makes up that website. Previously, we may not have necessarily archived them all
To do this we are working with experts. We support the efforts of people who have organized together, like SUCHO, that has more than 1,000 volunteers to curating materials to be archived. They are cataloging URLs they see as most relevant and submitting those to us and are directly archiving them using our “Save Page Now” tool for specific URLs that they identify. (They are using other platforms and tools as well; we aren’t the only game in town.)
We also see subscribers of our Archive-It service turning their attention to Ukraine. These include about 1,000 colleges, museums, and other organizations that pay us for Web-archiving support. Many of those are involved in the effort to archive Ukrainian material
What about Russia?
Graham: To me, Ukraine is a big part of the story, but another big part is indeed Russia. I’m trying to focus on Russian news. One of the biggest questions of our time is going to be trying to understand how we came to have someone acting like a dictator with an ironclad control over the media in their country who is able to have a state perspective that became the dominant understanding of the people. It’s like what happened in Germany in the ’30s and ’40s.
What digital resources are you talking about besides websites?
Graham: Images, video, radio, television…we are working on archiving TikTok, and YouTube, and even Telegram. And we are archiving data sets—e.g., the kind of files people move around in Google Drive.
What can people do to help?
Graham: Well of course money always helps; we fill up several 16-terabyte drives every day. But I don’t want to focus on that. What’s important right now is that we work together to capture as much as we can of material that certainly is at risk and may very well be destroyed. The buildings, the servers—these may be bombed. And once it’s gone, you can’t get it back. It’s a race against time. But together we can help to preserve digital culture that might otherwise be lost.
To help by becoming a sponsor, you can donate to the Internet Archive.
Another way to help, according to a letter sent by the Internet Archive to its patrons “is by using the Wayback Machine to preserve websites you may be concerned about. With the Save Page Now feature, anyone can submit URLs to be archived; if you’re logged in with an Internet Archive account, you can also select ‘Outlinks’ to capture any pages that link to the page you’ve selected. And if you have the Wayback Machine browser extension, you can take a snapshot without having to leave the page–here’s the add-on for Chrome, for Safari, for Firefox, and for Microsoft Edge. If you see something, save something!”
- Engineers Say “Nyet” to Doing Business in Russia, Survey Says ... ›
- Carnegie Mellon is Saving Old Software from Oblivion - IEEE ... ›
- Volunteers Scramble to Preserve Ukraine’s Digital Culture - IEEE Spectrum ›
- Ukrainian Volunteers Use 3D Printers to Save Lives - IEEE Spectrum ›
Tekla S. Perry is a senior editor at IEEE Spectrum. Based in Palo Alto, Calif., she's been covering the people, companies, and technology that make Silicon Valley a special place for more than 40 years. An IEEE member, she holds a bachelor's degree in journalism from Michigan State University.