How the Wayback Machine Is Saving Digital Ukraine

The Internet Archive’s Mark Graham explains the rush to protect Ukranian digital resources—and some Russian ones, too

3 min read
computer code in ukraine flag colors
iStockphoto

When the Ukrainian invasion began, the Internet Archive launched several efforts to capture the Ukrainian Internet. Its archivists launched a high-volume crawl through hundreds of thousands of websites ending in “.ua.” They selected specific sites to archive as completely as possible, including government, education, and library sites. And they targeted journalism, particularly Ukrainian news sites and aggregators. The organization has also been supporting others working to save Ukraine’s digital resources, including SUCHO (Saving Ukrainian Cultural Heritage Online) and the Archive Team.

Mark Graham, director of the Internet Archive’s Wayback Machine, explained this dive into Ukraine’s Internet and how it differs from the Wayback Machine’s usual approach to preserving digital history.

Weren’t you archiving Ukrainian websites already?

Mark Graham: We do archive a lot of material. We add more than a billion URLs a day to the Wayback Machine. But the Web is big—bigger than our efforts can fully address—and we can’t archive everything. We incorporate many different approaches. For example, we work with partners that feed us URLs—we almost instantly pick up every URL cited in Wikipedia. We have partnered with Cloudflare. And we work with domain experts who submit Google Sheets of URLs. What is different now is that we are marshaling internal resources and working with others to bring a special focus to Ukrainian digital material.

The question that is driving us is how do we use the resources we have to archive as much material that is relevant, significant, and historically useful, as we can? The process is both opportunistic and directed and, of course subjective and constantly changing.

A couple of years ago, I decided to focus on a couple of dozen countries including Ukraine and Russia, archiving more than 300 news sites from Ukraine and 900 news sites from Russia every day. So we had a good start.

What has changed now?

Graham: We are going deeper and wider. Wider, meaning that in addition to more websites—especially news, government, academic sites—we are looking for specific Web pages, like news articles, photos of cultural sites, and descriptions of museum artifacts.

“Once it’s gone, you can’t get it back. It’s a race against time.”

Going deeper means that we take a specific site—say, related to a museum in the Ukraine—and attempt to get every single URL that makes up that website. Previously, we may not have necessarily archived them all

To do this we are working with experts. We support the efforts of people who have organized together, like SUCHO, that has more than 1,000 volunteers to curating materials to be archived. They are cataloging URLs they see as most relevant and submitting those to us and are directly archiving them using our “Save Page Now” tool for specific URLs that they identify. (They are using other platforms and tools as well; we aren’t the only game in town.)

We also see subscribers of our Archive-It service turning their attention to Ukraine. These include about 1,000 colleges, museums, and other organizations that pay us for Web-archiving support. Many of those are involved in the effort to archive Ukrainian material

What about Russia?

Graham: To me, Ukraine is a big part of the story, but another big part is indeed Russia. I’m trying to focus on Russian news. One of the biggest questions of our time is going to be trying to understand how we came to have someone acting like a dictator with an ironclad control over the media in their country who is able to have a state perspective that became the dominant understanding of the people. It’s like what happened in Germany in the ’30s and ’40s.

What digital resources are you talking about besides websites?

Graham: Images, video, radio, television…we are working on archiving TikTok, and YouTube, and even Telegram. And we are archiving data sets—e.g., the kind of files people move around in Google Drive.

What can people do to help?

Graham: Well of course money always helps; we fill up several 16-terabyte drives every day. But I don’t want to focus on that. What’s important right now is that we work together to capture as much as we can of material that certainly is at risk and may very well be destroyed. The buildings, the servers—these may be bombed. And once it’s gone, you can’t get it back. It’s a race against time. But together we can help to preserve digital culture that might otherwise be lost.

To help by becoming a sponsor, you can donate to the Internet Archive.

Another way to help, according to a letter sent by the Internet Archive to its patrons “is by using the Wayback Machine to preserve websites you may be concerned about. With the Save Page Now feature, anyone can submit URLs to be archived; if you’re logged in with an Internet Archive account, you can also select ‘Outlinks’ to capture any pages that link to the page you’ve selected. And if you have the Wayback Machine browser extension, you can take a snapshot without having to leave the page–here’s the add-on for Chrome, for Safari, for Firefox, and for Microsoft Edge. If you see something, save something!”

The Conversation (1)
Benjamin Goulart22 Apr, 2022
INDV

Ok, that's all nice, but I don't think they've even done a very good job saving most other major sites that everyone in the world used, like the comment sections of CNN.com, YahooNews.com, and Amazon.com that have since been deleted, for instance. Does Internet Archive even have an archive of most of YahooGroups? I look for the amazing wavemusic.com gear forum or the old firefly.com, and it's mostly the front pages and some buttons they stored, which is pretty much useless.

Why Functional Programming Should Be the Future of Software Development

It’s hard to learn, but your code will produce fewer nasty surprises

11 min read
Vertical
A plate of spaghetti made from code
Shira Inbar
DarkBlue1

You’d expectthe longest and most costly phase in the lifecycle of a software product to be the initial development of the system, when all those great features are first imagined and then created. In fact, the hardest part comes later, during the maintenance phase. That’s when programmers pay the price for the shortcuts they took during development.

So why did they take shortcuts? Maybe they didn’t realize that they were cutting any corners. Only when their code was deployed and exercised by a lot of users did its hidden flaws come to light. And maybe the developers were rushed. Time-to-market pressures would almost guarantee that their software will contain more bugs than it would otherwise.

Keep Reading ↓Show less
{"imageShortcodeIds":["31996907"]}