A View to the Cloud

What really happens when your data is stored on far-off servers in distant data centers

6 min read
Illustration: Francesco Muzzi/Story TK
Illustration: Francesco Muzzi/Story TK

We live in a world that’s awash in information. Way back in 2011, an IBM study estimated that nearly 3 quintillion—that’s a 3 with 18 zeros after it—bytes of data were being generated every single day. We’re well past that mark now, given the doubling in the number of Internet users since 2011, the powerful rise of social media and machine learning, and the explosive growth in mobile computing, streaming services, and Internet of Things devices. Indeed, according to the latest Cisco Global Cloud Index, some 220,000 quintillion bytes—or if you prefer, 220 zettabytes—were generated “by all people, machines, and things” in 2016, on track to reach nearly 850 ZB in 2021.

Much of that data is considered ephemeral, and so it isn’t stored. But even a tiny fraction of a huge number can still be impressively large. When it comes to data, Cisco estimates that 1.8 ZB was stored in 2016, a volume that will quadruple to 7.2 ZB in 2021.

Our brains can’t really comprehend something as large as a zettabyte, but maybe this mental image will help: If each megabyte occupied the space of the period at the end of this sentence, then 1.8 ZB would cover about 460 square kilometers, or an area about eight times the size of Manhattan.

Of course, an actual zettabyte of data doesn’t occupy any space at all—data is an abstract concept. Storing data, on the other hand, does take space, as well as materials, energy, and sophisticated hardware and software. We need a reliable way to store those many 0s and 1s of data so that we can retrieve them later on, whether that’s an hour from now or five years. And if the information is in some way valuable—whether it’s a digitized family history of interest mainly to a small circle of people, or a film library of great cultural significance—the data may need to be archived more or less indefinitely.

Projected global data-center storage capacity, 2016 to 2021

/image/MzEwMTYwMw.png Source: Cisco Global Cloud Index

The grand challenge of data storage was hard enough when the rate of accumulation was much lower and nearly all of the data was stored on our own devices. These days, however, we’re sending off more data to “the cloud”—that (forgive the pun) nebulous term for the remote data centers operated by the likes of Amazon Web Services, Google Cloud, IBM Cloud, and Microsoft Azure. Businesses and government agencies are increasingly transferring more of their workloads—not just peripheral functions but also mission-critical work—to the cloud. Consumers, who make up a growing segment of cloud users, are turning to the cloud because it allows them to access content and services on any device wherever they go.

And yet, despite our growing reliance on the cloud, how many of us have a clear picture of how the cloud operates or, perhaps more important, how our data is stored? Even if it isn’t your job to understand such things, the fact remains that your life in more ways than you probably know relies on the very basic process of storing 0s and 1s. The infographics below offer a step-by-step guide to storing data, both locally and remotely, as well as a more detailed look into the mechanics of cloud storage. 

I. The Basics of Data Storage

  • /image/MzEwMTQ3Nw.jpeg

    Storing Data Locally on a Solid-State Drive

    Step 1:  Clicking on the “Save” icon in a program invokes firmware that locates where the data is to be stored on the drive.

  • img

    Step 2: Inside the drive, data is physically located in different blocks. A quirk of the flash memory used in solid-state drives is that when data is being written, individual bits can only be changed from 1 to 0, never from 0 to 1. So when data is written to a block, all of the bits are first set to 1, erasing any previous data. Then the 0s are written, creating the correct pattern of 1s and 0s.

  • /image/MzEwMTQ3OQ.jpeg

    Step 3: Another quirk of flash memory is that it’s prone to corrupting stored bits, and this corruption tends to affect clusters of bits that are located close together. Error-correcting codes can compensate for only a certain number of corrupted bits per byte, so each bit in a byte of data is stored in a different block, to minimize the likelihood of multiple bits in a given byte being corrupted.

  • img

    Step 4: Because erasing a block is slow, each time a portion of the data on a block is updated, the updated data is written to an empty part of the block, if possible, and the original data is marked as invalid. Eventually, though, the block must be erased to allow new data to be written to it.

  • /image/MzEwMTUwMA.jpeg

    Step 5: Reading back data always introduces errors, but error-correcting codes can locate the errors in a block and correct them, provided that the block has only one or two errors. If the block has multiple errors, the software can’t correct them, and the block is deemed unreadable. Errors tend to occur in bursts and may be caused by stray electromagnetic fields—for example, a phone ringing or a motor turning on. Errors also arise from imperfections in the storage medium.

  • /image/MzEwMTQ5NQ.jpeg

    Storing Data Remotely in the Cloud

    Step 1: Data is saved locally, in the form of blocks.

  • /image/MzEwMTQ5Ng.jpeg

    Step 2: Data is transmitted over the Internet to a data center  [see below, "A Deeper Dive into the Cloud"].

  • /image/MzEwMTQ5Nw.jpeg

    Step 3: For redundancy, data is stored on at least two hard-disk drives or two solid-state drives (which may not be in the same data center), following the same basic method described above for local storage.

  • /image/MzEwMTQ5OQ.jpeg

    Step 4: Later that day, if the data center follows best practices, the data is backed up onto magnetic tape. However, not all data centers use tape backup.

  • /image/MzEwMTUwMA.jpeg

    Step 5: Reading back data always introduces errors, but error-correcting codes can locate the errors in a block and correct them, provided that the block has only one or two errors. If the block has multiple errors, the software can’t correct them, and the block is deemed unreadable. Errors tend to occur in bursts and may be caused by stray electromagnetic fields—for example, a phone ringing or a motor turning on. Errors also arise from imperfections in the storage medium.

II. A Deeper Dive Into the Cloud

Though most data that gets stored is still retained locally, a growing number of people and devices are sending an ever-greater share of their data to remote data centers.

  • img

    Data from multiple users moves over the Internet to a cloud data center, which is connected to the outside world via optical fiber or, in some cases, satellite gigabit-per-second links.

  • img

    The cloud data center is basically a warehouse for data, with multiple racks of specialized storage computers called database servers.

  • img

    There are three basic types of cloud data storage: hard-disk drives, solid-state drives, and magnetic tape cartridges, which have the following features:

    Magnetic Tape Hard-Disk Drive Solid-State Drive
    Access time, read/write 10–60 seconds 7 milliseconds 50/1,000 nanoseconds
    Capacity 12 terabytes 8 terabytes 2 terabytes
    Data persistence 10–30 years 3–6 years 8–10 years
    Read/write cycles Indefinite Indefinite 1,000
  • img

    Cloud security: Cloud data centers protect data using encryption and firewalls. Most centers offer multiple layers of each. Opting for the highest level of encryption and firewall protection will of course increase the amount of time it takes to store and retrieve the data.

/image/MzEwMTI2Mg.jpeg Perpendicular magnetic recording allows for data densities of about 3 gigabits per square millimeter.

Magnetic Storage vs. Solid State

Hard-disk drives and magnetic tape store data by magnetizing particles that coat their surfaces. The amount of data that can be stored in a given space—the density, that is—is a function of the size of the smallest magnetized area that the recording head can create. Perpendicular recording [right] can store about 3 gigabits per square millimeter. Newer drives based on heat-assisted magnetic recording and microwave-assisted magnetic recording boast even higher densities. Flash memory in solid-state drives uses a single transistor for each bit. Data centers are replacing HDDs with SDDs, which cost four to five times as much but are several orders of magnitude faster. On the outside, a solid-state drive may look similar to an HDD, but inside you’ll find a printed circuit board studded with flash memory chips.

About the Author

Barry M. Lunt is a professor of information technology at Brigham Young University, in Provo, Utah. 

The Conversation (0)

New Fuel Cell Tech Points Toward Zero-Emission Trains

Lighter hydrogen converter design could give locomotives an efficiency boost

2 min read

A motor and flywheel simulate the load during testing of a new hydrogen cell converter designed by Pietro Tricoli and Ivan Krastev at the University of Birmingham.

Ivan Krastev

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

Keep Reading ↓ Show less

Q&A With Co-Creator of the 6502 Processor

Bill Mensch on the microprocessor that powered the Atari 2600 and Commodore 64

5 min read
Bill Mensch

Few people have seen their handiwork influence the world more than Bill Mensch. He helped create the legendary 8-bit 6502 microprocessor, launched in 1975, which was the heart of groundbreaking systems including the Atari 2600, Apple II, and Commodore 64. Mensch also created the VIA 65C22 input/output chip—noted for its rich features and which was crucial to the 6502's overall popularity—and the second-generation 65C816, a 16-bit processor that powered machines such as the Apple IIGS, and the Super Nintendo console.

Many of the 65x series of chips are still in production. The processors and their variants are used as microcontrollers in commercial products, and they remain popular among hobbyists who build home-brewed computers. The surge of interest in retrocomputing has led to folks once again swapping tips on how to write polished games using the 6502 assembly code, with new titles being released for the Atari, BBC Micro, and other machines.

Keep Reading ↓ Show less

How to Write Exceptionally Clear Requirements: 21 Tips

Avoid bad requirements with these 21 tips

1 min read

Systems Engineers face a major dilemma: More than 50% of project defects are caused by poorly written requirements. It's important to identify problematic language early on, before it develops into late-stage rework, cost-overruns, and recalls. Learn how to identify risks, errors and ambiguities in requirements before they cripple your project.

Trending Stories

The most-read stories on IEEE Spectrum right now