The Future of Big Data: Distilling Less Knowledge Per Bit

Until recently, the word data didn’t require a modifier. But we passed a watershed moment when we started referring to big data. Apparently, that wasn’t a sufficient description for some chunks of data, because people grasped for bolder terms, such as humongous data. Sadly, now, it appears that we have run out of appropriate adjectives. And yet data keeps getting bigger and bigger.

So instead of mentioning data, people have begun waving their hands and talking vaguely about the “cloud.” This seems to be the perfect metaphor—a mystical vapor hanging over Earth, occasionally raining information on the parched recipients below. It is both unknowable and all-knowing. It answers all questions, if only we know how to interpret those answers.

This evolution brings to mind two images. The first is from the current scientific hypothesis that all of the information in a black hole resides in the event horizon that surrounds it. This is like the idea of the cloud, while on Earth below, the practical reality of the cloud manifests in proliferating server farms. These farms bring the second image to mind: Douglas Adams’s city-size supercomputer, Deep Thought, from the classic novel (and radio play and TV show and movie) The Hitchhiker’s Guide to the Galaxy.

With these imaginary end states in mind, I wonder: Where is all this headed? Will data increase indefinitely, or is there some point of diminishing returns? Is there such a thing as enough data—or possibly too much?

There is a popular saying that “data is the new oil.” While I think this is an imperfect metaphor, it is true that both oil and data require refining to be useful. I’m mindful of the information pyramid described in T.S. Eliot’s poem “The Rock”: “Where is the wisdom we have lost in knowledge? / Where is the knowledge we have lost in information?”

For the purposes of our discussion, let’s say that data is composed of 1s and 0s, information is the words and images encoded by data, and knowledge is what we glean or learn from that information. The critical refining is between information and knowledge. In the refining of oil, the ratio of the useful final product to the starting amount of crude is not a function of the amount of crude. Not so with information: The more crude information we have to deal with, the less knowledge we want to produce per bit. Otherwise, big data will simply overwhelm us as it continues to grow. What we want is the small knowledge that we obtain from the big information. As the data set gets bigger, the job gets harder. The catch, however, is that unless the big information is big enough, it may not contain the small signal that we are searching for.

Knowledge inevitably increases, so data has to increase even faster. Fortunately, storage technology seems capable of coping without turning Earth into a giant disk drive, but the crunch will be on the artificial intelligence and algorithms that turn data into knowledge. We have come a long way since Claude Shannon, in his classic paper on information theory, in 1948 [PDF], could simply ignore the knowledge problem by writing: “Frequently the messages have meaning.... These semantic aspects of communication are irrelevant to the engineering problem.”

I’m also mindful of the propensity of drawers, closets, and hard drives to eventually become filled with useless junk. I sometimes blame this on the second law of thermodynamics, which states that entropy—that is, disorder—always increases. Perhaps this will ultimately be true of the cloud. Old, useless information accumulates, and it’s too much work to purge it. Moreover, who’s to say what is useless and what is not? Everything is in there, but everything is too much. Entropy is maximized, and the data ultimately becomes, as Shakespeare put it, full of sound and fury, signifying nothing.

This article appears in the May 2017 print issue as “The Counterintuitive Cloud.”

it networks cloud computing entropy Big Data

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Related Stories

Why Electronic Health Records Haven't Helped U.S. With Vaccinations

Minsk’s Teetering Tech Scene

How Estonia's Management of Legacy IT Has Helped It Weather the Pandemic

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes:

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

The Future of Big Data: Distilling Less Knowledge Per Bit

Without higher-value analyses, big data will overwhelm us

Related Stories

Why Electronic Health Records Haven't Helped U.S. With Vaccinations

Minsk’s Teetering Tech Scene

How Estonia's Management of Legacy IT Has Helped It Weather the Pandemic

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes: