Data Divination

To get better at forecasting big political events, we need both better data and sharper reporting, a clearer read on the numbers, and a more penetrating portrait of on-the-ground realities.

The results of recent votes, particularly the U.S. presidential election and the United Kingdom’s referendum on leaving the European Union (better known as Brexit) left many surprised. In both cases, the postdecision lament went something like this: “In this age of big data, how could the pollsters and pundits have been so wrong in their predictions?”

I’m just a language guy, so I don’t pretend to have an answer, although surely it’s an open question whether polls, with their sample sizes in the few thousands, count as “big” data. (Polling statistics probably fall more under the rubric of medium data.) Perhaps that was the problem: If preelection and prereferendum analyses could have accessed data points in the millions, then might the results have been less surprising? Or perhaps what’s needed isn’t big data on its own but an approach that takes advantage of the many new types of data that are available.

For example, fast data refers to data that requires near-instantaneous access or analysis or that is relevant for only a very short time. It’s an example of hot data, which is used constantly, so it must be easily and quickly accessible. On the opposite side of the information coin we have slow data, which accumulates over a relatively long time, meaning that at some point it might become long data, which extends back in time hundreds of years. It’s an example of cold data, which is used relatively infrequently, so it can be less readily available. Whether fast or slow, hot or cold, the information isn’t much use to anyone if it’s dirty data, which is incomplete, inconsistent, or just plain wrong.

In the same way that dark matter is an unseen but very large part of the cosmos (some estimates peg dark matter at 27 percent of the mass of the universe), dark data represents the unseen but very large part of the data that most corporations collect and store. It’s “dark” because corporations don’t use it for analysis, insight, or decision making. Some of it is transient data, such as unused sensor data or temporary network routing information, and live data, such as a user’s (changing) GPS coordinates. Occasionally, this data can produce perishable insights: valuable data that has a very short shelf life (such as when you detect that your roaming user is wandering past one of your brick-and-mortar storefronts). The opposite is target-rich data. When tagged, processed, and analyzed, this data offers its owner valuable, long-term insights.

Perhaps there’s a way to synthesize both the big picture and the small—that is, to somehow combine both big data and our contributions to that data: small data, which arises out of our everyday actions. We’d have to navigate some dangers. For example, we’d need to ensure that our data does not become cubed data, where a third party shares our data with another third party and it becomes impossible to predict where it ultimately ends up or how it will be used or interpreted. We’d need some assurances that third parties practice responsible data, information that is used and shared sensitively and humanely.

A more promising approach might be one that utilizes thick data, which combines both quantitative and qualitative analysis. The pundits could take a cue from narrative medicine, which uses the story of a patient’s illness combined with traditional medical practices as a way of understanding, diagnosing, and treating the illness. Rather than making guesses about what people will do—for example, that rural voters would stay home on Election Day or that people who told pollsters they’d vote “Leave” would do the opposite in the voting booth—pundits could actually talk to people and listen to their stories rather than just looking at the numbers. Call it narrative data.

innovation hot data big data it medium data

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Data Divination

Using collected data to predict events shouldn’t blind us to the humans behind it

Related Stories

Why Electronic Health Records Haven't Helped U.S. With Vaccinations

Minsk’s Teetering Tech Scene

How Estonia's Management of Legacy IT Has Helped It Weather the Pandemic

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes:

Topics

Sections

More

For IEEE Members

For IEEE Members

IEEE Spectrum

Follow IEEE Spectrum

Support IEEE Spectrum

Enjoy more free content and benefits by creating an account

Saving articles to read later requires an IEEE Spectrum account

The Institute content is only available for members

Downloading full PDF issues is exclusive for IEEE Members

Downloading this e-book is exclusive for IEEE Members

Access to Spectrum 's Digital Edition is exclusive for IEEE Members

Following topics is a feature exclusive for IEEE Members

Adding your response to an article requires an IEEE Spectrum account

Create an account to access more content and features on IEEE Spectrum , including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE .

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Join the world’s largest professional organization devoted to engineering and applied sciences and get access to this e-book plus all of IEEE Spectrum’s articles, archives, PDF downloads, and other benefits. Learn more →

Access Thousands of Articles — Completely Free

Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders — all free! For full access and benefits, join IEEE as a paying member.

Data Divination

Using collected data to predict events shouldn’t blind us to the humans behind it

Related Stories

Why Electronic Health Records Haven't Helped U.S. With Vaccinations

Minsk’s Teetering Tech Scene

How Estonia's Management of Legacy IT Has Helped It Weather the Pandemic

This article is for IEEE members only. Join IEEE to access our full archive.

Membership includes: