Computing

What's in a Blockchain? With New Tools, Anyone Can Find Out

Services available through Google's BigQuery platform make it easy to search and analyze blockchain transactions

An illustration of links in a chain with binary code on them.
Image: iStockphoto

As the intrigue of blockchains settles to a quiet simmer, it's time to ask: How far has the technology advanced? Hundreds of new coin species have been minted. Billions of dollars have been raised. And tales have been spun of an encroaching technological revolution. 

So, what has it changed? Who is using cryptocurrencies, and how often? How many coins are there? How are they being used? How secure are they? Which networks are thriving and which have withered?

These are questions best asked of data. However, for a technology that promises to bring transparency to the business of moving money, blockchain networks are remarkably opaque. On blockchains like Ethereum and Bitcoin, the complete historical records of financial transactions are out there for anyone to see. But getting your hands on an up-to-date version of them is far from trivial. As is the task of running any kind of analysis. 

Recently, a handful of new projects have set out to make it much easier to access and query blockchain data. And by doing so, they may shed light on how far cryptocurrency projects have come and how far they still have to go.

Google is certainly the biggest player to enter the blockchain search field. This month, the company announced that it has made available, through its BigQuery cloud platform, the full data sets from eight of the most active blockchain networks: Bitcoin, Bitcoin Cash, Ethereum, Ethereum Classic, Zcash, Dash, Litecoin, and Dogecoin.

The transactional data for each of these cryotocurrencies is already public, but Google is now providing it in a form that is easily accessible to data scientists. 

In the past, if researchers wanted a glimpse into these blockchains, they would need to first spin up a node on the peer-to-peer networks that run them and use that connection to download and parse raw data passed onto them from other nodes in the network. 

“For your average data scientist, they don't have the time to run their own node or write their own tools to parse that node to get that data out…and even if they did do that, they'd have to do it every day, just to get the latest data,” explains Yaz Khoury, the director of developer relations for the Ethereum Classic Cooperative, a nonprofit that funds development and outreach in support of the Ethereum Classic network. “They shouldn't have to suffer by setting up all that data engineering infrastructure.”

For those lacking the time and resources to download their own copy of a blockchain, there is also the option of browsing a service called an explorer, a primitive search engine that publishes block data online. Multiple explorers are now available for all the major cryptocurrencies, but they come with their own restrictions. On these websites, the data is not presented in a form that is easy to analyze. And though some provide charts visualizing the most basic economic trends, the insights they offer are mostly only of interest to the website owners. 

Google is now positioning itself as the place to go if you want to run an analysis without the hassle. By accessing BigQuery, researchers get remote access to blockchains structured in a relational database, updated daily to Google's cloud. Presented in this form, it’s then possible to run an analysis with Standard Query Language (SQL), a domain-specific language commonly used by data scientists.

“We converted the blockchain into a database that you can query. That opens it up to a lot of people who never would have touched the blockchain as a structure,” says Khoury, who collaborated with Google to bring the Ethereum Classic blockchain data to BigQuery.

With cryptocurrencies like Bitcoin and Litecoin, in which the main function of the network is simply to move value around, this may be enough. However, analysis gets a lot trickier with more complex blockchains like Ethereum.

In addition to standard transactions, blockchains like Ethereum also run smart contracts, code that remotely executes complex applications, called Dapps. However, before any analysis can be performed on these functions, the applications must be decompiled to their source code, a service that BigQuery does not provide.

Developers are now making such tools available outside of the BigQuery platform. Tomasz Kolinko, a developer working on the Ethereum blockchain, has created his own decompiler, called Eveem, which he has been using to load data from contracts back onto BigQuery, where they can then be used for basic analysis.

In this way, BigQuery is functioning as a repository for sharing data beyond what Google itself has to offer.

Kolinko says combining decompiled data from Ethereum, together with BigQuery's search capabilities, will especially empower researchers who want to inspect the security of the Ethereum network.

This year, he has used the two tools in tandem to search for security bugs known to exist in certain contracts and measure their prevalence across the entire network. When the results are pushed into the cloud, he says, it is much more likely that auditors will find vulnerabilities before they cause massive damage to users. 

“If there are many more eyes looking at this data…perhaps we can find the contracts that are affected before they gain in size,” says Kolinko.

BigQuery, however, is only practical for inspecting data on public blockchains. During the blockchain frenzy of the last two years, much of the innovation has come from private blockchains, networks where participation is restricted to a pool of vetted users.

While you won't find any of these blockchain data sets on Google's BigQuery platform, another company, Hacera, is pushing to make them at least partially transparent.

In a project called Unbounded, Hacera provides a registry where the administrators of private blockchains can list their networks along with a description of what functions they provide. Administrators of private blockchains can also use Unbounded, which is itself a blockchain built on Hyperledger’s Fabric, to selectively publish details about their networks, pushing operating data into the public that would otherwise be visible only to a closed group of participants.

There are many reasons to build a private blockchain. Most are run by businesses who have regulatory obligations to keep their client data out of the public view. With Hacera, these companies can choose to publish portions of their data, such as the total transaction volume or the number of participants on their network. Doing so provides some indication of their rate of adoption without running afoul of regulators.

According to Jonathan Levi, the founder and CEO of Hacera, even this small level of transparency will help people in the industry get a better sense of which technologies are available and how they are functioning. In the long run, they may even inspire collaboration, which has been one of the central goals of blockchain enthusiasts from the beginning.

“At the moment, everyone is just trying to create another chain,” says Levi. "We're trying to say—let's use what's out there.”

IEEE Spectrum
FOR THE TECHNOLOGY INSIDER

Follow IEEE Spectrum

Support IEEE Spectrum

IEEE Spectrum is the flagship publication of the IEEE — the world’s largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science.