Cloud storage services usually charge clients for how much data they wish to store. But charging users only when they actually use that data may be a more cost-effective approach, a new study finds.
Internet-scale web applications—the kind that run on servers across the globe and may handle millions of users—are increasingly relying on services that store data in the cloud. This helps applications deal with huge amounts of data. Facebook, for example, generates 4 petabytes (4 million gigabytes) of data every day.
Cloud storage services such as Amazon S3, Google Cloud Storage, and OpenStack Swift are now the first choice for these storage-intensive web applications. To reduce the delay or latency that these applications experience between requesting and receiving data, web applications also often use cloud caching systems such as Redis and Memcached that cache data in main memory—components that computers use to store information for immediate use. Main memory shuffles data more quickly than conventional storage options such as hard disks or flash memory.
Since main memory is expensive, cloud caching is typically only used for small pieces of data, or "objects," ranging in size from a few bytes to a few kilobytes. Larger objects (measuring megabytes to gigabytes) are generally thought to consume too much memory and bandwidth to be an efficient use for cloud caching.
However, previous research found that in cluster computing, caching large objects can prove effective and beneficial. Ao Wang, a computer scientist at George Mason University in Fairfax, Virginia, and his colleagues wanted to see if large object caching might also work for web applications.
Wang’s team analyzed records from two IBM data centers that provided cloud storage. They found that 30 percent of large objects stored in the data centers were accessed at least 10 times during the course of the study, which lasted 75 days. The most popular objects were accessed more than 100 times and roughly 40 percent of large objects were reused within an hour after they were last accessed. These findings suggested that caching large objects in main memory could boost the performance of cloud services without overwhelming them.
As such, Wang and his colleagues sought to develop a new cloud caching model that can address large objects in a cost-effective manner. They envisioned a cloud storage service that charges users only when a cached object is accessed, instead of charging for memory capacity on an hourly basis, whether cached objects are accessed or not.
The scientists developed a new caching technique called InfiniCache that can support a pay-per-use cloud storage service. When tested on Amazon Web Services’ (AWS) Lambda computing service, InfiniCache achieved at least a 100-fold improvement in latency compared to Amazon S3 in roughly 60 percent of requests for objects larger than 10 megabytes.
In addition, the scientists found InfiniCache performed comparable with the AWS ElastiCache cloud caching service. But when it came to large objects, InfiniCache cost customers roughly one-thirtieth to one-ninetieth as much as ElastiCache, Wang says: "Our price is really cheaper compared with existing costs."
A key challenge InfiniCache faced was how cloud service providers may reclaim memory at any time, risking a loss of cached data. To solve this problem, InfiniCache implements a data backup mechanism in which cached objects synchronize with clones of themselves to minimize the chances that reclaiming memory causes data loss.
In the future, the researchers will analyze how well InfiniCache may deal with small objects, and explore porting it over to other platforms such as Google Cloud Functions. They detailed their findings on 27February at the USENIX FAST conference in Santa Clara, California.
Charles Q. Choi is a science reporter who contributes regularly to IEEE Spectrum. He has written for Scientific American, The New York Times, Wired, and Science, among others.