Sometime next year, if all goes as planned, the largest scientific instrument ever built will come to life in a labyrinthine underground complex in Switzerland, near Geneva. Buried more than 100 meters down, the Large Hadron Collider (LHC) will send two beams of protons in opposite directions around a 27-kilometer-long circular tunnel. The beams, whizzing at nearly the speed of light, will collide head-on, producing a shower of subatomic fragments that scientists expect will include exotic, never-before-seen particles that could change our fundamental knowledge of the universe.
That's the hope, anyway. Researchers at the European Organization for Nuclear Research (CERN), which will operate the LHC, know that spotting the elusive bits of matter they are looking for will be a daunting task. To find them, the researchers will have to sift through a colossal haystack of collision data: the LHC is expected to spew out some 15 million gigabytes a year--on average, that's more than enough to fill six standard DVDs every minute.
Storing and analyzing the mountain of data, it turns out, is a task that no supercomputer in the world can handle. So while the LHC team rushes to finish the mammoth subterranean machine, above ground another group of physicists and computer scientists has been solving a problem of its own: assembling a computing infrastructure able to handle LHC's data deluge. Their solution? A vast collection of high-powered computer systems scattered in nearly 200 research centers around the world, networked and configured to function as a single parallel processing system. This type of infrastructure is known as a computing grid.
Computing grids emerged in the late 1990s as an alternative to traditional supercomputers to solve certain problems demanding powerful number crunching and access to larger amounts of distributed data. The idea was that with sufficiently fast networks and the right software, multiple and geographically dispersed research groups could pool their computing and data management resources into a unified system capable of tackling problems that would be out of reach for any of them alone. Such grids, those early researchers hoped, would do for serious computing power what electricity grids did for electricity: make it available everywhere. Just plug your PC into a computing grid and you'd have instant access to supercomputing power at an affordable cost.
We are not quite there yet. Today, although grids have sprung up all over the place, most of them are specialized systems available to only a small cadre of researchers in fields such as high-energy physics, genome research, and earthquake monitoring. How, then, can we turn grids into an everyday research tool that can energize a wider range of scientific and technical pursuits?
That is the question CERN and its partner universities, research agencies, and companies—most of them in Europe but some in the United States, Asia, and Latin America—hope to answer by building on the experience of the LHC grid to create a massive global grid infrastructure. Led by CERN, the group wants to transform this new global grid into a tool capable of solving a great variety of problems in science, engineering, and industry.
The initiative, funded by the European Union, is called Enabling Grids for E-sciencE (EGEE). Behind the awkward acronym lies an ambitious effort [see illustration, ” Going Global”]. The EGEE grid now combines the processing power of more than 20 000 CPUs, a storage capacity of about 5 million GB—growing rapidly in anticipation of the LHC data—and a global network connecting some 200 sites in such places as Paris, Moscow, Taipei, and Chicago. The grid is already crunching test data for the LHC experiments [see sidebar, ”The Big Data Bang”] and also for dozens of applications in such areas as astrophysics, medical imaging, bioinformatics, climate studies, oil and gas exploration, pharmaceutical research, and financial forecasting. It's now the world's largest general-purpose scientific computing grid, and it's getting bigger every month.