16 July 2008—As both computer chips and supercomputers grow more powerful by linking together more and more processors, they risk wasting money, energy, and time by sending data among processors over inefficient routes. The amount of time a supercomputer spends shuttling data around can vary dramatically but averages between 10 percent and 30 percent. Now one Stanford engineer says he and his colleagues at supercomputer maker Cray have the most efficient scheme yet for directing all that traffic—an architecture he calls the ”flattened butterfly.”
According to William Dally, chairman of Stanford’s computer science department, the flattened butterfly cuts the cost of building a supercomputer in half, compared with the cost of using a standard architecture known as a Clos network. Dally’s simulations show that in multicore microprocessors, the flattened butterfly can increase data throughput by up to 50 percent over a standard mesh network, reduce power consumption by 38 percent, and cut latency—the time a data packet spends waiting to be forwarded—by 28 percent.
The flattened butterfly is an update of an architecture known as a butterfly, which has been around since the 1960s. The name comes from the pattern of inverted triangles created by the interconnections, which looks like butterfly wings. Dally flattens the butterfly by combining columns of routers and linking each router to more processors. The new configuration halves the number of router-to-router connections. Data traveling between the processors can now get to any other processor in fewer hops, even though the physical route may be longer, and that eliminates considerable latency.
The original butterfly moved data by the most efficient route, but it couldn’t handle a conflict between two packets trying to use the same connection at the same time. The Clos network overcame that problem by having all the packets overshoot their destination—going to a more distant router and then hopping back to the right location. Unfortunately, most of the time the overshoot isn’t necessary. ”The problem with the Clos network is that it takes twice as many hops as you really need,” Dally says.
The flattened butterfly adaptively senses congestion and overshoots only when it needs to. Dally compares it to deciding which road to take when driving from San Jose to Palo Alto. If an online map tells him there’s a lot of traffic on the shorter Route 101, he’ll take the longer Route 280. If traffic is the same, he’ll choose the shorter way. It’s this adaptive routing that makes the flattened butterfly work.