As the old adage goes, a picture is worth a thousand words. But when you're talking about a picture of the Internet, it's worth a whole lot more (about 100 trillion words according to Google's director of research, Peter Norvig). Creating a complete map of the Internet is a goal that has remained largely elusive because of the ever-growing, highly complex nature of the Net. Now, through the efforts of researchers in Israel--from Bar-Ilan University, in Ramat Gan, Hebrew University of Jerusalem, and Tel-Aviv University--the fuzzy image of the Internet has just gotten a bit clearer. The Israeli team recently published a paper in the Proceedings of the National Academy of Science in which they construct a new, more accurate picture of the Internet using a combination of graph-theory analysis and distributed computing.
A network like the Internet is composed of various nodes, or devices, such as computers, routers, PDAs, and so on, that are linked by one or more physical or virtual paths. To determine the structure of the Internet, researchers had to map out all these nodes and links and their relationships to one another. The main problem previous Internet cartographers faced was insufficient information. Data about network structure had been acquired through a limited number of observation points--computers using a software tool called traceroute to determine the path taken by data packets as they moved from source to destination through a network. The trouble was there were too few of these data points to generate a complete picture--a task comparable to a handful of people walking around and trying to map out the entire world.
In order to overcome this limitation, a team of researchers at Tel Aviv University headed by Yuval Shavitt developed the Distributed Internet Measurement and Simulation (DIMES) project. DIMES gathered network structure information through a technique known as distributed computing. Volunteers download an agent program that runs in the background onto their personal computers--in a similar fashion to SETI@homeor Folding@home--gathering network data without interfering with system processes. This data, consisting of information related to 20 000 network nodes, is then processed by using a technique of graph theory called k -shell decomposition.
According to Shai Carmi of Bar-Ilan University, who performed much of the data analysis, k -shell decomposition breaks up the nodes in the network into nested hierarchical shells. It starts by finding all the nodes having only one connection. Those nodes are removed from the network and assigned the outermost shell, the 1-shell. Next, the nodes having two or fewer remaining connections are removed to the 2-shell. The process continues until all the nodes have been indexed. Using this technique, the team categorized the various shells as belonging to the network's crust, core, or nucleus--the best-connected nodes.
The nucleus consisted of more than one type of node. It included nodes like ATT Worldnet, having more than 2500 direct connections, and nodes like Google, which links to only 50 other nodes. The reason Google is in the nucleus is that it connects almost exclusively to the best-connected nodes in the Internet.
The size of the dots on the map identifies how many connections they make, while their k -shell index is indicated by their color and position.