This is part of IEEE Spectrum's special R&D report: They Might Be Giants: Seeds of a Tech Turnaround.
Remember software agents? Our own little software robots, they were supposed to represent us on the World Wide Web, which was already shaping up to be more information than any unaided human could sift through. Agents were going to know all about our needs, likes, and interests. They would forage every night for news and information, book our business travel for us, even do the preliminary research for our next management report.
It never happened. These robots were hard to build--too hard, actually. After all, Web pages are designed for human consumption. Words have meaning, indeed, multiple meanings: Is a particular document on "banking" about saving money or turning an airplane? The cues we use to derive meaning--position on the page, context, graphics, and other nontext elements--were beyond any software agent's ken. And some of the best information on the Web was hidden in databases that agents couldn't enter.
Now committees of researchers from around the globe are attacking the problem from the other direction. They want to make the Web more homogeneous, more data-like, more amenable to computer understanding--and then agents won't have to be so bright. In other words, if Web pages could contain their own semantics--if we had a Semantic Web--software agents wouldn't need to know the meanings behind the words.
But in the meantime, the Web has continued to grow. By the late 1990s, the leading search engine of the day, Altavista, could index only 30 percent of the Web. Searches often missed the most salient documents, and the ranking of hits with resp ect to search terms was poor. Just in time, along came Google with a better indexing engine and vastly better relevance ranking.
While Google can match the Web's astonishing growth, can it keep up with the expectations of its users? Someone who today says to a search engine, in effect, "Find some good documents on compound fractures of the ankle" will soon want to ask what he or she really wants to know: "Who are the best orthopedic surgeons near where I live, and are they included in my medical coverage?"
That sort of query can never be asked of an HTML-based Web. If we couldn't build intelligent software agents to navigate a simplistic Web, can we really build intelligence into the 3 billion or 10 billion documents that make up the Web?
While that sounds like moving the mountain to Mohammed, to Tim Berners-Lee, the inventor of the World Wide Web, it's not out of the question. The first step is to get a fulcrum under the mountain and lift it, and it is well under way. That fulcrum is the extensible markup language (XML). A sort of HTML-on-steroids, this coding system isolates, under the hood, the dozens or even hundreds of data elements a Web page might contain. Right now, HTML coding serves mostly to control the appearance and arrangement of the text and images on a Web page, so that only a few elements are tagged, such as <title> and <bold>. With new XML tags, <price>, for instance, a software agent might be able to, for example, comparison shop across different Web sites, or update an account ledger after an e-purchase.