Special R and D Report
Remember software agents? Our own little software robots, they were
supposed to represent us on the World Wide Web, which was
already shaping up to be more information than any unaided
human could sift through.
Agents were going to know all about our needs, likes, and
interests. They would forage every night for news and information,
book our business travel for us, even do the preliminary research
for our next management report.
It never happened. These robots were hard to build—too hard, actually.
After all, Web pages are designed for human consumption. Words
have meaning, indeed, multiple meanings: Is a particular document
on "banking" about saving money or turning an airplane? The
cues we use to derive meaning—position on the page, context,
graphics, and other nontext elements—were beyond any
software agent's ken. And some of the best information on
the Web was hidden in databases that agents couldn't enter.
Now committees of researchers from around the globe are attacking the problem
from the other direction. They want to make the Web more homogeneous,
more data-like, more amenable to computer understanding—and
then agents won't have to be so bright. In other words, if
Web pages could contain their own semantics—if we had
a Semantic Web—software agents wouldn't need to know
the meanings behind the words.
But in the meantime, the Web has continued to grow. By the late 1990s,
the leading search engine of the day, Altavista, could index
only 30 percent of the Web. Searches often missed the most
salient documents, and the ranking of hits with resp ect to
search terms was poor. Just in time, along came Google with
a better indexing engine and vastly better relevance ranking.
While Google can match the Web's astonishing growth, can it keep
up with the expectations of its users? Someone who today says
to a search engine, in effect, "Find some good documents on
compound fractures of the ankle" will soon want to ask what
he or she really wants to know: "Who are the best orthopedic
surgeons near where I live, and are they included in my medical
coverage?"
That sort of query can never be asked of an HTML-based Web. If
we couldn't build intelligent software agents to navigate
a simplistic Web, can we really build intelligence into the
3 billion or 10 billion documents that make up the Web?
While that sounds like moving the mountain to Mohammed, to Tim Berners-Lee,
the inventor of the World Wide Web, it's not out of the question.
The first step is to get a fulcrum under the mountain and
lift it, and it is well under way. That fulcrum is the extensible
markup language (XML). A sort of HTML-on-steroids, this coding
system isolates, under the hood, the dozens or even hundreds
of data elements a Web page might contain. Right now, HTML
coding serves mostly to control the appearance and arrangement
of the text and images on a Web page, so that only a few elements
are tagged, such as <title> and <bold>. With new
XML tags, <price>, for instance, a software agent might
be able to, for example, comparison shop across different
Web sites, or update an account ledger after an e-purchase.