April 3, 1998Study Finds Search Engines Cover Only a Small Portion of the Web
By THOMAS E. WEBER
Staff Reporter of THE WALL STREET JOURNAL
Take a massive phone book, and tear out most of the pages. What you have is a lot like the listings provided by Web search engines.
Even the most thorough search engine manages to find only about a third of the pages on the Web, according to a new study published Friday in the journal Science. And other popular search sites cover 10% or less of the electronic universe. That leaves page after page floating out there, somewhere, unreachable by anyone lacking the specific Web address.
"I don't think people realize how little coverage of the Web the search engines provide," says C. Lee Giles, co-author of the study and a scientist at NEC Corp.'s research lab in Princeton, N.J. "I was quite surprised."
With good reason. Search engines are best known for turning up too much information, not too little -- often responding to simple queries with tens of thousands of Web pages. Now the study raises the sobering prospect that the one page you need might not be among those thousands and there may be no way to find it.
If that's bad news, here is worse: As millions of pages are added to the Web each year, finding a specific piece of information will only get tougher. Unless search tools are improved, some users may get frustrated and give up. That could call into question not only the business of search engines -- among the hottest stocks in the on-line industry -- but also the continued prominence of the Web itself.
HotBot, which emerged as the most comprehensive search engine among the six included in the study, indexes only about 34% of the estimated 320 million pages on the Web. The worst of the six: Lycos, with a paltry 3% coverage.
Those findings are riling search-engine executives -- especially the ones whose sites came in low in the rankings.
"Quite frankly, I don't give these kinds of reports a lot of credence. Our focus is not on quantity, it's on quality," says Rajive Mathur, senior product manager at Lycos Inc. Graham Spencer, chief technology officer at Excite Inc., takes a similar tack. "It's becoming less and less meaningful to measure the size of the Web.
" David Pritchard, marketing director for search services at Wired Ventures Inc.'s HotBot, said: "We're the largest index out there -- there are no surprises for us in this report."
To understand the NEC researchers' findings, you need to know how search engines work. Using automated software robots, search engines follow links across the Web, calling up pages wherever they can find them. Once a page has been summoned, the search engine automatically indexes some or all of the words on the page. Then, when a Web surfer punches in words to search, the search engine looks up those words in its index and calls up the appropriate Web-page addresses. The engine doesn't actually go out on the Web each time a search is requested.
In the NEC study, Mr. Giles and an NEC colleague, Steve Lawrence, performed hundreds of searches, duplicating them on each search site. They then looked to see where the results overlapped and where they didn't, and from that they extrapolated the overall size of the Web. To figure out each engine's coverage area, they simply compared the total number of pages with how many pages each engine indexes, a figure the search services usually disclose.
Can search engines do better? The researchers say it is probably impossible to index the entire Web. And without more sophisticated tools to rank search results, such a gargantuan database wouldn't be much help.
The very futility of building giant databases will likely result in a trend toward smaller, specialized search sites, they say. For instance, HotBot now runs a special search engine called NewsBot that constantly patrols a fixed set of news-related sites. By concentrating on specific tasks, these search engines could be smarter and more thorough within their own field.
Some services avoid the entire issue by only providing highlights of the Web. Yahoo! Inc., one of the most widely used directories, isn't a search engine at all -- it employs human researchers to sift through sites and list them under appropriate categories instead of indexing every page in the electronic universe.
For now, though, users hunting down an elusive piece of information would do well to follow one piece of advice from the researchers: Don't limit yourself to a single search site. "Using multiple search engines is one method of doing a more thorough search, especially if you're searching for less popular documents," says NEC's Mr. Lawrence.
In fact, when the researchers ran a search through all six sites they studied, they found on average 3 1/2 times as many results as the average search using just one site. But even then, that combined search only sifts through less than 60% of the Web pages that the researchers believe are out there.
Users would also do well to construct narrow, focused search queries, the researchers say. "In practice a lot of users tend to search on just one or two words," Mr. Lawrence says. That kind of broad search will probably call up thousands of entries with little assurance that the information you're looking for will be anywhere near the top.
One Web surfer who doesn't plan to follow their advice is Louis Monier, technical director for the AltaVista search engine owned by Digital Equipment Corp. Mr. Monier claims he never needs to use any search engine but his own. "I usually find what I need," he says. "I usually find too much."
Search Engines and Directories Mentioned in This Article:
Hotbot
Lycos
Excite
Yahoo!
Newsbot
AltaVista
Copyright © 1998 Dow Jones & Company, Inc. All Rights Reserved.