The Ultimate Search Engine

By J. Nicholas Hoover,

The search engine, that little browser tool into which you type a phrase, hit Enter, and hope for the best, is notoriously inefficient, often returning millions of off-the-mark URLs. People search for 11 minutes on average before finding what they’re looking for, and half abandon searches without getting that far, according to Microsoft. By Gartner’s estimate, half of potential Web sales are lost because visitors simply can’t find what they want.

Google, Microsoft, Yahoo, and dozens of search specialists, including those catering to business customers, are racing to develop next-generation technologies that do a better job of getting people the information they seek. With emerging tools, people will no longer have to dumb down their queries with the pidgin language understood by first-generation search engines. They’ll be able to ask questions in English and other languages–or pose no question at all and automatically receive results based on their earlier queries or the applications they’re using.

The results users do get will include audio and video files, PowerPoint slides and other infographics, and structured data–all in one stream of results culled from the Web, PCs, and company databases. Over time, image searches will even detect information in the image itself, rather than by parsing metadata.

The search engine, that little browser tool into which you type a phrase, hit Enter, and hope for the best, is notoriously inefficient, often returning millions of off-the-mark URLs. People search for 11 minutes on average before finding what they’re looking for, and half abandon searches without getting that far, according to Microsoft. By Gartner’s estimate, half of potential Web sales are lost because visitors simply can’t find what they want.

Google, Microsoft, Yahoo, and dozens of search specialists, including those catering to business customers, are racing to develop next-generation technologies that do a better job of getting people the information they seek. With emerging tools, people will no longer have to dumb down their queries with the pidgin language understood by first-generation search engines. They’ll be able to ask questions in English and other languages–or pose no question at all and automatically receive results based on their earlier queries or the applications they’re using.

The results users do get will include audio and video files, PowerPoint slides and other infographics, and structured data–all in one stream of results culled from the Web, PCs, and company databases. Over time, image searches will even detect information in the image itself, rather than by parsing metadata.

Powerset, Hakia, and other companies are developing search engines that apply linguistics–the science of language–to interpret questions, analyze Web content, and, as necessary, refine results through interaction with users. Hakia CEO Riza Berkan envisions search engines becoming “knowledgeable creatures in the future if we teach them how to talk and how to understand.”

Semantic search engines parse language much like an English student does, using dictionaries and thesauri to interpret the meaning of words and link them using common rules of syntax and sentence structure. The sentence “IBM bought Tivoli for $743 million in 1996” includes concepts such as buying, buyer, subject of buy, year of buy, and purchase price.

For now, the process is aided by human beings who apply language rules and define categories to narrow searches, though Hakia’s search engine can use language cues to find rough meaning in concepts it doesn’t yet understand. “If it was fully automated, we would claim we have invented a human being,” Berkan says. Web search engines like Google and Yahoo employ linguists, too, though they’re not as far along with semantic search as Hakia or Powerset. Google’s search engine can spell check and returns synonyms and variations of words, but it doesn’t always answer questions accurately.

The technology of enterprise search company Autonomy powers the Federal Preservation Institute’s Historic Preservation Learning Portal, a gateway to documents on preservation rules and methods. The institute uses semantic search to help nonexperts find information. “This allows them to ask in plain language questions that do not have the technical lingo that keywords may have,” says Constance Ramirez, the institute’s director. For example, a site visitor may ask about the preservation of red tile roofs in California. “It’s really fascinating to see all the kinds of things that come back as relevant,” says Ramirez.

IBM is working on specialized text analysis in fields such as health care and government. Customers use its OmniFind Analytics search engine to determine nuances like sentiment–whether a document reflects negatively or positively on a subject–and define and relate specialized words, concepts, and proper nouns used inside a company.

QUERYLESS SEARCH
Serendipity is an amazing teacher. Search engines under development will be able to conduct searches on your behalf, based on your previous queries and without being prompted. Or they might search in the background, using the context in a Word document or Excel spreadsheet to serve up related information. Apple’s iTunes program does something like that now, displaying related music at the iTunes store when a listener plays a track from the hard drive. Getting that right isn’t easy. “Serendipity is the hardest thing to do for search,” IDC analyst Susan Feldman says. It’s computationally intensive, and designing the interface isn’t easy, she says.

MediaRiver developed a downloadable search tool, called Watson, that used information in a Web browser or PC application to search the Web and return results without a user-initiated query. It was a great product, but not a great business, says MediaRiver CEO Al Wasserberger. Instead, Watson got a second life in MediaRiver’s ClickSurge widgets, which determine important concepts on a Web page and embed relevant links elsewhere on the page. A similar product, Blinkx’s Pico, has been relegated to the back pages of its Web site while Blinkx focuses on video search.

Still, queryless search has a promising future. Google and Yahoo have long offered alerts, letting users subscribe to searches, then get e-mails when new results show up. Browser toolbar buttons such as StumbleUpon and Google Dice use Web history to send users to recommended sites with the click of a mouse. Yahoo’s Y!Q service and Mozilla Firefox both include the capability to highlight a word or phrase on a Web page, then launch a search with a right click.

Yahoo’s offering gives more weight to the context of the page where a search originated. So a search on “Florida Gators” coming from a page on college football wouldn’t return results about reptiles in the Everglades.

 

PERSONALIZATION
The term “civil war” conjures different things to different people–a defining point in American history, strife in Liberia, a song by Guns N’ Roses. The more a search engine knows about the searcher, the more it can make educated guesses about the searcher’s intent.Google’s personalized iGoogle pages represent the company’s fastest-growing product by number of new users. Google learns what users want and pushes it to them through RSS feeds and “gadgets.” Alternatively, users can set up a Recommendations tab on Google’s home page that pre-populates with information based on their previous searches.

Users with Google accounts have the opportunity to save every search they’ve ever made. That leads to what Sep Kamvar, Google’s technical lead for personalization, calls “query disambiguation.” For example, if someone’s interested in computers and regularly searches for “Apple,” she’s more interested in the company than the fruit. Archived data gives Google the power to make recommendations through a browser toolbar button, an iGoogle tab, or the Web History page.

The wealth of search-related information being stored in Google’s databases has opened up privacy concerns, which have kept competitors like Yahoo from diving deeper into personalization. Google argues that transparency in how it uses historical search data is the key to avoiding user backlash. “If we use something that you search, we want to show it to you and allow you to change it,” Kamvar says.

Personalization can work in business environments, too. For example, administrators of Vivisimo’s search products can assign higher value to HR documents for recruiters than, say, salespeople. “One of the advantages on an intranet is that people don’t need to be anonymous,” says Mike Moran, a distinguished engineer and product manager for IBM’s OmniFind search platform, which comes in four editions: enterprise, analytics, one that adds contextual links to results, and a free version co-sponsored by Yahoo. Graeme McCracken, COO of the search unit at publisher Reed Business, notes that personalization works best for frequent users, not the occasional Web site visitor.

SOCIAL SKILLS
From the earliest days of the Web, search has had a social aspect. Yahoo started as a list of links to sites the company’s founders thought were interesting. Google’s PageRank algorithm is based partially on the number of links one page has to others on the Web. With the onset of Web 2.0, search engines are pushing social search further with concepts like social bookmarking and tagging as well as shared searches and search systems that get better as more people use them. Part of Yahoo’s strategy will be to differentiate by way of social features, says Yahoo search VP Tim Mayer. Yahoo Answers–human answers to Web queries–recently began showing up alongside regular search results. Yahoo’s acquisition of social bookmarking site Deli.cio.us points to more possibilities, such as making social bookmarking a standard feature on Yahoo.

Microsoft, which already has a shared search feature called Collections that lets people share annotated maps, is looking at ways to implement visual, user-generated “tag clouds” on a Web scale, says Satya Nadella, senior VP of search and advertising. Enterprise search company Vivisimo is testing a feature that lets employees tag, rate, categorize, and annotate search results. Connectbeam sells tagging and social bookmarking technology as a layer on top of other enterprise search products.

Tag clouds and social bookmarks have their limitations. Too many tags lower search reliability, while too few can result in vast buckets of related information, says Autonomy CEO Mike Lynch. Google senior engineer Matt Cutts, who heads up the company’s Web spam team, says tagging and social bookmarking are massive targets for spammers and search engine optimization abusers.

Nevertheless, Google’s forging ahead with social search. On iGoogle, “magic tabs” present a menu of gadgets and feeds deemed relevant to a search query–the word “travel,” for example–based on the tabs other Googlers have created. “I love this algorithm because it gives you gadgets that don’t have the word ‘travel’ in them but are clearly useful,” Kamvar says.

Collarity takes the concept a step further with “collaborative filtering” in its Relevance Engine, used by FoxNews.com. If someone searches “Iraq” on the site, the engine gives a list of recommended links and ads based on what other people have done after searching for Iraq. However, only the users who spend a lot of time on Iraq news have their browsing habits entered into the recommendation engine because they represent people with a high level of interest and, presumably, more knowledge. “We think what’s important is to find the people in the room with the most intelligence reflecting the question you have,” says Rob Rustad, Collarity’s director of marketing.

RESULTS ORIENTED
“Who said an edit box and 10 blue links is what search is?” asks Microsoft’s Nadella. It’s a good question, but one that becomes less relevant in the new world of search. Search results are being expressed in new ways, from automated clustering and categorization to actual answers to questions. Type “Seattle traffic” in Microsoft’s Live Search and a map pops up with highways color coded by how fast traffic is moving. Likewise, type “Abraham Lincoln’s birthday” in Google and the first result shows the actual date–Feb. 12, 1809–above a list of related URLs.

Vivisimo, which also runs a consumer search engine called Clusty, reads through the text of Web pages and creates categories on the fly from the top 200 returned documents by using semantic understanding. Vivisimo’s Clustering Engine determines that concepts such as “pretty” and “gorgeous” are related, then groups results into categories based on such commonalities. “Themes help people contextualize data and give them some kind of framework for how information is organized,” says Rebecca Thompson, the company’s VP of marketing.

Computer-generated clustering is especially important in business environments, where users can’t rely on how popular a site is to provide a sense of relevance. Like Vivisimo, Endeca performs automatic categorization, using “guided navigation,” based on the theory that people aren’t always searching for something specific but instead are looking to discover something they don’t explicitly know how to ask for.

Home Depot’s Endeca-powered Web site shows how that works in practice. A search for “fridge” generates buckets like category, price, and brand, each of which can be narrowed. The categories are populated based on metadata about each item. “The future vision is where information summarizes out to the way you want it to look,” says Matt Eichner, Endeca’s VP of strategic development and marketing.

Factiva searches use technology from Fast Search & Transfer to find everything published on blogs and media sites about a brand, categorize coverage as favorable or unfavorable, quantify it, and plot a line graph that shows how perceptions change over time.

Another early example of using a search engine to gather new knowledge is Google Trends, a Google Labs project that will show searchers, say, that interest in Lake Tahoe and skiing spikes about the same time. “What if computers could understand more about the world?” Cutts asks. “If you solve that, you can really understand more about what people are searching for.”

MULTIFACETED
Today’s Web search engines can sift through HTML files, PDFs, Office files, and audio, video, and image metadata. Tomorrow’s engines will search images, audio, and video directly–without relying on metadata–and include them with other results. “You’re not going to see separate systems for audio, video, and text,” says Autonomy CEO Lynch.

Google’s universal search is an early start in this direction, though the relevance models for different data types don’t always gel. Other signs of progress: Autonomy’s technology can detect scene changes and divvy video into searchable chapters. And Autonomy, Sonic Foundry, and Nexidia can all search the voice tracks of video or audio.

Like.com, which sells clothing and accessories, is an example of where image search is heading. A Likeness Search feature at the site gives users sliding scales to designate preferences for color, shape, and pattern. Microsoft and Google have both developed technology that can search for faces.

Still, image search is far from being on an equal footing with text, says IBM’s Moran. People will be adding text tags to images and videos for a long time before search engines get good at looking at pictures and describing them in words.

Yet search innovations keep coming–driven largely by necessity. As petabyte after petabyte of information accumulates on the Web and in corporate databases, the tools for finding what we need must change, too.

Source: Information Week

Advertisements

8 Responses to The Ultimate Search Engine

  1. […] Check it out! While looking through the blogosphere we stumbled on an interesting post today.Here’s a quick excerpt for, and half abandon searches without getting that far, according to Microsoft. By Gartner s estimate, half…, Microsoft, Yahoo, and dozens of search specialists, including those catering to business customers… to Microsoft. By Gartner s estimate, half of potential Web sales are lost because visitors simply can t find what they want. Google, Microsoft, Yahoo, and dozens of search specialists, including those…, such as making social bookmarking a standard feature on Yahoo. Microsoft, which already has […]

  2. […] Check it out! While looking through the blogosphere we stumbled on an interesting post today.Here’s a quick excerptGoogle senior engineer Matt Cutts, who heads up the company’s Web spam team, says tagging and social bookmarking are massive targets for spammers and search engine optimization abusers. Nevertheless, Google’s forging ahead with social … […]

  3. […] TechBizWatch had an interesting post today on their blog about “The Ultimate Search Engine”. […]

  4. […] longscorner wrote an interesting post today on The Ultimate Search EngineHere’s a quick excerptHome Depot’s Endeca-powered Web site shows how that works in practice. A search for “fridge” generates buckets like category, price, and brand, each of which can be narrowed. The categories are populated based on metadata about each … […]

  5. […] longscorner wrote an interesting post today on The Ultimate Search EngineHere’s a quick excerptGoogle senior engineer Matt Cutts, who heads up the company’s Web spam team, says tagging and social bookmarking are massive targets for spammers and search engine optimization abusers. Nevertheless, Google’s forging ahead with social … […]

  6. […] Read the rest of this great post here […]

  7. […] Read the rest of this great post here […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: