Archive for May 1st, 2008

The Future of Search
Part Eight: We Really Can Get To Better Search

Thursday, May 1st, 2008

By Dr. Eric Glover, Searchme’s Classification Architect. Eric is responsible for the design and implementation of Searchme’s categories feature, a seemingly simple tool that springs from an exciting area of artificial intelligence (AI) research and development.

At Searchme, we’re very aware of the difficulties in building a web-scale automatic classification system that is fast and accurate and maps to a deep, dynamic ontology. In fact, in the previous post, we discussed how it was almost impossible.

However, as you may have guessed by now, at Searchme we are using a dynamic ontology to create our “categories” feature. Please feel free give it a try – pick some query, choose a category, and decide for yourself how accurate our classifiers feel.

Here’s how we are able to do what many have long considered impossible:

First, we define our own ontology. This means we can easily adapt it to better match how users search the Web, as well as match what works best from a categorization standpoint. Simply put, if a category doesn’t work, we can change the rules of the game by picking a slightly different definition – one that would have fewer errors.

Second, we use complex models for classification (non-linear SVMs), as well as more complex features (not limited to bag of words). This richer set of features reduces the chance that a document with a few golf terms will be considered “golf”. A simple linear model assigns a fixed weight to the word “eagle”, independent of the context, which increases misclassifications over a non-linear model. However, using non-linear classifiers enables us to learn subtler concepts, such as “eagle” and “flying” makes “eagle” negative with respect to golf, but “eagle” and “birdie” make “eagle” positive for golf.

Third, we’ve incorporated technologies for rapid training. These technologies reduce the amount of data and human effort required to train a classifier, keeping it at a manageable level without sacrificing final accuracy.

All of these factors are integrated into our core production system, which we designed from the ground up with the future of search in mind. Using the ideas of dynamic ontologies, we can be agile when new categories are needed or definitions change, and with our rapid training capabilities, we can adjust in weeks or months as opposed to years.

Conclusion – Part Nine: How We’re Making Search Better Today