The Future of Search
Part Eight: We Really Can Get To Better Search

By Dr. Eric Glover, Searchme’s Classification Architect. Eric is responsible for the design and implementation of Searchme’s categories feature, a seemingly simple tool that springs from an exciting area of artificial intelligence (AI) research and development.

At Searchme, we’re very aware of the difficulties in building a web-scale automatic classification system that is fast and accurate and maps to a deep, dynamic ontology. In fact, in the previous post, we discussed how it was almost impossible.

However, as you may have guessed by now, at Searchme we are using a dynamic ontology to create our “categories” feature. Please feel free give it a try – pick some query, choose a category, and decide for yourself how accurate our classifiers feel.

Here’s how we are able to do what many have long considered impossible:

First, we define our own ontology. This means we can easily adapt it to better match how users search the Web, as well as match what works best from a categorization standpoint. Simply put, if a category doesn’t work, we can change the rules of the game by picking a slightly different definition – one that would have fewer errors.

Second, we use complex models for classification (non-linear SVMs), as well as more complex features (not limited to bag of words). This richer set of features reduces the chance that a document with a few golf terms will be considered “golf”. A simple linear model assigns a fixed weight to the word “eagle”, independent of the context, which increases misclassifications over a non-linear model. However, using non-linear classifiers enables us to learn subtler concepts, such as “eagle” and “flying” makes “eagle” negative with respect to golf, but “eagle” and “birdie” make “eagle” positive for golf.

Third, we’ve incorporated technologies for rapid training. These technologies reduce the amount of data and human effort required to train a classifier, keeping it at a manageable level without sacrificing final accuracy.

All of these factors are integrated into our core production system, which we designed from the ground up with the future of search in mind. Using the ideas of dynamic ontologies, we can be agile when new categories are needed or definitions change, and with our rapid training capabilities, we can adjust in weeks or months as opposed to years.

Conclusion – Part Nine: How We’re Making Search Better Today

2 Responses to “The Future of Search
Part Eight: We Really Can Get To Better Search”

  1. Arow Says:

    WOW! “Categories” - I’m liking it! Probably the best search result I’ve ever had when looking for places to eat in Chicago - there are so many that it really does require digital assistance to pick one at times. Now I have a digital tool that can *actually* help! There are so many restaurants in Chicago that different search sites are really necessary - One for fine dining - one for happy hours - I found both of these in the Restaurant Category just searching ‘Chicago.’

    I think you can go one better, though. How about offering a ‘miscategorized page’ link or button on ‘category’ results to help tweak the algos - ain’t nothin better than pure user data, right?

    Thanks for pointing the ‘categories’ function out - I could’ve overlooked it if you hadn’t.

    It’s very well considered that results keep appearing as I keep browsing, too! It’s nice to not have to load a page just for more results. Great work here!

    Thanks!

  2. syboule Says:

    superbe moteur de recherche …
    bravo

Leave a Reply