Microsoft just announced the open source of a key algorithm behind Bing search-SPTAG, which enables Bing to quickly return search results to users.
Only a few years ago, web searches were simple, with users entering a few keywords and browsing the results page. Today, the same user may take a picture on a phone and put it in the search box, or use a smart assistant to ask a question without touching the device in person. They may also enter a question and expect an actual response instead of a page list of possible answers.
SPTAG (Space Partition Tree And Graph) is a distributed approximate nearest neighbor search (ANN) library that provides high-quality vector index construction, search, and a distributed online service toolkit for large-scale vector search scenarios. Using the SPTAG algorithm as the core of the open source Python library, Bing was able to search billions of messages in milliseconds.
Of course, vector search itself is not a new idea, what Microsoft has done is apply this concept to deep learning models.
First, the team takes a pre-trained model and encodes the data into vectors, where each vector represents a word or pixel. Vector indices are then generated using the new SPTAG library. As the query information enters, the deep learning model converts the text or image into a vector, and the library can then find the most relevant vector in the index.
Microsoft states that the SPTAG library has cataloged more than 150 billion pieces of data to date, including single words, characters, web code snippets, and full queries.
“Bing processes billions of documents every day. The idea now is to represent these entries as vectors and search this huge index of more than 100 billion vectors to find the most relevant results in 5 milliseconds.”
The Bing team expects that open source SPTAG can be used to build applications that recognize languages based on audio clips, or services that allow users to take pictures of plants and identify genera and species.
The library is now open and provides all the tools to build and search these distributed vector indexes.