Similarity Search
Similarity Search
A similarity search engine allows you to perform "fuzzy" data matching, and pattern recognition tasks such as classification and clustering. It is a kind of "NoSQL database" in which you search with a proximity criteria. Instead of retrieving exact key matches, you define a function that describes the similarity of objects. This allows our database to return the closest k objects for a given query. This fundamental operation opens a wide array of possibilities.
We provide the world's fastest similarity search engine and the know-how necessary to help you exploit the exciting possibilities that this technology brings. We have benchmarked our engine against major techniques and several experts have reviewed our results.
Application Domains
Similarity search (also known as nearest neighbor) can be applied in virtually every data science project as it is a commonly used operation. Two broad application categories exist:
"Fuzzy" Matching
Similarity search data structures let you retrieve objects even when your query does not match exactly a database record. When you search complex information like images, audio, factory test data and stock prices, inherent variations will prevent you from finding an exact match. This is not a problem for our engine as it allows fuzzy matches. We provide a breakthrough technology that allows you to search for similar data in very large data-sets. For example, our R-01 engine is capable of answering a query in 20 milliseconds in a data-set of 120 million objects.
Pattern Recognition / Prediction
By looking at historic data, a similarity engine can help you to classify or predict events by looking at objects that are similar to your query. For example:
- Illegal monetary transactions: by searching for previously known illegal events, it is possible to identify if a transaction has the potential of being inappropriate.
- Product quality assessment: based on historical measurements it is possible to predict if a product is going to fail within a certain amount of time. Our technology supports high-dimensional spaces and therefore many properties can be analyzed at the same time.












