simMachines provides a state-of-the-art similarity engine that outperforms other approaches by a large margin in terms of speed and precision while providing the justification behind each prediction. The advances we have made allow similarity to be used in applications previously believed to be unfeasible, paving the way for a variety of unique use cases.
Our technology can use any distance function – metric or non-metric. This allows us to handle data in its native form, significantly improving accuracy and gives us the ability to easily handle both structured or unstructured data.
We use a dynamic dimension reduction technique to identify the variables required to make an accurate prediction. As a result, our engine is able to provide the input variables that most strongly influenced the prediction, providing transparency to the machine learning process. Our tools can be used by themselves or as part of an ensemble, where we can bring transparency to other machine learning models.
In the following experiment, our engine (R-01) is compared to the permutation index (Perm). We created a database of up to 120 million strings of 20 characters each and we used the hamming distance, measuring the time it takes our index to answer a query as the database size increases. The results speaks for themselves:
Our engine is capable of answering queries at very high speeds in large databases with query time remaining virtually constant regardless of size or dimensionality. For more information, please read a white-paper that benchmarks our engine against top similarity search data structures.