Multiple Database Search

With the deluge of data in the new era, many organizations face the problem of a “Silo-Effect”, where data is being stored in different platforms, resulting in standalone databases (silos) with no means to bridge any information between them. This is especially true in Healthcare and any customer facing industry where a patient’s/customer’s credentials are entered into different databases (often with the inclusion of typographical errors), resulting in replication of the record with missing information.

simMachines has addressed this problem by creating an advanced search algorithm that is able to take in databases of various shapes and layouts as inputs, and yet perform matches (record linkage) on a patient/customer being searched for. In addition to cleartext matching, we are also able to perform these searches with confidential data complying to various privacy standards mandated by the respective Organizations/Government Agencies.

Database Comparison Tool Demo:

Privacy of sensitive/confidential data is a prime requirement in industries such as Healthcare, Insurance, Government, etc. Various privacy laws/acts (such as HIPAA, FRCA, ECPA, etc) have been enforced to prohibit the disclosure or misuse of such information.

Safety of our customer’s data is essential to us and we at simMachines have developed Machine Learning techniques that allow fast and flexible searches to be performed on such sensitive data without risking any disclosure. Building on top of our proprietary querying algorithm, we are also able to perform classification and clustering, while adhering to all the privacy laws in place.

The following demo aims to illustrate our database comparison tool’s ability to sift through Medical data and find a specific person’s record from different databases (discern and omit errors and typographical mistakes in their particulars across various platforms) and yet ensure that the search is performed in a HIPAA-Compliant fashion. This is achieved by converting each record into a fixed size binary string. This one-way conversion is designed in such a manner that sensitive information cannot be recreated/retrieved.

Please note that only for the purposes of this demo are we displaying the original text in the query results; in the real application, such information would not be revealed. Moreover, we have disabled the ability search for any text in this demo. To start the demo, please select a predefined search term (record) from the drop down list shown. The record will be converted to bits and the 10 closest records matching the search will be used to predict its class. Matched records are highlighted in “Green”. 

Please visit the link mentioned at the bottom of the page to access the demo. Select a sample record from the drop down list and click “Search”

simMachines Sample Record Search


The top 10 matching records across other databases will be extracted and displayed in a tabular form as shown below. Please note that each of the matching records do not contain exactly identical information as what we have searched for, since some have erroneous data or missing values. However, our database comparison software is able to see through those errors and find a perfect match for the patient being searched for. (All searches shown in this demo are performed in a HIPAA compliant manner.)

simMachines Multiple Database Query

The methodology behind this demo is that each record is stored as a binary string with its human-readable class. Our proprietary similarity engine computes the distance between each “encoded” record and if two binary strings are “similar”, it is inferred that the respective records belonging to the binary strings are also alike. Despite the fact that the user can’t access the original data, the obtained query results will still include the justification of the predicted record class.

Learn more about working with simMachines