Ramiel is a similarity search engine that provides a type of query that is not available in traditional databases: it can search for the closest k elements given an arbitrary similarity criteria. Our technology is the fastest available. Ramiel provides outstanding performance and ease of use. Our engine is a natural choice for those who require to handle large datasets, the goldmines of our times.
|NOMINAL||String values or what you would consider an ENUM.|
|MULTI_ENGLISH||Free form text. The text is english language.|
|MULTI_SPANISH||Multiple NOMINAL values separated by space. The text is spanish language.||MULTI_JAPANESE||Multiple NOMINAL values separated by spaces. The text is Japanese language.|
|ITEM_SET||A series of values with weights. (Formatted as item1:weight1;item2:weight2;
|IGNORE||The column shall be ignored by the program.|
|ID||Your internal object ID, useful when returning results.|
|META||You can store arbitrary information in a string. Currently the meta data is a reserved
The file should consider these specifications:
- The file should be in tsv format, which is a tab separated values = ‘\t’
- We expect to have a header with the same column names that were specified.
- We are going to read all files in the folder, all of them should follow this format.
- TSV Quote Character = ‘ ” ‘
- TSV Line End = ‘\n’
- TSV Escape Character= ‘\’
This file contains data about people information, such as age, occupation, education, marital-status among others. This data was collected by a bank to know if a person earns more or less than 50k per year. The table shows a few lines of the file.
This file has 16 specific columns. The first column is the person ID. The second column is the person age. The third column specify the type of work that the person has. The fourth column specify the education grade of the person. The fifth column is the person marital status. The sixth column specify the ocupation. The seventh column shows the relationship within of the family. The eighth specify the person race. The ninth specify the sex. The tenth column specify the hours per week worked by the person. The eleventh column specify the country of the person and the last column show the earns of the person in the year.
The purpose of this tutorial is to find people with similar characteristics.
Our cloud shows the next specs colums and types, but this is a recommendation, you can choose different types for each column depending on your interests:
Angel Parameters Specification:
These are the parameters needed for the angel creation:
|Storage Units||Specify the angel unit size reserved for creation.|
|Parallelism||Specify the number of replications for the angel that you want to create.|
|Ramiel K||Specify the number of results for the nearest neighbor search.|
|Pivots||The number of primary search points in the engine.|
|Probability||Minimum accepted probability for the results, any result with lower probability will be discarded.|
|Accepted Error||Accepted search error from the distance calculated by the engine and the real distance.|