Ramiel Tutorial (People’s Interests)

In this tutorial we are going to create a similarity search system that is going to find people with similar interests and characteristics. If you want to follow the tutorial step by step you can download the example data file in this link.

Angel Description

Ramiel is a similarity search engine that provides a type of query that is not available in traditional databases: it can search for the closest k elements given an arbitrary similarity criteria. Our technology is the fastest available. Ramiel provides outstanding performance and ease of use. Our engine is a natural choice for those who need to handle large datasets, the goldmines of our time.

 


File Specification:

REAL Real values.
NOMINAL String values or what you would consider an ENUM.
MULTI_ENGLISH Free form text. The text is english language.
MULTI_SPANISH Multiple NOMINAL values separated by space. The text is spanish language.
IGNORE The column shall be ignored by the program.
ID Your internal object ID, useful when returning results.
META You can store arbitrary information in a string. Currently the meta data is a reserved
column type.

 

The file should consider these specifications:

  • The file should be in tsv format, which is a tab separated values = ‘\t’
  • We expect to have a header with the same column names that were specified.
  • We are going to read all files in the folder, all of them should follow this format.
  • TSV Quote Character = ‘ ” ‘
  • TSV Line End = ‘\n’
  • TSV Escape Character= ‘\’

Example:

This file contains data about people information, such as age, occupation, among others. The table shows the first fifteen lines of the file.

IDENTIFICATION AGE EDUCATION OCCUPATION FRUITS
94862 54 Bachelors Sales Cape Gooseberry Pepino Wax jambu Water Apple Dead Man’s Fingers Indian prune Damson plum
80793 75 1st-4th Machine-op-inspct Pewa Duku Imbe Yellow Granadilla Voavanga Mamoncillo Barbados Cherry Atemoya Indian almond
75027 21 11th Handlers-cleaners Black Mulberry Mamey Yellow Mombin Vanilla Pulasan Jaboticaba Huito Maypop Wax jambu
55847 16 Masters Adm-clerical Agave Canistel Malabar plum Rough Shell Macadamia Ceylon gooseberry Strawberry Pear
69035 33 10th Protective-serv Naranjilla Genip Durian Noni Mangosteen Pupunha
74878 67 Assoc-voc Machine-op-inspct Sapodilla Acerola Wood Apple African cherry orange Guava Nance Sea Grape Huito
28219 68 Doctorate Craft-repair Jelly Plum Bilimbi Madrono CamuCamu Damson plum Nagami Kumquat Kiwifruit Pecan Tangerine
41853 58 Doctorate Priv-house-serv Key lime Avocado Tangerine Indian gooseberry Summer squash Melon pear Coconut Tangerine Betel Nut
40302 89 5th-6th Adm-clerical Huito Pepino Guanabana Chenet Youngberry Mango Pupunha
64184 77 Doctorate Transport-moving Lucuma Oil Palm White Sapote Ilama Biribi
63402 84 Bachelors Armed-Forces Sweet Granadilla Soursop Canistel Mountain Soursop Winged Bean Bignay Kei apple Grumichama Kepel fruit
60921 81 Prof-school Craft-repair Jelly Plum Soursop Maypop Carambola Barbadine Purple Mombin
60388 69 1st-4th Transport-moving Bignay Ice Cream Bean Chayote Chempedak Rose Apple Surinam Cherry Abiu Hairless rambutan Avocado Kepel fruit
53179 59 10th Craft-repair Kwai Muk Breadnut Pummelo Vanilla Agave Otaheite gooseberry Spanish lime Chupa-Chupa
88517 42 Some-college Exec-managerial Nagami Kumquat Indian jujube Watermelon Purple Guava Salak Rough Shell Macadamia Capulin Cherry Yellow Granadilla Grapes

This file has five specific columns. The first column is the person’s ID. The second column is the person’s age. The third column specify the education grade of the person. The forth column is the occupation and finally the fifth column names some fruits that the person likes.

The purpose of this tutorial is to find people with similar interests and characteristics.


Columns Specs:

Our cloud shows the next specs colums and types, but this is a recommendation, you can choose a different types for each column depending on your interests:

IDENTIFICATION ID
AGE REAL
EDUCATION NOMINAL
OCCUPATION NOMINAL
FRUITS MULTI_ENGLISH

Angel Parameters Specification:

These are the parameters needed for the angel creation:

Storage Units Specify the angel unit size reserved for creation.
Parallelism Specify the number of replications for the angel that you want to create.
Ramiel K Specify the number of results for the nearest neighbor search.
Pivots The number of primary search points in the engine.
Probability Minimum accepted probability for the results, any result with lower probability will be discarded.
Accepted Error Accepted search error from the distance calculated by the engine and the real distance.

  • Create Folder

    • Click on “Create Folder” to create a container for your csv, tsv or json files that our similarity engine will search.

    • Provide the folder name and click on “Create Folder”.

    • Once the folder is created you will return to a folder list view.

  • Upload File(s)

    • In the “Folder” that you created click on “Upload File” to see the next modal.

    • After choosing your files click on “Upload File”.

    • You can see the progress bar while the file is being uploaded.

    • Once the files are uploaded you will return to a folder list view.

  • Create your Angel

    • Go to “Create Angel” section to choose the angel that works for your project.

    • For this example we are going to create a Ramiel. Click “Create” on Ramiel image.

    • The next step is choosing the folder containing the files that you want to use to train the angel. When you choose the folder you can see a preview of the files.

    • If you want to change the type of any column you can do it by choosing an option in the list of types. For Ramiel is required that exists a column with type ID.

      Then click on “Next” to continue the creation.

    • The next step is to fill the Ramiel parameters (default parameters will work fine) and to choose the name for your angel.

      Click on “Create” to start the creation of the angel.

    • Once the creation started you can see a table with your current angels and the progress of the creation. When the state is running the angel is ready to answer queries.

  • Query Your Angel

    • For the query you have two options, Execute Query and Batch Query, both options can be accessed from “Your Angels” screen.

    • Execute Query

      Provide the values for the object that you want to query and then click on “Execute Query”.

    • Other option is to choose a folder containing the files that you want to use to query the angel.

    • Then you can click on “Fill Query Fields” to choose a row from the file to fill the query object. In this example we fill the fields with the fifth row showed in the preview of the data.

    • Now you only need to click on “Execute Query” to obtain your result. The result contains a list of the most similar objects identified by their IDs and showing their respective distances to the query object.

    • Batch Query

      First choose a folder containing the files that you want to use to query the angel, and click on “Execute Query”, this is going to create a batch process.

    • Once the batch query has started you can see a table with your current batch files and the progress of the execution. When the state is completed the batch is ready for download.

Manfred CalvoRamiel Tutorial People’s Interests