A classifier is a machine learning algorithm that receives an object and attaches “tags” or “labels” to it. Leliel is a classifier built on top of the Ramiel similarity engine and provides unmatched performance and accountability features.
|ID||A mandatory field which uniquely identifies each object.|
|CLASS||A mandatory field which specifies the field to be classified.|
|NOMINAL||Values that do not bear a quantitative relationship with each other (i.e., strings and numbers which represent non-numerical information).|
|MULTI_PLAIN||Multiple NOMINAL values separated by spaces. Non-language specific.|
|MULTI_ENGLISH||Multiple NOMINAL values separated by spaces. The text is English language.|
|MULTI_SPANISH||Multiple NOMINAL values separated by spaces. The text is Spanish language.|
|MULTI_JAPANESE||Multiple NOMINAL values separated by spaces. The text is Japanese language.|
|ITEM_SET||A series of values with weights. (Formatted as item1:weight1;item2:weight2;
|IGNORE||The column shall be ignored by the program.|
|META||This column is for metadata and shall be ignored by the program, but information will be retained in the output.|
The file should consider these specifications:
- The file should be in tsv format, which is a tab separated values = ‘\t’
- We expect to have a header with the same column names that were specified.
- We are going to read all files in the folder, all of them should follow this format.
- TSV Quote Character = ‘ ” ‘
- TSV Line End = ‘\n’
- TSV Escape Character= ‘\’
This file contains data about crime incidents that took place in the year 2012. The image shows the first fifteen lines of the file.
|05079932||8/28/2012 12:00:00 AM||SEX ABUSE||OTHERS||1A||FOURTH||2||003200 1|
|07156111||9/18/2012 12:00:00 AM||HOMICIDE||GUN||8A||SEVENTH||28||007504 1|
|08075756||9/21/2012 12:00:00 AM||HOMICIDE||GUN||8A||SEVENTH||28||007503 1|
|08254628||10/6/2012 12:00:00 AM||SEX ABUSE||OTHERS||2A||SECOND||5||005600 3|
|09074624||4/25/2012 12:00:00 AM||SEX ABUSE||OTHERS||6C||FIRST||25||010600 2|
|10123633||2/29/2012 12:00:00 AM||SEX ABUSE||OTHERS||6C||FIFTH||25||010600 1|
|10146732||6/8/2012 12:00:00 AM||SEX ABUSE||OTHERS||3C||SECOND||15||000501 2|
|11102619||5/14/2012 12:00:00 AM||HOMICIDE||GUN||7F||SIXTH||32||007703 3|
|11141272||6/25/2012 12:00:00 AM||HOMICIDE||OTHERS||8B||SEVENTH||36||007502 2|
|11142230||8/23/2012 12:00:00 AM||SEX ABUSE||OTHERS||7F||SIXTH||32||007703 1|
|11158196||1/5/2012 12:00:00 AM||HOMICIDE||OTHERS||7D||SIXTH||29||009601 1|
|11190860||1/1/2012 12:00:00 AM||HOMICIDE||OTHERS||7D||SIXTH||30||007803 1|
|12000005||1/1/2012 12:10:00 AM||THEFT F/AUTO||OTHERS||1A||THIRD||2||003200 3|
|12000041||1/1/2012 12:58:00 AM||ROBBERY||KNIFE||5D||FIFTH||23||008803 1|
|12000056||1/1/2012 12:20:00 AM||ASSAULT W/DANGEROUS WEAPON||OTHERS||6D||FIRST||27||007200 1|
This file has eight specific columns. The first column is the crime ID. The second column is the time in which the crime was reported. The third column specify the offense that the criminal committed. The forth column describes which method was used by the criminal to commit the crime and the rest of the columns contains information about the city where it happened.
The purpose of this tutorial is to predict which was the offense that the criminal committed.
Our cloud shows the next specs columns and types, but this is a recommendation, you can choose a different types for each column depending on your interests:
Angel Parameters Specification:
These are the parameters needed for the angel creation:
|Storage Units||Specify the angel unit size reserved for creation.|
|Parallelism||Specify the number of replications for the angel that you want to create.|
|Ramiel K||Specify the number of results for the nearest neighbor search.|
|Pivots||The number of primary search points in the engine.|
|Probability||Minimum accepted probability for the results, any result with lower probability will be discarded.|
|Accepted Error||Accepted search error from the distance calculated by the engine and the real distance.|