A classifier is a machine learning algorithm that receives an object and attaches “tags” or “labels” to it. Leliel is a classifier built on top of the Ramiel similarity engine and provides unmatched performance and accountability features. Also with this version of the classifier you can execute a list of experiments with differents parameters to found the parameters that increase the prediction score.
|NOMINAL||String values or what you would consider an ENUM.|
|MULTI_ENGLISH||Free form text. The text is english language.|
|MULTI_SPANISH||Free form text. The text is spanish language.|
|IGNORE||The column shall be ignored by the program.|
|ID||Your internal object ID, useful when returning results.|
|META||You can store arbitrary information in a string. Currently the meta data is a reserved
|CLASS||Column that will be classified, we only accept one class by instance.|
The file should consider these specifications:
- The file should be in tsv format, which is a tab separated values = ‘\t’
- We expect to have a header with the same column names that were specified.
- We are going to read all files in the folder, all of them should follow this format.
- TSV Quote Character = ‘ ” ‘
- TSV Line End = ‘\n’
- TSV Escape Character= ‘\’
This file contains data about people information, such as age, occupation, education, marital-status among others. This data was collected by a bank to know if a person earns more or less than 50k per year. The table shows a few lines of the file.
This file has 16 specific columns. The first column is the person ID. The second column is the person age. The third column specify the type of work that the person has. The fourth column specify the education grade of the person. The fifth column is the person marital status. The sixth column specify the ocupation. The seventh column shows the relationship within of the family. The eighth specify the person race. The ninth specify the sex. The tenth column specify the hours per week worked by the person. The eleventh column specify the country of the person and the last column show the earns of the person in the year.
The purpose of this tutorial is predict if a person earns more or less than 50k per year.
Our cloud shows the next specs columns and types, but this is a recommendation, you can choose a different types for each column depending on your interests:
Angel Parameters Specification:
These are the parameters needed for the angel creation:
|Storage Units||Specify the angel unit size reserved for creation.|
|Parallelism||Specify the number of replications for the angel that you want to create.|
|Ramiel K||Specify the number of results for the nearest neighbor search.|
|Pivots||The number of primary search points in the engine.|
|Probability||Minimum accepted probability for the results, any result with lower probability will be discarded.|
|Accepted Error||Accepted search error from the distance calculated by the engine and the real distance.|