Leliel Tutorial (Criminal Incidents)

In this tutorial we are going to create a classifier system that is going to predict the method used in a crime based on previous crimes. If you want to follow the tutorial step by step you can download the example data file in this link.

Angel Description

A classifier is a machine learning algorithm that receives an object and attaches “tags” or “labels” to it. Leliel is a classifier built on top of the Ramiel similarity engine and provides unmatched performance and accountability features.


File Specification:

ID A mandatory field which uniquely identifies each object.
CLASS A mandatory field which specifies the field to be classified.
REAL Numerical values.
NOMINAL Values that do not bear a quantitative relationship with each other (i.e., strings and numbers which represent non-numerical information).
MULTI_PLAIN Multiple NOMINAL values separated by spaces. Non-language specific.
MULTI_ENGLISH Multiple NOMINAL values separated by spaces. The text is English language.
MULTI_SPANISH Multiple NOMINAL values separated by spaces. The text is Spanish language.
MULTI_JAPANESE Multiple NOMINAL values separated by spaces. The text is Japanese language.
ITEM_SET A series of values with weights. (Formatted as item1:weight1;item2:weight2;item3:weight3)
IGNORE The column shall be ignored by the program.
META This column is for metadata and shall be ignored by the program, but information will be retained in the output.

 

The file should consider these specifications:

  • The file should be in tsv format, which is a tab separated values = ‘\t’
  • We expect to have a header with the same column names that were specified.
  • We are going to read all files in the folder, all of them should follow this format.
  • TSV Quote Character = ‘ ” ‘
  • TSV Line End = ‘\n’
  • TSV Escape Character= ‘\’

Example:

This file contains data about crime incidents that took place in the year 2012. The image shows the first fifteen lines of the file.

CCN REPORTDATETIME OFFENSE METHOD ANC DISTRICT NEIGHBORHOODCLUSTER BLOCK_GROUP
05079932 8/28/2012 12:00:00 AM SEX ABUSE OTHERS 1A FOURTH 2 003200 1
07156111 9/18/2012 12:00:00 AM HOMICIDE GUN 8A SEVENTH 28 007504 1
08075756 9/21/2012 12:00:00 AM HOMICIDE GUN 8A SEVENTH 28 007503 1
08254628 10/6/2012 12:00:00 AM SEX ABUSE OTHERS 2A SECOND 5 005600 3
09074624 4/25/2012 12:00:00 AM SEX ABUSE OTHERS 6C FIRST 25 010600 2
10123633 2/29/2012 12:00:00 AM SEX ABUSE OTHERS 6C FIFTH 25 010600 1
10146732 6/8/2012 12:00:00 AM SEX ABUSE OTHERS 3C SECOND 15 000501 2
11102619 5/14/2012 12:00:00 AM HOMICIDE GUN 7F SIXTH 32 007703 3
11141272 6/25/2012 12:00:00 AM HOMICIDE OTHERS 8B SEVENTH 36 007502 2
11142230 8/23/2012 12:00:00 AM SEX ABUSE OTHERS 7F SIXTH 32 007703 1
11158196 1/5/2012 12:00:00 AM HOMICIDE OTHERS 7D SIXTH 29 009601 1
11190860 1/1/2012 12:00:00 AM HOMICIDE OTHERS 7D SIXTH 30 007803 1
12000005 1/1/2012 12:10:00 AM THEFT F/AUTO OTHERS 1A THIRD 2 003200 3
12000041 1/1/2012 12:58:00 AM ROBBERY KNIFE 5D FIFTH 23 008803 1
12000056 1/1/2012 12:20:00 AM ASSAULT W/DANGEROUS WEAPON OTHERS 6D FIRST 27 007200 1

 

This file has eight specific columns. The first column is the crime ID. The second column is the time in which the crime was reported. The third column specify the offense that the criminal committed. The forth column describes which method was used by the criminal to commit the crime and the rest of the columns contains information about the city where it happened.

The purpose of this tutorial is to predict which was the offense that the criminal committed.


Columns Specs:

Our cloud shows the next specs columns and types, but this is a recommendation, you can choose a different types for each column depending on your interests:

CNN ID
REPORTDATETIME IGNORE
OFFENSE CLASS
METHOD NOMINAL
ANC NOMINAL
DISTRICT NOMINAL
NEIGHBORHOODCLUSTER NOMINAL
BLOCK_GROUP NOMINAL

Angel Parameters Specification:

These are the parameters needed for the angel creation:

Storage Units Specify the angel unit size reserved for creation.
Parallelism Specify the number of replications for the angel that you want to create.
Ramiel K Specify the number of results for the nearest neighbor search.
Pivots The number of primary search points in the engine.
Probability Minimum accepted probability for the results, any result with lower probability will be discarded.
Accepted Error Accepted search error from the distance calculated by the engine and the real distance.

  • Create Folder

    • Click on “Create Folder” to create a container for your csv, tsv or json files that our similarity engine will search.

    • Provide the folder name and click on “Create Folder”.

    • Once the folder is created you will return to a folder list view.

  • Upload File(s)

    • In the “Folder” that you created click on “Upload File” to see the next modal.

    • After choosing your files click on “Upload File”.

    • You can see the progress bar while the file is being uploaded.

    • Once the files are uploaded you will return to a folder list view.

  • Create your Angel

    • Go to “Create Angel” section to choose the angel that works for your project.

    • For this example we are going to create a Leliel. Click “Create” on Leliel image.

    • The next step is choosing the folder containing the files that you want to use to train the angel. When you choose the folder you can see a preview of the files.

    • If you want to change the type of any column you can do it by choosing an option in the list of types. For Leliel is required that exists a column with type ID and a column with type CLASS.

      Then click on “Next” to continue the creation.

    • The next step is to fill the Leliel parameters (default parameters will work fine) and to choose the name for your angel.

      Click on “Create” to start the creation of the angel.

    • Once the creation started you can see a table with your current angels and the progress of the creation. When the state is running the angel is ready to answer queries.

  • Query Your Angel

    • For the query you have two options, Execute Query and Batch Query, both options can be accessed from “Your Angels” screen.

    • Execute Query

      Provide the values for the object that you want to query and then click on “Execute Query”.

    • Other option is to choose a folder containing the files that you want to use to query the angel.

    • Then you can click on “Fill Query Fields” to choose a row from the file to fill the query object. In this example we fill the fields with the fifth row showed in the preview of the data.

    • Now you only need to click on “Execute Query” to obtain your result. On the query results, the first value is the class, the second value is a score (higher is better), the third value is the confidence of the result and the last value specify the probability that the class is the correct between the results (the query can return more than one class).

    • Batch Query

      First choose a folder containing the files that you want to use to query the angel, and click on “Execute Query”, this is going to create a batch process.

    • Once the batch query has started you can see a table with your current batch files and the progress of the execution. When the state is completed the batch is ready for download.

Manfred CalvoLeliel Tutorial Criminal Incidents