Create a new folder for this tutorial and upload the data. If you are unfamiliar with this process, please see the Platform Navigation Tutorial.
From the “Create Angel” window select Sandalphon, then choose the folder in which the data is saved.
Once selected you will see the first few rows of the data. Directly above the data is the columns’ header and a column type selection dropdown.
Next we tell Sandalphon about the structure of our data. This ensures that similarities are identified in the correct way. Sandalphon accepts the following specifications:
|ID||A field which uniquely identifies each object.|
|NOMINAL||Values that do not bear a quantitative relationship with each other (i.e., strings and numbers which represent non-numerical information).|
|MULTI_PLAIN||Multiple NOMINAL values separated by spaces. Non-language specific.|
|MULTI_ENGLISH||Multiple NOMINAL values separated by spaces. The text is English language.|
|MULTI_SPANISH||Multiple NOMINAL values separated by spaces. The text is Spanish language.|
|MULTI_JAPANESE||Multiple NOMINAL values separated by spaces. The text is Japanese language.|
|ITEM_SET||A series of values with weights. (Formatted as item1:weight1;item2:weight2;
|IGNORE||The column shall be ignored by the program.|
|META||This column is for metadata and shall be ignored by the program, but information will be retained in the output.|
Giving Sandalphon the correct specifications to work with is critical for ensuring that you get ideal results. Let’s look at a sample of the data and use it to assign the correct specifications.
|05079932||8/28/2012 12:00:00 AM||SEX ABUSE||OTHERS||1A||FOURTH||2||003200 1|
|07156111||9/18/2012 12:00:00 AM||HOMICIDE||GUN||8A||SEVENTH||28||007504 1|
|08075756||9/21/2012 12:00:00 AM||HOMICIDE||GUN||8A||SEVENTH||28||007503 1|
|08254628||10/6/2012 12:00:00 AM||SEX ABUSE||OTHERS||2A||SECOND||5||005600 3|
|09074624||4/25/2012 12:00:00 AM||SEX ABUSE||OTHERS||6C||FIRST||25||010600 2|
|10123633||2/29/2012 12:00:00 AM||SEX ABUSE||OTHERS||6C||FIFTH||25||010600 1|
|10146732||6/8/2012 12:00:00 AM||SEX ABUSE||OTHERS||3C||SECOND||15||000501 2|
|11102619||5/14/2012 12:00:00 AM||HOMICIDE||GUN||7F||SIXTH||32||007703 3|
|11141272||6/25/2012 12:00:00 AM||HOMICIDE||OTHERS||8B||SEVENTH||36||007502 2|
|11142230||8/23/2012 12:00:00 AM||SEX ABUSE||OTHERS||7F||SIXTH||32||007703 1|
|11158196||1/5/2012 12:00:00 AM||HOMICIDE||OTHERS||7D||SIXTH||29||009601 1|
|11190860||1/1/2012 12:00:00 AM||HOMICIDE||OTHERS||7D||SIXTH||30||007803 1|
|12000005||1/1/2012 12:10:00 AM||THEFT F/AUTO||OTHERS||1A||THIRD||2||003200 3|
|12000041||1/1/2012 12:58:00 AM||ROBBERY||KNIFE||5D||FIFTH||23||008803 1|
|12000056||1/1/2012 12:20:00 AM||ASSAULT W/DANGEROUS WEAPON||OTHERS||6D||FIRST||27||007200 1|
This file contains data about criminal incidents reported in Washington DC over a two year period. This table shows the first fifteen lines of the file.
Each row of this file corresponds to a criminal incident. Each entry is comprised of 8 columns:
- 1.CCN Criminal Complaint Number. It is a unique numerical code assigned to the event. As we are creating a visualization we do not need to assign an ID field, so we will set CCN to IGNORE.
- 2. REPORTDATETIME The date and time. As we are visualizing the location, method, and type of crime, we will IGNORE this field.
- 3. OFFENSE The type of crime which was committed. This is a single string (text) value which represents information of a classification, so we will label it NOMINAL.
- 4. METHOD What type of weapon (if any) was used in the crime. Like OFFENSE, this is NOMINAL.
- 5. ANC The Advisory Neighborhood Commission zone in which the crime occurred. We will use DISTRICT as our means of grouping crimes by their location, but this would be an acceptable alternative. IGNORE.
- 6. DISTRICT The district in which the crime occurred. This is a single string (text) value which represents information of a classification, so we will label it NOMINAL
- 7. NEIGHBORHOODCLUSTER Another method of classifying the location of the crime. IGNORE.
- 8. BLOCK_GROUP Another method of classifying the location of the crime. IGNORE.
Angel Specifications and Creation
Angels also accept specifications, which can modify their behavior and access to server resources. For the purposes of this tutorial, and for most circumstances, the default parameters are fine, but let’s go over the options.
|Storage Units||Specifies the amount of memory devoted by the server to this Angel. Larger files or Angels with more strict search parameters may require additional memory. Each unit is 2 GB. (Range 1 to 6, default 1)|
|Parallelism||Specifies the number of servers redundantly running the Angel. (Default 2)|
|Ramiel K||Specify the number of results for the nearest neighbor search. (Default 10)|
|Pivots||The number of primary search points in the engine. (Range 256 to 1024, default 256)|
|Probability||Minimum accepted probability for the results, any result with lower probability will be discarded.(Range 0 to 1, default .95)|
|Accepted Error||Maximum accepted difference in distance between returned objects and the query object. (Minimum 1, default 1.2)|
|Sandalphon Range||Maximum accepted distance between the center of a cluster and a given element. (Range 0 to 1, default .2)|
|Sandalphon Iterations||Number of passes taken by Sandalphon to identify new cluster centers. (Minimum 1, default 3)|
|Sandalphon Percentage||Proportion of elements used to identify cluster centers for each pass. (Range 0 to 1, default .5)|
Once all desired specifications have been entered and you have given your Angel a name, click “Create”.
You will be taken to the “Your Angels” page where you can see the status of the Angels that you have created. Once your Angel’s status is “COMPLETED” it is ready to generate visualizations. Depending on the file size it may take a few seconds to a few minutes for an Angel to complete initialization.
You now have a functional Sandalphon Angel which has identified clusters in the data and is ready to create visualizations. Let’s try it out!
Sandalphon outputs information in several ways: A table with each datapoint assigned to a cluster can be viewed through the simMachines platform or downloaded as a csv file, or the data can be viewed as a visualization depicting the identified clusters. Both options are accessed from the “Angel Actions” dropdown on the “Your Angels” screen.
Downloaded results provide all data points with a “clusterID” field denoting their assigned cluster.
To view existing visualizations, or create new ones, select “Visualizations” from the “Angel Actions” dropdown on the “Your Angels” screen.
A visualization is produced when a Sandalphon Angel is first generated. This visualization includes all data and clusters. Additional visualizations can be created using only selected fields, or with varied visualization settings.
New visualizations can be produced at the bottom of the page. Columns can be turned off by toggling “USE” to “IGNORE”.
Click “View Visualization” to see a visualization of the data. There are three views: Circular Graph, Sunburst, and Cluster List.
The outermost segments of a Circular Graph visualization represent a cluster composed of the elements of that segment and the segments nearer the center to which it is attached.
Here we see that there is a large cluster of thefts without a knife or gun occurring in district two. We can mouse over each cluster to see its details, or click on it to be brought to a chart detailing the cluster’s elements. Let’s select a cluster with overlapping elements (these will be the ones extending into additional rings).
This graph shows a cluster of sex abuse crimes occurring primarily in the seventh district without a weapon, but also shows that there are shared traits with crimes committed with knives and with crimes in the first district.
A list of the elements in the cluster, with a measure of distance between the element and the center of the cluster, on a scale of 0 to 1, is at the bottom of the page.
A Sunburst Graph is similar in form to the Circular Graph, but has the ability to zoom into a keyword and show which clusters are associated with it. To switch to a Sunburst Graph, click “Sunburst Graph” on the top right of the “Visualization” page.
As with the Circular Graph, the outermost ring of the Sunburst Graph represents the clusters.
Click into one of the inner rings to only see the clusters associated with that keyword.
A list of all clusters can be seen by clicking the “All Clusters” button at the top right of the page. Selecting “View” on a listed cluster will bring you to the same chart accessed by clicking a cluster in the visualization.