Gaghiel Tutorial (Social Tags in Web Pages)

In this tutorial we are going to create a recommendation system that is going to recommend similar tags to a page based on tags previously annotated by the users. If you want to follow the tutorial step by step you can download the example data file in this link.

Angel Description

Our recommendation system Gaghiel is based on the Ramiel similarity search engine, and therefore we are able to include several nearest neighbor predictors in the recommendation ensemble. A new level of recommendations is possible!


File Specification:

Our Gaghiel engine is waiting for tsv files with 3 types of columns. These columns are:

ID ID columns represent the key of an object.
ITEMS A list of nominal values separated by → ‘ ; ‘.
WEIGHTS The weight for each item separeted by → ‘ ; ‘.

The file should consider these specifications:

  • The file should be in tsv format, which is a tab separated values = ‘ \t ‘.
  • The items column and weights column are going to have a concatenated list of values. Both lists for items and weights are going to have the same length.
  • The separator of this list is going to be a semicolon = ‘ ; ‘.
  • We do not expect to have a header.
  • The IDs can be nominal or numeric values.
  • Item values could be nominal or numeric values.
  • The weights must be numeric values (we are going to parse them as double).
  • We are going to read all the files in the folder, all of them should follow this format.
  • TSV Quote Character = ‘ ” ‘.
  • TSV Line End = ‘ \n ‘.
  • TSV Escape Character= ‘ \ ‘.

Example:

This file contains 144,574 unique URLs, all of them with their corresponding social tags. This set of documents is annotated with 67,104 different tags.

ID ITEMS WEIGHTS
http://lists.osafoundation.org/pipermail/dev/2003-March/000479.html standards;python;programming;style;guidelines;conventions;reference;coding 11;27;14;8;2;3;4;7
http://linuxappfinder.com/alternatives ubuntu,;application;alternative;opensource 2;2;2;13
http://live.gnome.org/Tomboy tomboy;software;gnome;notes;wiki;linux 5;3;4;2;3;3
http://lisahistory.net/wordpress/ teaching;web2.0;blogs 4;2;4
http://literature-map.com/ flash;reading;writing;authors;author;tools;search;library;recommendations;culture 11;35;10;23;62;11;19;90;48;23
http://littlemewhatever.deviantart.com/ deviantart;inspiration;photography 8;2;5
http://livemobile.blogspot.com/ treo;phone;services;cell;blog;free;cellphone;apps;pda;java 5;23;3;4;8;4;2;15;8;3
http://listverse.com/bizarre/top-10-most-bizarre-videos/ cool;animation;videos;movies;art;list;bizarre;fun;strange;cinema;youtube;humor;weird 7;2;24;10;3;4;20;5;6;3;2;4;20
http://listverse.com/entertainment/top-15-indie-films/ cool;film;movies;lists 2;5;7;2
http://livedocs.adobe.com/flex/2/langref/ flash;development;as3;framework;software;programming;library;actionscript;documentation;api 60;18;56;30;172;96;36;71;24;54
http://litemind.com/getting-to-yes/ summary;book;lifehacks;negotiation;toread;litemind;mindmap;business 2;3;4;6;2;3;4;4
http://lip.sourceforge.net/ctreemap.html programming;tree;layout;treemaps;visualisation;information;linux;interface;geometry;maps;java;design 15;5;3;6;5;10;5;17;6;4;4;23
http://livedocs.adobe.com/labs/air/1/jslr/ javascript;apollo;documentation;api;adobe;air;flex;reference 17;5;6;3;10;20;8;12
http://lisa.sourceforge.net/ agents;clos;rete;opensource;common-lisp;code;agent;lisp;prolog;system;ai;software;programming;lisa 29;11;13;6;6;5;6;91;3;3;50;25;30;18
http://linuxwireless.org/en/users/Drivers/b43#devicefirmware drivers;firmware;wifi;wireless;broadcom;linksys;driver;bcm43xx;ubuntu;linux;b43 7;3;13;13;9;2;2;8;8;16;2
http://lionet.info/asn1c/ opensource;free;software;tools;programming;encoding;api;compilers;c/c++;asn.1;c;xml;c++;compiler;asn1 21;4;14;3;18;2;4;3;15;38;14;5;4;24;12
http://linuxbeans.blogspot.com/2007/10/image-handling-in-seam-apps-part-i-db.html work;seam;tutorial;image;jboss 2;14;8;4;6
http://linuxdevices.com/news/NS5410498949.html tiny;linux;embedded 2;5;2

 

This file has three specific columns. The first column is the URL direction. The second column is a semi-colon separated list of social tags contained in the page. And the last column is also a semi-colon separated list of the number of users who annotated the tag.

Note: the list in the second and last column have to be of the same length, this is because every item in the second column(tags) needs to have an associated weight in the last column(number of users who annotated the tag).

The purpose of this tutorial is to recommend similar tags to a page based on tags previously annotated by the users.


Angel Parameters Specification:

These are the parameters needed for the angel creation:

Storage Units Specify the angel unit size reserved for creation.
Parallelism Specify the number of replications for the angel that you want to create.
Ramiel K Specify the number of results for the nearest neighbor search.
Pivots The number of primary search points in the engine.
Probability Minimum accepted probability for the results, any result with lower probability will be discarded.
Accepted Error Accepted search error from the distance calculated by the engine and the real distance.
Top Recommendations Number of recomendations that Gaghiel is going to return.

  • Create Folder

    • Click on “Create Folder” to create a container for your csv, tsv or json files that our similarity engine will search.

    • Provide the folder name and click on “Create Folder”.

    • Once the folder is created you will return to a folder list view.

  • Upload File(s)

    • In the “Folder” that you created click on “Upload File” to see the next modal.

      p4g

    • After choosing your files click on “Upload File”.

      p5g

    • You can see the progress bar while the file is being uploaded.

      p6g

    • Once the files are uploaded you will return to a folder list view.

      p7g

  • Create your Angel

    • Go to “Create Angel” section to choose the angel that works for your project.

      p8g

    • For this example we are going to create a Gaghiel. Click “Create” on Gaghiel image.

      p9g

    • The next step is choose the folder containing the files that you want to use to train the angel. Choose the folder to see a preview of the files.

    • The next step is to fill the Gaghiel parameters (default parameters will work fine) and to choose the name for your angel.

      p11g

      Click on “Create” to start the creation of the angel.

    • Once the creation started you can see a table with your current angels and the progress of the creation. When the state is running the angel is ready to answer queries.

  • Query Your Angel

    • For the query you have two options, Execute Query and Batch Query, both options can be accessed from “Your Angels”screen.

    • Execute Query

      Provide the values for the object that you want to query (items and weights separated by ‘ ; ‘), then click on “Execute Query”.

    • Other option is to use an object from a file to fill the query parameters, for this choose a folder containing the files that you want to use to query the angel, then click on “Fill Query Fields” and “Execute Query” to obtain your result.

    • Batch Query

      First choose a folder containing the files that you want to use to query the angel, and click on “Execute Query”, this is going to create a batch process.

    • Once the batch query has started you can see a table with your current batch files and the progress of the execution. When the state is completed the batch is ready for download.

Erick AlpizarGaghiel Tutorial Social Tags