Skip to content

rajivsam/Starspace

Repository files navigation

Starspace Embeddings for Online Retail

Data encountered in business and socio-economic settings often have categorical attributes. These are attributes like gender, stock keeping unit etc.. Unlike continuous quantities like income or height, these attributes cannot be directly used in a geometric sense. You cannot plot them on a graph, for example. There are many methods developed over the years to determine such a representation. The Starspace library from Facebook permits us to determine numeric representations for these attributes. In this example, the Starspace library was used to determine the numerical representation for data from an online retail store: https://archive.ics.uci.edu/ml/datasets/online+retail

Starspace provides several models to learn these representations. Many of the canonical examples arise in natural language processing applications. This example provides the use of Starspace to learn an embedding for retail data. Documentation for Starspace usecases is available at: https://github.com/facebookresearch/StarSpace

In this example, we will use the page space user /page embeddings to determine the representations associated with the invoices. The example provided in the documentation learns the representations for content pages consumed by the user. Pages comprise of words. In the retail example a page maps to the items purchased by the user Starspace can be used to determine a numeric (latent) representation for each item. The representation for a user is simply the average of the items associated with him/her. So the steps involved in determining a user representation are as follows:

  1. Preprocess the data to remove records with missing data and miscellaneous charges like Bank Charges associated with the purchase. We only want the items purchased by the user to capture his/her representation. Artifacts of the purchase like bank charges incurred etc. are not relevant to his/her representation.
  2. Explore the cleaned data and profile how customers purchase items in this online store. The notebook explore_data.ipynb provides these details.
  3. Generate the embeddings for the items using Starspace. To generate the embeddings, we need to prepare the data for use with Starspace. The page embeddings model requires an input file with a line for each user. The line is a space separated list of items for the user. The notebook preprocessing_model_1_retail.ipynb is used to generate this input file.
  4. Determine the representation for each user. The user is represented by the average of all the items from his or her purchases. To compute the user representation we lookup the item representations for each of his/her purchases and average them. The item representations are obtained from the previous step. The notebook generate_user_representations.ipynb provides the details of the process.
  5. After computing the user representations we can visualize the user representations using a technique such as tSNE. This is also shown in generate_user_representations.ipynb.

Releases

No releases published

Packages

No packages published