The Movie Recommender BoosterPack provides a deployable Sample Solution that allows users to observe and study the application of a Collaborative Filtering model to provide recommendations to users based on their past behaviour. The purpose of this document is to describe the Sample Solution and how it demonstrates the use of advanced technologies.
Any acronyms and terms identified with Capitalized Italics in this document are defined in the Glossary.
In today’s world, users have an overwhelming array of options to choose from in practically any online business they interact with. It happens when trying to choose a book, a TV show, a movie, a new electronic device, or even groceries. When users have thousands, hundreds, or even just tens of options to choose from, it is essential that businesses provide tools to help them discover products of their interest quickly.
Automatic recommendation systems powered by machine learning aim to solve this problem and have actually become an essential feature for content providers and retail sites. Recommender Systems use information like users demographics, behaviour, product information, or product ratings provided by users to make predictions about what they will be most interested in at a particular time. Without a recommendation, the user experience is greatly degraded as they are forced to spend too much time searching for items of their interest and are more likely to abandon the task.
This Sample Solution showcases how TensorFlow and TensorRT can be used to build a Collaborative Filtering model for movie recommendations that runs on a GPU.
The solution consists of an end-to-end deep learning collaborative filtering recommendation system from user data.
Collaborative Filtering is a widely used approach to implement recommender systems. Collaborative filtering methods are based on users’ behaviours, activities, or preferences and predict what users will like based on their similarity to other users. A key advantage of the collaborative filtering approach is that it can rely only on observed user behaviour, without requiring extra information from the user or the product, making it easily transferable to different business applications where that information may not be available.
The solution includes the creation of a Deep Learning model developed in Python and TensorFlow and the deployment to a production-like environment leveraging NVIDIA TensorRT library and TensorRT Inference Server. Python is a widely used library in machine learning projects by both beginners and experienced practitioners. TensorFlow and TensorRT are available as part of DAIR Cloud infrastructure and allow the development of medium- to large-scale, high performance deep learning models.
The Deep Learning model is a Multilayer Perceptron as proposed in Neural Collaborative Filtering (He et al. 2017). The problem is implemented as a classification problem and a Neural Network is trained by using movies watched and rated by users as positive examples, and unwatched movies as negative examples.
The model is trained using the Movielens dataset, which is a dataset with movie information and ratings from users that is publicly available for non-commercial purposes and commonly used in tutorials and research projects.
The diagram below illustrates the structure of the Sample Solution.
The solution consists of an orchestrator that runs on the DAIR Cloud platform to coordinate deployments and two Python sub-systems in the DAIR Cloud: an offline pipeline and an online service. The offline component is the implementation of the experimentation phase of a machine learning project. It generates a machine learning model from the data. The online sub-system is the real-time service that makes recommendations to users by making predictions from the model. The offline component is not executed or queried by users in real-time. It may run (maybe several times) before the online service is launched, and periodically after to create new improved models when new data or model types are available. The online component includes a separate client that makes queries to the prediction service. That showcases how a larger system could easily integrate the Sample Solution as a component.
The table below is a summary of the significant components of the Sample Solution:
|Orchestrator||Set of scripts to coordinate the deployment of the rest of the components into the DAIR Cloud service.|
|Data processing pipeline||Collect data from an external data source and process it to make it usable by the Model Trainer (clean up, properly format, etc.).|
|Model Trainer||TensorFlow scripts to create, train and evaluate the machine learning model given the preprocessed data. The outputs are a model file and reports about the model performance.
|Model Exporter||Transform and export the model generated by the Trainer into a TensorRT model.|
|GPU Inference Server||Online TensorRT server to make predictions given inputs using a trained model leveraging a GPU.|
|Light Client||A command-line script that acts as a sample client making online queries to the GPU inference server. Used by users of the recommender system.|
This section guides you through a demonstration of a Recommendation System. Using a Recommendation System is compelling because it provides value to users by narrowing the search of items to those that they could be more interested in.
The demonstration will illustrate how to get recommendations of movies for users that have previously watched and rated other movies.
If you’re a DAIR participant, you can deploy the Sample Solution by following the instructions below for deploying a new instance of the Solution. To complete these steps, you must have the following pre-requisites:
- an account on the CMS,
- an SSH key associated with your account profile AND possession of the private key,
- your account profile configured with Linux Account settings as shown below.
To deploy the Sample Solution, we will create a new GPU instance. First, log into the CMS, navigate through the menus to Provisioning -> Apps and then click the +ADD button.
On the NEW APP – TEMPLATE dialog, select MOVIEREC BLUEPRINT and click the NEXT button.
On the NEW APP – SETUP screen, type a name for your app and select the option DAIR-ATIR GPUs for the fields GROUP and DEFAULT CLOUD. To continue, click NEXT.
On the NEW APP – CONFIGURE dialog, navigate to the OpenStack row on the left menu to configure the instance. Many fields are already filled by the Blueprint. Just type a name for the instance, make sure the cloud is selected to DAIR-ATIR GPUs and select any IP from the list for the field FLOATING IP. Click NEXT.
Finally, review the summary shown in the NEW APP – REVIEW screen and click NEXT.
The CMS will now start the deployment of the Sample Solution, which takes about 6-9 minutes to complete. During deployment, the status icon for the application displays as a rocket ship. After that, when complete, the status icon transitions to a green play button icon.
Finally, if you wish to get a more detailed view of the deployment process or find the floating IP for the app, navigate to Provisioning -> Instances and select the instance that is being deployed. You can view details of the deployment sequence in the HISTORY tab at the bottom of the page. And you can see the floating IP in the LOCATION field of the instance summary table.
Once the deployment is complete, you are ready to use the application. To do so, you need the IP of the app. You can find it on the instance detail page as described in the last step of the previous section.
First, log into the instance through SSH. If you need assistance, you can follow the DAIR Linux Technical Guide.
Once you are logged in, attach to the Docker container named trtclient by running the following command:
sudo docker attach trtclient
If it seems to hang, just click Enter again. You are now in the Docker container that holds the client command-line application. To start the client, run the following command:
The command-line application will show instructions on how to use it. Just type a number (for example, 0) and hit Enter to see movies recommended for a user. Instead of showing information about the user, the program shows some example movies that the user has watched and rated before.
If you want to exit the trtclient container but not kill it, CTRL-p CTRL-q. That will detach from the container, but not kill it, so you can attach to it again later.
When done studying the Sample Solution, you need to terminate the application and release the consumed cloud resources. To do this, log out the instance and follow the steps below.
Log into the CMS. Navigate to Provisioning -> Apps page and select the application you created. On the application details page, click the DELETE button and confirm in any subsequent dialogs.
While the CMS is deleting the application, the instance’s status will be shown as a garbage can on the Provisioning->Instances page. This step typically takes less than 1 minute to complete.
This section describes considerations for usage and adaptation of the Sample Solution.
The Sample Solution is deployed to a TensorRT inference server, but the TensorFlow model could be run directly production without exporting the model and running a TensorRT inference server. However, TensorRT is in general more efficient on GPUs and the predictions are in general faster than other alternatives.
Moreover, in this example, for simplicity, both the client and the inference server run on the same host. But typically, they would run on different hosts exposing the appropriate ports between them.
An alternative to TensorFlow to build and train the model is PyTorch. It is another popular framework for machine learning using Python that is also compatible with the TensorRT inference server. There are many comparatives online regarding when to choose one or the other, some are Awni Hannin: PyTorch or TensorFlow?, TensorFlow or PyTorch: The Force is Strong with Which One?, PyTorch vs. TensorFlow — Spotting the Difference and The Battle: TensorFlow vs Pytorch.
The Sample Solution relies on a dataset that is in this case, implemented as a single text file. There are several considerations regarding the data, the pieces of code that rely on it, how to extend it, and best practices.
The two main components that directly use the dataset are the data pipeline to load it into memory for training and the client to map predictions to actual movie names. Those are the pieces that would need updates if the data architecture changed. For example, the data could be stored in a database and therefore, the data pipeline would need to either dump the database to a file before running the current pipeline or directly query the database system from the code. As examples for the client code, the implementation could load the data to a hash map from a file or from a database or could query a database on every request.
To extend the data and incorporate new users or movies to the solution, it is necessary to retrain the model. A production solution would incorporate periodic retraining of the model and would provide personalized recommendations only after users have made enough use of the system to collect the data. Typically, similar projects incorporate generalized recommendations of popular items before providing personalized ones.
Finally, since the data may contain users’ information, developers implementing a similar solution should follow standard practices for critical data management. For instance, the training component only needs identifiers, not the specific users or items information. However, during prediction time, the client would need to identify a user (for example, a login), translate to an ID, and do the same for movies. Therefore, similar solutions should consider standard industry best practices in the components that collect and store data from users, export data for model training, and query data to translate IDs to plain text.
Once deployed, there is a minor risk that bad actors could gain access to the Sample Solution environment and modify it to mount a cyber attack (for example, perform a DoS attack). In order to mitigate the risk, the deployment scripts follow DAIR security ‘Best Practices’, such as:
- firewall rules restricting access to all ports except SSH port 22 on the deployed instance.
- access authorization control allows only authenticated DAIR participants to deploy and access an instance of the Sample Solution.
In addition, to limit security risks please follow the recommendations below:
- use security controls ‘as deployed’ without modification
- when finished using the reference solution, proceed to terminate the app (see section about termination earlier in this document).
Finally, as a stand-alone application, the reference solution does not directly consume network or storage resources while running. As such, those resources do not need any explicit control procedures.
The inference server can be queried by clients using HTTP or gRPC (Google Remote Procedure Call) protocols. There are not any specific networking considerations to highlight.
This Sample Solution uses a stateless model. That means that the same model can be deployed to many inference servers, implementing a standard highly scalable architecture where many requests can be sent in parallel and a load balancer spreads them through the inference servers.
The TensorRT inference server provides a health check API that indicates if the server is able to respond to inference requests. That allows the inference server to be included as any regular host in a highly available architecture, where the health check can be used by a load balancer to shut forward the requests, replace the host, or start a new one.
The Sample Solution only provides a simple command-line interface meant to showcase the backend. The UI would depend on the specific application and it is out of the scope of this example.
The code is regular Python code, it is organized in a modular manner, and includes extensive code comments. Thus, developers can easily extend it to create custom solutions.
This solution requires a single GPU instance in DAIR, whose equivalent value is approximately $100 / month in a public cloud.
All the libraries used in this sample solution are open source. The Movie Recommender code itself is as well open source. The MovieLens dataset is available for non-commercial use and under certain conditions. See detailed licensing information below. If you plan to use, modify, extend or distribute any components of this Sample Solution or its libraries, you must consider conformance to the terms of the licenses:
- TensorFlow license
- CUDA license
- TensorRT Inference Server license
- Movie Recommender Sample Solution license
- MovieLens 100k license
- MovieLens 1M license
- MovieLens 20M license
The source code for the solution is available at: https://code.cloud.canarie.ca:3000/carlamb/MovieRecommender and is available to DAIR participants. Please, refer to the README.md file for instructions on how to clone and use the repository.
|API||Application Programming Interface.|
|CMS||Cloud Management System.|
||Collaborative Filtering is a technique used by Recommender Systems to make automatic predictions about the interests of a user by collecting preferences or ratings from many users (collaborating).|
||Compute Unified Device Architecture. It is a parallel computing platform and programming model from NVIDIA.|
|DAIR||Digital Accelerator for Innovation and Research. This document refers to the DAIR Pilot released in Fall 2019.|
||Deep learning is part of a broader family of machine learning methods based on artificial neural networks.|
|GPU||Graphics processing unit. A hardware component with high performance for parallel processing.|
||A recommender system or a recommendation system is a Machine Learning model that seeks to predict the “rating” or “preference” a user would give to an item. They provide suggestions of relevant items to users.|