Train a Simple Recommendation Engine using Azure Machine Learning Designer

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Hi, everyone! I am Paschal Alaemezie, a Gold Microsoft Learn Student Ambassador. I am a student at the Federal University of Technology, Owerri (FUTO). I am interested in Artificial Intelligence, Software Engineering, and Emerging technologies, and how to apply the knowledge from these technologies in writing and building cool solutions to the challenges we face. Feel free to connect with me on LinkedIn and GitHub or follow me on X (Twitter).

Have you ever logged in to any of the popular online stores without buying anything there yet? Did you notice that most of the recommended items are similar to the ones you just viewed, or the ones matching your demographics? How about watching a video for a while on any of the popular video streaming platforms? Did you notice that videos similar to the one you just viewed were recommended to you? These are the wonders of recommendation engines that modern industries harness in making their platforms more interactive and achieving satisfying user experiences.

Items such as movies, restaurants, books, shoes, or songs are instances of what might be recommended to users. The user is an entity with item preferences such as a person, a group of persons, or any other type of entity you can imagine.

In this article, we will train a simple recommendation engine using the Azure Machine Learning designer, which is the graphical UI of Azure Machine Learning, and for this purpose, we will need an Azure subscription. In my next article series on AI, I will show you how you can build amazing solutions using the new Azure AI Studio.

If you are a student, you can use your university or school email to sign up for a free Azure for Students account and start building on the Azure cloud with a free $100 Azure credit.

Approaches to Building Recommendation Engines

These are the approaches to building relevant recommendation engines:

The content-based approach: Recommendations are based on the similarity of users or items. Users can be described by properties such as age or gender. Items can be described by properties such as the author or the manufacturer. Typical examples of content-based recommendation systems can be found in online stores.
The Collaborative filtering approach: This approach uses only identifiers of the users and the items. It is based on a matrix of ratings given by the users to the items. The main source of information about a user is the list of the items they have rated and the similarity with other users who have rated the same items. The SVD recommender module in Azure Machine Learning Designer is based on the Singular Value Decomposition algorithm. It uses identifiers of the users and the items, and a matrix of ratings given by the users to the items. It is a typical example of a collaboratively filtered recommender.
The Hybrid approach: This approach combines both the content- and the collaborative-filtering approaches to interact with both user ratings and cold-start users - who are users without ratings. The benefit of this approach is that it optimizes the capabilities of both recommender systems to create a combined recommendation. An example of a hybrid online recommendation engine is Azure AI Personalizer - which enables you to create optimized user experiences and add real-time relevance to product recommendations, with reinforcement learning-based capabilities.

Activities

We will make use of the Train SVD Recommender module available in Azure Machine Learning Designer to train a movie recommendation engine. We will adopt the collaborative filtering approach: the model learns from a collection of ratings made by users on a subset of a catalogue of movies. Two open datasets available in Azure Machine Learning Designer are used the IMDB Movie Titles dataset joined on the movie identifier with the Movie Ratings dataset.

We will both train the engine and score new data, to demonstrate the different modes in which a recommender can be used and evaluated. The trained model will predict what rating a user will give to unseen movies, so we will be able to recommend movies that the user is most likely to enjoy. This is a No-code approach to training recommendation engines using the Azure Machine Learning Designer.

Activity 1: Create a New Training Pipeline

Step 1: Setting up your Azure Machine Learning workspace

In the Azure portal, click on Create a resource.

Search for Azure Machine Learning and select it. In the Azure Machine Learning window, click on Create, and select New workspace.

Step 2: In the Basics section:

For the Resource details:

Select your Subscription from the drop-down menu.
Select your Resource group. If you have any existing resource group, select it from the drop-down menu. Otherwise, click on Create new to create a new resource group, and click OK after that.

For the Workspace details:

Workspace name: Provide any name of your choice, for example, Movie-recommender. The name you choose should be unique in the resource group.
Region: select any region of your choice.
Then, click on Review + create.

When your workspace passes the validation process, click on Create.

When your deployment is completed, click on Go to resource if you want to view your resource.

Step 3: Open Pipeline Authoring Editor

In the Azure portal, open the available machine learning workspace that you provisioned.

In the workspace, scroll down to where you can see the Launch studio button and click on it. It will open the Azure AI Machine Learning Studio in a new tab inside your web browser.

From the studio, select Designer from the navigation pane on the left-hand side. This will open the Designer environment where you can select a new pipeline if there is no existing pipeline.

In the Designer environment, select the Classic prebuilt component. Then click on the Create a new pipeline using classic prebuilt components. This will open a visual pipeline authoring editor.

Step 4: Add Sample Datasets

In the left navigation pane of the Authoring editor, click the Asset library and go to the Component section. Under Component, click on Sample data.

In the Sample data, scroll down to the Movie Ratings, and IMDB Movie Titles. Drag and drop the selected datasets onto the canvas.

Step 5: Join the two datasets on Movie ID

Close the Sample data drop-down menu. From the Data Transformation section in the left navigation, select the Join Data prebuilt module, and drag and drop the selected module onto the canvas

Connect the output of the Movie Ratings module to the first input of the Join Data module.
Connect the output of the IMDB Movie Titles module to the second input of the Join Data module.

Select the Join Data module. Click the navigation button at the upper right of the canvas to open the Join Data module window.

Select the Edit column link to open the Join key columns for the left dataset editor. Select the MovieId column in the Enter column name field and click Save.

Select the Edit column link to open the Join key columns for the right dataset editor. Select the Movie ID column in the Enter column name field and click Save. Then, close the Join Data window.

Step 6: Select Columns UserId, Movie Name, and Rating using a Python script

From the Python Language section in the left navigation, select the Execute Python Script prebuilt module. Drag and drop the selected module onto the canvas. Then, connect the Join Data output to the input of the Execute Python Script module.

Select Edit code to open the Python script editor, clear the existing code and then enter the following lines of code to select the UserId, Movie Name, and Rating columns from the joined dataset. Ensure best practice by indenting only the second and third lines of your code.

Step 7: Remove duplicate rows with the same Movie Name and UserId

From the Data Transformation section in the left navigation pane, select the Remove Duplicate Rows prebuilt module from the drop-down menu, and drag and drop the selected module onto the canvas.

Connect the first output of the Execute Python Script to the input of the Remove Duplicate Rows module.

Select the Edit column link to open the Select column editor. Click the navigation button at the upper right of the canvas to open the Remove Duplicate Rows module window.

Enter the following list of columns to be included in the output dataset: Movie Name, UserId. Then, click Save.

Step 8: Split the dataset into a training set (0.5) and a test set (0.5)

From the Data Transformation section in the left navigation select the Split Data prebuilt module and drag and drop the selected module onto the canvas, then connect the Dataset to the Split Data module.

Click the navigation button at the upper right of the canvas to open the Split Data module window. Ensure that the Fraction of rows in the first output dataset: 0.5

Step 9: Initialize Recommendation Module

From the Recommendation section in the left navigation pane, select the Train SVD Recommender prebuilt module and drag and drop the selected module onto the canvas. Then, connect the first output of the Split Data module to the input of the Train SVD Recommender module.

Click the navigation button at the upper right of the canvas to open the Train SVD Recommender module window. Set Number of factors: 200. This option specifies the number of factors to use with the recommender.
Number of recommendation algorithm iterations: 30. This number indicates how many times the algorithm should process the input data. The default value is 30.
For Learning rate: 0.001. The learning rate defines the step size for learning.

Step 10: Select Columns UserId, Movie Name from the test set

From the Data Transformation section in the left navigation pane, select the Select Columns in Dataset prebuilt module and drag and drop the selected module onto the canvas. Then, connect the Split Data second output to the input of the Select columns in Dataset module.

Click the navigation button at the upper right of the canvas to open the Select Columns in Dataset module window. Select the Edit column link to open the Select columns editor.

Enter the following list of columns to be included in the output dataset: UserId, Movie Name and Click Save.

Step 11: Configure the Score SVD Recommender

From the Recommendation section in the left navigation pane, select the Score SVD Recommender prebuilt module and drag and drop the selected module onto the canvas

Connect the output of the Train SVD Recommender module to the first input of the Score SVD Recommender module, which is the Trained SVD recommendation input.
Connect the output of the Select Columns in Dataset module to the second input of the Score SVD Recommender module, which is the Dataset to score input.

Open the Score SVD Recommender module on the canvas by clicking on the navigation button at the upper right of the canvas. Set the Recommender prediction kind: Rating Prediction. For this option, no other parameters are required.

Step 12: Setup Evaluate Recommender Module

From the Recommendation section in the left navigation pane, select the Evaluate Recommender prebuilt module and drag and drop the selected module onto the canvas.

Connect the Score SVD Recommender module to the second input of the Evaluate Recommender module, which is the Scored dataset input.
Connect the second output of the Split Data module (train set) to the first input of the Evaluate Recommender module, which is the Test dataset input.

Activity 2: Submit Training Pipeline

In the Authoring editor, ensure that you have AutoSave enabled. Then click on Configure & Submit at the upper right-hand side of your screen.

For the Set up pipeline job window: In the Basics section, click the Create new button under the Experiment name. Type your new experiment name and click the Next button at the bottom of the screen.

In the Inputs & outputs section, click the Next button at the bottom of the screen.

In the Runtime settings section: skip the Default compute. Go to the select compute type and select Compute instance from the drop-down menu. Under the Select Azure ML compute instance, click on Create Azure ML compute instance. The Create compute instance will open in another environment.

In the Create compute instance window, type in your compute name under the Compute name tab. Then, select the CPU button under the Virtual machine type.

While authoring this article, I had to select my virtual machine first to enable the Compute name tab. You may or may not encounter this issue. I selected the Standard_D2_v2 virtual machine for this training. After that, click the Review + Create button at the end of the screen, to take you back to the Runtime settings window.

Back to the Runtime settings window. At the Select Azure ML compute instance, Select the compute instance that you have created. Here, I selected the movie instance from the drop-down menu. Note that your newly created compute instance will take some time to be provisioned and appear in your drop-down menu. Go to the Advanced settings and ensure that the Continue on step failure box is checked. Then, click the Review + Submit button at the end of the screen.

At the Review + Submit section, ensure that your provided details are correct. Then, click the Submit button at the end of the screen.

Activity 3: Visualize Scoring Results

Step 1: When your pipeline is submitted and your model training is completed, at the left navigation pane, go to Jobs under Asset and click on the name of your completed pipeline.

Step 2: Visualize the Scored dataset

Go to the Score SVD Recommender module on the canvas and right-click on it. Select Preview data and click on Scored dataset.

Observe the predicted values under the column Rating.

Step 3: Visualize the Evaluation Results

Go to the Evaluate Recommender module on the canvas and right-click on it. Select Preview data and click on Metric.

Evaluate the model performance by reviewing the various evaluation metrics, such as Mean Absolute Error, Root Mean Squared Error, etc.

Conclusion

In our modern world, recommendation engines play a significant role in enhancing user experiences. These algorithms analyze user data to predict and suggest personalized content, from product recommendations in online stores to movie suggestions on streaming platforms. By adopting a data-driven approach and leveraging machine learning, businesses can create tailored experiences that resonate with users, ultimately driving engagement and retention.

Furthermore, training a recommendation engine with Azure Machine Learning Designer streamlines the process and opens up a world of possibilities for personalized user experiences. As we harness the power of Azure’s tools to refine our models, engaging with communities and resources that foster growth and innovation is equally important.

For enthusiasts and professionals alike, you can leverage these resources to stay informed and inspired as you embark on your AI journey:

Microsoft AI Discord Community is a dynamic space to discuss and share AI-related insights.
Global AI Community offers a platform to connect with peers worldwide.
Azure Samples provides practical code examples to enhance your projects.
Microsoft AI Show delivers the latest updates in AI technology.

If you seek deeper insights, “Mastering Azure Machine Learning” by Christoph Körner and Marcel Alsdorf provides valuable guidance on building robust recommendation systems within the Azure platform.

Leave a Reply Cancel reply