The recommendations engine is coded in Python and can be described in a flow chart as below.
Each block of this flow chart is described in detail below.
The CSV exporter block is coded in PHP which reads user, content and interactions (user - content) data per tenant from the Totara database. The data is then saved as CSV files in the directory specified by the Site Administrator on the Recommender engine page (Quick-access menu > Plugins > Machine learning settings > Recommender engine) when the machine learning plugins are enabled.
The user and content data files include metadata of the users and the content, respectively while the interactions data is a record of whether a user has interacted positively with the content or interacted leisurely with the content and the time when the interaction happened.
The user metadata consists of
The content metadata consists of
The interactions data consists of
This block is a Python class that reads the CSV files for each tenant at a time and pipes it for further processing.
This is a decision block that is set by the Site Administrator from the Recommender engine page (Quick-access menu > Plugins > Machine learning settings > Recommender engine) when the machine learning plugins are enabled. The user can choose one mode from the following:
Depending on the recommendation mode selected by the Site Administrator, one of the data processors transforms the data into the compatible form that can be consumed by the subsequent process. The output of each of the data processors is in sparse matrices form, so that the memory is used efficiently.
This block ignores the user and content data and transforms the interactions data into a format that the subsequent modules can consume.
This block uses the user and content metadata as well as the interactions data and transforms it for consumption in the subsequent process. This block ignores only the free text fields of user (city/town and profile description) and content (text description) data.
This utilises all the data fields in the user, content, and the interactions data including the free text fields of the user and content data. The free text fields are passed through the Natural Language Processing pipeline where the text is cleaned, lemmatised (if possible) and then converted to a matrix of TF-IDF features. The data sets are then transformed into a compatible form so that these can be consumed for subsequent process.
Depending on the recommendation mode selected by the Site Administrator, either the matrix factorisation (which is a sub-class of the collaborative filtering approach) or the content-based filtering approach is used for building the machine learning model for recommendations.
The model will be built using the standard matrix factorisation approach if the Site Administrator has chosen the matrix factorisation mode. During this process the model hyper-parameters, the lateral dimension and the number of epochs are tuned using the past interactions data of the users with the content. The final model is then built using the tuned hyper-parameters and forwarded to the next stage.
If the administrator selects the partial hybrid or the full hybrid modes, the content-based filtering algorithm is used to build the model. The data input for this algorithm includes users' and items' metadata. The class of the modelling algorithm used is implemented via the LightFM library and is described in Maciej Kula, 2015. Again the hyper-parameters are tuned using the past interactions of the users with the content and the provided metadata of the users and the content. After which the final model is built using the tuned set of these hyper-parameters. Note that this algorithm accepts data from either of the Partial data processor and Full data processor blocks which means it can accept and use the processed Natural Language Processed data as well.
Depending on the Site Administrator's settings at Quick-access menu > Plugins > Machine learning settings > Recommender engine, this block uses one of the models built in the previous block to predict:
The amount of similar content produced for each content is determined by the Site Administrator's settings at Quick-access menu > Plugins > Machine learning settings > Recommender engine. The content is sorted in descending order by the cosine similarity score of each content with the given content.
The amount of recommended content for each user is also determined by the Site Administrator's settings. The recommended content for each user is sorted in descending order by the prediction score. The prediction scores (or rankings) themselves are not interpretable, they are simply a means of ranking the items. The order of the prediction scores is important though; the content with the higher predictions score is more likely to be of the user's interest than the one with lower prediction score.
Both the outputs from the Obtain recommendations block (the list of similar content for each content, and the list of recommended content for each user) are written as CSV files by this block. The files are written in the same directory where the input datasets (in the form of CSV files) were stored and were set by the Site Administrator.
This block reads the CSV files written by the CSV writer block and loads the data into the Totara database relevant tables for each tenant.
© Copyright 2020 Totara Learning Solutions. All rights reserved.