Legacy Recommender Service

As of Totara 17 the recommender service (ml_recommender) is now deprecated. Please refer to the Machine Learning Service documentation to find out more about how Totara recommendations work.

What is machine learning subsystem?

The machine learning (ML) subsystem provides a space for ML and artificial intelligence (AI) enrichment within the broader context of a running Totara instance.

What is the recommender engine?

Totara Engage comes with an implicit feedback hybrid recommendations engine built in Cython – LightFM. The recommendations engine utilises collaborative filtering (i.e. interactions that users have with content) and content filtering (i.e. meta-data of the content itself) to make personalised recommendations to users. Additionally the recommendations engine will suggest related content for each piece of Totara Engage content.

The recommendations engine itself does not talk directly with Totara Core, or with the database. Instead, data required by the recommendations engine is prepared by Totara Core and exported into a collection of CSV files (user data, content data and user-item interactions data). This data is read into and processed by the recommendations engine before the resulting recommendations are output in CSV format for automatic upload into the Totara database.

To allow for the differences in the many running instances of Totara Engage, the recommendations system has been designed to cover as many operational realities as possible. Sites with generously provisioned web servers (in terms of cores and RAM) can run the Full Hybrid mode - this utilises basic native language processing (NLP) to enrich the content features presented to the hybrid model, making it possible to provide fine-grained recommendations and suggestions. Installations running on more modest infrastructure may need to run Partial Hybrid mode (no NLP, but still includes some content item meta-data elements), or the base Matrix Factorisation (MF) mode. The main drawback of the base mode is the inability to make suggestions to users or for items that have not been seen in the interactions data.

How the recommender engine works

User interactions with Totara Engage content (e.g. views, shares, likes, ratings) are tracked through event observers. The interactions data is exported to CSV, along with user and content data. These files serve as input to the recommendations engine. A host system scheduled task (i.e. server cron, not Totara cron) will manage the three-step process:

Extract and pre-format user, content and interaction data into CSV files for the recommendation system.
Python script performs NLP to identify strongest content features, submits data to LightFM model for processing and, finally, outputs items-to-user recommendations and related items suggestions as CSV files.
Upload of recommendations and suggestion into Totara Engage tables.

In a typical run, the Python script managing the machine learning process will perform the following steps:

Apply NLP to determine the most significant words per content piece, with respect to the corpus, by TF-IDF calculations (Term Frequency, Inverse Document Frequency)
Append the resultant vectors to the prepared document meta-data vectors
Split the interactions data into training and test sets
Calculate optimum hyper-parameter settings for the model
Train and fit the model
Compute recommendations per user and output these to a CSV file
Compute related content for each piece of content and output these to a CSV file

Trending content, another aspect of the recommendations system, is identified and updated within Totara Engage by a Totara cron scheduled task.

Recommendations and suggestions are presented to users in the front-end via configurable blocks and tabbed panel displays. Front-end components retrieve the desired recommendations from the Totara database via graphQL queries.

DB tables

Table	Purpose
ml_recommender_interactions	Track user interactions with content(e.g. view, like, share, comment)
ml_recommender_users	A list of content items that have been computed to be pertinent for the target user
ml_recommender_items	A list of content items that have been computed to be related, or similar, to the target item
ml_recommender_trending	A list of recently active content pieces regularly updated by Totara cron

Main API - recommender plugin

There are many ways in which to implement machine learning (ML), whether for recommendation purposes or other forms of analysis. The current system design aspires to provide maximum flexibility while maintaining a relatively simple interface.

Exported data

Description of the data files containing data exported by Totara for ML processing.

Interactions data (user_interactions.csv) columns

Column	Description
user_id (user identifier)	User who interacted with content.
item_id (item identifier)	The content item that was interacted with.
rating	In this implementation the rating is a consolidated implicit score of 1 (indicating an implied positive interaction), or 0 (not a positive interaction).
timestamp	Unix epoch timestamp of most recent interaction by this user on this item of content.

Content item data (item_data.csv) columns

Column	Description
item_id (item identifier)	The content item.

Variable number of columns representing content item metadata, including:

One-hot encoded vector of current system-wide topics (1 indicates topic is set for this item)
List of words comprising the title, description and/or textual content of the item

User data (user_data.csv) columns

Column	Description
user identifier	The user.

Variable number of columns representing user metadata:

Currently only user selected preferred language identifier

Imported data

Description of the data files output by recommendation engine for import into Totara database. If a custom processing engine is used instead of the default (./extensions/ml_recommender/python/ml_recommender.py) then care must be taken that the outputs are in the expected format to ensure accurate upload into Totara.

Items recommended for users (i2u.csv)

Column	Description
uid	The user identifier.
iid	A recommended item identifier.
ranking	The relative ranking score for the item (the value has no meaning beyond sequencing, thus can span a range of negative or positive values).

Items related to items (i2i.csv):the

Column	Description
target_iid	The item identifier.
similar_iid	Identifier of an item computed to be related to, or suggested for, the target item.
ranking	The relative ranking score for the item (the value has no meaning beyond sequencing, thus can span a range of negative or positive values).

Namespaces and classes

Classes	Purpose
ml_recommender\loader\*	Provide list of recommendations for articles, workpaces and playlists.
ml_recommender\local\export\*	Select, format and output data for submission to machine learning.
ml_recommender\local\import\*	Import recommendations for users and content items.
ml_recommender\observer\interaction_observer	Log registered user/item interactions (view, share, like, rating, comment).
ml_recommender\webapi\resolver\query\*	GraphQL query data providers for front-end components.
ml_recommender\webapi\resolver\type\*	GraphQL data definitions for front-end queries.

How to apply alternative modelling or processing of exported data

The default model (./extensions/ml_recommender/python/ml_recommender.py) may be replaced by any other machine learning (ML) implementation, provided the outputs are of the expected form. To develop your own model it is recommended to take a copy of the standard export data and develop the custom modelling offline. Once you are satisfied with the outputs then implementation requires:

Copy your custom code to ./extensions/ml_recommender/my_programming_lang/my_custom_dir (e.g. ./extensions/ml_recommender/python/custom/customscript.py).
Modify the host execution script to run your custom code instead of the default code.

If desired, the existing ML script may be executed as normal and additional outputs from custom code can be appended to the output files for uploading into Totara.

Best practices

Develop custom modelling offline with data exported from a live Totara instance - it is best if data is reasonably plentiful and as uncontrived as possible.

Do not make use of too many user or item features - this can lead to overfitting, resulting in very long processing times.

Apply feature normalisation wherever possible, to minimise the range of values that features can assume (another potential cause of overfitting, or failure to converge) - for example, large continuous values should be transformed into one-hot encoded categorical data. As an example, if we were to consider somehow including the interaction time-of-day as an extra feature in custom code, we should categorise timestamp data (e.g. see code snippet below) rather than submit the big integer continuous value timestamp as-is:

def _time_of_day(timestamp):

hour = round(timestamp % 86400 % 3600 / 60 , 0) + 1
if hour < 6:
time_of_day = 'predawn'
elif hour < 12:
time_of_day = 'morning'
elif hour < 18:
time_of_day = 'afternoon'
else:
time_of_day = 'evening'

return time_of_day