Recommender engine

What is the recommender engine?

Totara Engage comes with an implicit-feedback hybrid recommender engine built in Cython – LightFM. The recommender engine utilises collaborative filtering (i.e. interactions that users have with content) and content filtering (i.e. metadata of the content itself) to make personalised recommendations to users. Additionally the recommender engine will suggest related content for each piece of Totara Engage content.

The recommender engine itself does not interface directly with Totara Core, or with the database. Instead, data required by the recommender engine is prepared by Totara Core and exported into a collection of CSV files (user data, content data and user-item interactions data). This data is read into and processed by the recommender engine before the resulting recommendations can be made available in the Totara instance.

To allow for the differences in the many running instances of Totara Engage, the recommender engine has been designed to cover as many operational realities as possible. Sites with generously provisioned web servers (in terms of cores and RAM) can run the full hybrid mode, which utilises basic native language processing (NLP) to enrich the content features presented to the hybrid model, making it possible to provide fine-grained recommendations and suggestions. Installations running on more modest infrastructure may need to run partial hybrid mode (which does not use NLP, but still includes some content item metadata elements), or the base matrix factorisation (MF) mode. The main drawback of the base mode is the inability to make suggestions to users for items that have not been seen in the interactions data.

How the recommender engine works

User interactions with Totara Engage content (e.g. views, shares, likes, ratings) are tracked through event observers. The interactions data is exported to CSV files, along with user and content data. These files serve as input to the recommender engine. A host system scheduled task (i.e. server cron, not Totara cron) will manage the process to extract and preformat user, content and interaction data into CSV files for the recommendation system.

In a typical run, the recommender engine will perform the following steps:

  • Apply NLP to determine the most significant words per content piece, with respect to the corpus, by TF-IDF calculations (term frequency, inverse document frequency)
  • Append the resultant vectors to the prepared document metadata vectors
  • Split the interactions data into training and test sets
  • Calculate optimum hyper-parameter settings for the model
  • Train and fit the model
  • Compute recommendations per user
  • Compute related content for each piece of content

Trending content, another aspect of the recommender engine, is identified and updated within Totara Engage by a Totara cron scheduled task.

Recommendations and suggestions are presented to users in the front end via configurable blocks and tabbed panel displays. Front-end components retrieve the desired recommendations from the Totara database via GraphQL queries.

Main API - recommender plugin

There are many ways in which to implement machine learning (ML), whether for recommendation purposes or other forms of analysis. The current system design aspires to provide maximum flexibility while maintaining a relatively simple interface.

Exported data

Description of the data files containing data exported by Totara for ML processing.

Interactions data (user_interactions.csv) columns

ColumnDescription
user_id (user identifier)User who interacted with content.
item_id (item identifier)

The content item that was interacted with.

rating

In this implementation the rating is a consolidated implicit score of 1 (indicating an implied positive interaction), or 0 (not a positive interaction).
timestampUnix epoch timestamp of most recent interaction by this user on this item of content.

Content item data (item_data.csv) columns

ColumnDescription
item_id (item identifier)The content item.

Variable number of columns representing content item metadata, including:

  • One-hot encoded vector of current system-wide topics (1 indicates topic is set for this item)
  • List of words comprising the title, description and/or textual content of the item

User data (user_data.csv) columns

ColumnDescription
user identifierThe user.

Variable number of columns representing user metadata:

  • Currently only user-selected preferred language identifier

Namespaces and classes

ClassesPurpose
ml_recommender\loader\*Provide a list of recommendations for articles, workspaces and playlists.
ml_recommender\local\export\*Select, format and output data for submission to machine learning.
ml_recommender\local\import\*Import recommendations for users and content items (deprecated).
ml_recommender\observer\interaction_observerLog registered user-item interactions (view, share, like, rating, comment).
ml_recommender\webapi\resolver\query\*GraphQL query data providers for front-end components.
ml_recommender\webapi\resolver\type\*GraphQL data definitions for front-end queries.

How to apply alternative modelling, or processing, of exported data

(To be decided upon)

The default model (./extensions/ml_service/service/recommender/train_recommender.py) may be replaced by any other machine learning (ML) implementation, provided the outputs are of the expected form. To develop your own model, it is recommended to take a copy of the standard export data and develop the custom modelling offline. Once you are satisfied with the outputs then implementation requires the following steps:

  1. Copy your custom code to ./extensions/ml_service/my_custom_dir (e.g. ./extensions/ml_recommender/python/custom/customscript.py).
  2. Modify the host execution script to run your custom code instead of the default code.

If desired, the existing ML script may be executed as normal and additional outputs from custom code can be appended to the output files for uploading into Totara.

Best practices

We recommend the following when working with the Recommender engine:

  • Develop custom modelling offline with data exported from a live Totara instance - it is best if data is reasonably plentiful and as uncontrived as possible.
  • Do not make use of too many user or item features - this can lead to overfitting, resulting in very long processing times.
  • Apply feature normalisation wherever possible, to minimise the range of values that features can assume (another potential cause of overfitting, or failure to converge) - for example, large continuous values should be transformed into one-hot encoded categorical data. As an example, if we were to consider somehow including the interaction time of day as an extra feature in custom code, we should categorise timestamp data (e.g. see code snippet below) rather than submit the big integer continuous value timestamp as is.
def _time_of_day(timestamp):

hour = round(timestamp % 86400 % 3600 / 60 , 0) + 1
if hour < 6:
time_of_day = 'predawn'
elif hour < 12:
time_of_day = 'morning'
elif hour < 18:
time_of_day = 'afternoon'
else:
time_of_day = 'evening'

return time_of_day

© Copyright 2022 Totara Learning Solutions. All rights reserved.