Unfortunately, PHP is not really suitable for Machine Learning and Neural Networks tasks. There are many reasons for that but reality is that most of ML related software is written on either Python or Java / .NET stacks. We have selected Python3 for our Machine Learning stack and further development will also be based on Python.
This means that in addition to standard requirements for Totara installation, server that processes ML data must have Python3 installation with all it dependencies.
Recommenders engine can work on both Windows and Linux that meet the following requirements:
- Python 3.6
- Python pip package installer
- CPU with multiple cores recommended
- 4GB RAM or more recommended
- GCC compiler (required for LightFM cython compilation)
- It must be the same host that runs Totara or it can be different host that has shared volume with Totara instance
Hardware requirements widely depend on amount and specifics of the data in your instance, but we have prepared indicative performance benchmark on two different data sets run on two different AWS instances which can be found below.
Installation on Debian based linux distributives:
Installation on Red Hat based distributives:
Dependencies listed in extensions/ml_recommender/python/requirements.txt and can be installed using pip3:
Make sure that the system user who will run python script has access to those dependencies installed (so either install them system-wide or install by that user or use virtualenv).
System cron tasks (processing)
Next step is installing cron tasks. Current implementation works in three steps for which need to be run separate script:
Export data (
php server/ml/recommender/cli/export_data.php): Should be run only when data is needed to be processed (e.g. once a day)
Process data by Python (
eval php server/ml/recommender/cli/recommender_command.php): Can be run every 5 minutes - it will check if data was already processed and exit if it was
- Import data back to server (
server/ml/recommender/cli/import_recommendations.php): Can be run every 5 minutes - it will check if data was already imported and will exit if it was
Run all those scripts as user that have full access to the selected folder.
App server and ML server are same host
If Python is configured on the same host, then just run prepared script:
App server and ML server are different hosts
In this case, the whole process must be executed as three separate cron tasks. ML server will need to have at least the extensions folder of Totara distribution which has required Python script.
For simplicity of maintenance it is advised to completely reflect the folder structure of Totara distribution and data folder on both hosts.
This option increases complexity, so instead of separate hosts consider installation full Totara on ML instance and connecting it to a read-only database slave, which will not be available via load balancer and only serve as a ML processor. Use the method below only when ML server must not have access to Totara database or must not have Totara instance installed.
On a Totara host:
On Python host:
Python execution CLI command - obtained by running in Totara root directory and keeping one side:
Python command - would look similar to the following (see parameter descriptions below):
'/usr/bin/python3.7' '/var/www/totara/src/work/reorg/extensions/ml_recommender/python/ml_recommender.py' --query 'hybrid' ...
This script emits the full command required to run the python script for the recommendations process according to the current configuration settings on the settings page.
If the host where Python is running has a different path mapped to shared volume, it must be adjusted in --data-path parameter.
Example cron entry:
Generally it should be adequate to run the export process once a day. However, the optimum run frequency depends on many factors, for example how active users on the site are and how much new content is created daily.
To enable recommender system checks, add the following line to the config.php configuration file for your site:
Log in to Totara instance and navigate to settings page:
Python3 binary path is used only to generate executable string (to run via eval). Python is not run from PHP environment.
Parameters to ml_recommenders.py script
Full Hybrid - Meta-data & Content - utilises content data, item meta-data and user-item interaction data (longest time to process, highest granularity);
Partial Hybrid - utilises item meta-data and user-item interaction data
Matrix Factorisation - utilises only user-item interaction data (shortest processing time, lowest granularity)
TEXT - DROPDOWN - Full Hybrid
Number of cores/threads that may be utilised by the recommendation library (should be less than physical cores).
NUMERIC - DROPDOWN - 1
User result count - number of items-to-user recommendations to return.
NUMERIC - DROPDOWN - 10
Item result count - number of items-to-item recommendations to return.
NUMERIC - DROPDOWN - 10
Path to exported data files
TEXT - FILESYSTEM PATH - /totara_data_root/recommender/data
The period of user-item interactions to limit recommendations to, e.g. previous week, previous 2 weeks, previous 4 weeks
TEXT - DROPDOWN - ??? weeks, months
Location of the python3 executable on the system
TEXT - FILESYSTEM PATH - default blank (admin will need to install python and tell us where it is)
The following indicative tests were run using only user-item interaction data processed through collaborative filtering (matrix factorisation). Memory requirements and run times will increase when extra features processing via content-based filtering (i.e. user and item meta-data and/or content meta-data) is included as one of the hybrid processing modes.
Data sets used
|AWS Instance||Cores||Memory||100K Interactions||25M interactions|
|t3.medium||2||4GB||1 min 20 sec||crashed (out of memory)|
|t3.xlarge||4||16GB||1 min 0 sec||10 hrs 35 min|