MySQL is using case and accent insensitive collation for full text search (FTS) by default, you can specify a different collation here, for example:
$CFG->dboptions['ftslanguage'] = 'utf8_unicode_ci';
$CFG->dboptions['ftslanguage'] = 'utf8mb4_0900_as_ci';
$CFG->dboptions['ftslanguage'] = 'utf8mb4_de_pb_0900_ai_ci';
MySQL does not support Japanese and other languages with very short words without spaces in between, enable the following setting to get a basic experimental support of these languages:
$CFG->dboptions['fts3bworkaround'] = true;
After any of these changes re-populate FTS tables and rebuild indexes by running:
Minimum search term length
MySQL default search char limit can be changed by editing MySQL configuration file:
Stop words is a dictionary of words that will be excluded from index and search query. They depend on language being used during indexing and search as well as dictionary presence for database installation e.g. if user search for "Hotels in Wellington" then the system excludes "In" words from the search.
More details on MySQL stop words can be found in the MySQL help documentation.
Ngram is a built-in MySQL full-text parser, that determines the beginning and end of words using white space and particular letter sequences. It is usually enabled by default, but might depend on distribution. It can parse two or more words stems from compound words. For example, in a German language, with the word 'Fußballweltmeisterschaft' it will allow to search it by words like 'meister', 'schaft' and so on. Full-text index made without using this plugin will not able to split the word into smaller pieces and the SQL that is trying to look for a keyword like 'meister' will not able to find the record.
For some ideographic languages (like an example above) the normal full-text index will be limited to what it can search for since there are no delimiters for those words. Thus, this is where ngram comes in quite handy. It is a contiguous sequence of a number of characters from a sequence of text. The main function of ngram full-text parser is tokenising a sequence of text into a contiguous sequence of N characters.
In some cases Ngram does too eager search and returns false positive results (although they are low rated and appear in the end of the list), so it is advised to confirm that search behaviour suits your needs after enabling Ngram support.
To enable Ngram change your config.php setting:
$CFG->dboptions['ftsngram'] = true;
and then run:
This setting has effect only in MySQL.
By default, MySQL does support diacritics but it depends on collation and language defined in
$CFG->dboptions['ftslanguage']. For example, when collation is "utf8_general_ci", then the search query containing either word 'första' or 'forsta' will return a record containing word 'första'.