
- MySQL - Home
- MySQL - Introduction
- MySQL - Features
- MySQL - Versions
- MySQL - Variables
- MySQL - Installation
- MySQL - Administration
- MySQL - PHP Syntax
- MySQL - Node.js Syntax
- MySQL - Java Syntax
- MySQL - Python Syntax
- MySQL - Connection
- MySQL - Workbench
- MySQL Databases
- MySQL - Create Database
- MySQL - Drop Database
- MySQL - Select Database
- MySQL - Show Database
- MySQL - Copy Database
- MySQL - Database Export
- MySQL - Database Import
- MySQL - Database Info
- MySQL Users
- MySQL - Create Users
- MySQL - Drop Users
- MySQL - Show Users
- MySQL - Change Password
- MySQL - Grant Privileges
- MySQL - Show Privileges
- MySQL - Revoke Privileges
- MySQL - Lock User Account
- MySQL - Unlock User Account
- MySQL Tables
- MySQL - Create Tables
- MySQL - Show Tables
- MySQL - Alter Tables
- MySQL - Rename Tables
- MySQL - Clone Tables
- MySQL - Truncate Tables
- MySQL - Temporary Tables
- MySQL - Repair Tables
- MySQL - Describe Tables
- MySQL - Add/Delete Columns
- MySQL - Show Columns
- MySQL - Rename Columns
- MySQL - Table Locking
- MySQL - Drop Tables
- MySQL - Derived Tables
- MySQL Queries
- MySQL - Queries
- MySQL - Constraints
- MySQL - Insert Query
- MySQL - Select Query
- MySQL - Update Query
- MySQL - Delete Query
- MySQL - Replace Query
- MySQL - Insert Ignore
- MySQL - Insert on Duplicate Key Update
- MySQL - Insert Into Select
- MySQL Indexes
- MySQL - Indexes
- MySQL - Create Index
- MySQL - Drop Index
- MySQL - Show Indexes
- MySQL - Unique Index
- MySQL - Clustered Index
- MySQL - Non-Clustered Index
- MySQL Operators and Clauses
- MySQL - Where Clause
- MySQL - Limit Clause
- MySQL - Distinct Clause
- MySQL - Order By Clause
- MySQL - Group By Clause
- MySQL - Having Clause
- MySQL - AND Operator
- MySQL - OR Operator
- MySQL - Like Operator
- MySQL - IN Operator
- MySQL - ANY Operator
- MySQL - EXISTS Operator
- MySQL - NOT Operator
- MySQL - NOT EQUAL Operator
- MySQL - IS NULL Operator
- MySQL - IS NOT NULL Operator
- MySQL - Between Operator
- MySQL - UNION Operator
- MySQL - UNION vs UNION ALL
- MySQL - MINUS Operator
- MySQL - INTERSECT Operator
- MySQL - INTERVAL Operator
- MySQL Joins
- MySQL - Using Joins
- MySQL - Inner Join
- MySQL - Left Join
- MySQL - Right Join
- MySQL - Cross Join
- MySQL - Full Join
- MySQL - Self Join
- MySQL - Delete Join
- MySQL - Update Join
- MySQL - Union vs Join
- MySQL Keys
- MySQL - Unique Key
- MySQL - Primary Key
- MySQL - Foreign Key
- MySQL - Composite Key
- MySQL - Alternate Key
- MySQL Triggers
- MySQL - Triggers
- MySQL - Create Trigger
- MySQL - Show Trigger
- MySQL - Drop Trigger
- MySQL - Before Insert Trigger
- MySQL - After Insert Trigger
- MySQL - Before Update Trigger
- MySQL - After Update Trigger
- MySQL - Before Delete Trigger
- MySQL - After Delete Trigger
- MySQL Data Types
- MySQL - Data Types
- MySQL - VARCHAR
- MySQL - BOOLEAN
- MySQL - ENUM
- MySQL - DECIMAL
- MySQL - INT
- MySQL - FLOAT
- MySQL - BIT
- MySQL - TINYINT
- MySQL - BLOB
- MySQL - SET
- MySQL Regular Expressions
- MySQL - Regular Expressions
- MySQL - RLIKE Operator
- MySQL - NOT LIKE Operator
- MySQL - NOT REGEXP Operator
- MySQL - regexp_instr() Function
- MySQL - regexp_like() Function
- MySQL - regexp_replace() Function
- MySQL - regexp_substr() Function
- MySQL Fulltext Search
- MySQL - Fulltext Search
- MySQL - Natural Language Fulltext Search
- MySQL - Boolean Fulltext Search
- MySQL - Query Expansion Fulltext Search
- MySQL - ngram Fulltext Parser
- MySQL Functions & Operators
- MySQL - Date and Time Functions
- MySQL - Arithmetic Operators
- MySQL - Numeric Functions
- MySQL - String Functions
- MySQL - Aggregate Functions
- MySQL Misc Concepts
- MySQL - NULL Values
- MySQL - Transactions
- MySQL - Using Sequences
- MySQL - Handling Duplicates
- MySQL - SQL Injection
- MySQL - SubQuery
- MySQL - Comments
- MySQL - Check Constraints
- MySQL - Storage Engines
- MySQL - Export Table into CSV File
- MySQL - Import CSV File into Database
- MySQL - UUID
- MySQL - Common Table Expressions
- MySQL - On Delete Cascade
- MySQL - Upsert
- MySQL - Horizontal Partitioning
- MySQL - Vertical Partitioning
- MySQL - Cursor
- MySQL - Stored Functions
- MySQL - Signal
- MySQL - Resignal
- MySQL - Character Set
- MySQL - Collation
- MySQL - Wildcards
- MySQL - Alias
- MySQL - ROLLUP
- MySQL - Today Date
- MySQL - Literals
- MySQL - Stored Procedure
- MySQL - Explain
- MySQL - JSON
- MySQL - Standard Deviation
- MySQL - Find Duplicate Records
- MySQL - Delete Duplicate Records
- MySQL - Select Random Records
- MySQL - Show Processlist
- MySQL - Change Column Type
- MySQL - Reset Auto-Increment
- MySQL - Coalesce() Function
MySQL - ngram Full-Text Parser
Usually in Full-Text searching, the built-in MySQL Full-Text parser considers the white spaces between words as delimiters. This determines where the words actually begin and end, to make the search simpler. However, this is only simple for languages that use spaces to separate words.
Several ideographic languages like Chinese, Japanese and Korean languages do not use word delimiters. To support full-text searches in languages like these, an ngram parser is used. This parser is supported by both InnoDB and MyISAM storage engines.
The ngram Full-Text Parser
An ngram is a continuous sequence of 'n' characters from a given sequence of text. The ngram parser divides a sequence of text into tokens as a contiguous sequence of n characters.
For example, consider the text 'Tutorial' and observe how it is tokenized by the ngram parser −
n=1: 'T', 'u', 't', 'o', 'r', 'i', 'a', 'l' n=2: 'Tu', 'ut', 'to' 'or', 'ri', 'ia' 'al' n=3: 'Tut', 'uto', 'tor', 'ori', 'ria', 'ial' n=4: 'Tuto', 'utor', 'tori', 'oria', 'rial' n=5: 'Tutor', 'utori', 'toria', 'orial' n=6: 'Tutori', 'utoria', 'torial' n=7: 'Tutoria', 'utorial' n=8: 'Tutorial'
The ngram full-text parser is a built-in server plugin. As with other built-in server plug-ins, it is automatically loaded when the server is started.
Configuring ngram Token Size
To change the token size, from its default size 2, use the ngram_token_size configuration option. The range of ngram values is from 1 to 10. But to increase the speed of search queries, use smallers token sizes; as smaller token sizes allow faster searches with smaller full-text search indexes.
Because ngram_token_size is a read-only variable, you can only set its value using two options:
Setting the --ngram_token_size in startup string:
mysqld --ngram_token_size=1
Setting ngram_token_size in configuration file 'my.cnf':
[mysqld] ngram_token_size=1
Creating FULLTEXT Index Using ngram Parser
A FULLTEXT index can be created on columns of a table using the FULLTEXT keyword. This is used with CREATE TABLE, ALTER TABLE or CREATE INDEX SQL statements; you just have to specify 'WITH PARSER ngram'. Following is the syntax −
CREATE TABLE table_name ( column_name1 datatype, column_name2 datatype, column_name3 datatype, ... FULLTEXT (column_name(s)) WITH PARSER NGRAM ) ENGINE=INNODB CHARACTER SET UTF8mb4;
Example
In this example, we are creating a FULLTEXT index using the CREATE TABLE statement as follows −
CREATE TABLE blog ( ID INT AUTO_INCREMENT NOT NULL, TITLE VARCHAR(255), DESCRIPTION TEXT, FULLTEXT ( TITLE, DESCRIPTION ) WITH PARSER NGRAM, PRIMARY KEY(id) ) ENGINE=INNODB CHARACTER SET UTF8MB4; SET NAMES UTF8MB4;
Now, insert data (in any ideographic language) into this table created −
INSERT INTO BLOG VALUES (NULL, '', ''), (NULL, '', '');
To check how the text is tokenized, execute the following statements −
SET GLOBAL innodb_ft_aux_table = "customers/blog"; SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;
ngram Parser Space Handling
Any whitespace character is eliminated in the ngram parser when parsing. For instance, consider the following TEXT with token size 2 −
"ab cd" is parsed to "ab", "cd"
"a bc" is parsed to "bc"
ngram Parser Stop word Handling
Apart from the whitespace character, MySQL has a stop word list consisting of various that are considered to be stopwords. If the parser encounters any word in the text present in the stopword list, the word is excluded from the index.
ngram Parser Phrase Search
Normal Phrase searches are converted to ngram phrase searches. For example, The search phrase "abc" is converted to "ab bc", which returns documents containing "abc" and "ab bc"; and the search phrase "abc def" is converted to "ab bc de ef", which returns documents containing "abc def" and "ab bc de ef". A document that contains "abcdef" is not returned.
ngram Parser Term Search
For natural language mode search, the search term is converted to a union of ngram terms. For example, the string "abc" (assuming ngram_token_size=2) is converted to "ab bc". Given two documents, one containing "ab" and the other containing "abc", the search term "ab bc" matches both documents.
For boolean mode search, the search term is converted to an ngram phrase search. For example, the string 'abc' (assuming ngram_token_size=2) is converted to '"ab bc"'. Given two documents, one containing 'ab' and the other containing 'abc', the search phrase '"ab bc"' only matches the document containing 'abc'.
ngram Parser Wildcard Search
Because an ngram FULLTEXT index contains only ngrams, and does not contain information about the beginning of terms, wildcard searches may return unexpected results. The following behaviors apply to wildcard searches using ngram FULLTEXT search indexes:
If the prefix term of a wildcard search is shorter than ngram token size, the query returns all indexed rows that contain ngram tokens starting with the prefix term. For example, assuming ngram_token_size=2, a search on "a*" returns all rows starting with "a".
If the prefix term of a wildcard search is longer than ngram token size, the prefix term is converted to an ngram phrase and the wildcard operator is ignored. For example, assuming ngram_token_size=2, an "abc*" wildcard search is converted to "ab bc".
ngram Full-Text Parser Using a Client Program
We can also perform ngram full-text parser operation using the client program.
Syntax
To perform the ngram fulltext parser through a PHP programe, we need to execute the "Create" statement using the mysqli function query() as follows −
$sql = "CREATE TABLE blog (ID INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), DESCRIPTION TEXT, FULLTEXT ( title, DESCRIPTION ) WITH PARSER NGRAM, PRIMARY KEY(id) )ENGINE=INNODB CHARACTER SET UTF8MB4"; $mysqli->query($sql);
To perform the ngram fulltext parser through a JavaScript program, we need to execute the "Create" statement using the query() function of mysql2 library as follows −
sql = `CREATE TABLE blog (ID INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), DESCRIPTION TEXT, FULLTEXT ( title, DESCRIPTION ) WITH PARSER NGRAM, PRIMARY KEY(id) )ENGINE=INNODB CHARACTER SET UTF8MB4`; con.query(sql);
To perform the ngram fulltext parser through a Java program, we need to execute the "Create" statement using the JDBC function execute() as follows −
String sql = "CREATE TABLE blog (ID INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), description TEXT," + " FULLTEXT ( title, description ) WITH PARSER NGRAM, PRIMARY KEY(id) )ENGINE=INNODB CHARACTER SET UTF8MB4"; statement.execute(sql);
To perform the ngram fulltext parser through a python program, we need to execute the "Create" statement using the execute() function of the MySQL Connector/Python as follows −
create_table_query = 'CREATE TABLE blog(ID INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), description TEXT, FULLTEXT (title, description) WITH PARSER NGRAM, PRIMARY KEY(id)) ENGINE=INNODB CHARACTER SET UTF8MB4' cursorObj.execute(queryexpansionfulltext_search)
Example
Following are the programs −
$dbhost = "localhost"; $dbuser = "root"; $dbpass = "password"; $dbname = "TUTORIALS"; $mysqli = new mysqli($dbhost, $dbuser, $dbpass, $dbname); if ($mysqli->connect_errno) { printf("Connect failed: %s
", $mysqli->connect_error); exit(); } // printf('Connected successfully.
'); /*CREATE Table*/ $sql = "CREATE TABLE blog (ID INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), description TEXT, FULLTEXT ( title, description ) WITH PARSER NGRAM, PRIMARY KEY(id) )ENGINE=INNODB CHARACTER SET UTF8MB4"; $result = $mysqli->query($sql); if ($result) { printf("Table created successfully...!\n"); } //insert data $q = "INSERT INTO blog (id, title, description) VALUES (NULL, '', ''), (NULL, '', '')"; if ($res = $mysqli->query($q)) { printf("Data inserted successfully...!\n"); } //we will use the below statement to see how the ngram tokenizes the data: $setglobal = "SET GLOBAL innodb_ft_aux_table = 'TUTORIALS/blog'"; if ($mysqli->query($setglobal)) { echo "global innodb_ft_aux_table set...!"; } $s = "SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position "; if ($r = $mysqli->query($s)) { print_r($r); } //display data (ngram parser phrase search); $query = "SELECT * FROM blog WHERE MATCH (title, description) AGAINST ('')"; if ($r = $mysqli->query($query)) { printf("Table Records: \n"); while ($row = $r->fetch_assoc()) { printf( "ID: %d, Title: %s, Descriptions: %s", $row["id"], $row["title"], $row["description"] ); printf("\n"); } } else { printf("Failed"); } $mysqli->close();
Output
The output obtained is as shown below −
global innodb_ft_aux_table set...!mysqli_result Object ( [current_field] => 0 [field_count] => 6 [lengths] => [num_rows] => 62 [type] => 0 ) Table Records: ID: 1, Title: , Descriptions: ID: 3, Title: , Descriptions:
var mysql = require("mysql2"); var con = mysql.createConnection({ host: "localhost", user: "root", password: "password", }); //Connecting to MySQL con.connect(function (err) { if (err) throw err; // console.log("Connected successfully...!"); // console.log("--------------------------"); sql = "USE TUTORIALS"; con.query(sql); //create a table... sql = `CREATE TABLE blog (ID INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), description TEXT, FULLTEXT ( title, description ) WITH PARSER NGRAM, PRIMARY KEY(id) )ENGINE=INNODB CHARACTER SET UTF8MB4`; con.query(sql); //insert data sql = `INSERT INTO blog (id, title, description) VALUES (NULL, '', ''), (NULL, '', '')`; con.query(sql); //we will use the below statement to see how the ngram tokenizes the data: sql = "SET GLOBAL innodb_ft_aux_table = 'TUTORIALS/blog'"; con.query(sql); //display the table details; sql = `SELECT * FROM blog WHERE MATCH (title, description) AGAINST ('')`; con.query(sql, function (err, result) { if (err) throw err; console.log(result); }); });
Output
The output obtained is as shown below −
[ { id: 1, title: '', description: '' } ]
import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; public class NgRamFSearch { public static void main(String[] args) { String url = "jdbc:mysql://localhost:3306/TUTORIALS"; String username = "root"; String password = "password"; try { Class.forName("com.mysql.cj.jdbc.Driver"); Connection connection = DriverManager.getConnection(url, username, password); Statement statement = connection.createStatement(); System.out.println("Connected successfully...!"); //creating a table that takes fulltext column with parser ngram...! String sql = "CREATE TABLE blog (id INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), description TEXT," + " FULLTEXT ( title, description ) WITH PARSER NGRAM, PRIMARY KEY(id) )ENGINE=INNODB CHARACTER SET UTF8MB4"; statement.execute(sql); //System.out.println("Table created successfully...!"); //inserting data to the table String insert = "INSERT INTO blog (id, title, description) VALUES (NULL, '', '')," + " (NULL, '', '')"; statement.execute(insert); //System.out.println("Data inserted successfully...!"); //we will use the below statement to see how the ngram tokenizes the data: String set_global = "SET GLOBAL innodb_ft_aux_table = 'TUTORIALS/blog'"; statement.execute(set_global); ResultSet resultSet = statement.executeQuery("SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position "); System.out.println("Information schema order by Id...!"); while (resultSet.next()){ System.out.println(resultSet.getString(1)+" "+resultSet.getString(2)); } //displaying the data...! String query = "SELECT * FROM blog WHERE MATCH (title, description) AGAINST ('')"; ResultSet resultSet1 = statement.executeQuery(query); System.out.println("table records:"); while (resultSet1.next()){ System.out.println(resultSet1.getString(1)+" "+resultSet1.getString(2)+ " "+resultSet1.getString(3)); } connection.close(); } catch (Exception e) { System.out.println(e); } } }
Output
The output obtained is as shown below −
Connected successfully...! Information schema order by Id...! 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 2 2 2 2 3 3 3 3 3 3 3 3 table records: 1
import mysql.connector # Establishing the connection connection = mysql.connector.connect( host='localhost', user='root', password='password', database='tut' ) # Creating a cursor object cursorObj = connection.cursor() # Create the blog table with NGRAM full-text parser create_table_query = ''' CREATE TABLE blog ( id INT AUTO_INCREMENT NOT NULL, title VARCHAR(255), description TEXT, FULLTEXT (title, description) WITH PARSER NGRAM, PRIMARY KEY(id) ) ENGINE=INNODB CHARACTER SET UTF8MB4; ''' cursorObj.execute(create_table_query) print("Table 'blog' is created successfully!") # Set the character set to UTF8MB4 set_charset_query = "SET NAMES UTF8MB4;" cursorObj.execute(set_charset_query) print("Character set is set to UTF8MB4.") # Insert data into the blog table data_to_insert = [ ('', ''), ('', '') ] insert_query = "INSERT INTO blog (title, description) VALUES (%s, %s)" cursorObj.executemany(insert_query, data_to_insert) connection.commit() print("Data inserted into the 'blog' table.") # Query the INNODB_FT_INDEX_CACHE table to get the full-text index information query_index_cache_query = "SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;" cursorObj.execute(query_index_cache_query) results = cursorObj.fetchall() print("Results of INNODB_FT_INDEX_CACHE table:") for row in results: print(row) # Close the cursor and connection cursorObj.close() connection.close()
Output
The output obtained is as shown below −
Table 'blog' is created successfully! Character set is set to UTF8MB4. Data inserted into the 'blog' table. Results of INNODB_FT_INDEX_CACHE table: