We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
August 13, 2024
Simple Trick for Lightning-Fast LIKE and ILIKE Queries
EMIR VATRIC
đ Hi there!
Improve Search Performance with Trigram Indexes in PostgreSQL!
TLDR;
- Enable pg_trgm extension
- Add Trigram Index on columns that you are doing a full-text search
class AddTrigramIndex < ActiveRecord::Migration[6.0]
disable_ddl_transaction!
def change
enable_extension :pg_trgm
add_index :table, :column, opclass: :gin_trgm_ops, using: :gin,
algorithm: :concurrently, name: âindex_trgmâ
end
end
And that is it! Now you can leave and be happy that you have made a massive improvement or you can read on and actually understand what happened!
Introduction
What is the best part of software development? Optimizing! Optimizing code, optimizing queries, optimizing data models, fixing google speed score whenever something like this lands in your lap you should be over the moon! You will never do so little (usually) and achieve such a big change for the end user!
Now here is what happened, some time ago you added a search feature to your dashboard, the app was small so you didnât care about performance.
def search(query)
Post.where("body ILIKE (?)", "%#{query}%")
end
At the time of writing, this was fine there were not too many posts and a sequential scan was quick enough, but fast forward to today this query is too slow and we need to fix it.
What is a Trigram Index?
A trigram index is a type of inverted index. Trigram indexes are used to efficiently search for strings in large tables without providing an exact search term.
Trigram indexes make substring and similarity match efficient by indexing the unique trigrams of a string.
From the definition we are left with two questions:
- What is a trigram?
- What is an inverted index?
A trigram is a group of three consecutive characters in a string.
SELECT show_trgm('trigram');
show_trgm
-------------------------------
{" t"," tr","am ",gra,igr,ram,rig,tri}
Generalized Inverted Index or GIN for short is an index type designed to deal with data types that are subdividable. An index is going to store a key and a reference where that value occurs.
Another way of explaining GIN indexes is that it is like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for âcompensation accelerometersâ in this example:
All of the substrings
are going to be pointing to the post record. Once we make a query trying to match a keyword Trigram Index
database is going to split a keyword into trigrams and do Bitmap Index Scan matching them to the trigrams in the index.gram
SELECT show_trgm('gram');
-----------------------
{" g"," gr", "am ", gra, ram}
-------------match--match-match
I am sure that there is much more happening than this but as I understand this is the gist of it!
Trigram indexes are much more powerful depending of the comparison used:
Trigram indexes on STRING
columns support the following comparison operators:
- equality
- pattern matching (case-sensitive)
- pattern matching (case-insensitive)
- similarity matching
There are a few cons to all of this and one of them is slower writes on large tables!
Trigram index in action
I didnât want to leave the subject without a quick example, I quickly created 40k users in the database and run a simple ILIKE query on the users' email.
SELECT "users".* FROM "users" WHERE users.email ILIKE ('%emir%');
Even on a dataset this small and as simple as it gets email column difference is more than noticeable. On larger datasets, larger columns ( think article body ) or more complex queries where we try to search in multiple fields results are going to be much more significant!
Conclusion
Enabling the pg_trgm extension and adding a trigram index can be a simple and effective way to improve the performance of LIKE and ILIKE queries in PostgreSQL. By taking advantage of the specialized data structure and algorithms provided by trigram indexes, you can ensure that your queries run quickly and efficiently, providing a better experience for your users.