View All Posts

August 13, 2024

Simple Trick for Lightning-Fast LIKE and ILIKE Queries

Emir Vatric image

EMIR VATRIC

👋 Hi there!

Improve Search Performance with Trigram Indexes in PostgreSQL!

TLDR;

  1. Enable pg_trgm extension
  2. Add Trigram Index on columns that you are doing a full-text search
class AddTrigramIndex < ActiveRecord::Migration[6.0]
 disable_ddl_transaction!

 def change
   enable_extension :pg_trgm

   add_index :table, :column, opclass: :gin_trgm_ops, using: :gin, 
             algorithm: :concurrently, name: ‘index_trgm’
 end
end

And that is it! Now you can leave and be happy that you have made a massive improvement or you can read on and actually understand what happened!


Introduction

What is the best part of software development? Optimizing! Optimizing code, optimizing queries, optimizing data models, fixing google speed score whenever something like this lands in your lap you should be over the moon! You will never do so little (usually) and achieve such a big change for the end user!

One obligatory cheesy meme
One obligatory cheesy meme

Now here is what happened, some time ago you added a search feature to your dashboard, the app was small so you didn’t care about performance.

def search(query)
  Post.where("body ILIKE (?)", "%#{query}%")
end

At the time of writing, this was fine there were not too many posts and a sequential scan was quick enough, but fast forward to today this query is too slow and we need to fix it.


What is a Trigram Index?

A trigram index is a type of inverted index. Trigram indexes are used to efficiently search for strings in large tables without providing an exact search term.

Trigram indexes make substring and similarity match efficient by indexing the unique trigrams of a string.

From the definition we are left with two questions:

  1. What is a trigram?
  2. What is an inverted index?

A trigram is a group of three consecutive characters in a string.

SELECT show_trgm('trigram');
               show_trgm
    -------------------------------
{"  t"," tr","am ",gra,igr,ram,rig,tri}

Generalized Inverted Index or GIN for short is an index type designed to deal with data types that are subdividable. An index is going to store a key and a reference where that value occurs.

Another way of explaining GIN indexes is that it is like the table of contents in a book, where the heap pointers (to the actual table) are the page numbers. Multiple entries can be combined to yield a specific result, like the search for “compensation accelerometers” in this example:

presentation by Oleg Bartunov and Alexander Korotkov
presentation by Oleg Bartunov and Alexander Korotkov

All of the substrings Trigram Index are going to be pointing to the post record. Once we make a query trying to match a keyword gram database is going to split a keyword into trigrams and do Bitmap Index Scan matching them to the trigrams in the index.

SELECT show_trgm('gram');

-----------------------

{"  g"," gr", "am ", gra, ram}
-------------match--match-match

I am sure that there is much more happening than this but as I understand this is the gist of it!

Trigram indexes are much more powerful depending of the comparison used:

Trigram indexes on STRING columns support the following comparison operators:

  1. equality
  2. pattern matching (case-sensitive)
  3. pattern matching (case-insensitive)
  4. similarity matching

There are a few cons to all of this and one of them is slower writes on large tables!


Trigram index in action

I didn’t want to leave the subject without a quick example, I quickly created 40k users in the database and run a simple ILIKE query on the users' email.

SELECT "users".* FROM "users" WHERE users.email ILIKE ('%emir%');
query plan without index, database doing a sequential scan, total time: ~28.5 ms
query plan without index, database doing a sequential scan, total time: ~28.5 ms
query plan uses index, total time: ~1.3 ms
query plan uses index, total time: ~1.3 ms

Even on a dataset this small and as simple as it gets email column difference is more than noticeable. On larger datasets, larger columns ( think article body ) or more complex queries where we try to search in multiple fields results are going to be much more significant!


Conclusion

Enabling the pg_trgm extension and adding a trigram index can be a simple and effective way to improve the performance of LIKE and ILIKE queries in PostgreSQL. By taking advantage of the specialized data structure and algorithms provided by trigram indexes, you can ensure that your queries run quickly and efficiently, providing a better experience for your users.


References

Big thank you to the pganalyze team and their awesome blog!