1 of 6

Optimization Strategies

Discover effective optimization strategies for MariaDB Server queries. This section provides a variety of techniques and approaches to enhance query performance and overall database efficiency.

Duplicate Weedout Strategy

DuplicateWeedout is an execution strategy for Semi-join subqueries.

The idea

The idea is to run the semi-join (a query with uses WHERE X IN (SELECT Y FROM ...)) as if it were a regular inner join, and then eliminate the duplicate record combinations using a temporary table.

Suppose, you have a query where you're looking for countries which have more than 33% percent of their population in one big city:

SELECT * 
FROM Country 
WHERE 
   Country.code IN (SELECT City.Country
                    FROM City 
                    WHERE 
                      City.Population > 0.33 * Country.Population AND 
                      City.Population > 1*1000*1000);

First, we run a regular inner join between the City and Country tables:

The Inner join produces duplicates. We have Germany three times, because it has three big cities. Now, lets put DuplicateWeedout into the picture:

Here one can see that a temporary table with a primary key was used to avoid producing multiple records with 'Germany'.

DuplicateWeedout in action

The Start temporary and End temporary from the last diagram are shown in the EXPLAIN output:

EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where City.Population > 0.33 * Country.Population 
   AND City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        TABLE: City
         type: RANGE
possible_keys: Population,Country
          KEY: Population
      key_len: 4
          ref: NULL
         ROWS: 238
        Extra: USING INDEX CONDITION; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        TABLE: Country
         type: eq_ref
possible_keys: PRIMARY
          KEY: PRIMARY
      key_len: 3
          ref: world.City.Country
         ROWS: 1
        Extra: USING WHERE; End temporary
2 rows in set (0.00 sec)

This query will read 238 rows from the City table, and for each of them will make a primary key lookup in the Country table, which gives another 238 rows. This gives a total of 476 rows, and you need to add 238 lookups in the temporary table (which are typically much cheaper since the temporary table is in-memory).

If we run the same query with semi-join optimizations disabled, we'll get:

EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where City.Population > 0.33 * Country.Population 
    AND City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        TABLE: Country
         type: ALL
possible_keys: NULL
          KEY: NULL
      key_len: NULL
          ref: NULL
         ROWS: 239
        Extra: USING WHERE
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        TABLE: City
         type: index_subquery
possible_keys: Population,Country
          KEY: Country
      key_len: 3
          ref: func
         ROWS: 18
        Extra: USING WHERE
2 rows in set (0.00 sec)

This plan will read (239 + 239*18) = 4541 rows, which is much slower.

Factsheet

DuplicateWeedout is shown as "Start temporary/End temporary" in EXPLAIN.
The strategy can handle correlated subqueries.
But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.
DuplicateWeedout allows the optimizer to freely mix a subquery's tables and the parent select's tables.
There is no separate @@optimizer_switch flag for DuplicateWeedout. The strategy can be disabled by switching off all semi-join optimizations with SET @@optimizer_switch='optimizer_semijoin=off' command.

FirstMatch Strategy

FirstMatch is an execution strategy for Semi-join subqueries.

The idea

It is very similar to how IN/EXISTS subqueries were executed in MySQL 5.x.

Let's take the usual example of a search for countries with big cities:

SELECT * FROM Country 
WHERE Country.code IN (SELECT City.Country 
                       FROM City 
                       WHERE City.Population > 1*1000*1000)
      AND Country.continent='Europe'

Suppose, our execution plan is to find countries in Europe, and then, for each found country, check if it has any big cities. Regular inner join execution will look as follows:

Since Germany has two big cities (in this diagram), it will be put into the query output twice. This is not correct, SELECT ... FROM Country should not produce the same country record twice. The FirstMatch strategy avoids the production of duplicates by short-cutting execution as soon as the first genuine match is found:

Note that the short-cutting has to take place after "Using where" has been applied. It would have been wrong to short-cut after we found Trier.

FirstMatch in action

The EXPLAIN for the above query will look as follows:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where City.Population > 1*1000*1000)
    AND Country.continent='Europe';
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
| id | select_type | table   | type | possible_keys      | key       | key_len | ref                | rows | Extra                            |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
|  1 | PRIMARY     | Country | ref  | PRIMARY,continent  | continent | 17      | const              |   60 | Using index condition            |
|  1 | PRIMARY     | City    | ref  | Population,Country | Country   | 3       | world.Country.Code |   18 | Using where; FirstMatch(Country) |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
2 rows in set (0.00 sec)

FirstMatch(Country) in the Extra column means that as soon as we have produced one matching record combination, short-cut the execution and jump back to the Country table.

FirstMatch's query plan is very similar to one you would get in MySQL:

MySQL [world]> EXPLAIN SELECT * FROM Country  WHERE Country.code IN 
  (select City.Country from City where City.Population > 1*1000*1000) 
   AND Country.continent='Europe';
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
| id | select_type        | table   | type           | possible_keys      | key       | key_len | ref   | rows | Extra                              |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
|  1 | PRIMARY            | Country | ref            | continent          | continent | 17      | const |   60 | Using index condition; Using where |
|  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country   | 3       | func  |   18 | Using where                        |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
2 rows in set (0.01 sec)

and these two particular query plans will execute in the same time.

Difference between FirstMatch and IN->EXISTS

The general idea behind the FirstMatch strategy is the same as the one behind the IN->EXISTS transformation, however, FirstMatch has several advantages:

Equality propagation works across semi-join bounds, but not subquery bounds. Therefore, converting a subquery to semi-join and using FirstMatch can still give a better execution plan. (TODO example)
There is only one way to apply the IN->EXISTS strategy and MySQL will do it unconditionally. With FirstMatch, the optimizer can make a choice between whether it should run the FirstMatch strategy as soon as all tables used in the subquery are in the join prefix, or at some later point in time. (TODO: example)

FirstMatch factsheet

The FirstMatch strategy works by executing the subquery and short-cutting its execution as soon as the first match is found.
This means, subquery tables must be after all of the parent select's tables that are referred from the subquery predicate.
EXPLAIN shows FirstMatch as "FirstMatch(tableN)".
The strategy can handle correlated subqueries.
But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.
Use of the FirstMatch strategy is controlled with the firstmatch=on|off flag in the optimizer_switch variable.

Improvements to ORDER BY Optimization

Available tuning for ORDER BY with small LIMIT

In 2024, fix for MDEV-34720 has added optimizer_join_limit_pref_ratio optimization into MariaDB starting from 10.6. It allows one to enable extra optimization for ORDER BY with small LIMIT.

Older optimizations

MariaDB brought several improvements to the ORDER BY optimizer.

The fixes were made as a response to complaints by MariaDB customers, so they fix real-world optimization problems. The fixes are a bit hard to describe (as the ORDER BY optimizer is complicated), but here's a short description:

The ORDER BY optimizer:

Doesn’t make stupid choices when several multi-part keys and potential range accesses are present (MDEV-6402).
- This also fixes MySQL Bug#12113.
Always uses “range” and (not full “index” scan) when it switches to an index to satisfy ORDER BY … LIMIT (MDEV-6657).
Tries hard to be smart and use cost/number of records estimates from other parts of the optimizer (MDEV-6384, MDEV-465).
- This change also fixes MySQL Bug#36817.
Takes full advantage of InnoDB’s Extended Keys feature when checking if filesort() can be skipped (MDEV-6796).

Extra optimizations

The ORDER BY optimizer takes multiple-equalities into account (MDEV-8989). This optimization is enabled by default.

Comparison with MySQL 5.7

In MySQL 5.7 changelog, one can find this passage:

Make switching of index due to small limit cost-based (WL#6986) : We have made the decision in make_join_select() of whether to switch to a new index in order to support "ORDER BY ... LIMIT N" cost-based. This work fixes Bug#73837.

MariaDB is not using Oracle's fix (we believe make_join_select is not the right place to do ORDER BY optimization), but the effect is the same.

LooseScan Strategy

LooseScan is an execution strategy for Semi-join subqueries.

The idea

We will demonstrate the LooseScan strategy by example. Suppose, we're looking for countries that have satellites. We can get them using the following query (for the sake of simplicity we ignore satellites that are owned by consortiums of multiple countries):

SELECT * FROM Country  
WHERE 
  Country.code IN (SELECT country_code FROM Satellite)

Suppose, there is an index on Satellite.country_code. If we use that index, we will get satellites in the order of their owner country:

The LooseScan strategy doesn't really need ordering, what it needs is grouping. In the above figure, satellites are grouped by country. For instance, all satellites owned by Australia come together, without being mixed with satellites of other countries. This makes it easy to select just one satellite from each group, which you can join with its country and get a list of countries without duplicates:

LooseScan in action

The EXPLAIN output for the above query looks as follows:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select country_code from Satellite);
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
| id | select_type | table     | type   | possible_keys | key          | key_len | ref                          | rows | Extra                               |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
|  1 | PRIMARY     | Satellite | index  | country_code  | country_code | 9       | NULL                         |  932 | Using where; Using index; LooseScan |
|  1 | PRIMARY     | Country   | eq_ref | PRIMARY       | PRIMARY      | 3       | world.Satellite.country_code |    1 | Using index condition               |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+

Factsheet

LooseScan avoids the production of duplicate record combinations by putting the subquery table first and using its index to select one record from multiple duplicates
Hence, in order for LooseScan to be applicable, the subquery should look like:

expr IN (SELECT tbl.keypart1 FROM tbl ...)

expr IN (SELECT tbl.keypart2 FROM tbl WHERE tbl.keypart1=const AND ...)

LooseScan can handle correlated subqueries
LooseScan can be switched off by setting the loosescan=off flag in the optimizer_switch variable.

_{This page is licensed: CC BY-SA / Gnu FDL}

Semi-join Materialization Strategy

Semi-join Materialization is a special kind of subquery materialization used for Semi-join subqueries. It actually includes two strategies:

Materialization/lookup
Materialization/scan

The idea

Consider a query that finds countries in Europe which have big cities:

SELECT * FROM Country 
WHERE Country.code IN (SELECT City.Country 
                       FROM City 
                       WHERE City.Population > 7*1000*1000)
      AND Country.continent='Europe'

The subquery is uncorrelated, that is, we can run it independently of the upper query. The idea of semi-join materialization is to do just that, and fill a temporary table with possible values of the City.country field of big cities, and then do a join with countries in Europe:

The join can be done in two directions:

From the materialized table to countries in Europe
From countries in Europe to the materialized table

The first way involves doing a full scan on the materialized table, so we call it "Materialization-scan".

If you run a join from Countries to the materialized table, the cheapest way to find a match in the materialized table is to make a lookup on its primary key (it has one: we used it to remove duplicates). Because of that, we call the strategy "Materialization-lookup".

Semi-join materialization in action

Materialization-Scan

If we chose to look for cities with a population greater than 7 million, the optimizer will use Materialization-Scan and EXPLAIN will show this:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where  City.Population > 7*1000*1000);
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key        | key_len | ref                | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
|  1 | PRIMARY      | <subquery2> | ALL    | distinct_key       | NULL       | NULL    | NULL               |   15 |                       |
|  1 | PRIMARY      | Country     | eq_ref | PRIMARY            | PRIMARY    | 3       | world.City.Country |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population | 4       | NULL               |   15 | Using index condition |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
3 rows in set (0.01 sec)

Here, you can see:

There are still two SELECTs (look for columns with id=1 and id=2)
The second select (with id=2) has select_type=MATERIALIZED. This means it will be executed and its results will be stored in a temporary table with a unique key over all columns. The unique key is there to prevent the table from containing any duplicate records.
The first select received the table name subquery2. This is the table that we got as a result of the materialization of the select with id=2.

The optimizer chose to do a full scan over the materialized table, so this is an example of a use of the Materialization-Scan strategy.

As for execution costs, we're going to read 15 rows from table City, write 15 rows to materialized table, read them back (the optimizer assumes there won't be any duplicates), and then do 15 eq_ref accesses to table Country. In total, we'll do 45 reads and 15 writes.

By comparison, if you run the EXPLAIN with semi-join optimizations disabled, you'll get this:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where  City.Population > 7*1000*1000);
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
| id | select_type        | table   | type  | possible_keys      | key        | key_len | ref  | rows | Extra                              |
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
|  1 | PRIMARY            | Country | ALL   | NULL               | NULL       | NULL    | NULL |  239 | Using where                        |
|  2 | DEPENDENT SUBQUERY | City    | range | Population,Country | Population | 4       | NULL |   15 | Using index condition; Using where |
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+

...which is a plan to do (239 + 239*15) = 3824 table reads.

Materialization-Lookup

Let's modify the query slightly and look for countries which have cities with a population over one millon (instead of seven):

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where  City.Population > 1*1000*1000) ;
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key          | key_len | ref  | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
|  1 | PRIMARY      | Country     | ALL    | PRIMARY            | NULL         | NULL    | NULL |  239 |                       |
|  1 | PRIMARY      | <subquery2> | eq_ref | distinct_key       | distinct_key | 3       | func |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population   | 4       | NULL |  238 | Using index condition |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
3 rows in set (0.00 sec)

The EXPLAIN output is similar to the one which used Materialization-scan, except that:

the <subquery2> table is accessed with the eq_ref access method
the access uses an index named distinct_key

This means that the optimizer is planning to do index lookups into the materialized table. In other words, we're going to use the Materialization-lookup strategy.

With optimizer_switch='semijoin=off,materialization=off'), one will get this EXPLAIN:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where  City.Population > 1*1000*1000) ;
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
| id | select_type        | table   | type           | possible_keys      | key     | key_len | ref  | rows | Extra       |
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | Country | ALL            | NULL               | NULL    | NULL    | NULL |  239 | Using where |
|  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country | 3       | func |   18 | Using where |
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+

One can see that both plans will do a full scan on the Country table. For the second step, MariaDB will fill the materialized table (238 rows read from table City and written to the temporary table) and then do a unique key lookup for each record in table Country, which works out to 238 unique key lookups. In total, the second step will cost (239+238) = 477 reads and 238 temp.table writes.

Execution of the latter (DEPENDENT SUBQUERY) plan reads 18 rows using an index on City.Country for each record it receives for table Country. This works out to a cost of (18*239) = 4302 reads. Had there been fewer subquery invocations, this plan would have been better than the one with Materialization. By the way, MariaDB has an option to use such a query plan, too (see FirstMatch Strategy), but it did not choose it.

Subqueries with grouping

MariaDB is able to use Semi-join materialization strategy when the subquery has grouping (other semi-join strategies are not applicable in this case).

This allows for efficient execution of queries that search for the best/last element in a certain group.

For example, let's find cities that have the biggest population on their continent:

EXPLAIN 
SELECT * FROM City 
WHERE City.Population IN (SELECT max(City.Population) FROM City, Country 
                          WHERE City.Country=Country.Code 
                          GROUP BY Continent)
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
| id   | select_type  | table       | type | possible_keys | key        | key_len | ref                              | rows | Extra           |
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
|    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL       | NULL    | NULL                             |  239 |                 |
|    1 | PRIMARY      | City        | ref  | Population    | Population | 4       | <subquery2>.max(City.Population) |    1 |                 |
|    2 | MATERIALIZED | Country     | ALL  | PRIMARY       | NULL       | NULL    | NULL                             |  239 | Using temporary |
|    2 | MATERIALIZED | City        | ref  | Country       | Country    | 3       | world.Country.Code               |   18 |                 |
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
4 rows in set (0.00 sec)

the cities are:

+------+-------------------+---------+------------+
| ID   | Name              | Country | Population |
+------+-------------------+---------+------------+
| 1024 | Mumbai (Bombay)   | IND     |   10500000 |
| 3580 | Moscow            | RUS     |    8389200 |
| 2454 | Macao             | MAC     |     437500 |
|  608 | Cairo             | EGY     |    6789479 |
| 2515 | Ciudad de México | MEX     |    8591309 |
|  206 | São Paulo        | BRA     |    9968485 |
|  130 | Sydney            | AUS     |    3276207 |
+------+-------------------+---------+------------+

Factsheet

Semi-join materialization

Can be used for uncorrelated IN-subqueries. The subselect may use grouping and/or aggregate functions.
Is shown in EXPLAIN as type=MATERIALIZED for the subquery, and a line withtable=<subqueryN> in the parent subquery.
Is enabled when one has both materialization=on and semijoin=on in the optimizer_switch variable.
The materialization=on|off flag is shared with Non-semijoin materialization.

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimization Strategies

Duplicate Weedout Strategy

The idea

DuplicateWeedout in action

Factsheet

See Also

FirstMatch Strategy

The idea

FirstMatch in action

Difference between FirstMatch and IN->EXISTS

FirstMatch factsheet

See Also

Improvements to ORDER BY Optimization

Available tuning for ORDER BY with small LIMIT

Older optimizations

Extra optimizations

Comparison with MySQL 5.7

See Also

LooseScan Strategy

The idea

LooseScan in action

Factsheet

Semi-join Materialization Strategy

The idea

Semi-join materialization in action

Materialization-Scan

Materialization-Lookup

Subqueries with grouping

Factsheet