1 of 70

Optimizing Queries

Optimize your MariaDB Server queries for peak performance. This section provides techniques for writing efficient SQL, understanding query execution plans, and leveraging indexes effectively to speed

Aborting Statements that Exceed a Certain Time to Execute

Overview

MariaDB 10.1.1 introduced the max_statement_time system variable. When set to a non-zero value, any queries taking longer than this time in seconds will be aborted. The default is zero, and no limits are then applied. The aborted query has no effect on any larger transaction or connection contexts. The variable is of type double, thus you can use subsecond timeout. For example you can use value 0.01 for 10 milliseconds timeout.

The value can be set globally or per session, as well as per user or per query (see below). Replicas are not affected by this variable, however from MariaDB 10.10, there is slave_max_statement_time which serves the same purpose on replicas only.

An associated status variable, max_statement_time_exceeded, stores the number of queries that have exceeded the execution time specified by max_statement_time, and a MAX_STATEMENT_TIME_EXCEEDED column was added to the CLIENT_STATISTICS and USER STATISTICS Information Schema tables.

The feature was based upon a patch by Davi Arnaut.

User max_statement_time

max_statement_time can be stored per user with the GRANT ... MAX_STATEMENT_TIME syntax.

Per-query max_statement_time

By using max_statement_time in conjunction with SET STATEMENT, it is possible to limit the execution time of individual queries. For example:

SET STATEMENT max_statement_time=100 FOR 
  SELECT field1 FROM table_name ORDER BY field1;

max_statement_time per query Individual queries can also be limited by adding a MAX_STATEMENT_TIME clause to the query. For example:

SELECT MAX_STATEMENT_TIME=2 * FROM t1;

Limitations

max_statement_time does not work in embedded servers.
max_statement_time does not work for COMMIT statements in a Galera cluster (see MDEV-18673 for discussion).

Differences Between the MariaDB and MySQL Implementations

MySQL 5.7.4 introduced similar functionality, but the MariaDB implementation differs in a number of ways.

The MySQL version of max_statement_time (max_execution_time) is defined in millseconds, not seconds
MySQL's implementation can only kill SELECTs, while MariaDB's can kill any queries (excluding stored procedures).
MariaDB only introduced the max_statement_time_exceeded status variable, while MySQL also introduced a number of other variables which were not seen as necessary in MariaDB.
The SELECT MAX_STATEMENT_TIME = N ... syntax is not valid in MariaDB. In MariaDB one should use SET STATEMENT MAX_STATEMENT_TIME=N FOR....

Big DELETEs

The problem

How to DELETE lots of rows from a large table? Here is an example of purging items older than 30 days:

DELETE FROM tbl WHERE 
  ts < CURRENT_DATE() - INTERVAL 30 DAY

If there are millions of rows in the table, this statement may take minutes, maybe hours.

Any suggestions on how to speed this up?

Why it is a problem

MyISAM will lock the table during the entire operation, thereby nothing else can be done with the table.
InnoDB won't lock the table, but it will chew up a lot of resources, leading to sluggishness.
InnoDB has to write the undo information to its transaction logs; this significantly increases the I/O required.
Replication, being asynchronous, will effectively be delayed (on Slaves) while the DELETE is running.

InnoDB and undo

To be ready for a crash, a transactional engine such as InnoDB will record what it is doing to a log file. To make that somewhat less costly, the log file is sequentially written. If the log files you have (there are usually 2) fill up because the delete is really big, then the undo information spills into the actual data blocks, leading to even more I/O.

Deleting in chunks avoids some of this excess overhead.

Limited benchmarking of total delete elapsed time show two observations:

Total delete time approximately doubles above some 'chunk' size (as opposed to below that threshold). I do not have a formula relating the log file size with the threshold cutoff.
Chunk size below several hundred rows is slower. This is probably because the overhead of starting/ending each chunk dominates the timing.

Solutions

PARTITION -- Requires some careful setup, but is excellent for purging a time-base series.
DELETE in chunks -- Carefully walk through the table N rows at a time.

PARTITION

The idea here is to have a sliding window of partitions. Let's say you need to purge news articles after 30 days. The "partition key" would be the datetime (or timestamp) that is to be used for purging, and the PARTITIONs would be "range". Every night, a cron job would come along and build a new partition for the next day, and drop the oldest partition.

Dropping a partition is essentially instantaneous, much faster than deleting that many rows. However, you must design the table so that the entire partition can be dropped. That is, you cannot have some items living longer than others.

PARTITION tables have a lot of restrictions, some are rather weird. You can either have no UNIQUE (or PRIMARY) key on the table, or every UNIQUE key must include the partition key. In this use case, the partition key is the datetime. It should not be the first part of the PRIMARY KEY (if you have a PRIMARY KEY).

You can PARTITION InnoDB or MyISAM tables.

Since two news articles could have the same timestamp, you cannot assume the partition key is sufficient for uniqueness of the PRIMARY KEY, so you need to find something else to help with that.

Reference implementation for Partition maintenance

MariaDB docs on PARTITION

Deleting in chunks

Although the discussion in this section talks about DELETE, it can be used for any other "chunking", such as, say, UPDATE, or SELECT plus some complex processing.

(This discussion applies to both MyISAM and InnoDB.)

When deleting in chunks, be sure to avoid doing a table scan. The code below is good at that; it scans no more than 1001 rows in any one query. (The 1000 is tunable.)

Assuming you have news articles that need to be purged, and you have a schema something like

CREATE TABLE tbl
      id INT UNSIGNED NOT NULL AUTO_INCREMENT,
      ts TIMESTAMP,
      ...
      PRIMARY KEY(id)

Then, this pseudo-code is a good way to delete the rows older than 30 days:

@a = 0
   LOOP
      DELETE FROM tbl
         WHERE id BETWEEN @a AND @a+999
           AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
      SET @a = @a + 1000
      sleep 1  -- be a nice guy
   UNTIL end of table

Notes (Most of these caveats will be covered later):

It uses the PK instead of the secondary key. This gives much better locality of disk hits, especially for InnoDB.
You could (should?) do something to avoid walking through recent days but doing nothing. Caution -- the code for this could be costly.
The 1000 should be tweaked so that the DELETE usually takes under, say, one second.
No INDEX on ts is needed. (This helps INSERTs a little.)
If your PRIMARY KEY is compound, the code gets messier.
This code will not work without a numeric PRIMARY or UNIQUE key.
Read on, we'll develop messier code to deal with most of these caveats.

If there are big gaps in id values (and there will after the first purge), then

@a = SELECT MIN(id) FROM tbl
   LOOP
      SELECT @z := id FROM tbl WHERE id >= @a ORDER BY id LIMIT 1000,1
      If @z is null
         exit LOOP  -- last chunk
      DELETE FROM tbl
         WHERE id >= @a
           AND id <  @z
           AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
      SET @a = @z
      sleep 1  -- be a nice guy, especially in replication
   ENDLOOP
   # Last chunk:
   DELETE FROM tbl
      WHERE id >= @a
        AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)

That code works whether id is numeric or character, and it mostly works even if id is not UNIQUE. With a non-unique key, the risk is that you could be caught in a loop whenever @z==@a. That can be detected and fixed thus:

...
      SELECT @z := id FROM tbl WHERE id >= @a ORDER BY id LIMIT 1000,1
      If @z == @a
         SELECT @z := id FROM tbl WHERE id > @a ORDER BY id LIMIT 1
   ...

The drawback is that there could be more than 1000 items with a single id. In most practical cases, that is unlikely.

If you do not have a primary (or unique) key defined on the table, and you have an INDEX on ts, then consider

LOOP
      DELETE FROM tbl
         WHERE ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
         ORDER BY ts   -- to use the index, and to make it deterministic
         LIMIT 1000
   UNTIL no rows deleted

This technique is NOT recommended because the LIMIT leads to a warning on replication about it being non-deterministic (discussed below).

InnoDB chunking recommendation

Have a 'reasonable' size for innodb_log_file_size.
Use AUTOCOMMIT=1 for the session doing the deletions.
Pick about 1000 rows for the chunk size.
Adjust the row count down if asynchronous replication (Statement Based) causes too much delay on the Slaves or hogs the table too much.

Iterating through a compound key

To perform the chunked deletes recommended above, you need a way to walk through the PRIMARY KEY. This can be difficult if the PK has more than one column in it.

To efficiently to do compound 'greater than':

Assume that you left off at ($g, $s) (and have handled that row):

INDEX(Genus, species)
   SELECT/DELETE ...
      WHERE Genus >= '$g' AND ( species  > '$s' OR Genus > '$g' )
      ORDER BY Genus, species
      LIMIT ...

Addenda: The above AND/OR works well in older versions of MySQL; this works better in MariaDB and newer versions of MySQL:

WHERE ( Genus = '$g' AND species  > '$s' ) OR Genus > '$g' )

A caution about using @variables for strings. If, instead of '$g', you use @g, you need to be careful to make sure that @g has the same CHARACTER SET and COLLATION as Genus, else there could be a charset/collation conversion on the fly that prevents the use of the INDEX. Using the INDEX is vital for performance. It may require a COLLATE clause on SET NAMES and/or the @g in the SELECT.

Reclaiming the disk space

This is costly. (Switch to the PARTITION solution if practical.)

MyISAM leaves gaps in the table (.MYD file); OPTIMIZE TABLE will reclaim the freed space after a big delete. But it may take a long time and lock the table.

InnoDB is block-structured, organized in a BTree on the PRIMARY KEY. An isolated deleted row leaves a block less full. A lot of deleted rows can lead to coalescing of adjacent blocks. (Blocks are normally 16KB - see innodb_page_size.)

In InnoDB, there is no practical way to reclaim the freed space from ibdata1, other than to reuse the freed blocks eventually.

The only option with innodb_file_per_table = 0 is to dump ALL tables, remove ibdata*, restart, and reload. That is rarely worth the effort and time.

InnoDB, even with innodb_file_per_table = 1, won't give space back to the OS, but at least it is only one table to rebuild with. In this case, something like this should work:

CREATE TABLE new LIKE main;
   INSERT INTO new SELECT * FROM main;  -- This could take a long time
   RENAME TABLE main TO old, new TO main;   -- Atomic swap
   DROP TABLE old;   -- Space freed up here

You do need enough disk space for both copies. You must not write to the table during the process.

Deleting more than half a table

The following technique can be used for any combination of

Deleting a large portion of the table more efficiently
Add PARTITIONing
Converting to innodb_file_per_table = ON
Defragmenting

This can be done by chunking, or (if practical) all at once:

-- Optional:  SET GLOBAL innodb_file_per_table = ON;
   CREATE TABLE New LIKE Main;
   -- Optional:  ALTER TABLE New ADD PARTITION BY RANGE ...;
   -- Do this INSERT..SELECT all at once, or with chunking:
   INSERT INTO New
      SELECT * FROM Main
         WHERE ...;  -- just the rows you want to keep
   RENAME TABLE main TO Old, New TO Main;
   DROP TABLE Old;   -- Space freed up here

Notes:

You do need enough disk space for both copies.
You must not write to the table during the process. (Changes to Main may not be reflected in New.)

Non-deterministic replication

Any UPDATE, DELETE, etc with LIMIT that is replicated to slaves (via Statement Based Replication) may cause inconsistencies between the Master and Slaves. This is because the actual order of the records discovered for updating/deleting may be different on the slave, thereby leading to a different subset being modified. To be safe, add ORDER BY to such statements. Moreover, be sure the ORDER BY is deterministic -- that is, the fields/expressions in the ORDER BY are unique.

An example of an ORDER BY that does not quite work: Assume there are multiple rows for each 'date':

DELETE * FROM tbl ORDER BY date LIMIT 111

Given that id is the PRIMARY KEY (or UNIQUE), this will be safe:

DELETE * FROM tbl ORDER BY date, id LIMIT 111

Unfortunately, even with the ORDER BY, MySQL has a deficiency that leads to a bogus warning in mysqld.err. See Spurious "Statement is not safe to log in statement format." warnings

Some of the above code avoids this spurious warning by doing

SELECT @z := ... LIMIT 1000,1;  -- not replicated
   DELETE ... BETWEEN @a AND @z;   -- deterministic

That pair of statements guarantees no more than 1000 rows are touched, not the whole table.

Replication and KILL

If you KILL a DELETE (or any? query) on the master in the middle of its execution, what will be replicated?

If it is InnoDB, the query should be rolled back. (Exceptions??)

In MyISAM, rows are DELETEd as the statement is executed, and there is no provision for ROLLBACK. Some of the rows will be deleted, some won't. You probably have no clue of how much was deleted. In a single server, simply run the delete again. The delete is put into the binlog, but with error 1317. Since replication is supposed to keep the master and slave in sync, and since it has no clue of how to do that, replication stops and waits for manual intervention. In a HA (High Available) system using replication, this is a minor disaster. Meanwhile, you need to go to each slave(s) and verify that it is stuck for this reason, then do

SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
   START SLAVE;

Then (presumably) re-executing the DELETE will finish the aborted task.

(That is yet another reason to move all your tables from MyISAM to InnoDB.)

SBR vs RBR; Galera

TBD -- "Row Based Replication" may impact this discussion.

Postlog

The tips in this document apply to MySQL, MariaDB, and Percona.

Charset Narrowing Optimization

The Charset Narrowing optimization handles equality comparisons like:

utf8mb3_key_column=utf8mb4_expression

It enables the optimizer to construct ref access to utf8mb3_key_column based on this equality. The optimization supports comparisons of columns that use utf8mb3_general_ci to expressions that use utf8mb4_general_ci .

The optimization was introduced in MariaDB 10.6.16, MariaDB 10.10.7, MariaDB 10.11.6, MariaDB 11.0.4, MariaDB 11.1.3 and MariaDB 11.2.2, where it is OFF by default. From MariaDB 11.7, it is ON by default.

Description

MariaDB supports both the UTF8MB3 and UTF8MB4 character sets. It is possible to construct join queries that compare values in UTF8MB3 to UTF8MB4.

Suppose, we have the table 'users that uses UTF8MB4:

CREATE TABLE users (
  user_name_mb4 VARCHAR(100) COLLATE utf8mb4_general_ci,
  ...
);

and table orders that uses UTF8MB3:

CREATE TABLE orders (
  user_name_mb3 VARCHAR(100) COLLATE utf8mb3_general_ci,
  ...,
  INDEX idx1(user_name_mb3)
);

One can join users to orders on user_name:

SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;

Internally the optimizer will handle the equality by converting the UTF8MB3 value into UTF8MB4 and then doing the comparison. One can see the call to CONVERT in EXPLAIN FORMAT=JSON output or Optimizer Trace:

CONVERT(orders.user_name_mb3 USING utf8mb4) = users.user_name_mb4

This produces the expected result but the query optimizer is not able to use the index over orders.user_name_mb3 to find matches for values of users.user_name_mb4.

The EXPLAIN of the above query looks like this:

EXPLAIN SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;
+------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+
| id   | select_type | table  | type | possible_keys | key  | key_len | ref  | rows  | Extra                                           |
+------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+
|    1 | SIMPLE      | users  | ALL  | NULL          | NULL | NULL    | NULL | 1000  |                                                 |
|    1 | SIMPLE      | orders | ALL  | NULL          | NULL | NULL    | NULL | 10330 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+--------+------+---------------+------+---------+------+-------+-------------------------------------------------+

The Charset Narrowing optimization enables the optimizer to perform the comparison between UTF8MB3 and UTF8MB4 values by "narrowing" the value in UTF8MB4 to UTF8MB3. The CONVERT call is no longer needed, and the optimizer is able to use the equality to construct ref access:

SET optimizer_switch='cset_narrowing=ON';

EXPLAIN SELECT * FROM orders, users WHERE orders.user_name_mb3=users.user_name_mb4;
+------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+
| id   | select_type | table  | type | possible_keys | key  | key_len | ref                 | rows | Extra                 |
+------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+
|    1 | SIMPLE      | users  | ALL  | NULL          | NULL | NULL    | NULL                | 1000 | Using where           |
|    1 | SIMPLE      | orders | ref  | idx1          | idx1 | 303     | users.user_name_mb4 | 1    | Using index condition |
+------+-------------+--------+------+---------------+------+---------+---------------------+------+-----------------------+

Controlling the Optimization

The optimization is controlled by an optimizer_switch flag. Specify:

SET optimizer_switch='cset_narrowing=ON';

to enable the optimization.

References

MDEV-32113: utf8mb3_key_col=utf8mb4_value cannot be used for ref access
Blog post: Making “tbl.utf8mb3_key_column=utf8mb4_expr” sargable

_{This page is licensed: CC BY-SA / Gnu FDL}

Data Sampling: Techniques for Efficiently Finding a Random Row

Fetching random rows from a table (beyond ORDER BY RAND())

The problem

One would like to do "SELECT ... ORDER BY RAND() LIMIT 10" to get 10 rows at random. But this is slow. The optimizer does

Fetch all the rows -- this is costly
Append RAND() to the rows
Sort the rows -- also costly
Pick the first 10.

All the algorithms given below are "fast", but most introduce flaws:

Bias -- some rows are more like to be fetched than others.
Repetitions -- If two random sets contain the same row, they are likely to contain other dups.
Sometimes failing to fetch the desired number of rows.

"Fast" means avoiding reading all the rows. There are many techniques that require a full table scan, or at least an index scan. They are not acceptable for this list. There is even a technique that averages half a scan; it is relegated to a footnote.

Metrics

Here's a way to measure performance without having a big table.

FLUSH STATUS;
    SELECT ...;
    SHOW SESSION STATUS LIKE 'Handler%';

If some of the "Handler" numbers look like the number of rows in the table, then there was a table scan.

None of the queries presented here need a full table (or index) scan. Each has a time proportional to the number of rows returned.

Virtually all published algorithms involve a table scan. The previously published version of this blog had, embarrassingly, several algorithms that had table scans.

Sometimes the scan can be avoided via a subquery. For example, the first of these will do a table scan; the second will not.

SELECT *  FROM RandTest AS a
  WHERE id = FLOOR(@min + (@max - @min + 1) * RAND());  -- BAD: table scan
SELECT *
 FROM RandTest AS a
 JOIN (
   SELECT FLOOR(@min + (@max - @min + 1) * RAND()) AS id -- Good; single eval.
      ) b  USING (id);

Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned

Requirement: AUTO_INCREMENT id
Requirement: No gaps in id

SELECT r.*
      FROM (
          SELECT FLOOR(mm.min_id + (mm.max_id - mm.min_id + 1) * RAND()) AS id
              FROM (
                  SELECT MIN(id) AS min_id,
                         MAX(id) AS max_id
                      FROM RandTest
                   ) AS mm
           ) AS init
      JOIN  RandTest AS r  ON r.id = init.id;

(Of course, you might be able to simplify this. For example, min_id is likely to be 1. Or precalculate limits into @min and @max.)

Case: Consecutive AUTO_INCREMENT without gaps, 10 rows

Requirement: AUTO_INCREMENT id
Requirement: No gaps in id
Flaw: Sometimes delivers fewer than 10 rows

-- First select is one-time:
  SELECT @min := MIN(id),
         @max := MAX(id)
      FROM RandTest;
  SELECT DISTINCT *
      FROM RandTest AS a
      JOIN (
          SELECT FLOOR(@min + (@max - @min + 1) * RAND()) AS id
              FROM RandTest
              LIMIT 11    -- more than 10 (to compensate for dups)
           ) b  USING (id)
      LIMIT 10;           -- the desired number of rows

The FLOOR expression could lead to duplicates, hence the inflated inner LIMIT. There could (rarely) be so many duplicates that the inflated LIMIT leads to fewer than the desired 10 different rows. One approach to that Flaw is to rerun the query if it delivers too few rows.

A variant:

SELECT r.*
      FROM (
          SELECT FLOOR(mm.min_id + (mm.max_id - mm.min_id + 1) * RAND()) AS id
              FROM (
                  SELECT MIN(id) AS min_id,
                         MAX(id) AS max_id
                      FROM RandTest
                   ) AS mm
              JOIN ( SELECT id dummy FROM RandTest LIMIT 11 ) z
           ) AS init
      JOIN  RandTest AS r  ON r.id = init.id
      LIMIT 10;

Again, ugly but fast, regardless of table size.

Case: AUTO_INCREMENT with gaps, 1 or more rows returned

Requirement: AUTO_INCREMENT, possibly with gaps due to DELETEs, etc
Flaw: Only semi-random (rows do not have an equal chance of being picked), but it does partially compensate for the gaps
Flaw: The first and last few rows of the table are less likely to be delivered.

This gets 50 "consecutive" ids (possibly with gaps), then delivers a random 10 of them.

-- First select is one-time:
SELECT @min := MIN(id),
       @max := MAX(id)
    FROM RandTest;
SELECT a.*
    FROM RandTest a
    JOIN ( SELECT id FROM
            ( SELECT id
                FROM ( SELECT @min + (@max - @min + 1 - 50) * RAND() 
                  AS start FROM DUAL ) AS init
                JOIN RandTest y
                WHERE    y.id > init.start
                ORDER BY y.id
                LIMIT 50         -- Inflated to deal with gaps
            ) z ORDER BY RAND()
           LIMIT 10              -- number of rows desired (change to 1 if looking for a single row)
         ) r ON a.id = r.id;

Yes, it is complex, but yes, it is fast, regardless of the table size.

Case: Extra FLOAT column for randomizing

(Unfinished: need to check these.)

Assuming rnd is a FLOAT (or DOUBLE) populated with RAND() and INDEXed:

Requirement: extra, indexed, FLOAT column
Flaw: Fetches 10 adjacent rows (according to rnd), hence not good randomness
Flaw: Near 'end' of table, can't find 10 rows.

SELECT r.*
      FROM ( SELECT RAND() AS start FROM DUAL ) init
      JOIN RandTest r
      WHERE r.rnd >= init.start
      ORDER BY r.rnd
      LIMIT 10;

These two variants attempt to resolve the end-of-table flaw:

SELECT r.*
      FROM ( SELECT RAND() * ( SELECT rnd
                        FROM RandTest
                        ORDER BY rnd DESC
                        LIMIT 10,1 ) AS start
           ) AS init
      JOIN RandTest r
      WHERE r.rnd > init.start
      ORDER BY r.rnd
      LIMIT 10;


  SELECT @start := RAND(),
         @cutoff := CAST(1.1 * 10 + 5 AS DECIMAL(20,8)) / TABLE_ROWS
      FROM information_schema.TABLES
      WHERE TABLE_SCHEMA = 'dbname'
        AND TABLE_NAME = 'RandTest'; -- 0.0030
  SELECT d.*
      FROM (
          SELECT a.id
              FROM RandTest a
              WHERE rnd BETWEEN @start AND @start + @cutoff
           ) sample
      JOIN RandTest d USING (id)
      ORDER BY rand()
      LIMIT 10;

Case: UUID or MD5 column

Requirement: UUID/GUID/MD5/SHA1 column exists and is indexed.
Similar code/benefits/flaws to AUTO_INCREMENT with gaps.
Needs 7 random HEX digits:

RIGHT( HEX( (1<<24) * (1+RAND()) ), 6)

can be used as a start for adapting a gapped AUTO_INCREMENT case. If the field is BINARY instead of hex, then

UNHEX(RIGHT( HEX( (1<<24) * (1+RAND()) ), 6))

Data Warehousing High Speed Ingestion

The problem

You are ingesting lots of data. Performance is bottlenecked in the INSERT area.

This will be couched in terms of Data Warehousing, with a huge Fact table and Summary (aggregation) tables.

Overview of solution

Have a separate staging table.
Inserts go into Staging.
Normalization and Summarization reads Staging, not Fact.
After normalizing, the data is copied from Staging to Fact.

Staging is one (or more) tables in which the data lives only long enough to be handed off to Normalization, Summary, and the Fact tables.

Since we are probably talking about a billion-row table, shrinking the width of the Fact table by normalizing (as mentioned here). Changing an INT to a MEDIUMINT will save a GB. Replacing a string by an id (normalizing) saves many GB. This helps disk space and cacheability, hence speed.

Injection speed

Some variations:

Big dump of data once an hour, versus continual stream of records.
The input stream could be single-threaded or multi-threaded.
You might have 3rd party software tying your hands.

Generally the fastest injection rate can be achieved by "staging" the INSERTs in some way, then batch processing the staged records. This blog discusses various techniques for staging and batch processing.

Normalization

Let's say your Input has a VARCHAR host_name column, but you need to turn that into a smaller MEDIUMINT host_id in the Fact table. The "Normalization" table, as I call it, looks something like

CREATE TABLE Hosts (
    host_id  MEDIUMINT UNSIGNED  NOT NULL AUTO_INCREMENT,
    host_name VARCHAR(99) NOT NULL,
    PRIMARY KEY (host_id),      -- for mapping one direction
    INDEX(host_name, host_id)   -- for mapping the other direction
) ENGINE=InnoDB;                -- InnoDB works best for Many:Many mapping table

Here's how you can use Staging as an efficient way achieve the swap from name to id.

Staging has two fields (for this normalization example):

host_name VARCHAR(99) NOT NULL,     -- Comes from the insertion proces
    host_id  MEDIUMINT UNSIGNED  NULL,  -- NULL to start with; see code below

Meawhile, the Fact table has:

host_id  MEDIUMINT UNSIGNED NOT NULL,

SQL #1 (of 2):

# This should not be in the main transaction, and it should be done with autocommit = ON
    # In fact, it could lead to strange errors if this were part
    #    of the main transaction and it ROLLBACKed.
    INSERT IGNORE INTO Hosts (host_name)
        SELECT DISTINCT s.host_name
            FROM Staging AS s
            LEFT JOIN Hosts AS n  ON n.host_name = s.host_name
            WHERE n.host_id IS NULL;

By isolating this as its own transaction, we get it finished in a hurry, thereby minimizing blocking. By saying IGNORE, we don't care if other threads are 'simultaneously' inserting the same host_names.

There is a subtle reason for the LEFT JOIN. If, instead, it were INSERT IGNORE..SELECT DISTINCT, then the INSERT would preallocate auto_increment ids for as many rows as the SELECT provides. This is very likely to "burn" a lot of ids, thereby leading to overflowing MEDIUMINT unnecessarily. The LEFT JOIN leads to finding just the new ids that are needed (except for the rare possibility of a 'simultaneous' insert by another thread). More rationale: Mapping table

SQL #2:

# Also not in the main transaction, and it should be with autocommit = ON
    # This multi-table UPDATE sets the ids in Staging:
    UPDATE   Hosts AS n
        JOIN Staging AS s  ON s.host_name = n host_name
        SET s.host_id = n.host_id

This gets the IDs, whether already existing, set by another thread, or set by SQL #1.

If the size of Staging changes depending on the busy versus idle times of the day, this pair of SQL statements has another comforting feature. The more rows in Staging, the more efficient the SQL runs, thereby helping compensate for the "busy" times.

The companion Data Warehouse article folds SQL #2 into the INSERT INTO Fact. But you may need host_id for further normalization steps and/or Summarization steps, so this explicit UPDATE shown here is often better.

Flip-flop staging

The simple way to stage is to ingest for a while, then batch-process what is in Staging. But that leads to new records piling up waiting to be staged. To avoid that issue, have 2 processes:

one process (or set of processes) for INSERTing into Staging;
one process (or set of processes) to do the batch processing (normalization, summarization).

To keep the processes from stepping on each other, we have a pair of staging tables:

Staging is being INSERTed into;
StageProcess is one being processed for normalization, summarization, and moving to the Fact table. A separate process does the processing, then swaps the tables:

DROP   TABLE StageProcess;
    CREATE TABLE StageProcess LIKE Staging;
    RENAME TABLE Staging TO tmp, StageProcess TO Staging, tmp TO StageProcess;

This may not seem like the shortest way to do it, but has these features:

The DROP + CREATE might be faster than TRUNCATE, which is the desired effect.
The RENAME is atomic, so the INSERT process(es) never find that Staging is missing.

A variant on the 2-table flip-flop is to have a separate Staging table for each Insertion process. The Processing process would run around to each Staging in turn.

A variant on that would be to have a separate processing process for each Insertion process.

The choice depends on which is faster (insertion or processing). There are tradeoffs; a single processing thread avoids some locks, but lacks some parallelism.

Engine choice

Fact table -- InnoDB, if for no other reason than that a system crash would not need a REPAIR TABLE. (REPAIRing a billion-row MyISAM table can take hours or days.)

Normalization tables -- InnoDB, primarily because it can be done efficiently with 2 indexes, whereas, MyISAM would need 4 to achieve the same efficiency.

Staging -- Lots of options here.

If you have multiple Inserters and a single Staging table, InnoDB is desirable due to row-level, not table-level, locking.
MEMORY may be the fastest and it avoids I/O. This is good for a single staging table.
For multiple Inserters, a separate Staging table for each Inserter is desired.
For multiple Inserters into a single Staging table, InnoDB may be faster. (MEMORY does table-level locking.)
With one non-InnoDB Staging table per Inserter, using an explicit LOCK TABLE avoids repeated implicit locks on each INSERT.
But, if you are doing LOCK TABLE and the Processing thread is separate, an UNLOCK is necessary periodically to let the RENAME grab the table.
"Batch INSERTs" (100-1000 rows per SQL) eliminates much of the issues of the above bullet items.

Confused? Lost? There are enough variations in applications that make it impractical to predict what is best. Or, simply good enough. Your ingestion rate may be low enough that you don't hit the brick walls that I am helping you avoid.

Should you do "CREATE TEMPORARY TABLE"? Probably not. Consider Staging as part of the data flow, not to be DROPped.

Summarization

This is mostly covered here: Summary Tables Summarize from the Staging table instead of the Fact table.

Replication Issues

Row Based Replication (RBR) is probably the best option.

The following allows you to keep more of the Ingestion process in the Master, thereby not bogging down the Slave(s) with writes to the Staging table.

RBR
Staging is in a separate database
That database is not replicated (binlog-ignore-db on Master)
In the Processing steps, USE that database, reach into the main db via syntax like "MainDb.Hosts". (Otherwise, the binlog-ignore-db does the wrong thing.)

That way

Writes to Staging are not replicated.
Normalization sends only the few updates to the normalization tables.
Summarization sends only the updates to the summary tables.
Flip-flop does not replicate the DROP, CREATE or RENAME.

Sharding

You could possibly spread the data you are trying ingest across multiple machines in a predictable way (sharding on hash, range, etc). Running "reports" on a sharded Fact table is a challenge unto itself. On the other hand, Summary Tables rarely get too big to manage on a single machine.

For now, Sharding is beyond the scope of this blog.

Push me vs pull me

I have implicitly assumed the data is being pushed into the database. If, instead, you are "pulling" data from some source(s), then there are some different considerations.

Case 1: An hourly upload; run via cron

Grab the upload, parse it
Put it into the Staging table
Normalize -- each SQL in its own transaction (autocommit)
BEGIN
Summarize
Copy from Staging to Fact.
COMMIT

If you need parallelism in Summarization, you will have to sacrifice the transactional integrity of steps 4-7.

Caution: If these steps add up to more than an hour, you are in deep dodo.

Case 2: You are polling for the data

It is probably reasonable to have multiple processes doing this, so it will be detailed about locking.

Create a Staging table for this polling processor. Loop:
With some locked mechanism, decide which 'thing' to poll.
Poll for the data, pull it in, parse it. (Potentially polling and parsing are significantly costly)
Put it into the process-specific Staging table
Normalize -- each SQL in its own transaction (autocommit)
BEGIN
Summarize
Copy from Staging to Fact.
COMMIT
Declare that you are finished with this 'thing' (see step 1) EndLoop.

iblog_file_size should be larger than the change in the STATUS "Innodb_os_log_written" across the BEGIN...COMMIT transaction (for either Case).

Data Warehousing Summary Tables

Preface

This document discusses the creation and maintenance of "Summary Tables". It is a companion to the document on Data Warehousing Techniques.

The basic terminology ("Fact Table", "Normalization", etc) is covered in that document.

Summary tables for data warehouse "reports"

Summary tables are a performance necessity for large tables. MariaDB and MySQL do not provide any automated way to create such, so I am providing techniques here.

(Other vendors provide something similar with "materialized views".)

When you have millions or billions of rows, it takes a long time to summarize the data to present counts, totals, averages, etc, in a size that is readily digestible by humans. By computing and saving subtotals as the data comes in, one can make "reports" run much faster. (I have seen 10x to 1000x speedups.) The subtotals go into a "summary table". This document guides you on efficiency in both creating and using such tables.

General structure of a summary table

A summary table includes two sets of columns:

Main KEY: date + some dimension(s)
Subtotals: COUNT(*), SUM(...), ...; but not AVG()

The "date" might be a DATE (a 3-byte native datatype), or an hour, or some other time interval. A 3-byte MEDIUMINT UNSIGNED 'hour' can be derived from a DATETIME or TIMESTAMP via

FLOOR(UNIX_TIMESTAMP(dt) / 3600)
   FROM_UNIXTIME(hour * 3600)

The "dimensions" (a DW term) are some of the columns of the "Fact" table. Examples: Country, Make, Product, Category, Host Non-dimension examples: Sales, Quantity, TimeSpent

There would be one or more indexes, usually starting with some dimensions and ending with the date field. By ending with the date, one can efficiently get a range of days/weeks/etc. even when each row summarizes only one day.

There will typically be a "few" summary tables. Often one summary table can serve multiple purposes sufficiently efficiently.

As a rule of thumb, a summary table will have one-tenth the number of rows as the Fact table. (This number is very loose.)

Example

Let's talk about a large chain of car dealerships. The Fact table has all the sales with columns such as datetime, salesman_id, city, price, customer_id, make, model, model_year. One Summary table might focus on sales:

PRIMARY KEY(city, datetime),
   Aggregations: ct, sum_price
   
   # Core of INSERT..SELECT:
   DATE(datetime) AS DATE, city, COUNT(*) AS ct, SUM(price) AS sum_price
   
   # Reporting average price FOR last month, broken down BY city:
   SELECT city,
          SUM(sum_price) / SUM(ct) AS 'AveragePrice'
      FROM SalesSummary
      WHERE datetime BETWEEN ...
      GROUP BY city;
   
   # Monthly sales, nationwide, FROM same summary TABLE:
   SELECT MONTH(datetime) AS 'Month',
          SUM(ct)         AS 'TotalSalesCount'
          SUM(sum_price)  AS 'TotalDollars'
      FROM SalesSummary
      WHERE datetime BETWEEN ...
      GROUP BY MONTH(datetime);
   # This might benefit FROM a secondary INDEX(datetime)

When to augment the summary table(s)?

"Augment" in this section means to add new rows into the summary table or increment the counts in existing rows.

Plan A: "While inserting" rows into the Fact table, augment the summary table(s). This is simple, and workable for a smaller DW database (under 10 Fact table rows per second). For larger DW databases, Plan A likely to be too costly to be practical.

Plan B: "Periodically", via cron or an EVENT.

Plan C: "As needed". That is, when someone asks for a report, the code first updates the summary tables that will be needed.

Plan D: "Hybrid" of B and C. C, by itself, can led to long delays for the report. By also doing B, those delays can be kept low.

Plan E: (This is not advised.) "Rebuild" the entire summary table from the entire Fact table. The cost of this is prohibitive for large tables. However, Plan E may be needed when you decide to change the columns of a Summary Table, or discover a flaw in the computations. To lessen the impact of an entire build, adapt the chunking techniques in Deleting in chunks .

Plan F: "Staging table". This is primarily for very high speed ingestion. It is mentioned briefly in this blog, and discussed more thoroughly in the companion blog: High Speed Ingestion

Summarizing while Inserting (one row at a time)

INSERT INTO Fact ...;
    INSERT INTO Summary (..., ct, foo, ...) VALUES (..., 1, foo, ...)
        ON DUPLICATE KEY UPDATE ct = ct+1, sum_foo = sum_foo + VALUES(foo), ...;

IODKU (Insert On Duplicate Key Update) will update an existing row or create a new row. It knows which to do based on the Summary table's PRIMARY KEY.

Caution: This approach is costly, and will not scale to an ingestion rate of over, say, 10 rows per second (Or maybe 50/second on SSDs). More discussion later.

Summarizing periodically vs as-needed

If your reports need to be up-to-the-second, you need "as needed" or "hybrid". If your reports have less urgency (eg, weekly reports that don't include 'today'), then "periodically" might be best.

For a daily summaries, augmenting the summary tables could be done right after midnight. But, beware of data coming "late".

For both "periodic" and "as needed", you need a definitive way of keeping track of where you "left off".

Case 1: You insert into the Fact table first and it has an AUTO_INCREMENT id: Grab MAX(id) as the upper bound for summarizing and put it either into some other secure place (an extra table), or put it into the row(s) in the Summary table as you insert them. (Caveat: AUTO_INCREMENT ids do not work well in multi-master, including Galera, setups.)

Case 2: If you are using a 'staging' table, there is no issue. (More on staging tables later.)

Summarizing while batch inserting

This applies to multi-row (batch) INSERT and LOAD DATA.

The Fact table needs an AUTO_INCREMENT id, and you need to be able to find the exact range of ids inserted. (This may be impractical in any multi-master setup.)

Then perform bulk summarization using

FROM Fact
   WHERE id BETWEEN min_id AND max_id

Summarizing when using a staging table

Load the data (via INSERTs or [LOAD DATA) en masse into a "staging table". Then perform batch summarization from the Staging table. And batch copy from the Staging table to the Fact table. Note that the Staging table is handy for batching "normalization" during ingestion. See also [data-warehousing-high-speed-ingestion|High Speed Ingestion

Summary table: PK or not?

Let's say your summary table has a DATE, dy, and a dimension, foo. The question is: Should (foo, dy) be the PRIMARY KEY? Or a non-UNIQUE index?

Case 1: PRIMARY KEY (foo, dy) and summarization is in lock step with, say, changes in dy.

This case is clean and simple -- until you get to endcases. How will you handle the case of data arriving 'late'? Maybe you will need to recalculate some chunks of data? If so, how?

Case 2: (foo, dy) is a non-UNIQUE INDEX.

This case is clean and simple, but it can clutter the summary table because multiple rows can occur for a given (foo, dy) pair. The report will always have to SUM() up values because it cannot assume there is only one row, even when it is reporting on a single foo for a single dy. This forced-SUM is not really bad -- you should do it anyway; that way all your reports are written with one pattern.

Case 3: PRIMARY KEY (foo, dy) and summarization can happen anytime.

Since you should be using InnoDB, there needs to be an explicit PRIMARY KEY. One approach when you do not have a 'natural' PK is this:

id INT UNSIGNED AUTO_INCREMENT NOT NULL,
   ...
   PRIMARY KEY(foo, dy, id),  -- `id` added to make unique
   INDEX(id)                  -- sufficient to keep AUTO_INCREMENT happy

This case pushes the complexity onto the summarization by doing a IODKU.

Advice? Avoid Case 1; too messy. Case 2 is ok if the extra rows are not too common. Case 3 may be the closest to "once size fits all".

Averages, etc.

When summarizing, include COUNT(*) AS ct and SUM(foo) AS sum_foo. When reporting, the "average" is computed as SUM(sum_foo) / SUM(ct). That is mathematically correct.

Exception... Let's say you are looking at weather temperatures. And you monitoring station gets the temp periodically, but unreliably. That is, the number of readings for a day varies. Further, you decide that the easiest way to compensate for the inconsistency is to do something like: Compute the avg temp for each day, then average those across the month (or other timeframe).

Formula for Standard Deviation:

SQRT( SUM(sum_foo2)/SUM(ct) - POWER(SUM(sum_foo)/SUM(ct), 2) )

Where sum_foo2 is SUM(foo * foo) from the summary table. sum_foo and sum_foo2 should be FLOAT. FLOAT gives you about 7 significant digits, which is more than enough for things like average and standard deviation. FLOAT occupies 4 bytes. DOUBLE would give you more precision, but occupies 8 bytes. INT and BIGINT are not practical because they may lead to complaints about overflow.

Staging table

The idea here is to first load a set of Fact records into a "staging table", with the following characteristics (at least):

The table is repeatedly populated and truncated
Inserts could be individual or batched, and from one or many clients
SELECTs will be table scans, so no indexes needed
Inserting will be fast (InnoDB may be the fastest)
Normalization can be done in bulk, hence efficiently
Copying to the Fact table will be fast
Summarization can be done in bulk, hence efficiently
"Bursty" ingestion is smoothed by this process
Flip-flop a pair of Staging tables

If you have bulk inserts (Batch INSERT or LOAD DATA) then consider doing the normalization and summarization immediately after each bulk insert.

More details: High Speed Ingestion

Extreme design

Here is a more complex way to design the system, with the goal of even more scaling.

Use master-slave setup: ingest into master; report from slave(s).
Feed ingestion through a staging table (as described above)
Single-source of data: ENGINE=MEMORY; multiple sources: InnoDB
binlog_format = ROW
Use binlog_ignore_db to avoid replicating staging -- necessitating putting it in a separate database.
Do the summarization from Staging
Load Fact via INSERT INTO Fact ... SELECT FROM Staging ...

Explanation and comments:

ROW + ignore_db avoids replicating Staging, yet replicates the INSERTs based on it. Hence, it lightens the write load on the Slaves
If using MEMORY, remember that it is volatile -- recover from a crash by starting the ingestion over.
To aid with debugging, TRUNCATE or re-CREATE Staging at the start of the next cycle.
Staging needs no indexes -- all operations read all rows from it.

Stats on the system that this 'extreme design' came from: Fact Table: 450GB, 100M rows/day (batch of 4M/hour), 60 day retention (60+24 partitions), 75B/row, 7 summary tables, under 10 minutes to ingest and summarize the hourly batch. The INSERT..SELECT handled over 20K rows/sec going into the Fact table. Spinning drives (not SSD) with RAID-10.

"Left Off"

One technique involves summarizing some of the data, then recording where you "left off", so that next time, you can start there. There are some subtle issues with "left off" that you should be cautious of.

If you use a DATETIME or TIMESTAMP as "left off", beware of multiple rows with the same value.

Plan A: Use a compound "left off" (eg, TIMESTAMP + ID). This is messy, error prone, etc.
Plan B: WHERE ts >= $left_off AND ts < $max_ts -- avoids dups, but has other problems (below)
Separate threads could COMMIT TIMESTAMPs out of order.

If you use an AUTO_INCREMENT as "left off" beware of:

In InnoDB, separate threads could COMMIT ids in the 'wrong' order.
Multi-master (including Galera and InnoDB Cluster), could lead to ordering issues.

So, nothing works, at least not in a multi-threaded environment?

If you can live with an occasional hiccup (skipped record), then maybe this is 'not a problem' for you.

The "Flip-Flop Staging" is a safe alternative, optionally combined with the "Extreme Design".

Flip-flop staging

If you have many threads simultaneously INSERTing into one staging table, then here is an efficient way to handle a large load: Have a process that flips that staging table with another, identical, staging table, and performs bulk normalization, Fact insertion, and bulk summarization.

The flipping step uses a fast, atomic, RENAME.

Here is a sketch of the code:

# Prep FOR flip:
    CREATE TABLE new LIKE Staging;

    # Swap (flip) Staging tables:
    RENAME TABLE Staging TO old, new TO Staging;

    # Normalize new `foo`s:
    # (autocommit = 1)
    INSERT IGNORE INTO Foos SELECT fpp FROM old LEFT JOIN Foos ...

    # Prep FOR possible deadlocks, etc
    WHILE...
    START TRANSACTION;

    # ADD TO Fact:
    INSERT INTO Fact ... FROM old JOIN Foos ...

    # Summarize:
    INSERT INTO Summary ... FROM old ... GROUP BY ...

    COMMIT;
    end-WHILE

    # Cleanup:
    DROP TABLE old;

Meanwhile, ingestion can continue writing to Staging. The ingestion INSERTs will conflict with the RENAME, but will be resolved gracefully and silently and quickly.

How fast should you flip-flop? Probably the best scheme is to

Have a job that flip-flops in a tight loop (no delay, or a small delay, between iterations), and
Have a CRON that serves only as a "keep-alive" to restart the job if it dies.

If Staging is 'big', an iteration will take longer, but run more efficiently. Hence, it is self-regulating.

In a Galera (or InnoDB Cluster?) environment, each node could be receiving input. If can afford to loose a few rows, have Staging be a non-replicated MEMORY table. Otherwise, have one Staging per node and be InnoDB; it will be more secure, but slower and not without problems. In particular, if a node dies completely, you somehow need to process its Staging table.

Multiple summary tables

Look at the reports you will need.
Design a summary table for each.
Then look at the summary tables -- you are likely to find some similarities.
Merge similar ones.

To look at what a report needs, look at the WHERE clause that would provide the data. Some examples, assuming data about service records for automobiles: The GROUP BY to gives a clue of what the report might be about.

WHERE make = ? AND model_year = ? GROUP BY service_date, service_type
WHERE make = ? AND model = ? GROUP BY service_date, service_type
WHERE service_type = ? GROUP BY make, model, service_date
WHERE service_date between ? and ? GROUP BY make, model, model_year

You need to allow for 'ad hoc' queries? Well, look at all the ad hoc queries -- they all have a date range, plus nail down one or two other things. (I rarely see something as ugly as '%CL%' for nailing down another dimension.) So, start by thinking of date plus one or two other dimensions as the 'key' into a new summary table. Then comes the question of what data might be desired -- counts, sums, etc. Eventually you have a small set of summary tables. Then build a front end to allow them to pick only from those possibilities. It should encourage use of the existing summary tables, not be truly 'open ended'.

Later, another 'requirement' may surface. So, build another summary table. Of course, it may take a day to initially populate it.

Games on summary tables

Does one ever need to summarize a summary table? Yes, but only in extreme situations. Usually a 'weekly' report can be derived from a 'daily' summary table; building a separate weekly summary table not being worth the effort.

Would one ever PARTITION a Summary Table? Yes, in extreme situations, such as the table being large, and

Need to purge old data (unlikely), or
Recent' data is usually requested, and the index(es) fail to prevent table scans (rare). ("Partition pruning" to the rescue.)

Data Warehousing Techniques

Preface

This document discusses techniques for improving performance for data-warehouse-like tables in MariaDB and MySQL.

How to load large tables.
Normalization.
Developing 'summary tables' to make 'reports' efficient.
Purging old data.

Details on summary tables is covered in the companion document: Summary Tables.

Terminology

This list mirrors "Data Warehouse" terminology.

Fact table -- The one huge table with the 'raw' data.
Summary table -- a redundant table of summarized data that could -- use for efficiency
Dimension -- columns that identify aspects of the dataset (region, country, user, SKU, zipcode, ...)
Normalization table (dimension table) -- mapping between strings an ids; used for space and speed.
Normalization -- The process of building the mapping ('New York City' <-> 123)

Fact table

Techniques that should be applied to the huge Fact table.

id INT/BIGINT UNSIGNED NOT NULL AUTO_INCREMENT
PRIMARY KEY (id)
Probably no other INDEXes
Accessed only via id
All VARCHARs are "normalized"; ids are stored instead
ENGINE = InnoDB
All "reports" use summary tables, not the Fact table
Summary tables may be populated from ranges of id (other techniques described below)

There are exceptions where the Fact table must be accessed to retrieve multiple rows. However, you should minimize the number of INDEXes on the table because they are likely to be costly on INSERT.

Why keep the Fact table?

Once you have built the Summary table(s), there is not much need for the Fact table. One option that you should seriously consider is to not have a Fact table. Or, at least, you could purge old data from it sooner than you purge the Summary tables. Maybe even keep the Summary tables forever.

Case 1: You need to find the raw data involved in some event. But how will you find those row(s)? This is where a secondary index may be required.

If a secondary index is bigger than can be cached in RAM, and if the column(s) being indexed is random, then each row inserted may cause a disk hit to update the index. This limits insert speed to something like 100 rows per second (on ordinary disks). Multiple random indexes slow down insertion further. RAID striping and/or SSDs speed up insertion. Write caching helps, but only for bursts.

Case 2: You need some event, but you did not plan ahead with the optimal INDEX. Well, if the data is PARTITIONed on date, so even if you have a clue of when the event occurred, "partition pruning" will keep the query from being too terribly slow.

Case 3: Over time, the application is likely to need new 'reports', which may lead to a new Summary table. At this point, it would be handy to scan through the old data to fill up the new table.

Case 4: You find a flaw in the summarization, and need to rebuild an existing Summary table.

Cases 3 and 4 both need the "raw" data. But they don't necessarily need the data sitting in a database table. It could be in the pre-database format (such as log files). So, consider not building the Fact table, but simply keep the raw data, comressed, on some file system.

Batching the load of the Fact table

When talking about billions of rows in the Fact table, it is essentially mandatory that you "batch" the inserts. There are two main ways:

INSERT INTO Fact (.,.,.) VALUES (.,.,.), (.,.,.), ...; -- "Batch insert"
LOAD DATA ...;

A third way is to INSERT or LOAD into a Staging table, then

INSERT INTO Fact SELECT * FROM Staging; This INSERT..SELECT allows you to do other things, such as normalization. More later.

Batched INSERT Statement

Chunk size should usually be 100-1000 rows.

100-1000 an insert will run 10 times as fast as single-row inserts.
Beyond 100, you may be interfering replication and SELECTs.
Beyond 1000, you are into diminishing returns -- virtually no further performance gains.
Don't go past, say, 1MB for the constructed INSERT statement. This deals with packet sizes, etc. (1MB is unlikely to be hit for a Fact table.) Decide whether your application should lean toward the 100 or the 1000.

If your data is coming in continually, and you are adding a batching layer, let's do some math. Compute your ingestion rate -- R rows per second.

If R < 10 (= 1M/day = 300M/year) -- single-row INSERTs would probably work fine (that is, batching is optional)
If R < 100 (3B records per year) -- secondary indexes on Fact table may be ok
If R < 1000 (100M records/day) -- avoid secondary indexes on Fact table.
If R > 1000 -- Batching may not work. Decide how long (S seconds) you can stall loading the data in order to collect a batch of rows.
If S < 0.1s -- May not be able to keep up

If batching seems viable, then design the batching layer to gather for S seconds or 100-1000 rows, whichever comes first.

(Note: Similar math applies to rapid UPDATEs of a table.)

Normalization (Dimension) table

Normalization is important in Data Warehouse applications because it significantly cuts down on the disk footprint and improves performance. There are other reasons for normalizing, but space is the important one for DW.

Here is a typical pattern for a Dimension table:

CREATE TABLE Emails (
        email_id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,  -- don't make bigger than needed
        email VARCHAR(...) NOT NULL,
        PRIMARY KEY (email),  -- for looking up one way
        INDEX(email_id)  -- for looking up the other way (UNIQUE is not needed)
    ) ENGINE = InnoDB;  -- to get clustering

Notes:

MEDIUMINT is 3 bytes with UNSIGNED range of 0..16M; pick SMALLINT, INT, etc, based on a conservative estimate of how many 'foo's you will eventually have.
datatype sizes
There may be more than one VARCHAR in the table. Example: For cities, you might have City and Country.
InnoDB is better than MyISAM because of way the two keys are structured.
The secondary key is effectively (email_id, email), hence 'covering' for certain queries.
It is OK to not specify an AUTO_INCREMENT to be UNIQUE.

Batched normalization

I bring this up as a separate topic because of some of the subtle issues that can happen.

You may be tempted to do

INSERT IGNORE INTO Foos
        SELECT DISTINCT foo FROM Staging;  -- not wise

It has the problem of "burning" AUTO_INCREMENT ids. This is because MariaDB pre-allocates ids before getting to "IGNORE". That could rapidly increase the AUTO_INCREMENT values beyond what you expected.

Better is this...

INSERT IGNORE INTO Foos
        SELECT DISTINCT foo
            FROM Staging
            LEFT JOIN Foos ON Foos.foo = Staging.foo
            WHERE Foos.foo_id IS NULL;

Notes:

The LEFT JOIN .. IS NULL finds the foos that are not yet in Foos.
This INSERT..SELECT must not be done inside the transaction with the rest of the processing. Otherwise, you add to deadlock risks, leading to burned ids.
IGNORE is used in case you are doing the INSERT from multiple processes simultaneously.

Once that INSERT is done, this will find all the foo_ids it needs:

INSERT INTO Fact (..., foo_id, ...)
        SELECT ..., Foos.foo_id, ...
            FROM Staging
            JOIN Foos ON Foos.foo = Staging.foo;

An advantage of "Batched Normalization" is that you can summarize directly from the Staging table. Two approaches:

Case 1: PRIMARY KEY (dy, foo) and summarization is in lock step with, say, changes in dy.

This approach can have troubles if new data arrives after you have summarized the day's data.

INSERT INTO Summary (dy, foo, ct, blah_total)
        SELECT  DATE(dt) AS dy, foo,
                COUNT(*) AS ct, SUM(blah) AS blah_total)
            FROM Staging
            GROUP BY 1, 2;

Case 2: (dy, foo) is a non-UNIQUE INDEX.

Same code as Case 1.
By having the index be non-UNIQUE, delayed data simply shows up as extra rows.
You need to take care to avoid summarizing the data twice. (The id on the Fact table may be a good tool for that.)

Case 3: PRIMARY KEY (dy, foo) and summarization can happen anytime.

INSERT INTO Summary (dy, foo, ct, blah_total)
        ON DUPLICATE KEY UPDATE
            ct = ct + VALUE(ct),
            blah_total = blah_total + VALUE(bt)
        SELECT  DATE(dt) AS dy, foo,
                COUNT(*) AS ct, SUM(blah) AS bt)
            FROM Staging
            GROUP BY 1, 2;

Too many choices?

This document lists a number of ways to do things. Your situation may lead to one approach being more/less acceptable. But, if you are thinking "Just tell me what to do!", then here:

Batch load the raw data into a temporary table (Staging).
Normalize from Staging -- use code in Case 3.
INSERT .. SELECT to move the data from Staging into the Fact table
Summarize from Staging to Summary table(s) via IODKU (Insert ... On Duplicate Key Update).
Drop the Staging

Those techniques should perform well and scale well in most cases. As you develop your situation, you may discover why I described alternative solutions.

Purging old data

Typically the Fact table is PARTITION BY RANGE (10-60 ranges of days/weeks/etc) and needs purging (DROP PARTITION) periodically. This discusses a safe/clean way to design the partitioning and do the DROPs: Purging PARTITIONs

Master / slave

For "read scaling", backup, and failover, use master-slave replication or something fancier. Do ingestion only on a single active master; it replicate to the slave(s). Generate reports on the slave(s).

Sharding

"Sharding" is the splitting of data across multiple servers. (In contrast, replication and Galera have the same data on all servers, requiring all data to be written to all servers.)

With the non-sharding techniques described here, terabyte(s) of data can be handled by a single machine. Tens of terabytes probably requires sharding.

Sharding is beyond the scope of this document.

How fast? How big?

With the techniques described here, you may be able to achieve the following performance numbers. I say "may" because every data warehouse situation is different, and you may require performance-hurting deviations from what I describe here. I give multiple options for some aspects; these may cover some of your deviations.

One big performance killer is UUID/GUID keys. Since they are very 'random', updates of them (at scale) are limited to 1 row = 1 disk hit. Plain disks can handle only 100 hits/second. RAID and/or SSD can increase that to something like 1000 hits/sec. Huge amounts of RAM (for caching the random index) are a costly solution. It is possible to turn type-1 UUIDs into roughly-chronological keys, thereby mittigating the performance problems if the UUIDs are written/read with some chronological clustering. UUID discussion

Hardware, etc:

Single SATA drive: 100 IOPs (Input/Output operations per second)
RAID with N physical drives -- 100*N IOPs (roughly)
SSD -- 5 times as fast as rotating media (in this context)
Batch INSERT -- 100-1000 rows is 10 times as fast as INSERTing 1 row at a time (see above)
Purge "old" data -- Do not use DELETE or TRUNCATE, design so you can use DROP PARTITION (see above)
Think of each INDEX (except the PRIMARY KEY on InnoDB) as a separate table
Consider access patterns of each table/index: random vs at-the-end vs something in between

"Count the disk hits" -- back-of-envelope performance analysis

Random accesses to a table/index -- count each as a disk hit.
At-the-end accesses (INSERT chronologically or with AUTO_INCREMENT; range SELECT) -- count as zero hits.
In between (hot/popular ids, etc) -- count as something in between
For INSERTs, do the analysis on each index; add them up.
For SELECTs, do the analysis on the one index used, plus the table. (Use of 2 indexes is rare.) Insert cost, based on datatype of first column in an index:
AUTO_INCREMENT -- essentially 0 IOPs
DATETIME, TIMESTAMP -- essentially 0 for 'current' times
UUID/GUID -- 1 per insert (terrible)
Others -- depends on their patterns SELECT cost gets a little tricky:
Range on PRIMARY KEY -- think of it as getting 100 rows per disk hit.
IN on PRIMARY KEY -- 1 disk hit per item in IN
"=" -- 1 hit (for 1 row)
Secondary key -- First compute the hits for the index, then...
Think of each row as needing 1 disk hit.
However, if the rows are likely to be 'near' each other (based on the PRIMARY KEY), then it could be < 1 disk hit/row.

How fast?

Look at your data; compute raw rows per second (or hour or day or year). There are about 30M seconds in a year; 86,400 seconds per day. Inserting 30 rows per second becomes a billion rows per year.

10 rows per second is about all you can expect from an ordinary machine (after allowing for various overheads). If you have less than that, you don't have many worries, but still you should probably create Summary tables. If more than 10/sec, then batching, etc, becomes vital. Even on spiffy hardware, 100/sec is about all you can expect without utilizing the techniques here.

Not so fast?

Let's say your insert rate is only one-tenth of your disk IOPs (eg, 10 rows/sec vs 100 IOPs). Also, let's say your data is not "bursty"; that is, the data comes in somewhat soothly throughout the day.

Note that 10 rows/sec (300M/year) implies maybe 30GB for data + indexes + normalization tables + summary tables for 1 year. I would call this "not so big".

Still, the normalization and summarization are important. Normalization keeps the data from being, say, twice as big. Summarization speeds up the reports by orders of magnitude.

Let's design and analyse a "simple ingestion scheme" for 10 rows/second, without 'batching'.

# Normalize:
    $foo_id = SELECT foo_id FROM Foos WHERE foo = $foo;
    IF NO $foo_id, THEN
        INSERT IGNORE INTO Foos ...

    # Inserts:
    BEGIN;
        INSERT INTO Fact ...;
        INSERT INTO Summary ... ON DUPLICATE KEY UPDATE ...;
    COMMIT;
    # (plus code TO deal WITH errors ON INSERTs OR COMMIT)

Depending on the number and randomness of your indexes, etc, 10 Fact rows may (or may not) take less than 100 IOPs.

Also, note that as the data grows over time, random indexes will become less and less likely to be cached. That is, even if runs fine with 1 year's worth of data, it may be in trouble with 2 year's worth.

For those reasons, I started this discussion with a wide margin (10 rows versus 100 IOPs).

References

sec. 3.3.2: Dimensional Model and "Star schema"
Summary Tables

DISTINCT removal in aggregate functions

Basics

One can use DISTINCT keyword to de-duplicate the arguments of an aggregate function. For example:

SELECT COUNT(DISTINCT col1) FROM tbl1;

In order to compute this, MariaDB has to collect the values of col1 and remove the duplicates. This may be computationally expensive.

After the fix for MDEV-30660 (available from MariaDB 10.5.25, MariaDB 10.6.18, MariaDB 10.11.8, MariaDB 11.0.6, MariaDB 11.1.5, MariaDB 11.2.4, MariaDB 11.4.2), the optimizer can detect certain cases when argument of aggregate function will not have duplicates and so de-duplication can be skipped.

When one can skip de-duplication

A basic example: if we're doing a select from one table, then the values of primary_key are already distinct:

SELECT aggregate_func(DISTINCT tbl.primary_key, ...) FROM tbl;

If the SELECT has other constant tables, that's also ok, as they will not create duplicates.

The next step: a part of the primary key can be "bound" by the GROUP BY clause. Consider a query:

SELECT aggregate_func(DISTINCT t1.pk1, ...) FROM t1 GROUP BY t1.pk2;

Suppose the table has PRIMARY KEY(pk1, pk2). Grouping by pk2 fixes the value of pk2 within each group. Then, the values of pk1 must be unique within each group, and de-duplication is not necessary.

Observability

EXPLAIN or EXPLAIN FORMAT=JSON do not show any details about how aggregate functions are computed. One has to look at the Optimizer Trace. Search for aggregator_type:

When de-duplication is necessary, it will show:

{
            "prepare_sum_aggregators": {
              "function": "count(distinct t1.col1)",
              "aggregator_type": "distinct"
            }
          }

When de-duplication is not necessary, it will show:

{
            "prepare_sum_aggregators": {
              "function": "count(distinct t1.pk1)",
              "aggregator_type": "simple"
            }
          }

_{This page is licensed: CC BY-SA / Gnu FDL}

Equality propagation optimization

Basic idea

Consider a query with a WHERE clause:

WHERE col1=col2 AND ...

the WHERE clause will compute to true only if col1=col2. This means that in the rest of the WHERE clause occurrences of col1 can be substituted with col2 (with some limitations which are discussed in the next section). This allows the optimizer to infer additional restrictions.

For example:

WHERE col1=col2 AND col1=123

allows to infer a new equality: col2=123

WHERE col1=col2 AND col1 < 10

allows to infer that col2<10.

Identity and comparison substitution

There are some limitations to where one can do the substitution, though.

The first and obvious example is the string datatype and collations. Most commonly-used collations in SQL are "case-insensitive", that is 'A'='a'. Also, most collations have a "PAD SPACE" attribute, which means that comparison ignores the spaces at the end of the value, 'a'='a '.

Now, consider a query:

INSERT INTO t1 (col1, col2) VALUES ('ab', 'ab   ');
SELECT * FROM t1 WHERE col1=col2 AND LENGTH(col1)=2

Here, col1=col2, the values are "equal". At the same time LENGTH(col1)=2, while LENGTH(col2)=4, which means one can't perform the substiution for the argument of LENGTH(...).

It's not only collations. There are similar phenomena when equality compares columns of different datatypes. The exact criteria of when thy happen are rather convoluted.

The take-away is: sometimes, X=Y does not mean that one can replace any reference to X with Y. What one CAN do is still replace the occurrence in the comparisons <, >, >=, <=, etc.

This is how we get two kinds of substitution:

Identity substitution: X=Y, and any occurrence of X can be replaced with Y.
Comparison substitution: X=Y, and an occurrence of X in a comparison (X<Z) can be replaced with Y (Y<Z).

Place in query optimization

(A draft description): Let's look at how Equality Propagation is integrated with the rest of the query optimization process.

First, multiple-equalities are built (TODO example from optimizer trace)
- If multiple-equality includes a constant, fields are substituted with a constant if possible.
From this point, all optimizations like range optimization, ref access, etc make use of multiple equalities: when they see a reference to tableX.columnY somewhere, they also look at all the columns that tableX.columnY is equal to.
After the join order is picked, the optimizer walks through the WHERE clause and substitutes each field reference with the "best" one - the one that can be checked as soon as possible.
- Then, the parts of the WHERE condition are attached to the tables where they can be checked.

Interplay with ORDER BY optimization

Consider a query:

SELECT ... FROM ... WHERE col1=col2 ORDER BY col2

Suppose, there is an INDEX(col1). MariaDB optimizer is able to figure out that it can use an index on col1 (or sort by the value of col1) in order to resolve ORDER BY col2.

Optimizer trace

Look at these elements:

condition_processing
attaching_conditions_to_tables

More details

Equality propagation doesn't just happen at the top of the WHERE clause. It is done "at all levels" where a level is:

A top level of the WHERE clause.
If the WHERE clause has an OR clause, each branch of the OR clause.
The top level of any ON expression
(the same as above about OR-levels)

_{This page is licensed: CC BY-SA / Gnu FDL}

Filesort with Small LIMIT Optimization

Optimization Description

When n is sufficiently small, the optimizer will use a priority queue for sorting. Before the optimization's porting to MariaDB 10.0, the alternative was, roughly speaking, to sort the entire output and then pick only first n rows.

NOTE: The problem of choosing which index to use for query with ORDER BY ... LIMIT is a different problem, see optimizer_join_limit_pref_ratio-optimization.

Optimization Visibility in MariaDB

There are two ways to check whether filesort has used a priority queue.

Status Variable

The first way is to check the Sort_priority_queue_sorts status variable. It shows the number of times that sorting was done through a priority queue. (The total number of times sorting was done is a sum Sort_range and Sort_scan).

Slow Query Log

The second way is to check the slow query log. When one uses Extended statistics in the slow query log and specifies log_slow_verbosity=query_plan, slow query log entries look like this

# Time: 140714 18:30:39
# User@Host: root[root] @ localhost []
# Thread_id: 3  Schema: test  QC_hit: No
# Query_time: 0.053857  Lock_time: 0.000188  Rows_sent: 11  Rows_examined: 100011
# Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: Yes
SET timestamp=1405348239;SET timestamp=1405348239;
select * from t1 where col1 between 10 and 20 order by col2 limit 100;

Note the "Priority_queue: Yes" on the last comment line. (pt-query-digest is able to parse slow query logs with the Priority_queue field)

As for EXPLAIN, it will give no indication whether filesort uses priority queue or the generic quicksort and merge algorithm. Using filesort will be shown in both cases, by both MariaDB and MySQL.

FORCE INDEX

Description

Forcing an index to be used is mostly useful when the optimizer decides to do a table scan even if you know that using an index would be better. (The optimizer could decide to do a table scan even if there is an available index when it believes that most or all rows will match and it can avoid the overhead of using the index).

FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition it tells the optimizer to regard a table scan as something very expensive. However if none of the 'forced' indexes can be used, then a table scan will be used anyway.

FORCE INDEX cannot force an ignored index to be used - it will be treated as if it doesn't exist.

Example

CREATE INDEX Name ON City (Name);
EXPLAIN SELECT Name,CountryCode FROM City FORCE INDEX (Name)
WHERE name>="A" AND CountryCode >="A";

This produces:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	City	range	Name	Name	35	NULL	4079	Using where

Index Prefixes

When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.

Groupwise Max in MariaDB

The problem

You want to find the largest row in each group of rows. An example is looking for the largest city in each state. While it is easy to find the MAX(population) ... GROUP BY state, it is hard to find the name of the city associated with that population. Alas, MySQL and MariaDB do not have any syntax to provide the solution directly.

This article is under construction, mostly for cleanup. The content is reasonably accurate during construction.

The article presents two "good" solutions. They differ in ways that make neither of them 'perfect'; you should try both and weigh the pros and cons.

Also, a few "bad" solutions will be presented, together with why they were rejected.

MySQL manual gives 3 solutions; only the "Uncorrelated" one is "good", the other two are "bad".

Sample data

To show how the various coding attempts work, I have devised this simple task: Find the largest city in each Canadian province. Here's a sample of the source data (5493 rows):

+------------------+----------------+------------+
| province         | city           | population |
+------------------+----------------+------------+
| Saskatchewan     | Rosetown       |       2309 |
| British Columbia | Chilliwack     |      51942 |
| Nova Scotia      | Yarmouth       |       7500 |
| Alberta          | Grande Prairie |      41463 |
| Quebec           | Sorel          |      33591 |
| Ontario          | Moose Factory  |       2060 |
| Ontario          | Bracebridge    |       8238 |
| British Columbia | Nanaimo        |      84906 |
| Manitoba         | Neepawa        |       3151 |
| Alberta          | Grimshaw       |       2560 |
| Saskatchewan     | Carnduff       |        950 |
...

Here's the desired output (13 rows):

+---------------------------+---------------+------------+
| province                  | city          | population |
+---------------------------+---------------+------------+
| Alberta                   | Calgary       |     968475 |
| British Columbia          | Vancouver     |    1837970 |
| Manitoba                  | Winnipeg      |     632069 |
| New Brunswick             | Saint John    |      87857 |
| Newfoundland and Labrador | Corner Brook  |      18693 |
| Northwest Territories     | Yellowknife   |      15866 |
| Nova Scotia               | Halifax       |     266012 |
| Nunavut                   | Iqaluit       |       6124 |
| Ontario                   | Toronto       |    4612187 |
| Prince Edward Island      | Charlottetown |      42403 |
| Quebec                    | Montreal      |    3268513 |
| Saskatchewan              | Saskatoon     |     198957 |
| Yukon                     | Whitehorse    |      19616 |
+---------------------------+---------------+------------+

Duplicate max

One thing to consider is whether you want -- or do not want -- to see multiple rows for tied winners. For the dataset being used here, that would imply that the two largest cities in a province had identical populations. For this case, a duplicate would be unlikely. But there are many groupwise-max use cases where duplictes are likely.

The two best algorithms differ in whether they show duplicates.

Using an uncorrelated subquery

Characteristics:

Superior performance or medium performance
It will show duplicates
Needs an extra index
Probably requires 5.6
If all goes well, it will run in O(M) where M is the number of output rows.

An 'uncorrelated subquery':

SELECT  c1.province, c1.city, c1.population
    FROM  Canada AS c1
    JOIN
      ( SELECT  province, MAX(population) AS population
            FROM  Canada
            GROUP BY  province
      ) AS c2 USING (province, population)
    ORDER BY c1.province;

But this also 'requires' an extra index: INDEX(province, population). In addition, MySQL has not always been able to use that index effectively, hence the "requires 5.6". (I am not sure of the actual version.)

Without that extra index, you would need 5.6, which has the ability to create indexes for subqueries. This is indicated by <auto_key0> in the EXPLAIN. Even so, the performance is worse with the auto-generated index than with the manually generated one.

With neither the extra index, nor 5.6, this 'solution' would belong in 'The Duds' because it would run in O(N*N) time.

Using @variables

Characteristics:

Good performance
Does not show duplicates (picks one to show)
Consistent O(N) run time (N = number of input rows)
Only one scan of the data

SELECT
        province, city, population   -- The desired columns
    FROM
      ( SELECT  @prev := '' ) init
    JOIN
      ( SELECT  province != @prev AS first,  -- `province` is the 'GROUP BY'
                @prev := province,           -- The 'GROUP BY'
                province, city, population   -- Also the desired columns
            FROM  Canada           -- The table
            ORDER BY
                province,          -- The 'GROUP BY'
                population DESC    -- ASC for MIN(population), DESC for MAX
      ) x
    WHERE  first
    ORDER BY  province;     -- Whatever you like

For your application, change the lines with comments.

The duds

*** 'correlated subquery' (from MySQL doc):**

SELECT  province, city, population
    FROM  Canada AS c1
    WHERE  population =
      ( SELECT  MAX(c2.population)
            FROM  Canada AS c2
            WHERE  c2.province= c1.province
      )
    ORDER BY  province;

O(N*N) (that is, terrible) performance

*** LEFT JOIN (from MySQL doc):**

SELECT  c1.province, c1.city, c1.population
    FROM  Canada AS c1
    LEFT JOIN  Canada AS c2 ON c2.province = c1.province
      AND  c2.population > c1.population
    WHERE  c2.province IS NULL
    ORDER BY province;

Medium performance (2N-3N, depending on join_buffer_size).

For O(N*N) time,... It will take one second to do a groupwise-max on a few thousand rows; a million rows could take hours.

Top-N in each group

This is a variant on "groupwise-max" wherein you desire the largest (or smallest) N items in each group. Do these substitutions for your use case:

province --> your 'GROUP BY'
Canada --> your table
3 --> how many of each group to show
population --> your numeric field for determining "Top-N"
city --> more field(s) you want to show
Change the SELECT and ORDER BY if you desire
DESC to get the 'largest'; ASC for the 'smallest'

SELECT
        province, n, city, population
    FROM
      ( SELECT  @prev := '', @n := 0 ) init
    JOIN
      ( SELECT  @n := if(province != @prev, 1, @n + 1) AS n,
                @prev := province,
                province, city, population
            FROM  Canada
            ORDER BY
                province   ASC,
                population DESC
      ) x
    WHERE  n <= 3
    ORDER BY  province, n;

Output:

+---------------------------+------+------------------+------------+
| province                  | n    | city             | population |
+---------------------------+------+------------------+------------+
| Alberta                   |    1 | Calgary          |     968475 |
| Alberta                   |    2 | Edmonton         |     822319 |
| Alberta                   |    3 | Red Deer         |      73595 |
| British Columbia          |    1 | Vancouver        |    1837970 |
| British Columbia          |    2 | Victoria         |     289625 |
| British Columbia          |    3 | Abbotsford       |     151685 |
| Manitoba                  |    1 | ...

The performance of this is O(N), actually about 3N, where N is the number of source rows.

EXPLAIN EXTENDED gives

+----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table      | type   | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+
|  1 | PRIMARY     | <derived2> | system | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using filesort |
|  1 | PRIMARY     | <derived3> | ALL    | NULL          | NULL | NULL    | NULL | 5484 |   100.00 | Using where    |
|  3 | DERIVED     | Canada     | ALL    | NULL          | NULL | NULL    | NULL | 5484 |   100.00 | Using filesort |
|  2 | DERIVED     | NULL       | NULL   | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used |
+----+-------------+------------+--------+---------------+------+---------+------+------+----------+----------------+

Explanation, shown in the same order as the EXPLAIN, but numbered chronologically: 3. Get the subquery id=2 (init) 4. Scan the output from subquery id=3 (x) 2. Subquery id=3 -- the table scan of Canada

Subquery id=2 -- init for simply initializing the two @variables Yes, it took two sorts, though probably in RAM.

Main Handler values:

| Handler_read_rnd           | 39    |
| Handler_read_rnd_next      | 10971 |
| Handler_write              | 5485  |  -- #rows in Canada (+1)

Top-n in each group, take II

This variant is faster than the previous, but depends on city being unique across the dataset. (from openark.org)

SELECT  province, city, population
        FROM  Canada
        JOIN
          ( SELECT  GROUP_CONCAT(top_in_province) AS top_cities
                FROM
                  ( SELECT  SUBSTRING_INDEX(
                                   GROUP_CONCAT(city ORDER BY  population DESC),
                            ',', 3) AS top_in_province
                        FROM  Canada
                        GROUP BY  province
                  ) AS x
          ) AS y
        WHERE  FIND_IN_SET(city, top_cities)
        ORDER BY  province, population DESC;

Output. Note how there can be more than 3 cities per province:

| Alberta                   | Calgary          |     968475 |
| Alberta                   | Edmonton         |     822319 |
| Alberta                   | Red Deer         |      73595 |
| British Columbia          | Vancouver        |    1837970 |
| British Columbia          | Victoria         |     289625 |
| British Columbia          | Abbotsford       |     151685 |
| British Columbia          | Sydney           |          0 | -- Nova Scotia's second largest is Sydney
| Manitoba                  | Winnipeg         |     632069 |

Main Handler values:

| Handler_read_next          | 5484  | -- table size
| Handler_read_rnd_next      | 5500  | -- table size + number of provinces
| Handler_write              | 14    | -- number of provinces (+1)

Top-n using MyISAM

(This does not need your table to be MyISAM, but it does need MyISAM tmp table for its 2-column PRIMARY KEY feature.) See previous section for what changes to make for your use case.

-- build tmp table to get numbering
    -- (Assumes auto_increment_increment = 1)
    CREATE TEMPORARY TABLE t (
        nth MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
        PRIMARY KEY(province, nth)
    ) ENGINE=MyISAM
        SELECT province, NULL AS nth, city, population
            FROM Canada
            ORDER BY population DESC;
    -- Output the biggest 3 cities in each province:
    SELECT province, nth, city, population
        FROM t
        WHERE nth <= 3
        ORDER BY province, nth;

+---------------------------+-----+------------------+------------+
| province                  | nth | city             | population |
+---------------------------+-----+------------------+------------+
| Alberta                   |   1 | Calgary          |     968475 |
| Alberta                   |   2 | Edmonton         |     822319 |
| Alberta                   |   3 | Red Deer         |      73595 |
| British Columbia          |   1 | Vancouver        |    1837970 |
| British Columbia          |   2 | Victoria         |     289625 |
| British Columbia          |   3 | Abbotsford       |     151685 |
| Manitoba                  |  ...

SELECT FOR CREATE:
+----+-------------+--------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+--------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | Canada | ALL  | NULL          | NULL | NULL    | NULL | 5484 | Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+------+----------------+
Other SELECT:
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | t     | index | NULL          | PRIMARY | 104     | NULL |   22 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

The main handler values (total of all operations):

| Handler_read_rnd_next      | 10970 |
| Handler_write              | 5484  |  -- number of rows in Canada (write tmp table)

Both "Top-n" formulations probably take about the same amount of time.

Windowing functions

Hot off the press from Percona Live... MariaDB 10.2 has "windowing functions", which make "groupwise max" much more straightforward.

The code:

TBD

Postlog

Developed an first posted, Feb, 2015; Add MyISAM approach: July, 2015; Openark's method added: Apr, 2016; Windowing: Apr 2016

I did not include the technique(s) using GROUP_CONCAT. They are useful in some situations with small datasets. They can be found in the references below.

GUID/UUID Performance

The problem

GUIDs/UUIDs (Globally/Universally Unique Identifiers) are very random. Therefore, INSERTing into an index means jumping around a lot. Once the index is too big to be cached, most INSERTs involve a disk hit. Even on a beefy system, this limits you to a few hundred INSERTs per second.

MariaDB's UUID function.

This blog is mostly eliminated in MySQL 8.0 with the advent of the following function:UUID_TO_BIN(str, swap_flag).

Why it is a problem

A 'standard' GUID/UUID is composed of the time, machine identification and some other stuff. The combination should be unique, even without coordination between different computers that could be generating UUIDs simultaneously.

The top part of the GUID/UUID is the bottom part of the current time. The top part is the primary part of what would be used for placing the value in an ordered list (INDEX). This cycles in about 7.16 minutes.

Some math... If the index is small enough to be cached in RAM, each insert into the index is CPU only, with the writes being delayed and batched. If the index is 20 times as big as can be cached, then 19 out of 20 inserts will be a cache miss. (This math applies to any "random" index.)

Second problem

36 characters is bulky. If you are using that as a PRIMARY KEY in InnoDB and you have secondary keys, remember that each secondary key has an implicit copy of the PK, thereby making it bulky.

It is tempting to declare the UUID VARCHAR(36). And, since you probably are thinking globally, so you have CHARACTER SET utf8 (or utf8mb4). For utf8:

2 - Overhead for VAR
36 - chars
3 (or 4) bytes per character for utf8 (or utf8mb4) So, max length = 2+3*36 = 110 (or 146) bytes. For temp tables 108 (or 144) is actually used if a MEMORY table is used.

To compress

utf8 is unnecessary (ascii would do); but this is obviated by the next two steps
Toss dashes
UNHEX Now it will fit in 16 bytes: BINARY(16)

Combining the problems and crafting a solution

But first, a caveat. This solution only works for "Time based" / "Version 1" UUIDs They are recognizable by the "1" at the beginning of the third clump.

The manual's sample: 6ccd780c-baba-1026-9564-0040f4311e29 . A more current value (after a few years): 49ea2de3-17a2-11e2-8346-001eecac3efa . Notice how the 3rd part has slowly changed over time? Let's data is rearranged, thus:

1026-baba-6ccd780c-9564-0040f4311e29
      11e2-17a2-49ea2de3-8346-001eecac3efa
      11e2-17ac-106762a5-8346-001eecac3efa -- after a few more minutes

Now we have a number that increases nicely over time. Multiple sources won't be quite in time order, but they will be close. The "hot" spot for inserting into an INDEX(uuid) will be rather narrow, thereby making it quite cacheable and efficient.

If your SELECTs tend to be for "recent" uuids, then they, too, will be easily cached. If, on the other hand, your SELECTs often reach for old uuids, they will be random and not well cached. Still, improving the INSERTs will help the system overall.

Code to do it

Let's make Stored Functions to do the messy work of the two actions:

Rearrange fields
Convert to/from BINARY(16)

DELIMITER //

    CREATE FUNCTION UuidToBin(_uuid BINARY(36))
        RETURNS BINARY(16)
        LANGUAGE SQL  DETERMINISTIC  CONTAINS SQL  SQL SECURITY INVOKER
    RETURN
        UNHEX(CONCAT(
            SUBSTR(_uuid, 15, 4),
            SUBSTR(_uuid, 10, 4),
            SUBSTR(_uuid,  1, 8),
            SUBSTR(_uuid, 20, 4),
            SUBSTR(_uuid, 25) ));
    //
    CREATE FUNCTION UuidFromBin(_bin BINARY(16))
        RETURNS BINARY(36)
        LANGUAGE SQL  DETERMINISTIC  CONTAINS SQL  SQL SECURITY INVOKER
    RETURN
        LCASE(CONCAT_WS('-',
            HEX(SUBSTR(_bin,  5, 4)),
            HEX(SUBSTR(_bin,  3, 2)),
            HEX(SUBSTR(_bin,  1, 2)),
            HEX(SUBSTR(_bin,  9, 2)),
            HEX(SUBSTR(_bin, 11))
                 ));

    //
    DELIMITER ;

Then you would do things like

-- Letting MySQL create the UUID:
    INSERT INTO t (uuid, ...) VALUES (UuidToBin(UUID()), ...);

    -- Creating the UUID elsewhere:
    INSERT INTO t (uuid, ...) VALUES (UuidToBin(?), ...);

    -- Retrieving (point query using uuid):
    SELECT ... FROM t WHERE uuid = UuidToBin(?);

    -- Retrieving (other):
    SELECT UuidFromBin(uuid), ... FROM t ...;

Do not flip the WHERE; this will be inefficent because it won't use INDEX(uuid):

WHERE UuidFromBin(uuid) = '1026-baba-6ccd780c-9564-0040f4311e29' -- NO

TokuDB

TokuDB has been deprecated by its upstream maintainer. It is disabled from MariaDB 10.5 and has been removed in MariaDB 10.6 - MDEV-19780. We recommend MyRocks as a long-term migration path.

TokuDB is a viable engine if you must have UUIDs (even non-type-1) in a huge table. TokuDB is available in MariaDB as a 'standard' engine, making the barrier to entry very low. There are a small number of differences between InnoDB and TokuDB; I will not go into them here.

Tokudb, with its “fractal” indexing strategy builds the indexes in stages. In contrast, InnoDB inserts index entries “immediately” — actually that indexing is buffered by most of the size of the buffer_pool. To elaborate…

When adding a record to an InnoDB table, here are (roughly) the steps performed to write the data (and PK) and secondary indexes to disk. (I leave out logging, provision for rollback, etc.) First the PRIMARY KEY and data:

Check for UNIQUEness constraints
Fetch the BTree block (normally 16KB) that should contain the row (based on the PRIMARY KEY).
Insert the row (overflow typically occurs 1% of the time; this leads to a block split).
Leave the page “dirty” in the buffer_pool, hoping that more rows are added before it is bumped out of cache (buffer_pool).. Note that for AUTO_INCREMENT and TIMESTAMP-based PKs, the “last” block in the data will be updated repeatedly before splitting; hence, this delayed write adds greatly to the efficiency. OTOH, a UUID will be very random; when the table is big enough, the block will almost always be flushed before a second insert occurs in that block. <– This is the inefficiency in UUIDs. Now for any secondary keys:
All the steps are the same, since an index is essentially a "table" except that the "data" is a copy of the PRIMARY KEY.
UNIQUEness must be checked immediately — cannot delay the read.
There are (I think) some other "delays" that avoid some I/O.

Tokudb, on the other hand, does something like

Write data/index partially sorted records to disk before finding out exactly where it belongs.
In the background, combine these partially digested blocks. Repeat as needed.
Eventually move the info into the real table/indexes.

If you are familiar with how sort-merge works, consider the parallels to Tokudb. Each "sort" does some work of ordering things; each "merge" is quite efficient.

To summarize:

In the extreme (data/index much larger than buffer_pool), InnoDB must read-modify-write one 16KB disk block for each UUID entry.
Tokudb makes each I/O "count" by merging several UUIDs for each disk block. (Yeah, Toku rereads blocks, but it comes out ahead in the long run.)
Tokudb excels when the table is really big, which implies high ingestion rate.

Wrapup

This shows three thing for speeding up usage of GUIDs/UUIDs:

Shrink footprint (Smaller -> more cacheable -> faster).
Rearrange uuid to make a "hot spot" to improve cachability.
Use TokuDB (MyRocks shares some architectural traits which may also be beneficial in handling UUIDs, but this is hypothetical and hasn't been tested)

Note that the benefit of the "hot spot" is only partial:

Chronologically ordered (or approximately ordered) INSERTs benefit; random ones don't.
SELECTs/UPDATEs by "recent" uuids benefit; old ones don't benefit.

Postlog

Thanks to Trey for some of the ideas here.

The tips in this document apply to MySQL, MariaDB, and Percona.

Written Oct, 2012. Added TokuDB, Jan, 2015.

hash_join_cardinality optimizer_switch Flag

MariaDB starting with 10.6.13

The hash_join_cardinality optimizer_switch flag was added in MariaDB 11.0.2, MariaDB 10.11.3, MariaDB 10.10.4, MariaDB 10.9.6, MariaDB 10.8.8 and MariaDB 10.6.13.

In MySQL and MariaDB, the output cardinality of a part of query has historically been tied to the used access method(s). This is different from the approach used in database textbooks. There, the cardinality "x JOIN y" is the same regardless of which access methods are used to compute it.

Example

Consider a query joining customers with their orders:

SELECT * 
FROM
  customer, orders, ...
WHERE 
  customer.id = orders.customer_id AND ...

Suppose, table orders has an index IDX on orders.customer_id.

If the query plan is using this index to fetch orders for each customer, the optimizer will use index statistics from IDX to estimate the number of rows in the customer-joined-with-orders.

On the other hand, if the optimizer considers a query plan that joins customer with orders without use of indexes, it will ignore the customer.id = orders.customer_id equality completely and will compute the output cardinality as if customer was cross-joined with orders.

Hash Join

MariaDB supports Block Hash Join. It is not enabled by default, one needs to set it join_cache_level to 3 or a bigger value to enable it.

Before MDEV-30812, Query optimization for Block Hash Join would work as described in the above example: It would assume that the join operation is a cross join.

MDEV-30812 introduces a new optimizer_switch flag, hash_join_cardinality. In MariaDB versions before 11.0, it is off by default.

If one sets it to ON, the optimizer will make use of column histograms when computing the cardinality of hash join operation output.

One can see the computation in the Optimizer Trace, search for hash_join_cardinality.

_{This page is licensed: CC BY-SA / Gnu FDL}

How to Quickly Insert Data Into MariaDB

This article describes different techniques for inserting data quickly into MariaDB.

Background

When inserting new data into MariaDB, the things that take time are: (in order of importance):

Syncing data to disk (as part of the end of transactions)
Adding new keys. The larger the index, the more time it takes to keep keys updated.
Checking against foreign keys (if they exist).
Adding rows to the storage engine.
Sending data to the server.

The following describes the different techniques (again, in order of importance) you can use to quickly insert data into a table.

Disabling Keys

You can temporarily disable updating of non unique indexes. This is mostly useful when there are zero (or very few) rows in the table into which you are inserting data.

ALTER TABLE table_name DISABLE KEYS;
BEGIN;
... inserting data WITH INSERT OR LOAD DATA ....
COMMIT;
ALTER TABLE table_name ENABLE KEYS;

In many storage engines (at least MyISAM and Aria),ENABLE KEYS works by scanning through the row data and collecting keys, sorting them, and then creating the index blocks. This is an order of magnitude faster than creating the index one row at a time and it also uses less key buffer memory.

Note: When you insert into an empty table with INSERT orLOAD DATA, MariaDB automatically does aDISABLE KEYS before and an ENABLE KEYS afterwards.

When inserting big amounts of data, integrity checks are sensibly time-consuming. It is possible to disable the UNIQUE indexes and the foreign keys checks using the unique_checks and the foreign_key_checks system variables:

SET @@session.unique_checks = 0;
SET @@session.foreign_key_checks = 0;

For InnoDB tables, the AUTO_INCREMENT lock mode can be temporarily set to 2, which is the fastest setting:

SET @@global.innodb_autoinc_lock_mode = 2;

Also, if the table has INSERT triggers or PERSISTENT columns, you may want to drop them, insert all data, and recreate them.

Loading Text Files

The fastest way to insert data into MariaDB is through theLOAD DATA INFILE command.

The simplest form of the command is:

LOAD DATA INFILE 'file_name' INTO TABLE table_name;

You can also read a file locally on the machine where the client is running by using:

LOAD DATA LOCAL INFILE 'file_name' INTO TABLE table_name;

This is not as fast as reading the file on the server side, but the difference is not that big.

LOAD DATA INFILE is very fast because:

there is no parsing of SQL.
data is read in big blocks.
if the table is empty at the beginning of the operation, all non unique indexes are disabled during the operation.
the engine is told to cache rows first and then insert them in big blocks (At least MyISAM and Aria support this).
for empty tables, some transactional engines (like Aria) do not log the inserted data in the transaction log because one can rollback the operation by just doing a TRUNCATE on the table.

Because of the above speed advantages there are many cases, when you need to insert many rows at a time, where it may be faster to create a file locally, add the rows there, and then use LOAD DATA INFILE to load them; compared to using INSERT to insert the rows.

You will also get progress reporting forLOAD DATA INFILE.

mariadb-import

You can import many files in parallel with mariadb-import (mysqlimport before MariaDB 10.5). For example:

mariadb-import --use-threads=10 database text-file-name [text-file-name...]

Internally mariadb-import uses LOAD DATA INFILE to read in the data.

Inserting Data with INSERT Statements

Using Big Transactions

When doing many inserts in a row, you should wrap them with BEGIN / END to avoid doing a full transaction (which includes a disk sync) for every row. For example, doing a begin/end every 1000 inserts will speed up your inserts by almost 1000 times.

BEGIN;
INSERT ...
INSERT ...
END;
BEGIN;
INSERT ...
INSERT ...
END;
...

The reason why you may want to have many BEGIN/END statements instead of just one is that the former will use up less transaction log space.

Multi-Value Inserts

You can insert many rows at once with multi-value row inserts:

INSERT INTO table_name VALUES(1,"row 1"),(2, "row 2"),...;

The limit for how much data you can have in one statement is controlled by themax_allowed_packet server variable.

Inserting Data Into Several Tables at Once

If you need to insert data into several tables at once, the best way to do so is to enable multi-row statements and send many inserts to the server at once:

INSERT INTO table_name_1 (auto_increment_key, data) VALUES (NULL,"row 1");
INSERT INTO table_name_2 (auto_increment, reference, data) VALUES (NULL, LAST_INSERT_ID(), "row 2");

LAST_INSERT_ID() is a function that returns the lastauto_increment value inserted.

By default, the command line mariadb client will send the above as multiple statements.

To test this in the mariadb client you have to do:

delimiter ;;
SELECT 1; SELECT 2;;
delimiter ;

Note: for multi-query statements to work, your client must specify theCLIENT_MULTI_STATEMENTS flag to mysql_real_connect().

Server Variables That Can be Used to Tune Insert Speed

Option

Description

innodb_buffer_pool_size

Increase this if you have many indexes in InnoDB/XtraDB tables

key_buffer_size

Increase this if you have many indexes in MyISAM tables

max_allowed_packet

Increase this to allow bigger multi-insert statements

read_buffer_size

Read block size when reading a file with LOAD DATA

See Server System Variables for the full list of server variables.

_{This page is licensed: CC BY-SA / Gnu FDL}

IGNORE INDEX

Syntax

IGNORE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])

Description

You can tell the optimizer to not consider a particular index with the IGNORE INDEX option.

The benefit of using IGNORE_INDEX instead of USE_INDEX is that it will not disable a new index which you may add later.

Also see Ignored Indexes for an option to specify in the index definition that indexes should be ignored.

Index Prefixes

When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.

Example

This is used after the table name in the FROM clause:

CREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City IGNORE INDEX (Name)
WHERE name="Helsingborg" AND countrycode="SWE";

This produces:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	City	ref	CountryCode	CountryCode	3	const	14	Using where

Index Condition Pushdown

Index Condition Pushdown is an optimization that is applied for access methods that access table data through indexes: range, ref, eq_ref, ref_or_null, and Batched Key Access.

The idea is to check part of the WHERE condition that refers to index fields (we call it Pushed Index Condition) as soon as we've accessed the index. If the Pushed Index Condition is not satisfied, we won't need to read the whole table record.

Index Condition Pushdown is on by default. To disable it, set its optimizer_switch flag like so:

SET optimizer_switch='index_condition_pushdown=off'

When Index Condition Pushdown is used, EXPLAIN will show "Using index condition":

MariaDB [test]> EXPLAIN SELECT * FROM tbl WHERE key_col1 BETWEEN 10 AND 11 AND key_col2 LIKE '%foo%';
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+
|  1 | SIMPLE      | tbl   | range | key_col1      | key_col1 | 5       | NULL |    2 | Using index condition |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------+

The Idea Behind Index Condition Pushdown

In disk-based storage engines, making an index lookup is done in two steps, like shown on the picture:

Index Condition Pushdown optimization tries to cut down the number of full record reads by checking whether index records satisfy part of the WHERE condition that can be checked for them:

How much speed will be gained depends on

How many records will be filtered out
How expensive it was to read them

The former depends on the query and the dataset. The latter is generally bigger when table records are on disk and/or are big, especially when they have blobs.

Example Speedup

I used DBT-3 benchmark data, with scale factor=1. Since the benchmark defines very few indexes, we've added a multi-column index (index condition pushdown is usually useful with multi-column indexes: the first component(s) is what index access is done for, the subsequent have columns that we read and check conditions on).

ALTER TABLE lineitem ADD INDEX s_r (l_shipdate, l_receiptdate);

The query was to find big (l_quantity > 40) orders that were made in January 1993 that took more than 25 days to ship:

SELECT count(*) FROM lineitem
WHERE
  l_shipdate BETWEEN '1993-01-01' AND '1993-02-01' AND
  datediff(l_receiptdate,l_shipdate) > 25 AND
  l_quantity > 40;

EXPLAIN without Index Condition Pushdown:

-+----------+-------+----------------------+-----+---------+------+--------+-------------+
 | table    | type | possible_keys         | key | key_len | ref | rows    | Extra       |
-+----------+-------+----------------------+-----+---------+------+--------+-------------+
 | lineitem | range | s_r                  | s_r | 4       | NULL | 152064 | Using where |
-+----------+-------+----------------------+-----+---------+------+--------+-------------+

with Index Condition Pushdown:

-+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+
 | table     | type | possible_keys | key | key_len | ref | rows     | Extra                              |
-+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+
 | lineitem | range | s_r            | s_r | 4       | NULL | 152064 | Using index condition; Using where |
-+-----------+-------+---------------+-----+---------+------+--------+------------------------------------+

The speedup was:

Cold buffer pool: from 5 min down to 1 min
Hot buffer pool: from 0.19 sec down to 0.07 sec

Status Variables

There are two server status variables:

Variable name

Meaning

Handler_icp_attempts

Number of times pushed index condition was checked.

Handler_icp_match

Number of times the condition was matched.

That way, the value Handler_icp_attempts - Handler_icp_match shows the number records that the server did not have to read because of Index Condition Pushdown.

Limitations

Currently, virtual column indexes can't be used for index condition pushdown. Instead, a generated column can be made declared STORED. Then, index condition pushdown will be possible.
Index Condition Pushdown can't be used with backward-ordered index scan. When the optimizer needs to execute an ORDER BY ... DESC query which can be handled by using a backward-ordered index scan, it will disable Index Condition Pushdown.

Partitioned Tables

Index condition pushdown support for partitioned tables was added in MariaDB 11.5.

_{This page is licensed: CC BY-SA / Gnu FDL}

Index Hints: How to Force Query Plans

The optimizer is largely cost-based and will try to choose the optimal plan for any query. However in some cases it does not have enough information to choose a perfect plan and in these cases you may have to provide hints to force the optimizer to use another plan.

You can examine the query plan for a SELECT by writing EXPLAIN before the statement. SHOW EXPLAIN shows the output of a running query. In some cases, its output can be closer to reality than EXPLAIN.

For the following queries, we will use the world database for the examples.

Setting up the World Example Database

Download it from world.sql.gz

Install it with:

mariadb-admin create world
zcat world.sql.gz | ../client/mysql world

mariadb-admin create world
gunzip world.sql.gz
../client/mysql world < world.sql

Forcing Join Order

You can force the join order by using STRAIGHT_JOIN either in the SELECT or JOIN part.

The simplest way to force the join order is to put the tables in the correct order in the FROM clause and use SELECT STRAIGHT_JOIN like so:

SELECT STRAIGHT_JOIN SUM(City.Population) FROM Country,City WHERE
City.CountryCode=Country.Code AND Country.HeadOfState="Volodymyr Zelenskyy";

If you only want to force the join order for a few tables, useSTRAIGHT_JOIN in the FROM clause. When this is done, only tables connected with STRAIGHT_JOIN will have their order forced. For example:

SELECT SUM(City.Population) FROM Country STRAIGHT_JOIN City WHERE
City.CountryCode=Country.Code AND Country.HeadOfState="Volodymyr Zelenskyy";

In both of the above cases Country will be scanned first and for each matching country (one in this case) all rows in City will be checked for a match. As there is only one matching country this will be faster than the original query.

The output of EXPLAIN for the above cases is:

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

Country

ALL

PRIMARY

NULL

239

Using where

SIMPLE

City

ALL

NULL

4079

Using where; Using join buffer (flat, BNL join)

This is one of the few cases where ALL is ok, as the scan of theCountry table will only find one matching row.

Forcing Usage of a Specific Index for the WHERE Clause

In some cases the optimizer may choose a non-optimal index or it may choose to not use an index at all, even if some index could theoretically be used.

In these cases you have the option to either tell the optimizer to only use a limited set of indexes, ignore one or more indexes, or force the usage of some particular index.

USE INDEX: Use a Limited Set of Indexes

You can limit which indexes are considered with the USE INDEX option.

USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])

The default is 'FOR JOIN', which means that the hint only affects how theWHERE clause is optimized.

USE INDEX is used after the table name in the FROM clause.

Example:

CREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City USE INDEX (CountryCode)
WHERE name="Helsingborg" AND countrycode="SWE";

This produces:

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

City

ref

CountryCode

const

Using where

If we had not used USE INDEX, the Name index would have been inpossible keys.

IGNORE INDEX: Don't Use a Particular Index

You can tell the optimizer to not consider some particular index with the IGNORE INDEX option.

IGNORE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])

This is used after the table name in the FROM clause:

CREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City IGNORE INDEX (Name)
WHERE name="Helsingborg" AND countrycode="SWE";

This produces:

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

City

ref

CountryCode

const

Using where

The benefit of using IGNORE_INDEX instead of USE_INDEX is that it will not disable a new index which you may add later.

Also see Ignored Indexes for an option to specify in the index definition that indexes should be ignored.

FORCE INDEX: Forcing an Index

CREATE INDEX Name ON City (Name);
EXPLAIN SELECT Name,CountryCode FROM City FORCE INDEX (Name)
WHERE name>="A" AND CountryCode >="A";

This produces:

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

City

range

Name

NULL

4079

Using where

FORCE_INDEX works by only considering the given indexes (like withUSE_INDEX) but in addition it tells the optimizer to regard a table scan as something very expensive. However if none of the 'forced' indexes can be used, then a table scan will be used anyway.

Index Prefixes

When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.

Forcing an Index to be Used for ORDER BY or GROUP BY

The optimizer will try to use indexes to resolve ORDER BY and GROUP BY.

You can use USE INDEX, IGNORE INDEX and FORCE INDEX as in the WHERE clause above to ensure that some specific index used:

USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])

This is used after the table name in the FROM clause.

Example:

CREATE INDEX Name ON City (Name);
EXPLAIN SELECT Name,Count(*) FROM City
FORCE INDEX FOR GROUP BY (Name)
WHERE population >= 10000000 GROUP BY Name;

This produces:

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

City

index

NULL

Name

NULL

4079

Using where

Without the FORCE INDEX option we would have 'Using where; Using temporary; Using filesort' in the 'Extra' column, which means that the optimizer would created a temporary table and sort it.

Help the Optimizer Optimize GROUP BY and ORDER BY

The optimizer uses several strategies to optimize GROUP BY and ORDER BY:

Resolve with an index:
- Scan the table in index order and output data as we go. (This only works if the ORDER BY / GROUP BY can be resolved by an index after constant propagation is done).
Filesort:
- Scan the table to be sorted and collect the sort keys in a temporary file.
- Sort the keys + reference to row (with filesort)
- Scan the table in sorted order
Use a temporary table for ORDER BY:
- Create a temporary (in memory) table for the 'to-be-sorted' data. (If this gets bigger than max_heap_table_size or contains blobs then an Aria or MyISAM disk based table will be used)
- Sort the keys + reference to row (with filesort)
- Scan the table in sorted order

A temporary table will always be used if the fields which will be sorted are not from the first table in the JOIN order.

Use a temporary table for GROUP BY:
- Create a temporary table to hold the GROUP BY result with an index that matches the GROUP BY fields.
- Produce a result row
- If a row with the GROUP BY key exists in the temporary table, add the new result row to it. If not, create a new row.
- Before sending the results to the user, sort the rows with filesort to get the results in the GROUP BY order.

Forcing/Disallowing TemporaryTables to be Used for GROUP BY:

Using an in-memory table (as described above) is usually the fastest option forGROUP BY if the result set is small. It is not optimal if the result set is very big. You can tell the optimizer this by usingSELECT SQL_SMALL_RESULT or SELECT SQL_BIG_RESULT.

For example:

EXPLAIN SELECT SQL_SMALL_RESULT Name,Count(*) AS Cities 
FROM City GROUP BY Name HAVING Cities > 2;

produces:

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

City

ALL

NULL

4079

Using temporary; Using filesort

while:

EXPLAIN SELECT SQL_BIG_RESULT Name,Count(*) AS Cities 
FROM City GROUP BY Name HAVING Cities > 2;

produces:

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

City

ALL

NULL

4079

Using filesort

The difference is that with SQL_SMALL_RESULT a temporary table is used.

Forcing Usage of Temporary Tables

In some cases you may want to force the use of a temporary table for the result to free up the table/row locks for the used tables as quickly as possible.

You can do this with the SQL_BUFFER_RESULT option:

CREATE INDEX Name ON City (Name);
EXPLAIN SELECT SQL_BUFFER_RESULT Name,Count(*) AS Cities FROM City
GROUP BY Name HAVING Cities > 2;

This produces:

select_type

table

type

possible_keys

key

key_len

ref

rows

Extra

SIMPLE

City

index

NULL

Name

NULL

4079

Using index; Using temporary

Without SQL_BUFFER_RESULT, the above query would not use a temporary table for the result set.

Optimizer Switch

In MariaDB 5.3 we added an optimizer switch which allows you to specify which algorithms will be considered when optimizing a query.

See the optimizer section for more information about the different algorithms which are used.

index_merge sort_intersection

Prior to MariaDB 5.3, the index_merge access method supported union,sort-union, and intersection operations. Starting from MariaDB 5.3, thesort-intersection operation is also supported. This allows the use ofindex_merge in a broader number of cases.

This feature is disabled by default. To enable it, turn on the optimizer switch index_merge_sort_intersection like so:

SET optimizer_switch='index_merge_sort_intersection=on'

Limitations of index_merge/intersection

Prior to MariaDB 5.3, the index_merge access method had one intersection strategy called intersection. That strategy can only be used when merged index scans produced rowid-ordered streams. In practice this means that anintersection could only be constructed from equality (=) conditions.

For example, the following query will use intersection:

MySQL [ontime]> EXPLAIN SELECT AVG(arrdelay) FROM ontime WHERE depdel15=1 AND OriginState ='CA';
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+
|id|select_type|table |type       |possible_keys       |key                 |key_len|ref |rows |Extra                                            |
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+
| 1|SIMPLE     |ontime|index_merge|OriginState,DepDel15|OriginState,DepDel15|3,5    |NULL|76952|Using intersect(OriginState,DepDel15);Using where|
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+-------------------------------------------------+

but if you replace OriginState ='CA' with OriginState IN ('CA', 'GB') (which matches the same number of records), then intersection is not usable anymore:

MySQL [ontime]> explain select avg(arrdelay) from ontime where depdel15=1 and OriginState IN ('CA', 'GB');
+--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+
|id|select_type|table |type|possible_keys       |key     |key_len|ref  |rows |Extra      |
+--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+
| 1|SIMPLE     |ontime|ref |OriginState,DepDel15|DepDel15|5      |const|36926|Using where|
+--+-----------+------+----+--------------------+--------+-------+-----+-----+-----------+

The latter query would also run 5.x times slower (from 2.2 to 10.8 seconds) in our experiments.

How index_merge/sort_intersection improves the situation

In MariaDB 5.3, when index_merge_sort_intersection is enabled,index_merge intersection plans can be constructed from non-equality conditions:

MySQL [ontime]> explain select avg(arrdelay) from ontime where depdel15=1 and OriginState IN ('CA', 'GB');
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+
|id|select_type|table |type       |possible_keys       |key                 |key_len|ref |rows |Extra                                                   |
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+
| 1|SIMPLE     |ontime|index_merge|OriginState,DepDel15|DepDel15,OriginState|5,3    |NULL|60754|Using sort_intersect(DepDel15,OriginState); Using where |
+--+-----------+------+-----------+--------------------+--------------------+-------+----+-----+--------------------------------------------------------+

In our tests, this query ran in 3.2 seconds, which is not as good as the case with two equalities, but still much better than 10.8 seconds we were getting without sort_intersect.

The sort_intersect strategy has higher overhead than intersect but is able to handle a broader set of WHERE conditions.

When to Use

index_merge/sort_intersection works best on tables with lots of records and where intersections are sufficiently large (but still small enough to make a full table scan overkill).

The benefit is expected to be bigger for io-bound loads.

_{This page is licensed: CC BY-SA / Gnu FDL}

LIMIT ROWS EXAMINED

Syntax

SELECT ... FROM ... WHERE ...
[group_clause] [order_clause]
LIMIT [[OFFSET,] row_count] ROWS EXAMINED rows_limit;

Similar to the parameters of LIMIT, rows_limit can be both a prepared statement parameter, or a stored program parameter.

Description

The purpose of this optimization is to provide the means to terminate the execution of SELECT statements which examine too many rows, and thus use too many resources. This is achieved through an extension of the LIMIT clause —LIMIT ROWS EXAMINED number_of_rows. Whenever possible the semantics of LIMIT ROWS EXAMINED is the same as that of normal LIMIT (for instance for aggregate functions).

The LIMIT ROWS EXAMINED clause is taken into account by the query engine only during query execution. Thus the clause is ignored in the following cases:

If a query is EXPLAIN-ed.
During query optimization.
During auxiliary operations such as writing to system tables (e.g. logs).

The clause is not applicable to DELETE or UPDATE statements, and if used in those statements produces a syntax error.

The effects of this clause are as follows:

The server counts the number of read, inserted, modified, and deleted rows during query execution. This takes into account the use of temporary tables, and sorting for intermediate query operations.
Once the counter exceeds the value specified in the LIMIT ROWS EXAMINED clause, query execution is terminated as soon as possible.
The effects of terminating the query because of LIMIT ROWS EXAMINED are as follows:
- The result of the query is a subset of the complete query, depending on when the query engine detected that the limit was reached. The result may be empty if no result rows could be computed before reaching the limit.
- A warning is generated of the form: "Query execution was interrupted. The query examined at least 100 rows, which exceeds LIMIT ROWS EXAMINED (20). The query result may be incomplete."
- If query processing was interrupted during filesort, an error is returned in addition to the warning.
- If a UNION was interrupted during execution of one of its queries, the last step of the UNION is still executed in order to produce a partial result.
- Depending on the join and other execution strategies used for a query, the same query may produce no result at all, or a different subset of the complete result when terminated due to LIMIT ROWS EXAMINED.
- If the query contains a GROUP BY clause, the last group where the limit was reached will be discarded.

The LIMIT ROWS EXAMINED clause cannot be specified on a per-subquery basis. There can be only one LIMIT ROWS EXAMINED clause for the whole SELECT statement. If a SELECT statement contains several subqueries with LIMIT ROWS EXAMINED, the one that is parsed last is taken into account.

Examples

A simple example of the clause is:

SELECT * FROM t1, t2 LIMIT 10 ROWS EXAMINED 10000;

The LIMIT ROWS EXAMINED clause is global for the whole statement.

If a composite query (such as UNION, or query with derived tables or with subqueries) contains more than one LIMIT ROWS EXAMINED, the last one parsed is taken into account. In this manner either the last or the outermost one is taken into account. For instance, in the query:

SELECT * FROM t1
WHERE c1 IN (SELECT * FROM t2 WHERE c2 > ' ' LIMIT ROWS EXAMINED 0)
LIMIT ROWS EXAMINED 11;

The limit that is taken into account is 11, not 0.

_{This page is licensed: CC BY-SA / Gnu FDL}

MariaDB 5.3 Optimizer Debugging

MariaDB 5.3 has an optimizer debugging patch. The patch is pushed into:

lp:maria-captains/maria/5.3-optimizer-debugging

The patch is wrapped in #ifdef, but there is a #define straight in mysql_priv.h so simply compiling that tree should produce a binary with optimizer debugging enabled.

The patch adds two system variables:

@@debug_optimizer_prefer_join_prefix
@@debug_optimizer_dupsweedout_penalized

the variables are present as session/global variables, and are also settable via the server command line.

debug_optimizer_prefer_join_prefix

If this variable is non-NULL, it is assumed to specify a join prefix as a comma-separated list of table aliases:

SET debug_optimizer_prefer_join_prefix='tbl1,tbl2,tbl3';

The optimizer will try its best to build a join plan which matches the specified join prefix. It does this by comparing join prefixes it is considering with @@debug_optimizer_prefer_join_prefix, and multiplying cost by a million if the plan doesn't match the prefix.

As a result, you can more-or-less control the join order. For example, let's take this query:

MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten B, ten C;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                              |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
|  1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                    |
|  1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
|  1 | SIMPLE      | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
3 rows in set (0.00 sec)

and request a join order of C,A,B:

MariaDB [test]> SET debug_optimizer_prefer_join_prefix='C,A,B';
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten B, ten C;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                              |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
|  1 | SIMPLE      | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 |                                    |
|  1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
|  1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
3 rows in set (0.00 sec)

We got it.

Note that this is still a best-effort approach:

you won't be successful in forcing join orders which the optimizer considers invalid (e.g. for "t1 LEFT JOIN t2" you won't be able to get a join order of t2,t1).
The optimizer does various plan pruning and may discard the requested join order before it has a chance to find out that it is a million-times cheaper than any other.

Semi-joins

It is possible to force the join order of joins plus semi-joins. This may cause a different strategy to be used:

MariaDB [test]> SET debug_optimizer_prefer_join_prefix=NULL;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> EXPLAIN SELECT * FROM ten A WHERE a IN (SELECT B.a FROM ten B, ten C WHERE C.a + A.a < 4);
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                      |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
|  1 | PRIMARY     | A     | ALL  | NULL          | NULL | NULL    | NULL |   10 |                            |
|  1 | PRIMARY     | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where                |
|  1 | PRIMARY     | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; FirstMatch(A) |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------------------+
3 rows in set (0.00 sec)

MariaDB [test]> SET debug_optimizer_prefer_join_prefix='C,A,B';
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> EXPLAIN SELECT * FROM ten A WHERE a IN (SELECT B.a FROM ten B, ten C WHERE C.a + A.a < 4);
+----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                           |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
|  1 | PRIMARY     | C     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Start temporary                                 |
|  1 | PRIMARY     | A     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; Using join buffer (flat, BNL join) |
|  1 | PRIMARY     | B     | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using where; End temporary                      |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------------------------------------------+
3 rows in set (0.00 sec)

Semi-join materialization is a somewhat special case, because "join prefix" is not exactly what you see in the EXPLAIN output. For semi-join materialization:

don't put "<subqueryN>" into @@debug_optimizer_prefer_join_prefix
instead, put all of the materialization tables into the place where you want the <subqueryN> line.
Attempts to control the join order inside the materialization nest will be unsuccessful. Example: we want A-C-B-AA:

MariaDB [test]> SET debug_optimizer_prefer_join_prefix='A,C,B,AA';
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> EXPLAIN SELECT * FROM ten A, ten AA WHERE A.a IN (SELECT B.a FROM ten B, ten C);
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
| id | select_type | table       | type   | possible_keys | key          | key_len | ref  | rows | Extra                              |
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
|  1 | PRIMARY     | A           | ALL    | NULL          | NULL         | NULL    | NULL |   10 |                                    |
|  1 | PRIMARY     | <subquery2> | eq_ref | distinct_key  | distinct_key | 5       | func |    1 |                                    |
|  1 | PRIMARY     | AA          | ALL    | NULL          | NULL         | NULL    | NULL |   10 | Using join buffer (flat, BNL join) |
|  2 | SUBQUERY    | B           | ALL    | NULL          | NULL         | NULL    | NULL |   10 |                                    |
|  2 | SUBQUERY    | C           | ALL    | NULL          | NULL         | NULL    | NULL |   10 |                                    |
+----+-------------+-------------+--------+---------------+--------------+---------+------+------+------------------------------------+
5 rows in set (0.00 sec)

but we get A-B-C-AA.

debug_optimizer_dupsweedout_penalized

There are four semi-join execution strategies:

FirstMatch
Materialization
LooseScan
DuplicateWeedout

The first three strategies have flags in @@optimizer_switch that can be used to disable them. The DuplicateWeedout strategy does not have a flag. This was done for a reason, as that strategy is the catch-all strategy and it can handle all kinds of subqueries, in all kinds of join orders. (We're slowly moving to the point where it will be possible to run with FirstMatch enabled and everything else disabled but we are not there yet.)

Since DuplicateWeedout cannot be disabled, there are cases where it "gets in the way" by being chosen over the strategy you need. This is whatdebug_optimizer_dupsweedout_penalized is for. if you set:

MariaDB [test]> SET debug_optimizer_dupsweedout_penalized=TRUE;

...the costs of query plans that use DuplicateWeedout will be multiplied by a millon. This doesn't mean that you will get rid of DuplicateWeedout — due to Bug #898747 it is still possible to have DuplicateWeedout used even if a cheaper plan exits. A partial remedy to this is to run with

MariaDB [test]> SET optimizer_prune_level=0;

It is possible to use both debug_optimizer_dupsweedout_penalized and debug_optimizer_prefer_join_prefix at the same time. This should give you the desired strategy and join order.

not_null_range_scan Optimization

The NOT NULL range scan optimization enables the optimizer to construct range scans from NOT NULL conditions that it was able to infer from the WHERE clause.

The optimization appeared in MariaDB 10.5.0. It is not enabled by default; one needs to set an optimizer_switch flag to enable it.

Description

A basic (but slightly artificial) example:

CREATE TABLE items (
  price  DECIMAL(8,2),
  weight DECIMAL(8,2),
  ...
  INDEX(weight)
);

-- Find items that cost more than 1000 $currency_units per kg:
SET optimizer_switch='not_null_range_scan=ON';
EXPLAIN
SELECT * FROM items WHERE items.price > items.weight / 1000;

The WHERE condition in this form cannot be used for range scans. However, one can infer that it will reject rows that NULL for weight. That is, infer an additional condition that

weight IS NOT NULL

and pass it to the range optimizer. The range optimizer can, in turn, evaluate whether it makes sense to construct range access from the condition:

+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key    | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
|    1 | SIMPLE      | items | range | NULL          | weight | 5       | NULL | 1    | Using where |
+------+-------------+-------+-------+---------------+--------+---------+------+------+-------------+

Here's another example that's more complex but is based on a real-world query. Consider a join query

-- Find orders that were returned
SELECT * FROM current_orders AS O, order_returns AS RET
WHERE 
  O.return_id= RET.id;

Here, the optimizer can infer the condition "return_id IS NOT NULL". If most of the orders are not returned (and so have NULL for return_id), one can use range access to scan only those orders that had a return.

Controlling the Optimization

The optimization is not enabled by default. One can enable it like so

SET optimizer_switch='not_null_range_scan=ON';

Optimizer Trace

TODO.

optimizer_switch

optimizer_switch is a server variable that one can use to enable/disable specific optimizations.

Syntax

To set or unset the various optimizations, use the following syntax:

SET [GLOBAL|SESSION] optimizer_switch='cmd[,cmd]...';

The cmd takes the following format:

Syntax

Description

default

Reset all optimizations to their default values.

optimization_name=default

Set the specified optimization to its default value.

optimization_name=on

Enable the specified optimization.

optimization_name=off

Disable the specified optimization.

There is no need to list all flags - only those that are specified in the command will be affected.

Available Flags

Below is a list of all optimizer_switch flags available in MariaDB:

Flag and MariaDB default

Supported in MariaDB since

condition_pushdown_for_derived=on

MariaDB 10.2.2

condition_pushdown_for_subquery=on

MariaDB 10.4

condition_pushdown_from_having=on

MariaDB 10.4.3

cset_narrowing=on/off

MariaDB 10.6.16, MariaDB 10.11.6, MariaDB 11.0.4, MariaDB 11.1.3 and MariaDB 11.2.2

default

duplicateweedout=on

engine_condition_pushdown=off

MariaDB 5.5 (deprecated in MariaDB 10.1, removed in MariaDB 11.3)

index_condition_pushdown=on

MariaDB 5.3

hash_join_cardinality=off

MariaDB 10.6.13 (MDEV-30812)

index_merge=on

MariaDB 5.1

index_merge_intersection=on

MariaDB 5.1

index_merge_sort_intersection=off

MariaDB 5.3

index_merge_sort_union=on

MariaDB 5.1

index_merge_union=on#

join_cache_incremental=on

MariaDB 5.3

loosescan=on

MariaDB 5.3

materialization=on (semi-join, non-semi-join)

not_null_range_scan=off

MariaDB 10.5

optimize_join_buffer_size=on

MariaDB 5.3

orderby_uses_equalities=on

MariaDB 10.1.15

outer_join_with_cache=on

MariaDB 5.3

partial_match_rowid_merge=on

MariaDB 5.3

partial_match_table_scan=on

semijoin_with_cache=on

MariaDB 5.3

split_materialized=on[1]

↑ replaced split_grouping_derived, introduced in MariaDB 10.3.1

Defaults

From version

Default optimizer_switch setting

MariaDB 11.8.0

index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, duplicateweedout=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=on, sargable_casefold=on

MariaDB 11.7.0

index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=on, sargable_casefold=on

MariaDB 11.3.1

index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=off, sargable_casefold=on

MariaDB 10.6.16, MariaDB 10.11.6, MariaDB 11.0.4, MariaDB 11.1.3 and MariaDB 11.2.2

index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on, condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on, cset_narrowing=off

MariaDB 11.0.2

index_merge=on, index_merge_union=on, index_merge_sort_union=on, index_merge_intersection=on, index_merge_sort_intersection=off, engine_condition_pushdown=off, index_condition_pushdown=on, derived_merge=on, derived_with_keys=on, firstmatch=on, loosescan=on, materialization=on, in_to_exists=on, semijoin=on, partial_match_rowid_merge=on, partial_match_table_scan=on, subquery_cache=on, mrr=off, mrr_cost_based=off, mrr_sort_keys=off, outer_join_with_cache=on, semijoin_with_cache=on, join_cache_incremental=on, join_cache_hashed=on, join_cache_bka=on, optimize_join_buffer_size=on, table_elimination=on, extended_keys=on, exists_to_in=on, orderby_uses_equalities=on, condition_pushdown_for_derived=on, split_materialized=on, condition_pushdown_for_subquery=on, rowid_filter=on,condition_pushdown_from_having=on, not_null_range_scan=off, hash_join_cardinality=on

MariaDB 10.6.13, MariaDB 10.11.3

MariaDB 10.5.0

optimizer_adjust_secondary_key_costs

`optimizer_adjust_secondary_key_costs`

Description: Gives the user the ability to affect how the costs for secondary keys using ref are calculated in the few cases when MariaDB 10.6 up to MariaDB 10.11 makes a sub-optimal choice when optimizing ref access, either for key lookups or GROUP BY. ref, as used by EXPLAIN, means that the optimizer is using key-lookup on one value to find the matching rows from a table. Unused from MariaDB 11.0. In MariaDB 10.6.18 the variable was changed from a number to a set of strings and disable_forced_index_in_group_by (value 4) was added.
Scope: Global, Session
Dynamic: Yes
Data Type: set
Default Value: fix_reuse_range_for_ref, fix_card_multiplier
Range: 0 to 63 or any combination of adjust_secondary_key_cost, disable_max_seek or disable_forced_index_in_group_by, fix_innodb_cardinality,fix_reuse_range_for_ref, fix_card_multiplier
Introduced: MariaDB 10.6.17, MariaDB 10.11.7

MariaDB [securedb]> select @@version;
+-----------------------------------+
| @@version             |
+-----------------------------------+
| 10.6.20-16-MariaDB-enterprise-log |
+-----------------------------------+
1 row in set (0.001 sec)

MariaDB [securedb]> select @@optimizer_adjust_secondary_key_costs;
+---------------------------------------------+
| @@optimizer_adjust_secondary_key_costs   |
+---------------------------------------------+
| fix_reuse_range_for_ref,fix_card_multiplier |
+---------------------------------------------+

MariaDB starting with 11.0

optimizer_adjust_secondary_key_costs will be obsolete starting from MariaDB 11.0 as the new optimizer in 11.0 does not have max_seek optimization and is already using cost based choices for index usage with GROUP BY.

The value for optimizer_adjust_secondary_key_costs is one of more of the following:

Value

Version added

Old behavior

Change when variable is used

adjust_secondary_key_cost

10.6.17

Limit ref costs by max_seeks

The secondary key costs for ref are updated to be at least five times the clustered primary key costs if a clustered primary key exists

disable_max_seek

10.6.17

ref cost on secondary keys is limited to max_seek = min('number of expected rows'/ 10, scan_time*3)

Disable 'max_seek' optimization and do a slight adjustment of filter cost

disable_forced_index_in_group_by

10.6.18

Use a rule-based choice when deciding to use an index to resolve GROUP BY

The choice is now cost based

fix_innodb_cardinality

10.6.19

By default InnoDB doubles the cardinality for indexes in an effort to force index usage over table scans. This can cause the optimizer to create sub-optimal plans for ranges or index entries that cover a big part of the table.

Using this option removes the doubling of cardinality in InnoDB. fix_innodb_cardinality is recommended to be used only as a server startup option, as it is enabled for a table at first usage. See MDEV-34664 for details.

fix_reuse_range_for_ref

10.6.20

Number of estimated rows for 'ref' did not always match costs from range optimizer

Use cost from range optimizer for 'ref' if all used key parts are constants. The old code did not always do this

fix_card_multiplier

10.6.20

Index selectivity can be bigger than 1.0 if index statistics is not up to date. Not on by default.

Ensure that the calculated index selectivity is never bigger than 1.0. Having index selectivity bigger than 1.0 causes MariaDB to believe that there is more rows in the table than in reality, which can cause wrong plans. This option is on by default.

One can set all options with:

SET @@optimizer_adjust_secondary_key_costs='all';

Explanations of the old behavior in MariaDB 10.x

The reason for the max_seek optimization was originally to ensure that MariaDB would use a key instead of a table scan. This works well for a lot of queries, but can cause problems when a table scan is a better choice, such as when one would have to scan more than 1/4 of the rows in the table (in which case a table scan is better).

optimizer_join_limit_pref_ratio Optimization

Basics

Off (0) by default, when enabling this optimization, MariaDB will consider a join order that may shorten query execution time based on the ORDER BY ... LIMIT n clause. For small values of n, this may improve performance.

Set the value of optimizer_join_limit_pref_ratio to a non-zero value to enable this option (higher values are more conservative, recommended value is 100), or set to 0 (the default value) to disable it.

Detailed description

Problem setting

By default, the MariaDB optimizer picks a join order without considering the ORDER BY ... LIMIT clause, when present.

For example, consider a query looking at latest 10 orders together with customers who made them:

SELECT *
FROM
  customer,ORDER
WHERE
  customer.name=ORDER.customer_name
ORDER BY
  ORDER.DATE DESC
LIMIT 10

The two possible plans are:customer->orders:

+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
| id   | select_type | table    | type | possible_keys | key           | key_len | ref           | rows | Extra                                        |
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+
|    1 | SIMPLE      | customer | ALL  | name          | NULL          | NULL    | NULL          | 9623 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | orders   | ref  | customer_name | customer_name | 103     | customer.name | 1    |                                              |
+------+-------------+----------+------+---------------+---------------+---------+---------------+------+----------------------------------------------+

and orders->customer:

+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
| id   | select_type | table    | type  | possible_keys | key        | key_len | ref                  | rows | Extra       |
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+
|    1 | SIMPLE      | orders   | index | customer_name | order_date | 4       | NULL                 | 10   | Using where |
|    1 | SIMPLE      | customer | ref   | name          | name       | 103     | orders.customer_name | 1    |             |
+------+-------------+----------+-------+---------------+------------+---------+----------------------+------+-------------+

The customer->orders plan computes a join between all customers and orders, saves that result into a temporary table, and then uses filesort to get the 10 most recent orders. This query plan doesn't benefit from the fact that just 10 orders are needed.

However, in contrast, the orders->customers plan uses an index to read rows in the ORDER BY order. The query can stop execution once it finds 10 order-and-customer combinations. This is much faster than computing the entire join. Under this plan, and when this new optimization, we can leverage ORDER BY ... LIMIT to stop early, when we have the 10 combinations.

Plans with LIMIT shortcuts are difficult to estimate

It is fundamentally difficult to produce a reliable estimate for ORDER BY ... LIMIT shortcuts. Let's take an example from the previous section to see why. This query searches for last 10 orders that were shipped by air:

SELECT *
FROM
  customer,ORDER
WHERE
  customer.name=ORDER.customer_name 
  AND ORDER.shipping_method='Airplane'
ORDER BY
  ORDER.DATE DESC
LIMIT 10

Suppose we know beforehand that 50% of orders are shipped by air. Assuming there's no correlation between date and shipping method, orders->customer plan will need to scan 20 orders before we find 10 that are shipped by air. But if there is correlation, then we may need to scan up to (total_orders*0.5 + 10) before we find first 10 orders that are shipped by air. Scanning about 50% of all orders can be expensive.

This situation worsens when the query has constructs whose selectivity is not known. For example, suppose the WHERE condition was

order.shipping_method='%Airplane%'

in this case, we can't reliably say whether we will be able to stop after scanning #LIMIT rows or we will need to enumerate all rows before we find #LIMIT matches.

Providing guidelines to the optimizer

Due to these challenges, the optimization is not enabled by default.

When running a mostly OLTP workload such that query WHERE conditions have suitable indexes or are not very selective, then any ORDER BY ... LIMIT queries will typically find matching rows quickly. In this case, it makes sense to give the following guidance to the optimizer:

Do consider the query plan using LIMIT short-cutting 
and prefer it if it promises at least X times speedup.

The value of X is given to the optimizer via optimizer_join_limit_pref_ratio setting. Higher values carry less risk. The recommended value is 100: prefer the LIMIT join order if it promises at least 100x speedup.

References

MDEV-34720 introduces optimizer_join_limit_pref_ratio optimization
MDEV-18079 is about future development that would make the optimizer handle such cases without user guidance.

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimizing for "Latest News"-style Queries

The problem space

Let's say you have "news articles" (rows in a table) and want a web page showing the latest ten articles about a particular topic.

Variants on "topic":

Category
Tag
Provider (of news article)
Manufacturer (of item for sale)
Ticker (financial stock)

Variants on "news article"

Item for sale
Blog comment
Blog thread

Variants on "latest"

Publication date (unix_timestamp)
Most popular (keep the count)
Most emailed (keep the count)
Manual ranking (1..10 -- 'top ten')

Variants on "10" - there is nothing sacred about "10" in this discussion.

The performance issues

Currently you have a table (or a column) that relates the topic to the article. The SELECT statement to find the latest 10 articles has grown in complexity, and performance is poor. You have focused on what index to add, but nothing seems to work.

If there are multiple topics for each article, you need a many-to-many table.
You have a flag "is_deleted" that needs filtering on.
You want to "paginate" the list (ten articles per page, for as many pages as necessary).

The solution

First, let me give you the solution, then I will elaborate on why it works well.

One new table called, say, Lists.
Lists has exactly 3 columns: topic, article_id, sequence
Lists has exactly 2 indexes: PRIMARY KEY(topic, sequence, article_id), INDEX(article_id)
Only viewable articles are in Lists. (This avoids the filtering on "is_deleted", etc)
Lists is InnoDB. (This gets "clustering".)
"sequence" is typically the date of the article, but could be some other ordering.
"topic" should probably be normalized, but that is not critical to this discussion.
"article_id" is a link to the bulky row in another table(s) that provide all the details about the article.

The queries

Find the latest 10 articles for a topic:

SELECT  a.*
    FROM  Articles a
    JOIN  Lists s ON s.article_id = a.article_id
    WHERE  s.topic = ?
    ORDER BY  s.sequence DESC
    LIMIT  10;

You must not have any WHERE condition touching columns in Articles.

When you mark an article for deletion; you must remove it from Lists:

DELETE  FROM  Lists
    WHERE  article_id = ?;

I emphasize "must" because flags and other filtering is often the root of performance issues.

Why it works

By now, you may have discovered why it works.

The big goal is to minimize the disk hits. Let's itemize how few disk hits are needed. When finding the latest articles with 'normal' code, you will probably find that it is doing significant scans of the Articles table, failing to quickly home in on the 10 rows you want. With this design, there is only one extra disk hit:

1 disk hit: 10 adjacent, narrow, rows in Lists -- probably in a single "block".
10 disk hits: The 10 articles. (These hits are unavoidable, but may be cached.) The PRIMARY KEY, and using InnoDB, makes these quite efficient.

OK, you pay for this by removing things that you should avoid.

1 disk hit: INDEX(article_id) - finding a few ids
A few more disk hits to DELETE rows from Lists. This is a small price to pay -- and you are not paying it while the user is waiting for the page to render.

Pagination Optimization

The Desire

You have a website with news articles, or a blog, or some other thing with a list of things that might be too long for a single page. So, you decide to break it into chunks of, say, 10 items and provide a [Next] button to go the next "page".

You spot OFFSET and LIMIT in MariaDB and decide that is the obvious way to do it.

SELECT  *
        FROM  items
        WHERE  messy_filtering
        ORDER BY  date DESC
        OFFSET  $M  LIMIT $N

Note that the problem requirement needs a [Next] link on each page so that the user can 'page' through the data. He does not really need "GoTo Page #". Jump to the [First] or [Last] page may be useful.

The Problem

All is well -- until you have 50,000 items in a list. And someone tries to walk through all 5000 pages. That 'someone' could be a search engine crawler.

Where's the problem? Performance. Your web page is doing "SELECT ... OFFSET 49990 LIMIT 10" (or the equivalent "LIMIT 49990,10"). MariaDB has to find all 50,000 rows, step over the first 49,990, then deliver the 10 for that distant page.

If it is a crawler ('spider') that read all the pages, then it actually touched about 125,000,000 items to read all 5,000 pages.

Reading the entire table, just to get a distant page, can be so much I/O that it can cause timeouts on the web page. Or it can interfere with other activity, causing other things to be slow.

Other Bugs

In addition to a performance problem, ...

If an item is inserted or deleted between the time you look at one page and the next, you could miss an item, or see an item duplicated.
The pages are not easily bookmarked or sent to someone else because the contents shift over time.
The WHERE clause and the ORDER BY may even make it so that all 50,000 items have to be read, just to find the 10 items for page 1!

What to Do?

Hardware? No, that's just a bandaid. The data will continue to grow and even the new hardware won't handle it.

Better INDEX? No. You must get away from reading the entire table to get the 5000th page.

Build another table saying where the pages start? Get real! That would be a maintenance nightmare, and expensive.

Bottom line: Don't use OFFSET; instead remember where you "left off".

# First page (latest 10 items):
    SELECT ... WHERE ... ORDER BY id DESC LIMIT 10
# Next page (second 10):
    SELECT ... WHERE ... AND id < $left_off ORDER BY id DESC LIMIT 10

With INDEX(id), this suddenly becomes very efficient.

Implementation -- Getting Rid of OFFSET

You are probably doing this now: ORDER BY datetime DESC LIMIT 49990,10 You probably have some unique id on the table. This can probably be used for "left off".

Currently, the [Next] button probably has a url something like ?topic=xyz&page=4999&limit=10 The 'topic' (or 'tag' or 'provider' or 'user' or etc) says which set of items are being displayed. The product of page*limit gives the OFFSET. (The "limit=10" might be in the url, or might be hard-coded; this choice is not relevant to this discussion.)

The new variant would be ?topic=xyz&id=12345&limit=10. (Note: the 12345 is not computable from 4999.) By using INDEX(topic, id) you can efficiently say

WHERE topic = 'xyz'
      AND id >= 1234
    ORDER BY id
    LIMIT 10

That will hit only 10 rows. This is a huge improvement for later pages. Now for more details.

Implementation -- "Left Off"

What if there are exactly 10 rows left when you display the current page? It would make the UI nice if you grayed out the [Next] button, wouldn't it. (Or you could suppress the button all together.)

How to do that? Instead of LIMIT 10, use LIMIT 11. That will give you the 10 items needed for the current page, plus an indication of whether there is another page. And the id for that page.

So, take the 11th id for the [Next] button: <a href=?topic=xyz&id=$id11&limit=10>Next

Implementation -- Links Beyond [Next]

Let's extend the 11 trick to also find the next 5 pages and build links for them.

Plan A is to say LIMIT 51. If you are on page 12, that would give you links for pages 13 (using 11th id) through pages 17 (51st).

Plan B is to do two queries, one to get the 10 items for the current page, the other to get the next 41 ids (LIMIT 10, 41) for the next 5 pages.

Which plan to pick? It depends on many things, so benchmark.

A Reasonable Set of Links

Reaching forward and backward by 5 pages is not too much work. It would take two separate queries to find the ids in both directions. Also, having links that take you to the First and Last pages would be easy to do. No id is needed; they can be something like

<a href=?topic=xyz&id=FIRST&limit=10>First</a>
    <a href=?topic=xyz&id=LAST&limit=10>Last</a>

The UI would recognize those, then generate a SELECT with something like

WHERE topic = 'xyz'
    ORDER BY id ASC -- ASC for First; DESC for Last
    LIMIT 10

The last items would be delivered in reverse order. Either deal with that in the UI, or make the SELECT more complex:

( SELECT ...
        WHERE topic = 'xyz'
        ORDER BY id DESC
        LIMIT 10
    ) ORDER BY id ASC

Let's say you are on page 12 of lots of pages. It could show these links:

[First] ... [7] [8] [9] [10] [11] 12 [13] [14] [15] [16] [17] ... [Last]

where the ellipsis is really used. Some end cases:

# Page one of three:
    First [2] [3]
# Page one of many:
    First [2] [3] [4] [5] ... [Last]
# Page two of many:
    [First] 2 [3] [4] [5] ... [Last]
# If you jump to the Last page, you don't know what page number it is.
# So, the best you can do is perhaps:
#    [First] ... [Prev] Last

Why it Works

The goal is to touch only the relevant rows, not all the rows leading up to the desired rows. This is nicely achieved, except for building links to the "next 5 pages". That may (or may not) be efficiently resolved by the simple SELECT id, discussed above. The reason that may not be efficient deals with the WHERE clause.

Let's discuss the optimal and suboptimal indexes.

For this discussion, I am assuming

The datetime field might have duplicates -- this can cause troubles
The id field is unique
The id field is close enough to datetime-ordered to be used instead of datetime.

Very efficient -- it does all the work in the index:

INDEX(topic, id)
    WHERE topic = 'xyz'
      AND id >= 876
    ORDER BY id ASC
    LIMIT 10,41
<</code??
That will hit 51 consecutive index entries, 0 data rows.

Inefficient -- it must reach into the data:
<<code>>
    INDEX(topic, id)
    WHERE topic = 'xyz'
      AND id >= 876
      AND is_deleted = 0
    ORDER BY id ASC
    LIMIT 10,41

That will hit at least 51 consecutive index entries, plus at least 51 randomly located data rows.

Efficient -- back to the previous degree of efficiency:

INDEX(topic, is_deleted, id)
    WHERE topic = 'xyz'
      AND id >= 876
      AND is_deleted = 0
    ORDER BY id ASC
    LIMIT 10,41

Note how all the '=' parts of the WHERE come first; then comes both the '>=' and 'ORDER BY', both on id. This means that the INDEX can be used for all the WHERE, plus the ORDER BY.

"Items 11-20 Out of 12345"

You lose the "out of" except when the count is small. Instead, say something like

Items 11-20 out of Many

Alternatively... Only a few searches will have too many items to count. Keep another table with the search criteria and a count. This count can be computed daily (or hourly) by some background script. When discovering that the topic is a busy one, look it up in the table to get

Items 11-20 out of about 49,000

The background script would round the count off.

The quick way to get an estimated number of rows for an InnoDB table is

SELECT  table_rows
        FROM  information_schema.TABLES
        WHERE  TABLE_SCHEMA = 'database_name'
          AND  TABLE_NAME = 'table_name'

However, it does not allow for the WHERE clause that you probably have.

Complex WHERE, or JOIN

If the search criteria cannot be confined to an INDEX in a single table, this technique is doomed. I have another paper that discusses "Lists", which solves that (which extra development work), and even improves on what is discussed here.

How Much Faster?

This depends on

How many rows (total)
Whether the WHERE clause prevented the efficient use of the ORDER BY
Whether the data is bigger than the cache. This last one kicks in when building one page requires reading more data from disk can be cached. At that point, the problem goes from being CPU-bound to being I/O-bound. This is likely to suddenly slow down the loading of a pages by a factor of 10.

What is Lost

Cannot "jump to Page N", for an arbitrary N. Why do you want to do that?
Walking backward from the end does not know the page numbers.
The code is more complex.

Postlog

Designed about 2007; posted 2012.

Pivoting in MariaDB

The problem

You want to "pivot" the data so that a linear list of values with two keys becomes a spreadsheet-like array. See examples, below.

A solution

The best solution is probably to do it in some form of client code (PHP, etc). MySQL and MariaDB do not have a syntax for SELECT that will do the work for you. The code provided here uses a stored procedure to generate code to pivot the data, and then runs the code.

You can edit the SQL generated by the stored procedure to tweak the output in a variety of ways. Or you can tweak the stored procedure to generate what you would prefer.

Reference code for solution

'Source' this into the mysql commandline tool:

DELIMITER //
DROP   PROCEDURE IF EXISTS Pivot //
CREATE PROCEDURE Pivot(
    IN tbl_name VARCHAR(99),       -- table name (or db.tbl)
    IN base_cols VARCHAR(99),      -- column(s) on the left, separated by commas
    IN pivot_col VARCHAR(64),      -- name of column to put across the top
    IN tally_col VARCHAR(64),      -- name of column to SUM up
    IN where_clause VARCHAR(99),   -- empty string or "WHERE ..."
    IN order_by VARCHAR(99)        -- empty string or "ORDER BY ..."; usually the base_cols
    )
    DETERMINISTIC
    SQL SECURITY INVOKER
BEGIN
    -- Find the distinct values
    -- Build the SUM()s
    SET @subq = CONCAT('SELECT DISTINCT ', pivot_col, ' AS val ',
                    ' FROM ', tbl_name, ' ', where_clause, ' ORDER BY 1');
    -- select @subq;

    SET @cc1 = "CONCAT('SUM(IF(&p = ', &v, ', &t, 0)) AS ', &v)";
    SET @cc2 = REPLACE(@cc1, '&p', pivot_col);
    SET @cc3 = REPLACE(@cc2, '&t', tally_col);
    -- select @cc2, @cc3;
    SET @qval = CONCAT("'\"', val, '\"'");
    -- select @qval;
    SET @cc4 = REPLACE(@cc3, '&v', @qval);
    -- select @cc4;

    SET SESSION group_concat_max_len = 10000;   -- just in case
    SET @stmt = CONCAT(
            'SELECT  GROUP_CONCAT(', @cc4, ' SEPARATOR ",\n")  INTO @sums',
            ' FROM ( ', @subq, ' ) AS top');
     select @stmt;
    PREPARE _sql FROM @stmt;
    EXECUTE _sql;                      -- Intermediate step: build SQL for columns
    DEALLOCATE PREPARE _sql;
    -- Construct the query and perform it
    SET @stmt2 = CONCAT(
            'SELECT ',
                base_cols, ',\n',
                @sums,
                ',\n SUM(', tally_col, ') AS Total'
            '\n FROM ', tbl_name, ' ',
            where_clause,
            ' GROUP BY ', base_cols,
            '\n WITH ROLLUP',
            '\n', order_by
        );
    select @stmt2;                    -- The statement that generates the result
    PREPARE _sql FROM @stmt2;
    EXECUTE _sql;                     -- The resulting pivot table ouput
    DEALLOCATE PREPARE _sql;
    -- For debugging / tweaking, SELECT the various @variables after CALLing.
END;
//
DELIMITER ;

Then do a CALL, like in the examples, below.

Variants

I thought about having several extra options for variations, but decided that would be too messy. Instead, here are instructions for implementing the variations, either by capturing the SELECT that was output by the Stored Procedure, or by modifying the SP, itself.

The data is strings (not numeric) -- Remove "SUM" (but keep the expression); remove the SUM...AS TOTAL line.
If you want blank output instead of 0 -- Currently the code says "SUM(IF(... 0))"; change the 0 to NULL, then wrap the SUM: IFNULL(SUM(...), ''). Note that this will distinguish between a zero total (showing '0') and no data (blank).
Fancier output -- Use PHP/VB/Java/etc.
No Totals at the bottom -- Remove the WITH ROLLUP line from the SELECT.
No Total for each row -- Remove the SUM...AS Total line from the SELECT.
Change the order of the columns -- Modify the ORDER BY 1 ('1' meaning first column) in the SELECT DISTINCT in the SP.
Example: ORDER BY FIND_IN_SET(DAYOFWEEK(...), 'Sun,Mon,Tue,Wed,Thu,Fri,Sat')

Notes about "base_cols":

Multiple columns on the left, such as an ID and its meaning -- This is already handled by allowing base_cols to be a commalist like 'id, meaning'
You cannot call the SP with "foo AS 'blah'" in hopes of changing the labels, but you could edit the SELECT to achieve that goal.

Notes about the "Totals":

If "base_cols" is more than one column, WITH ROLLUP will be subtotals as well as a grand total.
NULL shows up in the Totals row in the "base_cols" column; this can be changed via something like IFNULL(..., 'Totals').

Example 1 - Population vs Latitude in US

-- Sample input:
+-------+----------------------+---------+------------+
| state | city                 | lat     | population |
+-------+----------------------+---------+------------+
| AK    | Anchorage            | 61.2181 |     276263 |
| AK    | Juneau               | 58.3019 |      31796 |
| WA    | Monroe               | 47.8556 |      15554 |
| WA    | Spanaway             | 47.1042 |      25045 |
| PR    | Arecibo              | 18.4744 |      49189 |
| MT    | Kalispell            | 48.1958 |      18018 |
| AL    | Anniston             | 33.6597 |      23423 |
| AL    | Scottsboro           | 34.6722 |      14737 |
| HI    | Kaneohe              | 21.4181 |      35424 |
| PR    | Candelaria           | 18.4061 |      17632 |
...

-- Call the Stored Procedure:
CALL Pivot('World.US', 'state', '5*FLOOR(lat/5)', 'population', '', '');

-- SQL generated by the SP:
SELECT state,
SUM(IF(5*FLOOR(lat/5) = "15", population, 0)) AS "15",
SUM(IF(5*FLOOR(lat/5) = "20", population, 0)) AS "20",
SUM(IF(5*FLOOR(lat/5) = "25", population, 0)) AS "25",
SUM(IF(5*FLOOR(lat/5) = "30", population, 0)) AS "30",
SUM(IF(5*FLOOR(lat/5) = "35", population, 0)) AS "35",
SUM(IF(5*FLOOR(lat/5) = "40", population, 0)) AS "40",
SUM(IF(5*FLOOR(lat/5) = "45", population, 0)) AS "45",
SUM(IF(5*FLOOR(lat/5) = "55", population, 0)) AS "55",
SUM(IF(5*FLOOR(lat/5) = "60", population, 0)) AS "60",
SUM(IF(5*FLOOR(lat/5) = "70", population, 0)) AS "70",
 SUM(population) AS Total
 FROM World.US  GROUP BY state
 WITH ROLLUP

-- Output from that SQL (also comes out of the SP):
+-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+
| state | 15      | 20     | 25       | 30       | 35       | 40       | 45      | 55    | 60     | 70   | Total     |
+-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+
| AK    |       0 |      0 |        0 |        0 |        0 |        0 |       0 | 60607 | 360765 | 4336 |    425708 |
| AL    |       0 |      0 |        0 |  1995225 |        0 |        0 |       0 |     0 |      0 |    0 |   1995225 |
| AR    |       0 |      0 |        0 |   595537 |   617361 |        0 |       0 |     0 |      0 |    0 |   1212898 |
| AZ    |       0 |      0 |        0 |  4708346 |   129989 |        0 |       0 |     0 |      0 |    0 |   4838335 |
...
| FL    |       0 |  34706 |  9096223 |  1440916 |        0 |        0 |       0 |     0 |      0 |    0 |  10571845 |
| GA    |       0 |      0 |        0 |  2823939 |        0 |        0 |       0 |     0 |      0 |    0 |   2823939 |
| HI    |   43050 | 752983 |        0 |        0 |        0 |        0 |       0 |     0 |      0 |    0 |    796033 |
...
| WY    |       0 |      0 |        0 |        0 |        0 |   277480 |       0 |     0 |      0 |    0 |    277480 |
| NULL  | 1792991 | 787689 | 16227033 | 44213344 | 47460670 | 61110822 | 7105143 | 60607 | 360765 | 4336 | 179123400 |
+-------+---------+--------+----------+----------+----------+----------+---------+-------+--------+------+-----------+

Notice how Alaska (AK) has populations in high latitudes and Hawaii (HI) in low latitudes.

Example 2 - Home Solar Power Generation

This give the power (KWh) generated by hour and month for 2012.

-- Sample input:
+---------------------+------+
| ts                  | enwh |
+---------------------+------+
| 2012-06-06 11:00:00 |  523 |
| 2012-06-06 11:05:00 |  526 |
| 2012-06-06 11:10:00 |  529 |
| 2012-06-06 11:15:00 |  533 |
| 2012-06-06 11:20:00 |  537 |
| 2012-06-06 11:25:00 |  540 |
| 2012-06-06 11:30:00 |  542 |
| 2012-06-06 11:35:00 |  543 |
Note that it is a reading in watts for each 5 minutes.
So, summing is needed to get the breakdown by month and hour.

-- Invoke the SP:
CALL Pivot('details',    -- Table
           'MONTH(ts)',  -- `base_cols`, to put on left; SUM up over the month
           'HOUR(ts)',   -- `pivot_col` to put across the top; SUM up entries across the hour
           'enwh/1000',  -- The data -- watts converted to KWh
           "WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 year",  -- Limit to one year
           '');          -- assumes that the months stay in order

-- The SQL generated:
SELECT MONTH(ts),
SUM(IF(HOUR(ts) = "5", enwh/1000, 0)) AS "5",
SUM(IF(HOUR(ts) = "6", enwh/1000, 0)) AS "6",
SUM(IF(HOUR(ts) = "7", enwh/1000, 0)) AS "7",
SUM(IF(HOUR(ts) = "8", enwh/1000, 0)) AS "8",
SUM(IF(HOUR(ts) = "9", enwh/1000, 0)) AS "9",
SUM(IF(HOUR(ts) = "10", enwh/1000, 0)) AS "10",
SUM(IF(HOUR(ts) = "11", enwh/1000, 0)) AS "11",
SUM(IF(HOUR(ts) = "12", enwh/1000, 0)) AS "12",
SUM(IF(HOUR(ts) = "13", enwh/1000, 0)) AS "13",
SUM(IF(HOUR(ts) = "14", enwh/1000, 0)) AS "14",
SUM(IF(HOUR(ts) = "15", enwh/1000, 0)) AS "15",
SUM(IF(HOUR(ts) = "16", enwh/1000, 0)) AS "16",
SUM(IF(HOUR(ts) = "17", enwh/1000, 0)) AS "17",
SUM(IF(HOUR(ts) = "18", enwh/1000, 0)) AS "18",
SUM(IF(HOUR(ts) = "19", enwh/1000, 0)) AS "19",
SUM(IF(HOUR(ts) = "20", enwh/1000, 0)) AS "20",
 SUM(enwh/1000) AS Total
 FROM details WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 year GROUP BY MONTH(ts)
 WITH ROLLUP

-- That generated decimal places that I did like:
| MONTH(ts) | 5      | 6       | 7        | 8        | 9         | 10        | 11        | 12        | 13        | 14
     | 15        | 16       | 17       | 18       | 19      | 20     | Total      |
+-----------+--------+---------+----------+----------+-----------+-----------+-----------+-----------+-----------+------
-----+-----------+----------+----------+----------+---------+--------+------------+
|         1 | 0.0000 |  0.0000 |   1.8510 |  21.1620 |   52.3190 |   73.0420 |   89.3220 |   97.0190 |   88.9720 |   75.
4970 |   50.9270 |  12.5130 |   0.5990 |   0.0000 |  0.0000 | 0.0000 |   563.2230 |
|         2 | 0.0000 |  0.0460 |   5.9560 |  35.6330 |   72.4710 |   96.5130 |  112.7770 |  126.0850 |  117.1540 |   96.
7160 |   72.5900 |  33.6230 |   4.7650 |   0.0040 |  0.0000 | 0.0000 |   774.3330 |

Other variations made the math go wrong. (Note that there is no CAST to FLOAT.)

While I was at it, I gave an alias to change "MONTH(ts)" to just "Month".

So, I edited the SQL to this and ran it:

SELECT MONTH(ts) AS 'Month',
ROUND(SUM(IF(HOUR(ts) = "5", enwh, 0))/1000) AS "5",
...
ROUND(SUM(IF(HOUR(ts) = "20", enwh, 0))/1000) AS "20",
 ROUND(SUM(enwh)/1000) AS Total
 FROM details WHERE ts >= '2012-01-01' AND ts < '2012-01-01' + INTERVAL 1 year
 GROUP BY MONTH(ts)
 WITH ROLLUP;

-- Which gave cleaner output:

+-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+
| Month | 5    | 6    | 7    | 8    | 9    | 10   | 11   | 12   | 13   | 14   | 15   | 16   | 17   | 18   | 19   | 20   | Total |
+-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+
|     1 |    0 |    0 |    2 |   21 |   52 |   73 |   89 |   97 |   89 |   75 |   51 |   13 |    1 |    0 |    0 |    0 |   563 |
|     2 |    0 |    0 |    6 |   36 |   72 |   97 |  113 |  126 |  117 |   97 |   73 |   34 |    5 |    0 |    0 |    0 |   774 |
|     3 |    0 |    0 |    9 |   46 |   75 |  105 |  121 |  122 |  128 |  126 |  105 |   71 |   33 |   10 |    0 |    0 |   952 |
|     4 |    0 |    1 |   14 |   63 |  111 |  146 |  171 |  179 |  177 |  158 |  141 |  105 |   65 |   26 |    3 |    0 |  1360 |
|     5 |    0 |    4 |   21 |   78 |  128 |  162 |  185 |  199 |  196 |  187 |  166 |  130 |   81 |   36 |    8 |    0 |  1581 |
|     6 |    0 |    4 |   17 |   71 |  132 |  163 |  182 |  191 |  193 |  182 |  161 |  132 |   89 |   43 |   10 |    1 |  1572 |
|     7 |    0 |    3 |   17 |   57 |  121 |  160 |  185 |  197 |  199 |  189 |  168 |  137 |   92 |   44 |   11 |    1 |  1581 |
|     8 |    0 |    1 |   11 |   48 |  104 |  149 |  171 |  183 |  187 |  179 |  156 |  121 |   76 |   32 |    5 |    0 |  1421 |
|     9 |    0 |    0 |    6 |   32 |   77 |  127 |  151 |  160 |  159 |  148 |  124 |   93 |   47 |   12 |    1 |    0 |  1137 |
|    10 |    0 |    0 |    1 |   16 |   54 |   85 |  107 |  115 |  119 |  106 |   85 |   56 |   17 |    2 |    0 |    0 |   763 |
|    11 |    0 |    0 |    5 |   30 |   57 |   70 |   84 |   83 |   76 |   64 |   35 |    8 |    1 |    0 |    0 |    0 |   512 |
|    12 |    0 |    0 |    2 |   17 |   39 |   54 |   67 |   75 |   64 |   58 |   31 |    4 |    0 |    0 |    0 |    0 |   411 |
|  NULL |    0 |   13 |  112 |  516 | 1023 | 1392 | 1628 | 1728 | 1703 | 1570 | 1294 |  902 |  506 |  203 |   38 |    2 | 12629 |
+-------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+-------+

Midday in the summer is the best time for solar panels, as you would expect. 1-2pm in July was the best.

Postlog

Posted, Feb. 2015

Query Limits and Timeouts

This article describes the different methods MariaDB provides to limit/timeout a query:

LIMIT

SELECT ... LIMIT row_count
OR
SELECT ... LIMIT OFFSET, row_count
OR
SELECT ... LIMIT row_count OFFSET OFFSET

The LIMIT clause restricts the number of returned rows.

LIMIT ROWS EXAMINED

SELECT ... LIMIT ROWS EXAMINED rows_limit;

Stops the query after 'rows_limit' number of rows have been examined.

sql_safe_updates

If the sql_safe_updates variable is set, one can't execute an UPDATE or DELETE statement unless one specifies a key constraint in the WHERE clause or provide a LIMIT clause (or both).

SET @@SQL_SAFE_UPDATES=1
UPDATE tbl_name SET not_key_column=val;
-> ERROR 1175 (HY000): You are using safe update mode 
  and you tried to update a table without a WHERE that uses a KEY column

sql_select_limit

sql_select_limit acts as an automatic LIMIT row_count to any SELECT query.

SET @@SQL_SELECT_LIMIT=1000
SELECT * FROM big_table;

The above is the same as:

SELECT * from big_table LIMIT 1000;

max_join_size

If the max_join_size variable (also called sql_max_join_size) is set, then it will limit any SELECT statements that probably need to examine more thanMAX_JOIN_SIZE rows.

SET @@MAX_JOIN_SIZE=1000;
SELECT count(null_column) FROM big_table;
->ERROR 1104 (42000): The SELECT would examine more than MAX_JOIN_SIZE ROWS; 
  CHECK your WHERE AND USE SET SQL_BIG_SELECTS=1 OR SET MAX_JOIN_SIZE=# IF the SELECT IS okay

max_statement_time

If the max_statement_time variable is set, any query (excluding stored procedures) taking longer than the value of max_statement_time (specified in seconds) to execute will be aborted. This can be set globally, by session, as well as per user and per query. See Aborting statements that take longer than a certain time to execute.

Rollup Unique User Counts

The Problem

The normal way to count "Unique Users" is to take large log files, sort by userid, dedup, and count. This requires a rather large amount of processing. Furthermore, the count derived cannot be rolled up. That is, daily counts cannot be added to get weekly counts -- some users will be counted multiple times.

So, the problem is to store the counts is such a way as to allow rolling up.

The solution

Let's think about what we can do with a hash of the userid. The hash could map to a bit in a bit string. A BIT_COUNT of the bit string would give the 1-bits, representing the number of users. But that bit string would have to be huge. What if we could use shorter bit strings? Then different userids would be folded into the same bit. Let's assume we can solve that.

Meanwhile, what about the rollup? The daily bit strings can be OR'd together to get a similar bit string for the week.

We have now figured out how to do the rollup, but have created another problem -- the counts are too low.

Inflating the BIT_COUNT

A sufficiently random hash (eg MD5) will fold userids into the same bits with a predictable frequency. We need to figure this out, and work backwards. That is, given that X percent of the bits are set, we need a formula that says approximately how many userids were used to get those bits.

I simulated the problem by generating random hashes and calculated the number of bits that would be set. Then, with the help of Eureqa software, I derived the formula:

Y = 0.5456X + 0.6543tan(1.39XX*X)

How good is it?

The formula is reasonably precise. It is usually within 1% of the correct value; rarely off by 2%.

Of course, if virtually all the bits are set, the forumla can't be very precise. Hence, you need to plan to have the bit strings big enough to handle the expected number of Uniques. In practice, you can use less than 1 bit per Unique. This would be a huge space savings over trying to save all the userids.

Another suggestion... If you are rolling up over a big span of time (eg hourly -> monthly), the bit strings must all be the same length, and the monthly string must be big enough to handle the expected count. This is likely to lead to very sparse hourly bit strings. Hence, it may be prudent to compress the hourly stings.

Postlog

Invented Nov, 2013; published Apr, 2014

Future: Rick is working on actual code (Sep, 2016) It is complicated by bit-wise operations being limited to BIGINT. However, with MySQL 8.0 (freshly released), the desired bit-wise operations can be applied to BLOB, greatly simplifying my code. I hope to publish the pre-8.0 code soon; 8.0 code later.

Rowid Filtering Optimization

The target use case for rowid filtering is as follows:

a table uses ref access on index IDX1
but it also has a fairly restrictive range predicate on another index IDX2.

In this case, it is advantageous to:

Do an index-only scan on index IDX2 and collect rowids of index records into a data structure that allows filtering (let's call it $FILTER).
When doing ref access on IDX1, check $FILTER before reading the full record.

Example

Consider a query

SELECT ...
FROM orders JOIN lineitem ON o_orderkey=l_orderkey
WHERE
  l_shipdate BETWEEN '1997-01-01' AND '1997-01-31' AND
  o_totalprice between 200000 and 230000;

Suppose the condition on l_shipdate is very restrictive, which means lineitem table should go first in the join order. Then, the optimizer can use o_orderkey=l_orderkey equality to do an index lookup to get the order the line item is from. On the other hand o_totalprice between ... can also be rather selective.

With filtering, the query plan would be:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lineitem
         type: range
possible_keys: PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity
          key: i_l_shipdate
      key_len: 4
          ref: NULL
         rows: 98
        Extra: Using index condition
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: eq_ref|filter
possible_keys: PRIMARY,i_o_totalprice
          key: PRIMARY|i_o_totalprice
      key_len: 4|9
          ref: dbt3_s001.lineitem.l_orderkey
         rows: 1 (5%)
        Extra: Using where; Using rowid filter

Note that table orders has "Using rowid filter". The type column has "|filter", the key column shows the index that is used to construct the filter. rows column shows the expected filter selectivity, it is 5%.

ANALYZE FORMAT=JSON output for table orders will show

"table": {
      "table_name": "orders",
      "access_type": "eq_ref",
      "possible_keys": ["PRIMARY", "i_o_totalprice"],
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["o_orderkey"],
      "ref": ["dbt3_s001.lineitem.l_orderkey"],
      "rowid_filter": {
        "range": {
          "key": "i_o_totalprice",
          "used_key_parts": ["o_totalprice"]
        },
        "rows": 69,
        "selectivity_pct": 4.6,
        "r_rows": 71,
        "r_selectivity_pct": 10.417,
        "r_buffer_size": 53,
        "r_filling_time_ms": 0.0716
      }

Note the rowid_filter element. It has a range element inside it. selectivity_pct is the expected selectivity, accompanied by the r_selectivity_pct showing the actual observed selectivity.

Details

The optimizer makes a cost-based decision about when the filter should be used.
The filter data structure is currently an ordered array of rowids. (a Bloom filter would be better here and will probably be introduced in the future versions).
The optimization needs to be supported by the storage engine. At the moment, it is supported by InnoDB and MyISAM. It is not supported in partitioned tables.

Limitations

Rowid Filtering can't be used with a backward-ordered index scan. When the optimizer needs to execute an ORDER BY ... DESC query and decides to handle it with a backward-ordered index scan, it will disable Rowid Filtering.

Control

Rowid filtering can be switched on/off using rowid_filter flag in the optimizer_switch variable. By default, the optimization is enabled.

_{This page is licensed: CC BY-SA / Gnu FDL}

Sargable DATE and YEAR

Starting from MariaDB 11.1, conditions in the form

YEAR(indexed_date_col) CMP const_value
DATE(indexed_date_col) CMP const_value

are sargable, provided that

CMP is any of =, <=>, <, <=, >, >= .
indexed_date_col has a type of DATE, DATETIME or TIMESTAMP and is a part of some index.

One can swap the left and right hand sides of the equality: const_value CMP {DATE|YEAR}(indexed_date_col) is also handled.

Sargable here means that the optimizer is able to use such conditions to construct access methods, estimate their selectivity, or use them to perform partition pruning.

Implementation

Internally, the optimizer rewrites the condition to an equivalent condition which doesn't use YEAR or DATE functions.

For example, YEAR(date_col)=2023 is rewritten intodate_col between '2023-01-01' and '2023-12-31'.

Similarly, DATE(datetime_col) <= '2023-06-01' is rewritten intodatetime_col <= '2023-06-01 23:59:59'.

Controlling the Optimization

The optimization is always ON, there is no Optimizer Switch flag to control it.

Optimizer Trace

The rewrite is logged as date_conds_into_sargable transformation. Example:

{
            "transformation": "date_conds_into_sargable",
            "before": "cast(t1.datetime_col as date) <= '2023-06-01'",
            "after": "t1.datetime_col <= '2023-06-01 23:59:59'"
          },

References

MDEV-8320: Allow index usage for DATE(datetime_column) = const

_{This page is licensed: CC BY-SA / Gnu FDL}

Sargable UPPER

Starting from MariaDB 11.3, expressions in the form

UPPER(key_col) = expr
UPPER(key_col) IN (constant-list)

are sargable if key_col uses either the utf8mb3_general_ci or utf8mb4_general_ci collation.

UCASE is a synonym for UPPER so is covered as well.

Sargable means that the optimizer is able to use such conditions to construct access methods, estimate their selectivity, or perform partition pruning.

Example

CREATE TABLE t1 (
  key1 VARCHAR(32) COLLATE utf8mb4_general_ci,
  ...
  KEY(key1)
);

EXPLAIN SELECT * FROM t1 WHERE UPPER(key1)='ABC'
+------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref   | rows | Extra                    |
+------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | t1    | ref  | key1          | key1 | 131     | const | 1    | Using where; Using index |
+------+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+

Note that ref access is used.

An example with join:

EXPLAIN SELECT * FROM t0,t1 WHERE upper(t1.key1)=t0.col;
+------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref         | rows | Extra       |
+------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+
|    1 | SIMPLE      | t0    | ALL  | NULL          | NULL | NULL    | NULL        | 10   | Using where |
|    1 | SIMPLE      | t1    | ref  | key1          | key1 | 131     | test.t0.col | 1    | Using index |
+------+-------------+-------+------+---------------+------+---------+-------------+------+-------------+

Here, the optimizer was able to construct ref access.

Controlling the Optimization

The optimizer_switch variable has the flag sargable_casefold to turn the optimization on and off. The default is ON.

Optimizer Trace

The optimization is implemented as a rewrite for a query's WHERE/ON conditions. It uses the sargable_casefold_removal object name in the trace:

"join_optimization": {
        "select_id": 1,
        "steps": [
          {
            "sargable_casefold_removal": {
              "before": "ucase(t1.key1) = t0.col",
              "after": "t1.key1 = t0.col"
            }
          },

References

MDEV-31496: Make optimizer handle UCASE(varchar_col)=...
An analog for LCASE is not possible. See MDEV-31955: Make optimizer handle LCASE(varchar_col)=... for details.

_{This page is licensed: CC BY-SA / Gnu FDL}

USE INDEX

You can limit which indexes are considered with the USE INDEX option.

Syntax

USE INDEX [{FOR {JOIN|ORDER BY|GROUP BY}] ([index_list])

Description

The default is 'FOR JOIN', which means that the hint only affects how the WHERE clause is optimized.

USE INDEX is used after the table name in the FROM clause.

USE INDEX cannot use an ignored index - it will be treated as if it doesn't exist.

Index Prefixes

When using index hints (USE, FORCE or IGNORE INDEX), the index name value can also be an unambiguous prefix of an index name.

Example

CREATE INDEX Name ON City (Name);
CREATE INDEX CountryCode ON City (Countrycode);
EXPLAIN SELECT Name FROM City USE INDEX (CountryCode)
WHERE name="Helsingborg" AND countrycode="SWE";

This produces:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	City	ref	CountryCode	CountryCode	3	const	14	Using where

If we had not used USE INDEX, the Name index would have been in possible keys.

Virtual Column Support in the Optimizer

MariaDB starting with 11.8

Starting from MariaDB 11.8, the optimizer can recognize use of indexed virtual column expressions in the WHERE clause and use them to construct range and ref(const) accesses.

Motivating Example

Suppose one has a table with data in JSON:

CREATE TABLE t1 (json_data JSON);
INSERT INTO t1 VALUES('{"column1": 1234}'); 
INSERT INTO t1 ...

In order to do efficient queries over data in JSON, one can add a virtual column and an index on it:

ALTER TABLE t1
  ADD COLUMN vcol1 INT AS (cast(json_value(json_data, '$.column1') AS INTEGER)),
  ADD INDEX(vcol1);

Before MariaDB 11.8, one had to use vcol1 in the WHERE clause. Now, one can use the virtual column expression, too:

-- This will use the index before 11.8:
EXPLAIN SELECT * FROM t1 WHERE vcol1=100;
-- Starting from 11.8, this will use the index, too:
EXPLAIN SELECT * FROM t1 
WHERE cast(json_value(json_data, '$.column1') AS INTEGER)=100;

+------+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id   | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra |
+------+-------------+-------+------+---------------+-------+---------+-------+------+-------+
|    1 | SIMPLE      | t1    | ref  | vcol1         | vcol1 | 5       | const | 1    |       |
+------+-------------+-------+------+---------------+-------+---------+-------+------+-------+

General Considerations

In MariaDB, one has to create a virtual column and then create an index over it. Other databases allow to create an index directly over expression: create index on t1((col1+col2)). This is not yet supported in MariaDB (MDEV-35853).
The WHERE clause must use the exact same expression as in the virtual column definition.
The optimization is implemented in a similar way to MySQL: the optimizer finds potentially useful occurrences of vcol_expr in the WHERE clause and replaces them with vcol_name.
In the optimizer trace, the rewrites are shown like so:

"virtual_column_substitution": {
              "condition": "WHERE",
              "resulting_condition": "t1.vcol1 = 100"
            }

Accessing JSON fields

Cast the value to the desired type

SQL is strongly-typed language while JSON is weakly-typed. This means one must specify the desired datatype when accessing JSON data from SQL. In the above example, we declared vcol1 as INT and then used (CAST ... AS INTEGER) (both in the ALTER TABLE and in the WHERE clause in SELECT query:):

ALTER TABLE t1
  ADD COLUMN vcol1 INT AS (CAST(json_value(json_data, '$.column1') AS INTEGER)) ...

SELECT ...  WHERE ... CAST(json_value(json_data, '$.column1') AS INTEGER) ...;

Specify collation for strings

When extracting string values, CAST is not necessary, as JSON_VALUE returns strings.

However, one must take into account collations. If one declares the column as JSON:

CREATE TABLE t1 ( 
  json_data JSON 
  ...

the collation of json_data will be utf8mb4_bin. The collation ofJSON_VALUE(json_data, ...) will also be utf8mb4_bin.

Most use cases require a more commonly-used collation. It is possible to achieve that using the COLLATE clause:

ALTER TABLE t1
  ADD col1 VARCHAR(100) COLLATE utf8mb4_uca1400_ai_ci AS
  (json_value(js1, '$.string_column') collate utf8mb4_uca1400_ai_ci),
  ADD INDEX(col1);
...
SELECT  ... 
WHERE
  json_value(js1, '$.string_column') COLLATE utf8mb4_uca1400_ai_ci='string-value';

References

MDEV-35616: Add basic optimizer support for virtual columns

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimization Strategies

Discover effective optimization strategies for MariaDB Server queries. This section provides a variety of techniques and approaches to enhance query performance and overall database efficiency.

Duplicate Weedout Strategy

DuplicateWeedout is an execution strategy for Semi-join subqueries.

The idea

The idea is to run the semi-join (a query with uses WHERE X IN (SELECT Y FROM ...)) as if it were a regular inner join, and then eliminate the duplicate record combinations using a temporary table.

Suppose, you have a query where you're looking for countries which have more than 33% percent of their population in one big city:

SELECT * 
FROM Country 
WHERE 
   Country.code IN (SELECT City.Country
                    FROM City 
                    WHERE 
                      City.Population > 0.33 * Country.Population AND 
                      City.Population > 1*1000*1000);

First, we run a regular inner join between the City and Country tables:

The Inner join produces duplicates. We have Germany three times, because it has three big cities. Now, lets put DuplicateWeedout into the picture:

Here one can see that a temporary table with a primary key was used to avoid producing multiple records with 'Germany'.

DuplicateWeedout in action

The Start temporary and End temporary from the last diagram are shown in the EXPLAIN output:

EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where City.Population > 0.33 * Country.Population 
   AND City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        TABLE: City
         type: RANGE
possible_keys: Population,Country
          KEY: Population
      key_len: 4
          ref: NULL
         ROWS: 238
        Extra: USING INDEX CONDITION; Start temporary
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        TABLE: Country
         type: eq_ref
possible_keys: PRIMARY
          KEY: PRIMARY
      key_len: 3
          ref: world.City.Country
         ROWS: 1
        Extra: USING WHERE; End temporary
2 rows in set (0.00 sec)

This query will read 238 rows from the City table, and for each of them will make a primary key lookup in the Country table, which gives another 238 rows. This gives a total of 476 rows, and you need to add 238 lookups in the temporary table (which are typically much cheaper since the temporary table is in-memory).

If we run the same query with semi-join optimizations disabled, we'll get:

EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where City.Population > 0.33 * Country.Population 
    AND City.Population > 1*1000*1000)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        TABLE: Country
         type: ALL
possible_keys: NULL
          KEY: NULL
      key_len: NULL
          ref: NULL
         ROWS: 239
        Extra: USING WHERE
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        TABLE: City
         type: index_subquery
possible_keys: Population,Country
          KEY: Country
      key_len: 3
          ref: func
         ROWS: 18
        Extra: USING WHERE
2 rows in set (0.00 sec)

This plan will read (239 + 239*18) = 4541 rows, which is much slower.

Factsheet

DuplicateWeedout is shown as "Start temporary/End temporary" in EXPLAIN.
The strategy can handle correlated subqueries.
But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.
DuplicateWeedout allows the optimizer to freely mix a subquery's tables and the parent select's tables.
There is no separate @@optimizer_switch flag for DuplicateWeedout. The strategy can be disabled by switching off all semi-join optimizations with SET @@optimizer_switch='optimizer_semijoin=off' command.

FirstMatch Strategy

FirstMatch is an execution strategy for Semi-join subqueries.

The idea

It is very similar to how IN/EXISTS subqueries were executed in MySQL 5.x.

Let's take the usual example of a search for countries with big cities:

SELECT * FROM Country 
WHERE Country.code IN (SELECT City.Country 
                       FROM City 
                       WHERE City.Population > 1*1000*1000)
      AND Country.continent='Europe'

Suppose, our execution plan is to find countries in Europe, and then, for each found country, check if it has any big cities. Regular inner join execution will look as follows:

Since Germany has two big cities (in this diagram), it will be put into the query output twice. This is not correct, SELECT ... FROM Country should not produce the same country record twice. The FirstMatch strategy avoids the production of duplicates by short-cutting execution as soon as the first genuine match is found:

Note that the short-cutting has to take place after "Using where" has been applied. It would have been wrong to short-cut after we found Trier.

FirstMatch in action

The EXPLAIN for the above query will look as follows:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where City.Population > 1*1000*1000)
    AND Country.continent='Europe';
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
| id | select_type | table   | type | possible_keys      | key       | key_len | ref                | rows | Extra                            |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
|  1 | PRIMARY     | Country | ref  | PRIMARY,continent  | continent | 17      | const              |   60 | Using index condition            |
|  1 | PRIMARY     | City    | ref  | Population,Country | Country   | 3       | world.Country.Code |   18 | Using where; FirstMatch(Country) |
+----+-------------+---------+------+--------------------+-----------+---------+--------------------+------+----------------------------------+
2 rows in set (0.00 sec)

FirstMatch(Country) in the Extra column means that as soon as we have produced one matching record combination, short-cut the execution and jump back to the Country table.

FirstMatch's query plan is very similar to one you would get in MySQL:

MySQL [world]> EXPLAIN SELECT * FROM Country  WHERE Country.code IN 
  (select City.Country from City where City.Population > 1*1000*1000) 
   AND Country.continent='Europe';
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
| id | select_type        | table   | type           | possible_keys      | key       | key_len | ref   | rows | Extra                              |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
|  1 | PRIMARY            | Country | ref            | continent          | continent | 17      | const |   60 | Using index condition; Using where |
|  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country   | 3       | func  |   18 | Using where                        |
+----+--------------------+---------+----------------+--------------------+-----------+---------+-------+------+------------------------------------+
2 rows in set (0.01 sec)

and these two particular query plans will execute in the same time.

Difference between FirstMatch and IN->EXISTS

The general idea behind the FirstMatch strategy is the same as the one behind the IN->EXISTS transformation, however, FirstMatch has several advantages:

Equality propagation works across semi-join bounds, but not subquery bounds. Therefore, converting a subquery to semi-join and using FirstMatch can still give a better execution plan. (TODO example)
There is only one way to apply the IN->EXISTS strategy and MySQL will do it unconditionally. With FirstMatch, the optimizer can make a choice between whether it should run the FirstMatch strategy as soon as all tables used in the subquery are in the join prefix, or at some later point in time. (TODO: example)

FirstMatch factsheet

The FirstMatch strategy works by executing the subquery and short-cutting its execution as soon as the first match is found.
This means, subquery tables must be after all of the parent select's tables that are referred from the subquery predicate.
EXPLAIN shows FirstMatch as "FirstMatch(tableN)".
The strategy can handle correlated subqueries.
But it cannot be applied if the subquery has meaningful GROUP BY and/or aggregate functions.
Use of the FirstMatch strategy is controlled with the firstmatch=on|off flag in the optimizer_switch variable.

Improvements to ORDER BY Optimization

Available tuning for ORDER BY with small LIMIT

In 2024, fix for MDEV-34720 has added optimizer_join_limit_pref_ratio optimization into MariaDB starting from 10.6. It allows one to enable extra optimization for ORDER BY with small LIMIT.

Older optimizations

MariaDB brought several improvements to the ORDER BY optimizer.

The fixes were made as a response to complaints by MariaDB customers, so they fix real-world optimization problems. The fixes are a bit hard to describe (as the ORDER BY optimizer is complicated), but here's a short description:

The ORDER BY optimizer:

Doesn’t make stupid choices when several multi-part keys and potential range accesses are present (MDEV-6402).
- This also fixes MySQL Bug#12113.
Always uses “range” and (not full “index” scan) when it switches to an index to satisfy ORDER BY … LIMIT (MDEV-6657).
Tries hard to be smart and use cost/number of records estimates from other parts of the optimizer (MDEV-6384, MDEV-465).
- This change also fixes MySQL Bug#36817.
Takes full advantage of InnoDB’s Extended Keys feature when checking if filesort() can be skipped (MDEV-6796).

Extra optimizations

The ORDER BY optimizer takes multiple-equalities into account (MDEV-8989). This optimization is enabled by default.

Comparison with MySQL 5.7

In MySQL 5.7 changelog, one can find this passage:

Make switching of index due to small limit cost-based (WL#6986) : We have made the decision in make_join_select() of whether to switch to a new index in order to support "ORDER BY ... LIMIT N" cost-based. This work fixes Bug#73837.

MariaDB is not using Oracle's fix (we believe make_join_select is not the right place to do ORDER BY optimization), but the effect is the same.

LooseScan Strategy

LooseScan is an execution strategy for Semi-join subqueries.

The idea

We will demonstrate the LooseScan strategy by example. Suppose, we're looking for countries that have satellites. We can get them using the following query (for the sake of simplicity we ignore satellites that are owned by consortiums of multiple countries):

SELECT * FROM Country  
WHERE 
  Country.code IN (SELECT country_code FROM Satellite)

Suppose, there is an index on Satellite.country_code. If we use that index, we will get satellites in the order of their owner country:

The LooseScan strategy doesn't really need ordering, what it needs is grouping. In the above figure, satellites are grouped by country. For instance, all satellites owned by Australia come together, without being mixed with satellites of other countries. This makes it easy to select just one satellite from each group, which you can join with its country and get a list of countries without duplicates:

LooseScan in action

The EXPLAIN output for the above query looks as follows:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select country_code from Satellite);
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
| id | select_type | table     | type   | possible_keys | key          | key_len | ref                          | rows | Extra                               |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+
|  1 | PRIMARY     | Satellite | index  | country_code  | country_code | 9       | NULL                         |  932 | Using where; Using index; LooseScan |
|  1 | PRIMARY     | Country   | eq_ref | PRIMARY       | PRIMARY      | 3       | world.Satellite.country_code |    1 | Using index condition               |
+----+-------------+-----------+--------+---------------+--------------+---------+------------------------------+------+-------------------------------------+

Factsheet

LooseScan avoids the production of duplicate record combinations by putting the subquery table first and using its index to select one record from multiple duplicates
Hence, in order for LooseScan to be applicable, the subquery should look like:

expr IN (SELECT tbl.keypart1 FROM tbl ...)

expr IN (SELECT tbl.keypart2 FROM tbl WHERE tbl.keypart1=const AND ...)

LooseScan can handle correlated subqueries
LooseScan can be switched off by setting the loosescan=off flag in the optimizer_switch variable.

_{This page is licensed: CC BY-SA / Gnu FDL}

Semi-join Materialization Strategy

Semi-join Materialization is a special kind of subquery materialization used for Semi-join subqueries. It actually includes two strategies:

Materialization/lookup
Materialization/scan

The idea

Consider a query that finds countries in Europe which have big cities:

SELECT * FROM Country 
WHERE Country.code IN (SELECT City.Country 
                       FROM City 
                       WHERE City.Population > 7*1000*1000)
      AND Country.continent='Europe'

The subquery is uncorrelated, that is, we can run it independently of the upper query. The idea of semi-join materialization is to do just that, and fill a temporary table with possible values of the City.country field of big cities, and then do a join with countries in Europe:

The join can be done in two directions:

From the materialized table to countries in Europe
From countries in Europe to the materialized table

The first way involves doing a full scan on the materialized table, so we call it "Materialization-scan".

If you run a join from Countries to the materialized table, the cheapest way to find a match in the materialized table is to make a lookup on its primary key (it has one: we used it to remove duplicates). Because of that, we call the strategy "Materialization-lookup".

Semi-join materialization in action

Materialization-Scan

If we chose to look for cities with a population greater than 7 million, the optimizer will use Materialization-Scan and EXPLAIN will show this:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where  City.Population > 7*1000*1000);
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key        | key_len | ref                | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
|  1 | PRIMARY      | <subquery2> | ALL    | distinct_key       | NULL       | NULL    | NULL               |   15 |                       |
|  1 | PRIMARY      | Country     | eq_ref | PRIMARY            | PRIMARY    | 3       | world.City.Country |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population | 4       | NULL               |   15 | Using index condition |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
3 rows in set (0.01 sec)

Here, you can see:

There are still two SELECTs (look for columns with id=1 and id=2)
The second select (with id=2) has select_type=MATERIALIZED. This means it will be executed and its results will be stored in a temporary table with a unique key over all columns. The unique key is there to prevent the table from containing any duplicate records.
The first select received the table name subquery2. This is the table that we got as a result of the materialization of the select with id=2.

The optimizer chose to do a full scan over the materialized table, so this is an example of a use of the Materialization-Scan strategy.

As for execution costs, we're going to read 15 rows from table City, write 15 rows to materialized table, read them back (the optimizer assumes there won't be any duplicates), and then do 15 eq_ref accesses to table Country. In total, we'll do 45 reads and 15 writes.

By comparison, if you run the EXPLAIN with semi-join optimizations disabled, you'll get this:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where  City.Population > 7*1000*1000);
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
| id | select_type        | table   | type  | possible_keys      | key        | key_len | ref  | rows | Extra                              |
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
|  1 | PRIMARY            | Country | ALL   | NULL               | NULL       | NULL    | NULL |  239 | Using where                        |
|  2 | DEPENDENT SUBQUERY | City    | range | Population,Country | Population | 4       | NULL |   15 | Using index condition; Using where |
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+

...which is a plan to do (239 + 239*15) = 3824 table reads.

Materialization-Lookup

Let's modify the query slightly and look for countries which have cities with a population over one millon (instead of seven):

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where  City.Population > 1*1000*1000) ;
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key          | key_len | ref  | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
|  1 | PRIMARY      | Country     | ALL    | PRIMARY            | NULL         | NULL    | NULL |  239 |                       |
|  1 | PRIMARY      | <subquery2> | eq_ref | distinct_key       | distinct_key | 3       | func |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population   | 4       | NULL |  238 | Using index condition |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
3 rows in set (0.00 sec)

The EXPLAIN output is similar to the one which used Materialization-scan, except that:

the <subquery2> table is accessed with the eq_ref access method
the access uses an index named distinct_key

This means that the optimizer is planning to do index lookups into the materialized table. In other words, we're going to use the Materialization-lookup strategy.

With optimizer_switch='semijoin=off,materialization=off'), one will get this EXPLAIN:

MariaDB [world]> EXPLAIN SELECT * FROM Country WHERE Country.code IN 
  (select City.Country from City where  City.Population > 1*1000*1000) ;
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
| id | select_type        | table   | type           | possible_keys      | key     | key_len | ref  | rows | Extra       |
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | Country | ALL            | NULL               | NULL    | NULL    | NULL |  239 | Using where |
|  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country | 3       | func |   18 | Using where |
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+

One can see that both plans will do a full scan on the Country table. For the second step, MariaDB will fill the materialized table (238 rows read from table City and written to the temporary table) and then do a unique key lookup for each record in table Country, which works out to 238 unique key lookups. In total, the second step will cost (239+238) = 477 reads and 238 temp.table writes.

Execution of the latter (DEPENDENT SUBQUERY) plan reads 18 rows using an index on City.Country for each record it receives for table Country. This works out to a cost of (18*239) = 4302 reads. Had there been fewer subquery invocations, this plan would have been better than the one with Materialization. By the way, MariaDB has an option to use such a query plan, too (see FirstMatch Strategy), but it did not choose it.

Subqueries with grouping

MariaDB is able to use Semi-join materialization strategy when the subquery has grouping (other semi-join strategies are not applicable in this case).

This allows for efficient execution of queries that search for the best/last element in a certain group.

For example, let's find cities that have the biggest population on their continent:

EXPLAIN 
SELECT * FROM City 
WHERE City.Population IN (SELECT max(City.Population) FROM City, Country 
                          WHERE City.Country=Country.Code 
                          GROUP BY Continent)
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
| id   | select_type  | table       | type | possible_keys | key        | key_len | ref                              | rows | Extra           |
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
|    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL       | NULL    | NULL                             |  239 |                 |
|    1 | PRIMARY      | City        | ref  | Population    | Population | 4       | <subquery2>.max(City.Population) |    1 |                 |
|    2 | MATERIALIZED | Country     | ALL  | PRIMARY       | NULL       | NULL    | NULL                             |  239 | Using temporary |
|    2 | MATERIALIZED | City        | ref  | Country       | Country    | 3       | world.Country.Code               |   18 |                 |
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
4 rows in set (0.00 sec)

the cities are:

+------+-------------------+---------+------------+
| ID   | Name              | Country | Population |
+------+-------------------+---------+------------+
| 1024 | Mumbai (Bombay)   | IND     |   10500000 |
| 3580 | Moscow            | RUS     |    8389200 |
| 2454 | Macao             | MAC     |     437500 |
|  608 | Cairo             | EGY     |    6789479 |
| 2515 | Ciudad de México | MEX     |    8591309 |
|  206 | São Paulo        | BRA     |    9968485 |
|  130 | Sydney            | AUS     |    3276207 |
+------+-------------------+---------+------------+

Factsheet

Semi-join materialization

Can be used for uncorrelated IN-subqueries. The subselect may use grouping and/or aggregate functions.
Is shown in EXPLAIN as type=MATERIALIZED for the subquery, and a line withtable=<subqueryN> in the parent subquery.
Is enabled when one has both materialization=on and semijoin=on in the optimizer_switch variable.
The materialization=on|off flag is shared with Non-semijoin materialization.

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimizations for Derived Tables

Optimize derived tables in MariaDB Server queries. This section provides techniques and strategies to improve the performance of subqueries and complex joins, enhancing overall query efficiency.

Condition Pushdown into Derived Table Optimization

If a query uses a derived table (or a view), the first action that the query optimizer will attempt is to apply the derived-table-merge-optimization and merge the derived table into its parent select. However, that optimization is only applicable when the select inside the derived table has a join as the top-level operation. If it has a GROUP-BY, DISTINCT, or uses window functions, then derived-table-merge-optimization is not applicable.

In that case, the Condition Pushdown optimization is applicable.

Introduction to Condition Pushdown

Consider an example

CREATE view OCT_TOTALS AS
SELECT
  customer_id,
  SUM(amount) AS TOTAL_AMT
FROM orders
WHERE  order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY customer_id;

SELECT * FROM OCT_TOTALS WHERE customer_id=1

The naive way to execute the above is to

Compute the OCT_TOTALS contents (for all customers).
The, select the line with customer_id=1

This is obviously inefficient, if there are 1000 customers, then one will be doing up to 1000 times more work than necessary.

However, the optimizer can take the condition customer_id=1 and push it down into the OCT_TOTALS view.

(TODO: elaborate here)

Condition Pushdown Properties

Condition Pushdown has been available since MariaDB 10.2.
The Jira task for it was MDEV-9197.
The optimization is enabled by default. One can disable it by setting @@optimizer_switch flag condition_pushdown_for_derived to OFF.

Derived Table Merge Optimization

Background

Users of "big" database systems are used to using FROM subqueries as a way to structure their queries. For example, if one's first thought was to select cities with population greater than 10,000 people, and then that from these cities to select those that are located in Germany, one could write this SQL:

SELECT * 
FROM 
  (SELECT * FROM City WHERE Population > 10*1000) AS big_city
WHERE 
  big_city.Country='DEU'

For MySQL, using such syntax was taboo. If you run EXPLAIN for this query, you can see why:

mysql> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000) 
  AS big_city WHERE big_city.Country='DEU' ;
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL | 4068 | Using where |
|  2 | DERIVED     | City       | ALL  | Population    | NULL | NULL    | NULL | 4079 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.60 sec)

It plans to do the following actions:

From left to right:

Execute the subquery: (SELECT * FROM City WHERE Population > 1*1000), exactly as it was written in the query.
Put result of the subquery into a temporary table.
Read back, and apply a WHERE condition from the upper select, big_city.Country='DEU'

Executing a subquery like this is very inefficient, because the highly-selective condition from the parent select, (Country='DEU') is not used when scanning the base table City. We read too many records from theCity table, and then we have to write them into a temporary table and read them back again, before finally filtering them out.

Derived table merge in action

If one runs this query in MariaDB/MySQL 5.6, they get this:

MariaDB [world]> EXPLAIN SELECT * FROM (SELECT * FROM City WHERE Population > 1*1000) 
  AS big_city WHERE big_city.Country='DEU';
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
| id | select_type | table | type | possible_keys      | key     | key_len | ref   | rows | Extra                              |
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
|  1 | SIMPLE      | City  | ref  | Population,Country | Country | 3       | const |   90 | Using index condition; Using where |
+----+-------------+-------+------+--------------------+---------+---------+-------+------+------------------------------------+
1 row in set (0.00 sec)

From the above, one can see that:

The output has only one line. This means that the subquery has been merged into the top-level SELECT.
Table City is accessed through an index on the Country column. Apparently, the Country='DEU' condition was used to construct ref access on the table.
The query will read about 90 rows, which is a big improvement over the 4079 row reads plus 4068 temporary table reads/writes we had before.

Factsheet

Derived tables (subqueries in the FROM clause) can be merged into their parent select when they have no grouping, aggregates, or ORDER BY ... LIMIT clauses. These requirements are the same as requirements for VIEWs to allow algorithm=merge.
The optimization is enabled by default. It can be disabled with:

SET @@optimizer_switch='derived_merge=OFF'

Versions of MySQL and MariaDB which do not have support for this optimization will execute subqueries even when running EXPLAIN. This can result in a well-known problem (see e.g. MySQL Bug #44802) of EXPLAIN statements taking a very long time. Starting from MariaDB 5.3+ and MySQL 5.6+ EXPLAIN commands execute instantly, regardless of the derived_merge setting.

Derived Table with Key Optimization

The idea

If a derived table cannot be merged into its parent SELECT, it will be materialized in a temporary table, and then parent select will treat it as a regular base table.

Before MariaDB 5.3/MySQL 5.6, the temporary table would never have any indexes, and the only way to read records from it would be a full table scan. Starting from the mentioned versions of the server, the optimizer has an option to create an index and use it for joins with other tables.

Example

Consider a query: we want to find countries in Europe, that have more than one million people living in cities. This is accomplished with this query:

SELECT * 
FROM
   Country, 
   (select 
       sum(City.Population) AS urban_population, 
       City.Country 
    FROM City 
    GROUP BY City.Country 
    HAVING 
    urban_population > 1*1000*1000
   ) AS cities_in_country
WHERE 
  Country.Code=cities_in_country.Country AND Country.Continent='Europe';

The EXPLAIN output for it will show:

+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
| id | select_type | table      | type | possible_keys     | key       | key_len | ref                | rows | Extra                           |
+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+
|  1 | PRIMARY     | Country    | ref  | PRIMARY,continent | continent | 17      | const              |   60 | Using index condition           |
|  1 | PRIMARY     | <derived2> | ref  | key0              | key0      | 3       | world.Country.Code |   17 |                                 |
|  2 | DERIVED     | City       | ALL  | NULL              | NULL      | NULL    | NULL               | 4079 | Using temporary; Using filesort |
+----+-------------+------------+------+-------------------+-----------+---------+--------------------+------+---------------------------------+

One can see here that

table <derived2> is accessed through key0.
ref column shows world.Country.Code
if we look that up in the original query, we find the equality that was used to construct ref access: Country.Code=cities_in_country.Country.

Factsheet

The idea of "derived table with key" optimization is to let the materialized derived table have one key which is used for joins with other tables.
The optimization is applied then the derived table could not be merged into its parent SELECT
- which happens when the derived table doesn't meet criteria for mergeable VIEW
The optimization is ON by default, it can be switched off like so:

SET optimizer_switch='derived_with_keys=off'

Lateral Derived Optimization

MariaDB supports the Lateral Derived optimization, also referred to as "Split Grouping Optimization" or "Split Materialized Optimization" in some sources.

Description

The optimization's use case is

The query uses a derived table (or a VIEW, or a non-recursive CTE)
The derived table/View/CTE has a GROUP BY operation as its top-level operation
The query only needs data from a few GROUP BY groups

An example of this: consider a VIEW that computes totals for each customer in October:

CREATE view OCT_TOTALS AS
SELECT
  customer_id,
  SUM(amount) AS TOTAL_AMT
FROM orders
WHERE
  order_date BETWEEN '2017-10-01' AND '2017-10-31'
GROUP BY
  customer_id;

And a query that does a join with the customer table to get October totals for "Customer#1" and Customer#2:

SELECT *
FROM
  customer, OCT_TOTALS
WHERE
  customer.customer_id=OCT_TOTALS.customer_id AND
  customer.customer_name IN ('Customer#1', 'Customer#2')

Before Lateral Derived optimization, MariaDB would execute the query as follows:

Materialize the view OCT_TOTALS. This essentially computes OCT_TOTALS for all customers.
Join it with table customer.

The EXPLAIN would look like so:

+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
| id   | select_type | table      | type  | possible_keys | key       | key_len | ref                       | rows  | Extra                    |
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+
|    1 | PRIMARY     | customer   | range | PRIMARY,name  | name      | 103     | NULL                      | 2     | Using where; Using index |
|    1 | PRIMARY     | <derived2> | ref   | key0          | key0      | 4       | test.customer.customer_id | 36    |                          |
|    2 | DERIVED     | orders     | index | NULL          | o_cust_id | 4       | NULL                      | 36738 | Using where              |
+------+-------------+------------+-------+---------------+-----------+---------+---------------------------+-------+--------------------------+

It is obvious that Step #1 is very inefficient: we compute totals for all customers in the database, while we will only need them for two customers. (If there are 1000 customers, we are doing 500x more work than needed here)

Lateral Derived optimization addresses this case. It turns the computation of OCT_TOTALS into what SQL Standard refers to as "LATERAL subquery": a subquery that may have dependencies on the outside tables. This allows pushing the equality customer.customer_id=OCT_TOTALS.customer_id down into the derived table/view, where it can be used to limit the computation to compute totals only for the customer of interest.

The query plan will look as follows:

Scan table customer and find customer_id for Customer#1 and Customer#2.
For each customer_id, compute the October totals, for this specific customer.

The EXPLAIN output will look like so:

+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
| id   | select_type     | table      | type  | possible_keys | key       | key_len | ref                       | rows | Extra                    |
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+
|    1 | PRIMARY         | customer   | range | PRIMARY,name  | name      | 103     | NULL                      | 2    | Using where; Using index |
|    1 | PRIMARY         | <derived2> | ref   | key0          | key0      | 4       | test.customer.customer_id | 2    |                          |
|    2 | LATERAL DERIVED | orders     | ref   | o_cust_id     | o_cust_id | 4       | test.customer.customer_id | 1    | Using where              |
+------+-----------------+------------+-------+---------------+-----------+---------+---------------------------+------+--------------------------+

Note the line with id=2: select_type is LATERAL DERIVED. And table customer uses ref access referring to customer.customer_id, which is normally not allowed for derived tables.

In EXPLAIN FORMAT=JSON output, the optimization is shown like so:

...
        "table": {
          "table_name": "<derived2>",
          "access_type": "ref",
...
          "materialized": {
            "lateral": 1,

Note the "lateral": 1 member.

Controlling the Optimization

Lateral Derived is enabled by default, the optimizer will make a cost-based decision whether the optimization should be used.

If you need to disable the optimization, it has an optimizer_switch flag. It can be disabled like so:

SET optimizer_switch='split_materialized=off'

References

Jira task: MDEV-13369
Commit: b14e2b044b

_{This page is licensed: CC BY-SA / Gnu FDL}

Split Materialized Optimization

This is another name for Lateral Derived Optimization.

_{This page is licensed: CC BY-SA / Gnu FDL}

Statistics for Optimizing Queries

Utilize statistics to optimize queries in MariaDB Server. This section explains how the database uses statistical information to generate efficient query execution plans and improve performance.

Engine-Independent Table Statistics

Introduction

Before MariaDB 10.0, the MySQL/MariaDB optimizer relied on storage engines (e.g. InnoDB) to provide statistics for the query optimizer. This approach worked; however it had some deficiencies:

Storage engines provided poor statistics (this was fixed to some degree with the introduction of Persistent Statistics).
The statistics were supplied through the MySQL Storage Engine Interface, which puts a lot of restrictions on what kind of data is supplied (for example, there is no way to get any data about value distribution in a non-indexed column)
There was little control of the statistics. There was no way to "pin" current statistic values, or provide some values on your own, etc.

Engine-independent table statistics lift these limitations.

Statistics are stored in regular tables in the mysql database.
- it is possible for a DBA to read and update the values.
More data is collected/used.

Histogram-based statistics are a subset of engine-independent table statistics that can improve the query plan chosen by the optimizer in certain situations.

Statistics are stored in three tables, mysql.table_stats, mysql.column_stats and mysql.index_stats.

Use or update of data from these tables is controlled by use_stat_tables variable. Possible values are listed below:

Value

Meaning

'never'

The optimizer doesn't use data from statistics tables.

'complementary'

The optimizer uses data from statistics tables if the same kind of data is not provided by the storage engine.

'preferably'

Prefer the data from statistics tables, if it's not available there, use the data from the storage engine.

'complementary_for_queries'

Same as complementary, but for queries only (to avoid needlessly collecting for ANALYZE TABLE).

'preferably_for_queries'

Same as preferably, but for queries only (to avoid needlessly collecting for ANALYZE TABLE). Default.

Collecting Statistics with the ANALYZE TABLE Statement

Engine-independent statistics are collected by doing full table and full index scans, and this process can be quite expensive.

The ANALYZE TABLE statement can be used to collect table statistics. However, simply running ANALYZE TABLE table_name does not collect engine-independent (or histogram) statistics by default.

When the ANALYZE TABLE statement is executed, MariaDB makes a call to the table's storage engine, and the storage engine collects its own statistics for the table. The specific behavior depends on the storage engine. For the default InnoDB storage engine, see InnoDB Persistent Statistics for more information.

ANALYZE TABLE may also collect engine-independent statistics for the table. The specific behavior depends on the value of the use_stat_tables system variable. Engine-independent statistics will only be collected if one of the following is true:

The use_stat_tables system variable is set to complementary or preferably.
The ANALYZE TABLE statement includes the PERSISTENT FOR clause.

The use_stat_tables system variable is set to preferably_for_queries by default. With this value, engine-independent statistics are used by default if available, but they are not collected by default. If you want to use engine-independent statistics with the default configuration, then you will have to collect them by executing the ANALYZE TABLE statement and by specifying the PERSISTENT FOR clause. It is recommended to collect engine-independent statistics on as-needed basis, so typically one will not have engine-independent statistics for all indexes/all columns.

When to collect statistics is very dependent on the dataset. If data changes frequently it may be necessary to collect statistics more frequently, and the benefits may be very noticeable (see This one trick can make MariaDB 30x faster!). If the data distribution is relatively static, the costs of collecting may outweigh any benefits.

Collecting Statistics for Specific Columns or Indexes

The syntax for the ANALYZE TABLE statement has been extended with the PERSISTENT FOR clause. This clause allows one to collect engine-independent statistics only for particular columns or indexes. This clause also allows one to collect engine-independent statistics, regardless of the value of the use_stat_tables system variable. For example:

ANALYZE TABLE table_name PERSISTENT FOR ALL;

Statistics for columns using the BLOB and TEXT data types are not collected. If a column using one of these types is explicitly specified, then a warning is returned.

Examples of Statistics Collection

-- update all engine-independent statistics for all columns and indexes
ANALYZE TABLE tbl PERSISTENT FOR ALL;

-- update specific columns and indexes:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES (idx1,idx2,...);

-- empty lists are allowed:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS (col1,col2,...) INDEXES ();
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES (idx1,idx2,...);

-- the following will only update mysql.table_stats fields:
ANALYZE TABLE tbl PERSISTENT FOR COLUMNS () INDEXES ();

-- when use_stat_tables is set to 'COMPLEMENTARY' or 'PREFERABLY', 
-- a simple ANALYZE TABLE  collects engine-independent statistics for all columns and indexes.
SET SESSION use_stat_tables='COMPLEMENTARY';
ANALYZE TABLE tbl;

Manual Updates to Statistics Tables

Statistics are stored in three tables, mysql.table_stats, mysql.column_stats and mysql.index_stats.

It is possible to update statistics tables manually. One should modify the table(s) with regular INSERT/UPDATE/DELETE statements. Statistics data will be re-read when the tables are re-opened. One way to force all tables to be re-opened is to issue FLUSH TABLES command.

A few scenarios where one might need to update statistics tables manually:

Deleting the statistics. Currently, the ANALYZE TABLE command will collect the statistics, but there is no special command to delete statistics.
Running ANALYZE on a different server. To collect engine-independent statistics ANALYZE TABLE does a full table scan, which can put too much load on the server. It is possible to run ANALYZE on the slave, and then take the data from statistics tables on the slave and apply it on the master.
In some cases, knowledge of the database allows one to compute statistics manually in a more efficient way than ANALYZE does. One can compute the statistics manually and put it into the database.

Histogram-Based Statistics

Histogram-based statistics are a mechanism to improve the query plan chosen by the optimizer in certain situations. Before their introduction, all conditions on non-indexed columns were ignored when searching for the best execution plan. Histograms can be collected for both indexed and non-indexed columns, and are made available to the optimizer.

Histogram statistics are stored in the mysql.column_stats table, which stores data for engine-independent table statistics, and so are essentially a subset of engine-independent table statistics.

Histograms are used by default from MariaDB 10.4.3 if they are available. However, histogram statistics are not automatically collected, as collection is expensive, requiring a full table scan. See Collecting Statistics with the ANALYZE TABLE Statement for details.

Consider this example, using the following query:

SELECT * FROM t1,t2 WHERE t1.a=t2.a AND t2.b BETWEEN 1 AND 3;

Let's assume that

table t1 contains 100 records
table t2 contains 1000 records
there is a primary index on t1(a)
there is a secondary index on t2(a)
there is no index defined on column t2.b
the selectivity of the condition t2.b BETWEEN (1,3) is high (~ 1%)

Before histograms were introduced, the optimizer would choose the plan that:

accesses t1 using a table scan
accesses t2 using index t2(a)
checks the condition t2.b BETWEEN 1 AND 3

This plan examines all rows of both tables and performs 100 index look-ups.

With histograms available, the optimizer can choose the following, more efficient plan:

accesses table t2 in a table scan
checks the condition t2.b BETWEEN 1 AND 3
accesses t1 using index t1(a)

This plan also examine all rows from t2, but it performs only 10 look-ups to access 10 rows of table t1.

System Variables

There are a number of system variables that affect histograms.

histogram_size

The histogram_size variable determines the size, in bytes, from 0 to 255, used for a histogram. This is effectively the number of bins for histogram_type=SINGLE_PREC_HB or number of bins/2 for histogram_type=DOUBLE_PREC_HB. If it is set to 0 (the default for MariaDB 10.4.2 and below), no histograms are created when running an ANALYZE TABLE.

histogram_type

The histogram_type variable determines whether single precision (SINGLE_PREC_HB) or double precision (DOUBLE_PREC_HB) height-balanced histograms are created. From MariaDB 10.4.3, double precision is the default. For MariaDB 10.4.2 and below, single precision is the default.

From MariaDB 10.8, JSON_HB, JSON-format histograms, are accepted.

optimizer_use_condition_selectivity

The optimizer_use_condition_selectivity controls which statistics can be used by the optimizer when looking for the best query execution plan.

1 Use selectivity of predicates as in MariaDB 5.5.
2 Use selectivity of all range predicates supported by indexes.
3 Use selectivity of all range predicates estimated without histogram.
4 Use selectivity of all range predicates estimated with histogram.
5 Additionally use selectivity of certain non-range predicates calculated on record sample.

From MariaDB 10.4.1, the default is 4. Until MariaDB 10.4.0, the default is 1.

Example

Here is an example of the dramatic impact histogram-based statistics can make. The query is based on DBT3 Benchmark Q20 with 60 millions records in the lineitem table.

SELECT SQL_CALC_FOUND_ROWS s_name, s_address FROM 
supplier, nation WHERE 
  s_suppkey IN
    (select ps_suppkey from partsupp where
      ps_partkey IN (SELECT p_partkey FROM part WHERE 
         p_name LIKE 'forest%') AND 
    ps_availqty > 
      (select 0.5 * sum(l_quantity) from lineitem where
        l_partkey = ps_partkey AND l_suppkey = ps_suppkey AND
        l_shipdate >= DATE('1994-01-01') AND
        l_shipdate < DATE('1994-01-01') + INTERVAL '1' year ))
  AND s_nationkey = n_nationkey
  AND n_name = 'CANADA'
  ORDER BY s_name
  LIMIT 10;

First,

SET optimizer_switch='materialization=off,semijoin=off';

+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table    | type  |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | nation   | ALL   |...| 25   |100.00 | Using where;...
| 1 | PRIMARY | supplier | ref   |...| 1447 |100.00 | Using where; Subq
| 2 | DEP SUBQ| partsupp | idxsq |...| 38   |100.00 | Using where
| 4 | DEP SUBQ| lineitem | ref   |...| 3    |100.00 | Using where
| 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
+---+-------- +----------+-------+...+------+----------+------------

10 ROWS IN SET
(51.78 sec)

Next, a really bad plan, yet one sometimes chosen:

+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table    | type  |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | supplier | ALL   |...|100381|100.00 | Using where; Subq
| 1 | PRIMARY | nation   | ref   |...| 1    |100.00 | Using where
| 2 | DEP SUBQ| partsupp | idxsq |...| 38   |100.00 | Using where
| 4 | DEP SUBQ| lineitem | ref   |...| 3    |100.00 | Using where
| 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
+---+-------- +----------+-------+...+------+----------+------------

10 ROWS IN SET
(7 min 33.42 sec)

Persistent statistics don't improve matters:

SET use_stat_tables='preferably';
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table    | type  |...| rows | filt | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | supplier | ALL   |...|10000 |100.00 | Using where;
| 1 | PRIMARY | nation   | ref   |...| 1    |100.00 | Using where
| 2 | DEP SUBQ| partsupp | idxsq |...| 80   |100.00 | Using where
| 4 | DEP SUBQ| lineitem | ref   |...| 7    |100.00 | Using where
| 3 | DEP SUBQ| part     | unqsb |...| 1    |100.00 | Using where
+---+-------- +----------+-------+...+------+----------+------------

10 ROWS IN SET
(7 min 40.44 sec)

The default flags for optimizer_switch do not help much:

SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table    | type  |...| rows  | filt  | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | supplier | ALL   |...|10000  |100.00 | Using where;
| 1 | PRIMARY | nation   | ref   |...| 1     |100.00 | Using where
| 1 | PRIMARY | <subq2>  | eq_ref|...| 1     |100.00 |
| 2 | MATER   | part     | ALL   |.. |2000000|100.00 | Using where
| 2 | MATER   | partsupp | ref   |...| 4     |100.00 | Using where; Subq
| 4 | DEP SUBQ| lineitem | ref   |...| 7     |100.00 | Using where
+---+-------- +----------+-------+...+------+----------+------------

10 ROWS IN SET
(5 min 21.44 sec)

Using statistics doesn't help either:

SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
SET optimizer_use_condition_selectivity=4;

+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table    | type  |...| rows  | filt  | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | nation   | ALL   |...| 25    |4.00   | Using where
| 1 | PRIMARY | supplier | ref   |...| 4000  |100.00 | Using where;
| 1 | PRIMARY | <subq2>  | eq_ref|...| 1     |100.00 |
| 2 | MATER   | part     | ALL   |.. |2000000|1.56   | Using where
| 2 | MATER   | partsupp | ref   |...| 4     |100.00 | Using where; Subq
| 4 | DEP SUBQ| lineitem | ref   |...| 7     | 30.72 | Using where
+---+-------- +----------+-------+...+------+----------+------------

10 ROWS IN SET
(5 min 22.41 sec)

Now, taking into account the cost of the dependent subquery:

SET optimizer_switch='materialization=DEFAULT,semijoin=DEFAULT';
SET optimizer_use_condition_selectivity=4;
SET optimizer_switch='expensive_pred_static_pushdown=ON';
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table    | type  |...| rows | filt  | Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | nation   | ALL   |...| 25   | 4.00  | Using where
| 1 | PRIMARY | supplier | ref   |...| 4000 |100.00 | Using where;
| 2 | PRIMARY | partsupp | ref   |...| 80   |100.00 |
| 2 | PRIMARY | part     | eq_ref|...| 1    | 1.56  | where; Subq; FM
| 4 | DEP SUBQ| lineitem | ref   |...| 7    | 30.72 | Using where
+---+-------- +----------+-------+...+------+----------+------------

10 ROWS IN SET
(49.89 sec)

Finally, using join_buffer as well:

SET optimizer_switch= 'materialization=DEFAULT,semijoin=DEFAULT';
SET optimizer_use_condition_selectivity=4;
SET optimizer_switch='expensive_pred_static_pushdown=ON';
SET join_cache_level=6;
SET optimizer_switch='mrr=ON';
SET optimizer_switch='mrr_sort_keys=ON';
SET join_buffer_size=1024*1024*16;
SET join_buffer_space_limit=1024*1024*32;
+---+-------- +----------+-------+...+------+----------+------------
| id| sel_type| table    | type  |...| rows | filt |  Extra
+---+-------- +----------+-------+...+------+----------+------------
| 1 | PRIMARY | nation   | AL  L |...| 25   | 4.00  | Using where
| 1 | PRIMARY | supplier | ref   |...| 4000 |100.00 | where; BKA
| 2 | PRIMARY | partsupp | ref   |...| 80   |100.00 | BKA
| 2 | PRIMARY | part     | eq_ref|...| 1    | 1.56  | where Sq; FM; BKA
| 4 | DEP SUBQ| lineitem | ref   |...| 7    | 30.72 | Using where
+---+-------- +----------+-------+...+------+----------+------------

10 ROWS IN SET
(35.71 sec)

InnoDB Persistent Statistics

InnoDB statistics are stored on disk and are therefore persistent. Prior to MariaDB 10.0, InnoDB statistics were not stored on disk, meaning that on server restarts the statistics would need to be recalculated, which is both needless computation, as well as leading to inconsistent query plans.

There are a number of variables that control persistent statistics:

innodb_stats_persistent - when set (the default) enables InnoDB persistent statistics.
innodb_stats_auto_recalc - when set (the default), persistent statistics are automatically recalculated when the table changes significantly (more than 10% of the rows)
innodb_stats_persistent_sample_pages - Number of index pages sampled (default 20) when estimating cardinality and statistics for indexed columns. Increasing this value will increases index statistics accuracy, but use more I/O resources when running ANALYZE TABLE.

These settings can be overwritten on a per-table basis by use of the STATS_PERSISTENT, STATS_AUTO_RECALC and STATS_SAMPLE_PAGES clauses in a CREATE TABLE or ALTER TABLE statement.

Details of the statistics are stored in two system tables in the mysql database:

The ANALYZE TABLE statement can be used to recalculate InnoDB statistics.

The RENAME TABLE statement triggers a reload of the statistics.

MariaDB starting with 10.11.12

Prior to MariaDB 10.11.12, MariaDB 11.4.6 and MariaDB 11.8.2, FLUSH TABLES also caused InnoDB statistics to be reloaded. From MariaDB 10.11.12, MariaDB 11.4.6 and MariaDB 11.8.2, this is no longer the case.

Slow Query Log Extended Statistics

Overview

Added extra logging to slow log of 'Thread_id, Schema, Query Cache hit, Rows sent and Rows examined'
Added optional logging to slow log, through log_slow_verbosity, of query plan statistics
Added new session variables log_slow_rate_limit, log_slow_verbosity, log_slow_filter
Added log-slow-file as synonym for 'slow-log-file', as most slow-log variables starts with 'log-slow'
Added log-slow-time as synonym for long-query-time.

Session Variables

log_slow_verbosity

You can set the verbosity of what's logged to the slow query log by setting the log_slow_verbosity variable to a combination of the following values:

All (From MariaDB 10.6.16)
- Enable all verbosity options.
Query_plan
- For select queries, log information about the query plan. This includes "Full_scan", "Full_join", "Tmp_table", "Tmp_table_on_disk", "Filesort", "Filesort_on_disk" and number of "Merge_passes during sorting"
explain
- EXPLAIN output is logged in the slow query log. See explain-in-the-slow-query-log for details.
Innodb (From MariaDB 10.6.15. Before that this option did nothing)
- Kept for compatibility. Same as engine.
engine (From MariaDB 10.6.15)
- Writes statistics from the storage engine. This includes:

Option

Description

Engine

Pages_accessed

Number of pages accessed from page buffer (innodb-buffer-pool / key cache)

InnoDB

Pages_updated

Number of pages updated in memory

InnoDB

Pages_read_time

Milliseconds spend reading pages from storage

InnoDB

Old_rows_read

Number of retrieval of old versions of rows in the engine (versioning)

InnoDB

Engine_time

Milliseconds spent inside engine calls (read_row / read_next_row etc)

All

Warnings (From MariaDB 10.6.16)
- Print all errors, warnings and notes related to statement, up to log_slow_max_warnings lines.
full.
- Old shortcut to enable all the verbosity options

The default value for log_slow_verbosity is ' ', to be compatible with MySQL 5.1.

The possible values for log_slow_verbosity areinnodb,query_plan,explain,engine,warnings. Multiple options are separated by ','. log_slow_verbosity is not supported when log_output='TABLE'.

In the future we will add more engine statistics and also support for other engines.

log_slow_filter

You can define which queries to log to the slow query log by setting the variable log_slow_filter to a combination of the following values:

All (From MariaDB 10.6.16)
- Enable all filter options. log_slow_filter will be shown as having all options set.
admin
- Log administrative statements (create, optimize, drop etc...)
- log_slow_admin_statements maps to this option.
filesort
- Log statement if it uses filesort
filesort_on_disk
- Log statement if it uses filesort that needs temporary tables on disk
filesort_priority_queue (from MariaDB 10.3.2)
- Log statement if it uses filesort with priority_queue (filesort can either use disk or priority queue).
full_join
- Log statements that doesn't uses indexes to join tables
full_scan
- Log statements that uses full table scans
not_using_index (From MariaDB 10.3.1)
- Logs queries that don't use an index, or that perform a full index scan where the index doesn't limit the number of rows
- Disregards long_query_time, unlike other options!
- log_queries_not_using_indexes maps to this option
query_cache
- Log statements that are resolved by the query cache
query_cache_miss
- Log statements that are not resolved by the query cache
tmp_table
- Log statements that uses in memory temporary tables
tmp_table_on_disk
- Log statements that uses temporary tables on disk

Multiple options are separated by ','. If you don't specify any options everything will be logged (same as setting the value to All

log_slow_rate_limit

The log_slow_rate_limit variable limits logging to the slow query log by not logging every query (only one query / log_slow_rate_limit is logged). This is mostly useful when debugging and you get too much information to the slow query log.

Note that in any case, only queries that takes longer than log_slow_time orlong_query_time' are logged (as before).

log_slow_max_warnings

MariaDB starting with 10.6.16

If one enables the warning option for log_slow_verbosity, all notes and warnings for a slow query will also be added to the slow query log. This is very usable when one has enabled warnings for Notes when an index cannot be used.log_slow_max_warnings limits the number of warnings printed to the slow query log per query. The default value is 10.

Credits

Part of this addition is based on themicroslow patch from Percona.

User Statistics

The User Statistics (userstat) plugin creates the USER_STATISTICS, CLIENT_STATISTICS, the INDEX_STATISTICS, and the TABLE_STATISTICS tables in the INFORMATION_SCHEMA database. As an alternative to these tables, the plugin also adds the SHOW USER_STATISTICS, the SHOW CLIENT_STATISTICS, the SHOW INDEX_STATISTICS, and the SHOW TABLE_STATISTICS statements.

These tables and commands can be used to understand the server activity better and to identify the sources of your database's load.

The plugin also adds the FLUSH USER_STATISTICS, FLUSH CLIENT_STATISTICS, FLUSH INDEX_STATISTICS, and FLUSH TABLE_STATISTICS statements.

The MariaDB implementation of this plugin is based on the userstatv2 patch from Percona and Ourdelta. The original code comes from Google (Mark Callaghan's team) with additional work from Percona, Ourdelta, and Weldon Whipple. The MariaDB implementation provides the same functionality as the userstatv2 patch but a lot of changes have been made to make it faster and to better fit the MariaDB infrastructure.

How it Works

The userstat plugin works by keeping several hash tables in memory. All variables are incremented while the query is running. At the end of each statement the global values are updated.

Enabling the Plugin

By default statistics are not collected. This is to ensure that statistics collection does not cause any extra load on the server unless desired.

Set the userstat=ON system variable in a relevant server option group in an option file to enable the plugin. For example:

[mariadb]
...
userstat = 1

The value can also be changed dynamically. For example:

SET GLOBAL userstat=1;

Using the Plugin

Using the Information Schema Table

The userstat plugin creates the USER_STATISTICS, CLIENT_STATISTICS, the INDEX_STATISTICS, and the TABLE_STATISTICS tables in the INFORMATION_SCHEMA database.

SELECT * FROM INFORMATION_SCHEMA.USER_STATISTICS\G
*************************** 1. row ***************************
                  USER: root
     TOTAL_CONNECTIONS: 1
CONCURRENT_CONNECTIONS: 0
        CONNECTED_TIME: 297
             BUSY_TIME: 0.001725
              CPU_TIME: 0.001982
        BYTES_RECEIVED: 388
            BYTES_SENT: 2327
  BINLOG_BYTES_WRITTEN: 0
             ROWS_READ: 0
             ROWS_SENT: 12
          ROWS_DELETED: 0
         ROWS_INSERTED: 13
          ROWS_UPDATED: 0
       SELECT_COMMANDS: 4
       UPDATE_COMMANDS: 0
        OTHER_COMMANDS: 3
   COMMIT_TRANSACTIONS: 0
 ROLLBACK_TRANSACTIONS: 0
    DENIED_CONNECTIONS: 0
      LOST_CONNECTIONS: 0
         ACCESS_DENIED: 0
         EMPTY_QUERIES: 1

SELECT * FROM INFORMATION_SCHEMA.CLIENT_STATISTICS\G
*************************** 1. row ***************************
                CLIENT: localhost
     TOTAL_CONNECTIONS: 3
CONCURRENT_CONNECTIONS: 0
        CONNECTED_TIME: 4883
             BUSY_TIME: 0.009722
              CPU_TIME: 0.0102131
        BYTES_RECEIVED: 841
            BYTES_SENT: 13897
  BINLOG_BYTES_WRITTEN: 0
             ROWS_READ: 0
             ROWS_SENT: 214
          ROWS_DELETED: 0
         ROWS_INSERTED: 207
          ROWS_UPDATED: 0
       SELECT_COMMANDS: 10
       UPDATE_COMMANDS: 0
        OTHER_COMMANDS: 13
   COMMIT_TRANSACTIONS: 0
 ROLLBACK_TRANSACTIONS: 0
    DENIED_CONNECTIONS: 0
      LOST_CONNECTIONS: 0
         ACCESS_DENIED: 0
         EMPTY_QUERIES: 1
1 row in set (0.00 sec)

SELECT * FROM INFORMATION_SCHEMA.INDEX_STATISTICS WHERE TABLE_NAME = "author";
+--------------+------------+------------+-----------+
| TABLE_SCHEMA | TABLE_NAME | INDEX_NAME | ROWS_READ |
+--------------+------------+------------+-----------+
| books        | author     | by_name    |        15 |
+--------------+------------+------------+-----------+

SELECT * FROM INFORMATION_SCHEMA.TABLE_STATISTICS WHERE TABLE_NAME='user';
+--------------+------------+-----------+--------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
+--------------+------------+-----------+--------------+------------------------+
| mysql        | user       |         5 |            2 |                      2 |
+--------------+------------+-----------+--------------+------------------------+

Using the SHOW Statements

As an alternative to the INFORMATION_SCHEMA tables, the userstat plugin also adds the SHOW USER_STATISTICS, the SHOW CLIENT_STATISTICS, the SHOW INDEX_STATISTICS, and the SHOW TABLE_STATISTICS statements.

These commands are another way to display the information stored in the information schema tables. WHERE clauses are accepted. LIKE clauses are accepted but ignored.

SHOW USER_STATISTICS
SHOW CLIENT_STATISTICS
SHOW INDEX_STATISTICS
SHOW TABLE_STATISTICS

Flushing Plugin Data

The userstat plugin also adds the FLUSH USER_STATISTICS, FLUSH CLIENT_STATISTICS, FLUSH INDEX_STATISTICS, and FLUSH TABLE_STATISTICS statements, which discard the information stored in the specified information schema table.

FLUSH USER_STATISTICS
FLUSH CLIENT_STATISTICS
FLUSH INDEX_STATISTICS
FLUSH TABLE_STATISTICS

Versions

USER_STATISTICS

Version

Status

Introduced

2.0

Stable

MariaDB 10.1.18

2.0

Gamma

MariaDB 10.1.1

CLIENT_STATISTICS

Version

Status

Introduced

2.0

Stable

MariaDB 10.1.13

2.0

Gamma

MariaDB 10.1.1

INDEX_STATISTICS

Version

Status

Introduced

2.0

Stable

MariaDB 10.1.13

2.0

Gamma

MariaDB 10.1.1

TABLE_STATISTICS

Version

Status

Introduced

2.0

Stable

MariaDB 10.1.18

2.0

Gamma

MariaDB 10.1.1

System Variables

`userstat`

Description: If set to 1, user statistics will be activated.
Commandline: --userstat=1
Scope: Global
Dynamic: Yes
Data Type: boolean
Default Value: OFF

Status Variables

User Statistics introduced a number of new status variables:

access_denied_errors
binlog_bytes_written
busy_time (requires userstat to be set to be recorded)
cpu_time (requires userstat to be set to be recorded)
empty_queries
rows_read
rows_sent

_{This page is licensed: CC BY-SA / Gnu FDL}

Subquery Optimizations

Optimize subqueries in MariaDB Server for improved performance. This section provides techniques and best practices to ensure your nested queries execute efficiently and enhance overall query speed.

Condition Pushdown Into IN subqueries

This article describes Condition Pushdown into IN subqueries as implemented in MDEV-12387.

optimizer_switch flag name: condition_pushdown_for_subquery.

_{This page is licensed: CC BY-SA / Gnu FDL}

Conversion of Big IN Predicates Into Subqueries

Starting from MariaDB 10.3, the optimizer converts certain big IN predicates into IN subqueries.

That is, an IN predicate in the form

COLUMN [NOT] IN (const1, const2, .... )

is converted into an equivalent IN-subquery:

column [NOT] IN (select ... from temporary_table)

which opens new opportunities for the query optimizer.

The conversion happens if the following conditions are met:

the IN list has more than 1000 elements (One can control it through the in_predicate_conversion_threshold parameter).
the [NOT] IN condition is at the top level of the WHERE/ON clause.

Controlling the Optimization

The optimization is on by default. MariaDB 10.3.18 (and debug builds prior to that) introduced the in_predicate_conversion_threshold variable. Set to 0 to disable the optimization.

Benefits of the Optimization

If column is a key-prefix, MariaDB optimizer will process the condition

COLUMN [NOT] IN (const1, const2, .... )

by trying to construct a range access. If the list is large, the analysis may take a lot of memory and CPU time. The problem gets worse when column is a part of a multi-column index and the query has conditions on other parts of the index.

Conversion of IN predicates into subqueries bypass the range analysis, which means the query optimization phase will use less CPU and memory.

Possible disadvantages of the conversion are are:

The optimization may convert 'IN LIST elements' key accesses to a table scan (if there is no other usable index for the table)
The estimates for the number of rows matching the IN (...) are less precise.

Links

MDEV-12176

_{This page is licensed: CC BY-SA / Gnu FDL}

EXISTS-to-IN Optimization

MySQL (including MySQL 5.6) has only one execution strategy for EXISTS subqueries. The strategy is essentially the straightforward, "naive" execution, without any rewrites.

MariaDB 5.3 introduced a rich set of optimizations for IN subqueries. Since then, it makes sense to convert an EXISTS subquery into an IN so that the new optimizations can be used.

EXISTS will be converted into IN in two cases:

Trivially correlated EXISTS subqueries
Semi-join EXISTS

We will now describe these two cases in detail

Trivially-correlated EXISTS subqueries

Often, EXISTS subquery is correlated, but the correlation is trivial. The subquery has form

EXISTS (SELECT ...  FROM ... WHERE outer_col= inner_col AND inner_where)

and "outer_col" is the only place where the subquery refers to outside fields. In this case, the subquery can be re-written into uncorrelated IN:

outer_col IN (SELECT inner_col FROM ... WHERE inner_where)

(NULL values require some special handling, see below). For uncorrelated IN subqueries, MariaDB is able a cost-based choice between two execution strategies:

IN-to-EXISTS (basically, convert back into EXISTS)
Materialization

That is, converting trivially-correlated EXISTS into uncorrelated IN gives query optimizer an option to use Materialization strategy for the subquery.

Currently, EXISTS->IN conversion works only for subqueries that are at top level of the WHERE clause, or are under NOT operation which is directly at top level of the WHERE clause.

Semi-join EXISTS subqueries

If EXISTS subquery is an AND-part of the WHERE clause:

SELECT ... FROM outer_tables WHERE EXISTS (SELECT ...) AND ...

then it satisfies the main property of semi-join subqueries:

with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery

Semi-join optimizer offers a rich set of execution strategies for both correlated and uncorrelated subqueries. The set includes FirstMatch strategy which is an equivalent of how EXISTS suqueries are executed, so we do not lose any opportunities when converting an EXISTS subquery into a semi-join.

In theory, it makes sense to convert all kinds of EXISTS subqueries: convert both correlated and uncorrelated ones, convert irrespectively of whether the subquery has inner=outer equality.

In practice, the subquery will be converted only if it has inner=outer equality. Both correlated and uncorrelated subqueries are converted.

Handling of NULL values

TODO: rephrase this:

IN has complicated NULL-semantics. NOT EXISTS doesn't.
EXISTS-to-IN adds IS NOT NULL before the subquery predicate, when required

Control

The optimization is controlled by the exists_to_in flag in optimizer_switch. Before MariaDB 10.0.12, the optimization was OFF by default. Since MariaDB 10.0.12, it has been ON by default.

Limitations

EXISTS-to-IN doesn't handle

subqueries that have GROUP BY, aggregate functions, or HAVING clause
subqueries are UNIONs
a number of degenerate edge cases

_{This page is licensed: CC BY-SA / Gnu FDL}

Non-semi-join Subquery Optimizations

Certain kinds of IN-subqueries cannot be flattened into semi-joins. These subqueries can be both correlated or non-correlated. In order to provide consistent performance in all cases, MariaDB provides several alternative strategies for these types of subqueries. Whenever several strategies are possible, the optimizer chooses the optimal one based on cost estimates.

The two primary non-semi-join strategies are materialization (also called outside-in materialization), and in-to-exists transformation. Materialization is applicable only for non-correlated subqueries, while in-to-exist can be used both for correlated and non-correlated subqueries.

Applicability

An IN subquery cannot be flattened into a semi-join in the following cases. The examples below use the World database from the MariaDB regression test suite.

Subquery in a disjunction (OR)

The subquery is located directly or indirectly under an OR operation in the WHERE clause of the outer query.

Query pattern:

SELECT ... FROM ... WHERE (expr1, ..., exprN) [NOT] IN (SELECT ... ) OR expr;

Example:

SELECT Name FROM Country
WHERE (Code IN (SELECT Country FROM City WHERE City.Population > 100000) OR
       Name LIKE 'L%') AND
      surfacearea > 1000000;

Negated subquery predicate (NOT IN)

The subquery predicate itself is negated.

Query pattern:

SELECT ... FROM ... WHERE ... (expr1, ..., exprN) NOT IN (SELECT ... ) ...;

Example:

SELECT Country.Name
FROM Country, CountryLanguage 
WHERE Code NOT IN (SELECT Country FROM CountryLanguage WHERE Language = 'English')
  AND CountryLanguage.Language = 'French'
  AND Code = Country;

Subquery in the SELECT or HAVING clause

The subquery is located in the SELECT or HAVING clauses of the outer query.

Query pattern:

SELECT field1, ..., (SELECT ...)  WHERE ...;
SELECT ...  WHERE ... HAVING (SELECT ...);

Example:

SELECT Name, City.id IN (SELECT capital FROM Country WHERE capital IS NOT NULL) AS is_capital
FROM City
WHERE City.population > 10000000;

Subquery with a UNION

The subquery itself is a UNION, while the IN predicate may be anywhere in the query where IN is allowed.

Query pattern:

... [NOT] IN (SELECT ... UNION SELECT ...)

Example:

SELECT * FROM City WHERE (Name, 91) IN
(SELECT Name, round(Population/1000) FROM City WHERE Country = "IND" AND Population > 2500000
UNION
 SELECT Name, round(Population/1000) FROM City WHERE Country = "IND" AND Population < 100000);

Materialization for non-correlated IN-subqueries

Materialization basics

The basic idea of subquery materialization is to execute the subquery and store its result in an internal temporary table indexed on all its columns. Naturally, this is possible only when the subquery is non-correlated. The IN predicate tests whether its left operand is present in the subquery result. Therefore it is not necessary to store duplicate subquery result rows in the temporary table. Storing only unique subquery rows provides two benefits - the size of the temporary table is smaller, and the index on all its columns can be unique.

If the size of the temporary table is less than the tmp_table_size system variable, the table is a hash-indexed in-memory HEAP table. In the rare cases when the subquery result exceeds this limit, the temporary table is stored on disk in an ARIA or MyISAM B-tree indexed table (ARIA is the default).

Subquery materialization happens on demand during the first execution of the IN predicate. Once the subquery is materialized, the IN predicate is evaluated very efficiently by index lookups of the outer expression into the unique index of the materialized temporary table. If there is a match, IN is TRUE, otherwise IN is FALSE.

NULL-aware efficient execution

An IN predicate may produce a NULL result if there is a NULL value in either of its arguments. Depending on its location in a query, a NULL predicate value is equivalent to FALSE. These are the cases when substituting NULL with FALSE would reject exactly the same result rows. A NULL result of IN is indistinguishable from a FALSE if the IN predicate is:

not negated,
not a function argument,
inside a WHERE or ON clause.

In all these cases the evaluation of IN is performed as described in the previous paragraph via index lookups into the materialized subquery. In all remaining cases when NULL cannot be substituted with FALSE, it is not possible to use index lookups. This is not a limitation in the server, but a consequence of the NULL semantics in the ANSI SQL standard.

Suppose an IN predicate is evaluated as

NULL IN (SELECT
not_null_col FROM t1)

, that is, the left operand of IN is a NULL value, and there are no NULLs in the subquery. In this case the value of IN is neither FALSE, nor TRUE. Instead it is NULL. If we were to perform an index lookup with the NULL as a key, such a value would not be found in not_null_col, and the IN predicate would incorrectly produce a FALSE.

In general, an NULL value on either side of an IN acts as a "wildcard" that matches any value, and if a match exists, the result of IN is NULL. Consider the following example:

If the left argument of IN is the row: (7, NULL, 9), and the result of the right subquery operand of IN is the table:

(7, 8, 10)
(6, NULL, NULL)
(7, 11, 9)

The the IN predicate matches the row (7, 11, 9), and the result of IN is NULL. Matches where the differing values on either side of the IN arguments are matched by a NULL in the other IN argument, are called partial matches.

In order to efficiently compute the result of an IN predicate in the presence of NULLs, MariaDB implements two special algorithms forpartial matching, described here in detail.

Rowid-merge partial matching This technique is used when the number of rows in the subquery result is above a certain limit. The technique creates special indexes on some of the columns of the temporary table, and merges them by alternative scanning of each index thus performing an operation similar to set-intersection.
Table scan partial matching This algorithm is used for very small tables when the overhead of the rowid-merge algorithm is not justifiable. Then the server simply scans the materialized subquery, and checks for partial matches. Since this strategy doesn't need any in-memory buffers, it is also used when there is not enough memory to hold the indexes of the rowid-merge strategy.

Limitations

In principle the subquery materialization strategy is universal, however, due to some technical limitations in the MariaDB server, there are few cases when the server cannot apply this optimization.

BLOB fields Either the left operand of an IN predicate refers to a BLOB field, or the subquery selects one or more BLOBs.
Incomparable fields TODO

In the above cases, the server reverts to theIN-TO-EXISTS transformation.

The IN-TO-EXISTS transformation

This optimization is the only subquery execution strategy that existed in older versions of MariaDB and MySQL prior to MariaDB 5.3. We have made various changes and fixed a number of bugs in this code as well, but in essence it remains the same.

Performance discussion

Example speedup over MySQL 5.x and MariaDB 5.1/5.2

Depending on the query and data, either of the two strategies described here may result in orders of magnitude better/worse plan than the other strategy. Older versions of MariaDB and any current MySQL version (including MySQL 5.5, and MySQL 5.6 DMR as of July 2011) implement only the IN-TO-EXISTS transformation. As illustrated below, this strategy is inferior in many common cases to subquery materialization.

Consider the following query over the data of the DBT3 benchmark scale 10. Find customers with top balance in their nations:

SELECT * FROM part
WHERE p_partkey IN
      (SELECT l_partkey FROM lineitem
       WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-02-01')
ORDER BY p_retailprice DESC LIMIT 10;

The times to run this query is as follows:

Execution time in MariaDB 5.2/MySQL 5.x (any MySQL): > 1 h The query takes more than one hour (we didn't wait longer), which makes it impractical to use subqueries in such cases. The EXPLAIN below shows that the subquery was transformed into a correlated one, which indicates an IN-TO-EXISTS transformation.

+--+------------------+--------+--------------+-------------------+----+------+---------------------------+
|id|select_type       |table   |type          |key                |ref |rows  |Extra                      |
+--+------------------+--------+--------------+-------------------+----+------+---------------------------+
| 1|PRIMARY           |part    |ALL           |NULL               |NULL|199755|Using where; Using filesort|
| 2|DEPENDENT SUBQUERY|lineitem|index_subquery|i_l_suppkey_partkey|func|    14|Using where                |
+--+------------------+--------+--------------+-------------------+----+------+---------------------------+

Execution time in MariaDB 5.3: 43 sec In MariaDB 5.3 it takes less than a minute to run the same query. The EXPLAIN shows that the subquery remains uncorrelated, which is an indication that it is being executed via subquery materialization.

+--+------------+-----------+------+------------------+----+------+-------------------------------+
|id|select_type |table      |type  |key               |ref |rows  |Extra                          |
+--+------------+-----------+------+------------------+----+------+-------------------------------+
| 1|PRIMARY     |part       |ALL   |NULL              |NULL|199755|Using temporary; Using filesort|
| 1|PRIMARY     |<subquery2>|eq_ref|distinct_key      |func|     1|                               |
| 2|MATERIALIZED|lineitem   |range |l_shipdate_partkey|NULL|160060|Using where; Using index       |
+--+------------+-----------+------+------------------+----+------+-------------------------------+

The speedup here is practically infinite, because both MySQL and older MariaDB versions cannot complete the query in any reasonable time.

In order to show the benefits of partial matching we extended the customer table from the DBT3 benchmark with two extra columns:

c_pref_nationkey - preferred nation to buy from,
c_pref_brand - preferred brand.

Both columns are prefixed with the percent NULL values in the column, that is, c_pref_nationkey_05 contains 5% NULL values.

Consider the query "Find all customers that didn't buy from a preferred country, and from a preferred brand withing some date ranges":

SELECT count(*)
FROM customer
WHERE (c_custkey, c_pref_nationkey_05, c_pref_brand_05) NOT IN
  (SELECT o_custkey, s_nationkey, p_brand
   FROM orders, supplier, part, lineitem
   WHERE l_orderkey = o_orderkey AND
         l_suppkey = s_suppkey AND
         l_partkey = p_partkey AND
         p_retailprice < 1200 AND
         l_shipdate >= '1996-04-01' AND l_shipdate < '1996-04-05' AND
         o_orderdate >= '1996-04-01' AND o_orderdate < '1996-04-05');

Execution time in MariaDB 5.2/MySQL 5.x (any MySQL): 40 sec
Execution time in MariaDB 5.3: 2 sec

The speedup for this query is 20 times.

Performance guidelines

TODO

Optimizer control

In certain cases it may be necessary to override the choice of the optimizer. Typically this is needed for benchmarking or testing purposes, or to mimic the behavior of an older version of the server, or if the optimizer made a poor choice.

All the above strategies can be controlled via the following switches inoptimizer_switch system variable.

materialization=on/off In some very special cases, even if materialization was forced, the optimizer may still revert to the IN-TO-EXISTS strategy if materialization is not applicable. In the cases when materialization requres partial matching (because of the presense of NULL values), there are two subordinate switches that control the two partial matching strategies:
- partial_match_rowid_merge=on/off This switch controls the Rowid-merge strategy. In addition to this switch, the system variable rowid_merge_buff_size controls the maximum memory available to the Rowid-merge strategy.
- partial_match_table_scan=on/off Controls the alternative partial match strategy that performs matching via a table scan.
in_to_exists=on/off This switch controls the IN-TO-EXISTS transformation.
tmp_table_size and max_heap_table_size system variables The tmp_table_size system variable sets the upper limit for internal MEMORY temporary tables. If an internal temporary table exceeds this size, it is converted automatically into a Aria or MyISAM table on disk with a B-tree index. Notice however, that a MEMORY table cannot be larger than max_heap_table_size.

The two main optimizer switches - materialization and in_to_exists cannot be simultaneously off. If both are set to off, the server will issue an error.

_{This page is licensed: CC BY-SA / Gnu FDL}

Optimizing GROUP BY and DISTINCT Clauses in Subqueries

A DISTINCT clause and a GROUP BY without a corresponding HAVING clause have no meaning in IN/ALL/ANY/SOME/EXISTS subqueries. The reason is that IN/ALL/ANY/SOME/EXISTS only check if an outer row satisfies some condition with respect to all or any row in the subquery result. Therefore is doesn't matter if the subquery has duplicate result rows or not - if some condition is true for some row of the subquery, this condition will be true for all duplicates of this row. Notice that GROUP BY without a corresponding HAVING clause is equivalent to a DISTINCT.

MariaDB 5.3 and later versions automatically remove DISTINCT and GROUP BY without HAVING if these clauses appear in an IN/ALL/ANY/SOME/EXISTS subquery. For instance:

SELECT * FROM t1
WHERE t1.a > ALL(SELECT DISTINCT b FROM t2 WHERE t2.c > 100)

is transformed to:

SELECT * FROM t1
WHERE t1.a > ALL(SELECT b FROM t2 WHERE t2.c > 100)

Removing these unnecessary clauses allows the optimizer to find more efficient query plans because it doesn't need to take care of post-processing the subquery result to satisfy DISTINCT / GROUP BY.

_{This page is licensed: CC BY-SA / Gnu FDL}

Semi-join Subquery Optimizations

MariaDB has a set of optimizations specifically targeted at semi-join subqueries.

What is a Semi-Join Subquery

A semi-join subquery has a form of

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...

that is, the subquery is an IN-subquery and it is located in the WHERE clause. The most important part here is

with semi-join subquery, we're only interested in records of outer_tables that have matches in the subquery

Let's see why this is important. Consider a semi-join subquery:

SELECT * FROM Country 
WHERE 
  Continent='Europe' AND 
  Country.Code IN (SELECT City.country 
                   FROM City 
                   WHERE City.Population>1*1000*1000);

One can execute it "naturally", by starting from countries in Europe and checking if they have populous Cities:

The semi-join property also allows "backwards" execution: we can start from big cities, and check which countries they are in:

To contrast, let's change the subquery to be non-semi-join:

SELECT * FROM Country 
WHERE 
   Country.Continent='Europe' AND 
   (Country.Code IN (SELECT City.country 
                   FROM City WHERE City.Population>1*1000*1000) 
    OR Country.SurfaceArea > 100*1000  -- Added this part
   );

It is still possible to start from countries, and then check

if a country has any big cities
if it has a large surface area:

The opposite, city-to-country way is not possible. This is not a semi-join.

Difference from Inner Joins

Semi-join operations are similar to regular relational joins. There is a difference though: with semi-joins, you don't care how many matches an inner table has for an outer row. In the above countries-with-big-cities example, Germany will be returned once, even if it has three cities with populations of more than one million each.

Semi-Join Optimizations in MariaDB

MariaDB uses semi-join optimizations to run IN subqueries.The optimizations are enabled by default. You can disable them by turning off their optimizer_switch like so:

SET optimizer_switch='semijoin=off'

MariaDB has five different semi-join execution strategies:

Table pullout optimization
FirstMatch execution strategy
Semi-join Materialization execution strategy
LooseScan execution strategy
DuplicateWeedout execution strategy

Subquery Cache

The goal of the subquery cache is to optimize the evaluation of correlated subqueries by storing results together with correlation parameters in a cache and avoiding re-execution of the subquery in cases where the result is already in the cache.

Administration

The cache is on by default. One can switch it off using the optimizer_switch subquery_cache setting, like so:

SET optimizer_switch='subquery_cache=off';

The efficiency of the subquery cache is visible in 2 statistical variables:

Subquery_cache_hit - Global counter for all subquery cache hits.
Subquery_cache_miss - Global counter for all subquery cache misses.

The session variables tmp_table_size and max_heap_table_size influence the size of in-memory temporary tables in the table used for caching. It cannot grow more than the minimum of the above variables values (see the Implementation section for details).

Visibility

Your usage of the cache is visible in EXTENDED EXPLAIN output (warnings) as"<expr_cache><//list of parameters//>(//cached expression//)". For example:

EXPLAIN EXTENDED SELECT * FROM t1 WHERE a IN (SELECT b FROM t2);
+----+--------------------+-------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+-------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where |
+----+--------------------+-------+------+---------------+------+---------+------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

SHOW WARNINGS;
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                    |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | SELECT `test`.`t1`.`a` AS `a` from `test`.`t1` WHERE <expr_cache><`test`.`t1`.`a`>(<in_optimizer>(`test`.`t1`.`a`,<exists>(SELECT 1 FROM `test`.`t2` WHERE (<cache>(`test`.`t1`.`a`) = `test`.`t2`.`b`)))) |
+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

In the example above the presence of"<expr_cache><test.t1.a>(...)" is how you know you are using the subquery cache.

Implementation

Every subquery cache creates a temporary table where the results and all parameters are stored. It has a unique index over all parameters. First the cache is created in a MEMORY table (if doing this is impossible the cache becomes disabled for that expression). When the table grows up to the minimum oftmp_table_size and max_heap_table_size, the hit rate will be checked:

if the hit rate is really small (<0.2) the cache will be disabled.
if the hit rate is moderate (<0.7) the table will be cleaned (all records deleted) to keep the table in memory
if the hit rate is high the table will be converted to a disk table (for 5.3.0 it can only be converted to a disk table).

hit rate = hit / (hit + miss)

Performance Impact

Here are some examples that show the performance impact of the subquery cache (these tests were made on a 2.53 GHz Intel Core 2 Duo MacBook Pro with dbt-3 scale 1 data set).

example

cache on

cache off

gain

hit

miss

hit rate

example

cache on

cache off

gain

hit

miss

hit rate

1.01sec

1 hour 31 min 43.33sec

5445x

149975

99.98%

0.21sec

1.41sec

6.71x

6285

220

96.6%

2.54sec

2.55sec

1.00044x

151

461

24.67%

1.87sec

1.95sec

0.96x

23026

Example 1

Dataset from DBT-3 benchmark, a query to find customers with balance near top in their nation:

SELECT count(*) FROM customer 
WHERE 
   c_acctbal > 0.8 * (SELECT max(c_acctbal) 
                      FROM customer C 
                      WHERE C.c_nationkey=customer.c_nationkey
                      GROUP BY c_nationkey);

Example 2

DBT-3 benchmark, Query #17

SELECT sum(l_extendedprice) / 7.0 AS avg_yearly 
FROM lineitem, part 
WHERE 
  p_partkey = l_partkey AND 
  p_brand = 'Brand#42' AND p_container = 'JUMBO BAG' AND 
  l_quantity < (SELECT 0.2 * avg(l_quantity) FROM lineitem 
                WHERE l_partkey = p_partkey);

Example 3

DBT-3 benchmark, Query #2

SELECT
        s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment
FROM
        part, supplier, partsupp, nation, region
WHERE
        p_partkey = ps_partkey AND s_suppkey = ps_suppkey AND p_size = 33
        AND p_type LIKE '%STEEL' AND s_nationkey = n_nationkey
        AND n_regionkey = r_regionkey AND r_name = 'MIDDLE EAST'
        AND ps_supplycost = (
                SELECT
                        min(ps_supplycost)
                FROM
                        partsupp, supplier, nation, region
                WHERE
                        p_partkey = ps_partkey AND s_suppkey = ps_suppkey
                        AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey
                        AND r_name = 'MIDDLE EAST'
        )
ORDER BY
        s_acctbal DESC, n_name, s_name, p_partkey;

Example 4

DBT-3 benchmark, Query #20

SELECT
        s_name, s_address
FROM
        supplier, nation
WHERE
        s_suppkey IN (
                SELECT
                        DISTINCT (ps_suppkey)
                FROM
                        partsupp, part
                WHERE
                        ps_partkey=p_partkey
                        AND p_name LIKE 'indian%'
                        AND ps_availqty > (
                                SELECT
                                        0.5 * sum(l_quantity)
                                FROM
                                        lineitem
                                WHERE
                                        l_partkey = ps_partkey
                                        AND l_suppkey = ps_suppkey
                                        AND l_shipdate >= '1995-01-01'
                                        AND l_shipdate < date_ADD('1995-01-01',INTERVAL 1 year)
                                )
        )
        AND s_nationkey = n_nationkey AND n_name = 'JAPAN'
ORDER BY
        s_name;

Subquery Optimizations Map

Below is a map showing all types of subqueries allowed in the SQL language, and the optimizer strategies available to handle them.

Uncolored areas represent different kinds of subqueries, for example:
- Subqueries that have form x IN (SELECT ...)
- Subqueries that are in the FROM clause
- .. and so forth
The size of each uncolored area roughly corresponds to how important (i.e. frequently used) that kind of subquery is. For example, x IN (SELECT ...) queries are the most important, and EXISTS (SELECT ...) are relatively unimportant.
Colored areas represent optimizations/execution strategies that are applied to handle various kinds of subqueries.
The color of optimization indicates which version of MySQL/MariaDB it was available in (see legend below)

Some things are not on the map:

MariaDB doesn't evaluate expensive subqueries when doing optimization (this means, EXPLAIN is always fast). MySQL 5.6 has made a progress in this regard but its optimizer will still evaluate certain kinds of subqueries (for example, scalar-context subqueries used in range predicates)

Links to pages about individual optimizations:

IN->EXISTS
Subquery Caching
Semi-join optimizations
Non-semi-join Materialization (including NULL-aware and partial matching)
Derived table optimizations
- Derived table merge
- Derived table with keys

Table Pullout Optimization

Table pullout is an optimization for Semi-join subqueries.

The idea of Table Pullout

Sometimes, a subquery can be re-written as a join. For example:

SELECT *
FROM City 
WHERE City.Country IN (SELECT Country.Code
                       FROM Country 
                       WHERE Country.Population < 100*1000);

If we know that there can be, at most, one country with a given value of Country.Code (we can tell that if we see that table Country has a primary key or unique index over that column), we can re-write this query as:

SELECT City.* 
FROM 
  City, Country 
WHERE
 City.Country=Country.Code AND Country.Population < 100*1000;

Table pullout in action

If one runs EXPLAIN for the above query in MySQL 5.1-5.6 or MariaDB 5.1-5.2, they'll get this plan:

MySQL [world]> EXPLAIN SELECT * FROM City WHERE City.Country IN (SELECT Country.Code FROM Country WHERE Country.Population < 100*1000);
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
| id | select_type        | table   | type            | possible_keys      | key     | key_len | ref  | rows | Extra       |
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | City    | ALL             | NULL               | NULL    | NULL    | NULL | 4079 | Using where |
|  2 | DEPENDENT SUBQUERY | Country | unique_subquery | PRIMARY,Population | PRIMARY | 3       | func |    1 | Using where |
+----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+
2 rows in set (0.00 sec)

It shows that the optimizer is going to do a full scan on table City, and for each city it will do a lookup in table Country.

If one runs the same query in MariaDB 5.3, they will get this plan:

MariaDB [world]> EXPLAIN SELECT * FROM City WHERE City.Country IN (SELECT Country.Code FROM Country WHERE Country.Population < 100*1000);
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type | table   | type  | possible_keys      | key        | key_len | ref                | rows | Extra                 |
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
|  1 | PRIMARY     | Country | range | PRIMARY,Population | Population | 4       | NULL               |   37 | Using index condition |
|  1 | PRIMARY     | City    | ref   | Country            | Country    | 3       | world.Country.Code |   18 |                       |
+----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+
2 rows in set (0.00 sec)

The interesting parts are:

Both tables have select_type=PRIMARY, and id=1 as if they were in one join.
The Country table is first, followed by the City table.

Indeed, if one runs EXPLAIN EXTENDED; SHOW WARNINGS, they will see that the subquery is gone and it was replaced with a join:

MariaDB [world]> SHOW warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: SELECT `world`.`City`.`ID` AS `ID`,`world`.`City`.`Name` AS 
`Name`,`world`.`City`.`Country` AS `Country`,`world`.`City`.`Population` AS 
`Population` 

  
   FROM `world`.`City` JOIN `world`.`Country` WHERE 


((`world`.`City`.`Country` = `world`.`Country`.`Code`) and (`world`.`Country`.
`Population` < (100 * 1000)))
1 row in set (0.00 sec)

Changing the subquery into a join allows feeding the join to the join optimizer, which can make a choice between two possible join orders:

City -> Country
Country -> City

as opposed to the single choice of

City->Country

which we had before the optimization.

In the above example, the choice produces a better query plan. Without pullout, the query plan with a subquery would read (4079 + 1*4079)=8158 table records. With table pullout, the join plan would read (37 + 37 * 18) = 703 rows. Not all row reads are equal, but generally, reading 10 times fewer table records is faster.

Table pullout fact sheet

Table pullout is possible only in semi-join subqueries.
Table pullout is based on UNIQUE/PRIMARY key definitions.
Doing table pullout does not cut off any possible query plans, so MariaDB will always try to pull out as much as possible.
Table pullout is able to pull individual tables out of subqueries to their parent selects. If all tables in a subquery have been pulled out, the subquery (i.e. its semi-join) is removed completely.
One common bit of advice for optimizing MySQL has been "If possible, rewrite your subqueries as joins". Table pullout does exactly that, so manual rewrites are no longer necessary.

Controlling table pullout

There is no separate @@optimizer_switch flag for table pullout. Table pullout can be disabled by switching off all semi-join optimizations withSET @@optimizer_switch='semijoin=off' command.

_{This page is licensed: CC BY-SA / Gnu FDL}

Table Elimination

Learn about table elimination for query optimization in MariaDB Server. This section explains how the optimizer removes unnecessary tables from query plans, improving performance.

Table Elimination External Resources

an example of how to do this in MariaDB

_{This page is licensed: CC BY-SA / Gnu FDL}

Table Elimination in MariaDB

The first thing the MariaDB optimizer does is to merge the VIEW definition into the query to obtain:

SELECT ACRAT_rating
FROM
  ac_anchor
  LEFT JOIN ac_name ON ac_anchor.AC_ID=ac_name.AC_ID
  LEFT JOIN ac_dob ON ac_anchor.AC_ID=ac_dob.AC_ID
  LEFT JOIN ac_rating ON (ac_anchor.AC_ID=ac_rating.AC_ID AND
                          ac_rating.ACRAT_fromdate = 
                            (select max(sub.ACRAT_fromdate)
                             FROM ac_rating sub WHERE sub.AC_ID = ac_rating.AC_ID))
WHERE
 ACNAM_name='Gary Oldman'

It's important to realize that the obtained query has a useless part:

left join ac_dob on ac_dob.AC_ID=... will produce exactly one matching record:
- primary key(ac_dob.AC_ID) guarantees that there will be at most one match for any value of ac_anchor.AC_ID,
- and if there won't be a match, LEFT JOIN will generate a NULL-complemented “row”
and we don't care what the matching record is, as tableac_dob is not used anywhere else in the query.

This means that the left join ac_dob on ... part can be removed from the query and this is what Table Elimination module does. The detection logic is rather smart, for example it would be able to remove theleft join ac_rating on ... part as well, together with the subquery (in the above example it won't be removed because ac_rating used in the selection list of the query). The Table Elimination module can also handle nested outer joins and multi-table outer joins.

Table Elimination in Other Databases

In addition to MariaDB, Table Elimination is found in both Microsoft SQL Server 2005/2008 and Oracle 11g. Of the two, Microsoft SQL Server 2005/2008 seems to have the most advanced implementation. Oracle 11g has been confirmed to use table elimination but not to the same extent.

To compare the two, we will look at the following query:

SELECT
 A.colA
FROM
 tableA A
LEFT OUTER JOIN
 tableB B
ON
 B.id = A.id;

When using A as the left table we ensure that the query will return at least as many rows as there are in that table. For rows where the join condition (B.id = A.id) is not met the selected column (A.colA) will still contain its original value. The not seen B.* row would contain all NULL:s.

However, the result set could actually contain more rows than what is found in tableA if there are duplicates of the column B.id in tableB. If A contains a row [1, "val1"] and B the rows [1, "other1a"],[1, "other1b"] then two rows will match in the join condition. The only way to know what the result will look like is to actually touch both tables during execution.

Instead, let's say tableB contains rows that make it possible to place a unique constraint on the column B.id, for example, which is often the case with a primary key. In this situation we know that we will get exactly as many rows as there are in tableA, since joining with tableB cannot introduce any duplicates. Furthermore, as in the example query, if we do not select any columns from tableB, touching that table during execution is unnecessary. We can remove the whole join operation from the execution plan.

Both SQL Server 2005/2008 and Oracle 11g deploy table elimination in the case described above. Let us look at a more advanced query, where Oracle fails.

SELECT
 A.colA
FROM
 tableA A
LEFT OUTER JOIN
 tableB B
ON
 B.id = A.id
AND
 B.fromDate = (
   SELECT
     max(sub.fromDate)
   FROM
     tableB sub
   WHERE
     sub.id = A.id
 );

In this example we have added another join condition, which ensures that we only pick the matching row from tableB having the latest fromDate. In this case tableB will contain duplicates of the column B.id, so in order to ensure uniqueness the primary key has to contain the fromDate column as well. In other words the primary key of tableB is (B.id, B.fromDate).

Furthermore, since the subselect ensures that we only pick the latest B.fromDate for a given B.id we know that at most one row will match the join condition. We will again have the situation where joining with tableB cannot affect the number of rows in the result set. Since we do not select any columns from tableB, the whole join operation can be eliminated from the execution plan.

SQL Server 2005/2008 will deploy table elimination in this situation as well. We have not found a way to make Oracle 11g use it for this type of query. Queries like these arise in two situations. Either when you have a denormalized model consisting of a fact table with several related dimension tables, or when you have a highly normalized model where each attribute is stored in its own table. The example with the subselect is common whenever you store historized/versioned data.

Table Elimination User Interface

One can check that table elimination is working by looking at the output of EXPLAIN [EXTENDED] and not finding there the tables that were eliminated:

EXPLAIN SELECT ACRAT_rating FROM actors WHERE ACNAM_name=’Gary Oldman’;
+----+--------------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+
| id | select_type        | table     | type   | possible_keys | key     | key_len | ref                  | rows | Extra       |
+----+--------------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+
|  1 | PRIMARY            | ac_anchor | index  | PRIMARY       | PRIMARY | 4       | NULL                 |    2 | Using index |
|  1 | PRIMARY            | ac_name   | eq_ref | PRIMARY       | PRIMARY | 4       | test.ac_anchor.AC_ID |    1 | Using where |
|  1 | PRIMARY            | ac_rating | ref    | PRIMARY       | PRIMARY | 4       | test.ac_anchor.AC_ID |    1 |             |
|  3 | DEPENDENT SUBQUERY | sub       | ref    | PRIMARY       | PRIMARY | 4       | test.ac_rating.AC_ID |    1 | Using index |
+----+--------------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+

Note that ac_dob table is not in the output. Now let's try getting birthdate instead:

EXPLAIN SELECT ACDOB_birthdate FROM actors WHERE ACNAM_name=’Gary Oldman’;
+----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+
| id | select_type | table     | type   | possible_keys | key     | key_len | ref                  | rows | Extra       |
+----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+
|  1 | PRIMARY     | ac_anchor | index  | PRIMARY       | PRIMARY | 4       | NULL                 |    2 | Using index |
|  1 | PRIMARY     | ac_name   | eq_ref | PRIMARY       | PRIMARY | 4       | test.ac_anchor.AC_ID |    1 | Using where |
|  1 | PRIMARY     | ac_dob    | eq_ref | PRIMARY       | PRIMARY | 4       | test.ac_anchor.AC_ID |    1 |             |
+----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+
3 rows in set (0.01 sec)

The ac_dob table is there while ac_rating and the subquery are gone. Now, if we just want to check the name of the actor:

EXPLAIN SELECT count(*) FROM actors WHERE ACNAM_name=’Gary Oldman’;
+----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+
| id | select_type | table     | type   | possible_keys | key     | key_len | ref                  | rows | Extra       |
+----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+
|  1 | PRIMARY     | ac_anchor | index  | PRIMARY       | PRIMARY | 4       | NULL                 |    2 | Using index |
|  1 | PRIMARY     | ac_name   | eq_ref | PRIMARY       | PRIMARY | 4       | test.ac_anchor.AC_ID |    1 | Using where |
+----+-------------+-----------+--------+---------------+---------+---------+----------------------+------+-------------+
2 rows in set (0.01 sec)

In this case it will eliminate both the ac_dob andac_rating tables.

Removing tables from a query does not make the query slower, and it does not cut off any optimization opportunities, so table elimination is unconditional and there are no plans on having any kind of query hints for it.

For debugging purposes there is a table_elimination=on|off switch in debug builds of the server.

What is Table Elimination?

The basic idea behind table elimination is that sometimes it is possible to resolve a query without even accessing some of the tables that the query refers to. One can invent many kinds of such cases, but in Table Elimination we targeted only a certain class of SQL constructs that one ends up writing when they are querying highly-normalized data.

The sample queries were drawn from “Anchor Modeling”, a database modeling technique which takes normalization to the extreme. The slides at the anchor modeling website have an in-depth explanation of Anchor modeling and its merits, but the part that's important for table elimination can be shown with an example.

Suppose the database stores information about actors, together with their names, birthdays, and ratings, where ratings can change over time:

According to anchor modeling, each attribute should go into its own table:

the 'anchor' table which only has a synthetic primary key:

CREATE TABLE  ac_anchor(AC_ID INT PRIMARY KEY);

a table for the 'name' attribute:

CREATE TABLE ac_name(AC_ID INT, ACNAM_name CHAR(N),
                     PRIMARY KEY(AC_ID));

a table for the 'birthdate' attribute:

CREATE TABLE ac_dob(AC_ID INT,
                    ACDOB_birthdate DATE,
                    PRIMARY KEY(AC_ID));

a table for the ‘rating’ attribute, which is historized:

CREATE TABLE ac_rating(AC_ID INT,
                       ACRAT_rating INT,
                       ACRAT_fromdate DATE,
                       PRIMARY KEY(AC_ID, ACRAT_fromdate));

With this approach it becomes easy to add/change/remove attributes, but this comes at a cost of added complexity in querying the data: in order to answer the simplest, select-star question of displaying actors and their current ratings one has to write outer joins:

Display actors, with their names and current ratings:

SELECT
  ac_anchor.AC_ID, ACNAM_Name, ACDOB_birthdate, ACRAT_rating
FROM
  ac_anchor
  LEFT JOIN ac_name ON ac_anchor.AC_ID=ac_name.AC_ID
  LEFT JOIN ac_dob ON ac_anchor.AC_ID=ac_dob.AC_ID
  LEFT JOIN ac_rating ON (ac_anchor.AC_ID=ac_rating.AC_ID AND
                          ac_rating.ACRAT_fromdate = 
                            (select max(sub.ACRAT_fromdate)
                             FROM ac_rating sub WHERE sub.AC_ID = ac_rating.AC_ID))

We don't want to write the joins every time we need to access an actor's properties, so we’ll create a view:

CREATE view actors AS
  SELECT  ac_anchor.AC_ID, ACNAM_Name,  ACDOB_birthdate, ACRAT_rating
  FROM <see the SELECT above>

This will allow us to access the data as if it was stored in a regular way:

SELECT ACRAT_rating FROM actors WHERE ACNAM_name='Gary Oldman'

And this is where table elimination will be needed.

Optimizing Queries

Aborting Statements that Exceed a Certain Time to Execute

Overview

User max_statement_time

Per-query max_statement_time

Limitations

Differences Between the MariaDB and MySQL Implementations

See Also

Big DELETEs

The problem

Why it is a problem

InnoDB and undo

PARTITION

Deleting in chunks

InnoDB chunking recommendation

Iterating through a compound key

Reclaiming the disk space

Deleting more than half a table

Non-deterministic replication

Replication and KILL

SBR vs RBR; Galera

Postlog

See also

Charset Narrowing Optimization

Description

Controlling the Optimization

References

Data Sampling: Techniques for Efficiently Finding a Random Row

Fetching random rows from a table (beyond ORDER BY RAND())

The problem

Metrics

Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned

Case: Consecutive AUTO_INCREMENT without gaps, 10 rows

Case: AUTO_INCREMENT with gaps, 1 or more rows returned

Case: Extra FLOAT column for randomizing

Case: UUID or MD5 column

See also

Data Warehousing High Speed Ingestion

The problem

Overview of solution

Injection speed

Normalization

Flip-flop staging

Engine choice

Summarization

Replication Issues

Sharding

Push me vs pull me

See also

Data Warehousing Summary Tables

Preface

Summary tables for data warehouse "reports"

General structure of a summary table

Example

When to augment the summary table(s)?

Summarizing while Inserting (one row at a time)

Summarizing periodically vs as-needed

Summarizing while batch inserting

Summarizing when using a staging table

Summary table: PK or not?

Averages, etc.

Staging table

Extreme design

"Left Off"

Flip-flop staging

Multiple summary tables

Games on summary tables

See also

Data Warehousing Techniques

Preface

Terminology

Fact table

Why keep the Fact table?

Batching the load of the Fact table

Batched INSERT Statement

Normalization (Dimension) table

Batched normalization

Too many choices?

Purging old data

Master / slave