Rowid Filtering Optimization

The target use case for rowid filtering is as follows:

a table uses ref access on index IDX1
but it also has a fairly restrictive range predicate on another index IDX2.

In this case, it is advantageous to:

Do an index-only scan on index IDX2 and collect rowids of index records into a data structure that allows filtering (let's call it $FILTER).
When doing ref access on IDX1, check $FILTER before reading the full record.

Example

Consider a query

SELECT ...
FROM orders JOIN lineitem ON o_orderkey=l_orderkey
WHERE
  l_shipdate BETWEEN '1997-01-01' AND '1997-01-31' AND
  o_totalprice between 200000 and 230000;

Suppose the condition on l_shipdate is very restrictive, which means lineitem table should go first in the join order. Then, the optimizer can use o_orderkey=l_orderkey equality to do an index lookup to get the order the line item is from. On the other hand o_totalprice between ... can also be rather selective.

With filtering, the query plan would be:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lineitem
         type: range
possible_keys: PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity
          key: i_l_shipdate
      key_len: 4
          ref: NULL
         rows: 98
        Extra: Using index condition
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: eq_ref|filter
possible_keys: PRIMARY,i_o_totalprice
          key: PRIMARY|i_o_totalprice
      key_len: 4|9
          ref: dbt3_s001.lineitem.l_orderkey
         rows: 1 (5%)
        Extra: Using where; Using rowid filter

Note that table orders has "Using rowid filter". The type column has "|filter", the key column shows the index that is used to construct the filter. rows column shows the expected filter selectivity, it is 5%.

ANALYZE FORMAT=JSON output for table orders will show

"table": {
      "table_name": "orders",
      "access_type": "eq_ref",
      "possible_keys": ["PRIMARY", "i_o_totalprice"],
      "key": "PRIMARY",
      "key_length": "4",
      "used_key_parts": ["o_orderkey"],
      "ref": ["dbt3_s001.lineitem.l_orderkey"],
      "rowid_filter": {
        "range": {
          "key": "i_o_totalprice",
          "used_key_parts": ["o_totalprice"]
        },
        "rows": 69,
        "selectivity_pct": 4.6,
        "r_rows": 71,
        "r_selectivity_pct": 10.417,
        "r_buffer_size": 53,
        "r_filling_time_ms": 0.0716
      }

Note the rowid_filter element. It has a range element inside it. selectivity_pct is the expected selectivity, accompanied by the r_selectivity_pct showing the actual observed selectivity.

Details

The optimizer makes a cost-based decision about when the filter should be used.
The filter data structure is currently an ordered array of rowids. (a Bloom filter would be better here and will probably be introduced in the future versions).
The optimization needs to be supported by the storage engine. At the moment, it is supported by InnoDB and MyISAM. It is not supported in partitioned tables.

Limitations

Rowid Filtering can't be used with a backward-ordered index scan. When the optimizer needs to execute an ORDER BY ... DESC query and decides to handle it with a backward-ordered index scan, it will disable Rowid Filtering.

Control

Rowid filtering can be switched on/off using rowid_filter flag in the optimizer_switch variable. By default, the optimization is enabled.

_{This page is licensed: CC BY-SA / Gnu FDL}

PreviousRollup Unique User Counts NextSargable DATE and YEAR

Last updated 2 months ago

Was this helpful?