Filters are powerful modules that intercept and process database traffic in MaxScale. Use them to log, transform, block, or reroute queries to add control, security, and monitoring.
This filter was introduced in MariaDB MaxScale 2.1.
From MaxScale version 2.2.11 onwards, the cache filter is no longer considered experimental. The following changes to the default behaviour have also been made:
The default value of cached_data
is now thread_specific
(used to beshared
).
The default value of selects
is now assume_cacheable
(used to beverify_cacheable
).
The cache filter is a simple cache that is capable of caching the result of SELECTs, so that subsequent identical SELECTs are served directly by MaxScale, without the queries being routed to any server.
By default the cache will be used and populated in the following circumstances:
There is no explicit transaction active, that is, autocommit is used,
there is an explicitly read-only transaction (that is,START TRANSACTION READ ONLY
) active, or
there is a transaction active and no statement that modifies the database has been performed.
In practice, the last bullet point basically means that if a transaction has
been started with BEGIN
, START TRANSACTION
or START TRANSACTION READ WRITE
, then the cache will be used and populated until the first UPDATE
,INSERT
or DELETE
statement is encountered.
That is, in default mode the cache effectively causes the system to behave
as if the isolation level would be READ COMMITTED
, irrespective of what
the isolation level of the backends actually is.
The default behaviour can be altered using the configuration parametercache_in_transactions.
By default it is assumed that all SELECT
statements are cacheable, which
means that also statements like SELECT LOCALTIME
are cached. Please checkselects for how to change the default behaviour.
All of these limitations may be addressed in forthcoming releases.
Resultsets of prepared statements are not cached.
Multi-statements are always sent to the backend and their result isnot cached.
The cache is not aware of grants.
The implication is that unless the cache has been explicitly configured who the caching should apply to, the presence of the cache may provide a user with access to data he should not have access to.
Please read the section Security for more detailed information.
However, from 2.5 onwards it is possible to configure the cache to cache the data of each user separately, which effectively means that there can be no unintended sharing. Please see users for how to change the default behaviour.
information_schema
When invalidation is enabled, SELECTs targeting tables
in information_schema
are not cached. The reason is that as the content
of the tables changes as the side-effect of something else, the cache would
not know when to invalidate the cache-entries.
Since MaxScale 2.5, the cache is capable of invalidating entries in the cache when a modification (UPDATE, INSERT or DELETE) that may affect those entries is made.
The cache invalidation works on the table-level, that is, a modification made to a particular table will cause all cache entries that refer to that table to be invalidated, irrespective of whether the modification actually has an impact on the cache entries or not. For instance, suppose the result of the following SELECT has been cached
SELECT * FROM t WHERE a=1;
An insert like
INSERT INTO t SET a=42;
will cause the cache entry containing the result of that SELECT to be invalidated even if the INSERT actually does not affect it. Please seeinvalidate for how to enable the invalidation.
When invalidation has been enabled MaxScale must be able to completely parse a SELECT statement for its results to be stored in the cache. The reason is that in order to be able to invalidate cache entries, MaxScale must know what tables a SELECT statement depends upon. Consequently, if (and only if) invalidation has been enabled and MaxScale fails to parse a statement, the result of that particular statement will not be cached.
When invalidation has been enabled, MaxScale will also parse all UPDATE, INSERT and DELETE statements, in order to find out what tables are modified. If that parsing fails, MaxScale will by default clear the entire cache. The reason is that unless MaxScale can completely parse the statement it cannot know what tables are modified and hence not what cache entries should be invalidated. Consequently, to prevent stale data from being returned, the entire cache is cleared. The default behaviour can be changed using the configuration parameterclear_cache_on_parse_errors.
Note that what threading approach is used has a big impact on the invalidation. Please seeThreads, Users and Invalidation for how the threading approach affects the invalidation.
Note also that since the invalidation may not, depending on how the cache has been configured, be visible to all sessions of all users, it is still important to configure a reasonable soft andhard TTL.
The invalidation offered by the MaxScale cache can be said to be ofbest efforts quality. The reason is that in order to ensure that the cache in all circumstances reflects the state in the actual database, would require that the operations involving the cache and the MariaDB server are synchronized, which would cause an unacceptable overhead.
What best efforts means in this context is best illustrated using an example.
Suppose a client executes the statement SELECT * FROM tbl
and that the result
is cached. Next time that or any other client executes the same statement, the
result is returned from the cache and the MariaDB server will not be accessed
at all.
If a client now executes the statement INSERT INTO tbl VALUES (...)
, the
cached value for the SELECT
statement above and all other statements that are
dependent upon tbl
will be invalidated. That is, the next time someone executes
the statement SELECT * FROM tbl
the result will again be fetched from the
MariaDB server and stored to the cache.
However, suppose some client executes the statement SELECT COUNT(*) FROM tbl
at the same time someone else executes the INSERT ...
statement. A possible
chain of events is as follows:
Timeline 1 Timeline 2
Clients execute INSERT ... SELECT COUNT(*) FROM tbl
MaxScale -> DB SELECT COUNT(*) FROM tbl
MaxScale -> DB INSERT ...
That is, the SELECT
is performed in the database server before theINSERT
. However, since the timelines are proceeding independently of
each other, the events may be re-ordered as far as the cache is concerned.
MaxScale -> Cache Delete invalidated values
MaxScale -> Cache Store result and invalidation key
That is, the cached value for SELECT COUNT(*) FROM tbl
will reflect the
situation before the insert and will thus not be correct.
The stale result will be returned until the value has reached its time-to-live or its invalidation is caused by some update operation.
The cache is simple to add to any existing service. However, some experimentation may be required in order to find the configuration settings that provide the maximum benefit.
[Cache]
type=filter
module=cache
hard_ttl=30
soft_ttl=20
rules=...
...
[Cached-Routing-Service]
type=service
...
filters=Cache
Each configured cache filter uses a storage of its own. That is, if there are two services, each configured with a specific cache filter, then, even if queries target the very same servers the cached data will not be shared.
Two services can use the same cache filter, but then either the services should use the very same servers or a completely different set of servers, where the used table names are different. Otherwise there can be unintended sharing.
The cache filter has no mandatory parameters but a range of optional ones.
Note that it is advisable to specify max_size
to prevent the cache from
using up all memory there is, in case there is very little overlap among the
queries.
storage
Type: string
Mandatory: No
Dynamic: No
Default: storage_inmemory
The name of the module that provides the storage for the cache. That
module will be loaded and provided with the value of storage_options
as
argument. For instance:
storage=storage_redis
See Storage for what storage modules are available.
storage_options
Type: string
Mandatory: No
Dynamic: No
Default:
A string that is provided verbatim to the storage module specified in storage
,
when the module is loaded. Note that the needed arguments and their format depend
upon the specific module.
hard_ttl
Type: duration
Mandatory: No
Dynamic: No
Default: 0s
(no limit)
Hard time to live; the maximum amount of time the cached
result is used before it is discarded and the result is fetched from the
backend (and cached). See also soft_ttl
below.
hard_ttl=60s
soft_ttl
Type: duration
Mandatory: No
Dynamic: No
Default: 0s
(no limit)
Soft time to live; the amount of time - in seconds - the cached result is
used before it is refreshed from the server. When soft_ttl
has passed, the
result will be refreshed when the first client requests the value.
However, as long as hard_ttl
has not passed, all other clients requesting
the same value will use the result from the cache while it is being fetched
from the backend. That is, as long as soft_ttl
but not hard_ttl
has passed,
even if several clients request the same value at the same time, there will be
just one request to the backend.
soft_ttl=60s
The default value is 0
, which means no limit. If the value of soft_ttl
is
larger than hard_ttl
it will be adjusted down to the same value.
max_resultset_rows
Type: count
Mandatory: No
Dynamic: No
Default: 0
(no limit)
Specifies the maximum number of rows a resultset can have in order to be stored in the cache. A resultset larger than this, will not be stored.
max_resultset_rows=1000
max_resultset_size
Type: size
Mandatory: No
Dynamic: No
Default: 0
(no limit)
Specifies the maximum size of a resultset, for it to be stored in the cache. A resultset larger than this, will not be stored. The size can be specified as described here.
max_resultset_size=128Ki
Note that the value of max_resultset_size
should not be larger than the
value of max_size
.
max_count
Type: count
Mandatory: No
Dynamic: No
Default: 0
(no limit)
The maximum number of items the cache may contain. If the limit has been reached and a new item should be stored, then an older item will be evicted.
Note that if cached_data
is thread_specific
then this limit will be
applied to each cache separately. That is, if a thread specific cache
is used, then the total number of cached items is #threads * the value
of max_count
.
max_count=1000
max_size
Type: size
Mandatory: No
Dynamic: No
Default: 0
(no limit)
The maximum size the cache may occupy. If the limit has been reached and a new item should be stored, then some older item(s) will be evicted to make space. The size can be specified as describedhere.
Note that if cached_data
is thread_specific
then this limit will be
applied to each cache separately. That is, if a thread specific cache
is used, then the total size is #threads * the value of max_size
.
max_size=100Mi
rules
Type: path
Mandatory: No
Dynamic: No
Default: ""
(no rules)
Specifies the path of the file where the caching rules are stored. A relative path is interpreted relative to the data directory of MariaDB MaxScale.
rules=/path/to/rules-file
cached_data
Type: enum
Mandatory: No
Dynamic: No
Values: shared
, thread_specific
Default: thread_specific
An enumeration option specifying how data is shared between threads. The allowed values are:
shared
: The cached data is shared between threads. On the one hand
it implies that there will be synchronization between threads, on
the other hand that all threads will use data fetched by any thread.
thread_specific
: The cached data is specific to a thread. On the
one hand it implies that no synchronization is needed between threads,
on the other hand that the very same data may be fetched and stored
multiple times.
cached_data=shared
Default is thread_specific
. See max_count
and max_size
what implication
changing this setting to shared
has.
selects
Type: enum
Mandatory: No
Dynamic: No
Values: assume_cacheable
, verify_cacheable
Default: assume_cacheable
An enumeration option specifying what approach the cache should take with
respect to SELECT
statements. The allowed values are:
assume_cacheable
: The cache can assume that all SELECT
statements,
without exceptions, are cacheable.
verify_cacheable
: The cache can not assume that all SELECT
statements are cacheable, but must verify that.
selects=verify_cacheable
Default is assume_cacheable
. In this case, all SELECT
statements are
assumed to be cacheable and will be parsed only if some specific rule
requires that.
If verify_cacheable
is specified, then all SELECT
statements will be
parsed and only those that are safe for caching - e.g. do not call any
non-cacheable functions or access any non-cacheable variables - will be
subject to caching.
If verify_cacheable
has been specified, the cache will not be used in
the following circumstances:
The SELECT
uses any of the following functions: BENCHMARK
,CONNECTION_ID
, CONVERT_TZ
, CURDATE
, CURRENT_DATE
, CURRENT_TIMESTAMP
,CURTIME
, DATABASE
, ENCRYPT
, FOUND_ROWS
, GET_LOCK
, IS_FREE_LOCK
,IS_USED_LOCK
, LAST_INSERT_ID
, LOAD_FILE
, LOCALTIME
, LOCALTIMESTAMP
,MASTER_POS_WAIT
, NOW
, RAND
, RELEASE_LOCK
, SESSION_USER
, SLEEP
,SYSDATE
, SYSTEM_USER
, UNIX_TIMESTAMP
, USER
, UUID
, UUID_SHORT
.
The SELECT
accesses any of the following fields: CURRENT_DATE
,CURRENT_TIMESTAMP
, LOCALTIME
, LOCALTIMESTAMP
The SELECT
uses system or user variables.
Note that parsing all SELECT
statements carries a performance
cost. Please read performance for more details.
cache_in_transactions
Type: enum
Mandatory: No
Dynamic: No
Values: never
, read_only_transactions
, all_transactions
Default: all_transactions
An enumeration option specifying how the cache should behave when there are active transactions:
never
: When there is an active transaction, no data will be returned
from the cache, but all requests will always be sent to the backend.
The cache will be populated inside explicitly read-only transactions.
Inside transactions that are not explicitly read-only, the cache will
be populated until the first non-SELECT statement.
read_only_transactions
: The cache will be used and populated inside
explicitly read-only transactions. Inside transactions that are not
explicitly read-only, the cache will be populated, but not used
until the first non-SELECT statement.
all_transactions
: The cache will be used and populated inside
explicitly read-only transactions. Inside transactions that are not
explicitly read-only, the cache will be used and populated until the
first non-SELECT statement.
cache_in_transactions=never
Default is all_transactions
.
The values read_only_transactions
and all_transactions
have roughly the
same effect as changing the isolation level of the backend to read_committed
.
debug
Type: number
Mandatory: No
Dynamic: No
Default: 0
An integer value, using which the level of debug logging made by the cache can be controlled. The value is actually a bitfield with different bits denoting different logging.
0
(0b00000
) No logging is made.
1
(0b00001
) A matching rule is logged.
2
(0b00010
) A non-matching rule is logged.
4
(0b00100
) A decision to use data from the cache is logged.
8
(0b01000
) A decision not to use data from the cache is logged.
16
(0b10000
) Higher level decisions are logged.
Default is 0
. To log everything, give debug
a value of 31
.
debug=31
enabled
Type: boolean
Mandatory: No
Dynamic: No
Default: true
Specifies whether the cache is initially enabled or disabled.
enabled=false
Default is true
.
The value affects the initial state of the MaxScale user variables using which the behaviour of the cache can be modified at runtime. Please seeRuntime Configuration for details.
invalidate
Type: enum
Mandatory: No
Dynamic: No
Values: never
, current
Default: never
An enumeration option specifying how the cache should invalidate cache entries.
* `never`: No invalidation is performed. This is the default.
* `current`: When a modification is made, entries in the cache used by
the current session are invalidated. Other sessions that use the same
cache will also be affected, but sessions that use another cache will
not.
The effect of current
depends upon the value of cached_data
. If the value
is shared
, that is, all threads share the same cache, then the effect of an
invalidation is immediately visible to all sessions, as there is just one cache.
However, if the value is thread_specific
, then an invalidation will affect only
the cache that the session happens to be using.
If it is important and sufficient that an application immediately sees a change
that it itself has caused, then a combination of invalidate=current
and cached_data=thread_specific
can be used.
If it is important that an application immediately sees all changes, irrespective
of who has caused them, then a combination of invalidate=current
and cached_data=shared
must be used.
clear_cache_on_parse_errors
Type: boolean
Mandatory: No
Dynamic: No
Default: true
This boolean option specifies how the cache should behave in case of parsing errors when invalidation has been enabled.
true
: If the cache fails to parse an UPDATE/INSERT/DELETE
statement then all cached data will be cleared.
false
: A failure to parse an UPDATE/INSERT/DELETE statement
is ignored and no invalidation will take place due that statement.
The default value is true
.
Changing the value to false
may mean that stale data is returned from
the cache, if an UPDATE/INSERT/DELETE cannot be parsed and the statement
affects entries in the cache.
users
Type: enum
Mandatory: No
Dynamic: No
Values: mixed
, isolated
Default: mixed
An enumeration option specifying how the cache should cache data for different users.
* `mixed`: The data of different users is stored in the same
cache. This is the default and may cause that a user can
access data he should not have access to.
* `isolated`: Each user has a unique cache and there can be
no unintended sharing.
Note that if isolated
has been specified, then each user will
conceptually have a cache of his own, which is populated
independently from each other. That is, if two users make the
same query, then the data will be fetched twice and also stored
twice. So, a isolated
cache will in general use more memory and
cause more traffic to the backend compared to a mixed
cache.
timeout
Type: duration
Mandatory: No
Dynamic: No
Default: 5s
The timeout used when performing operations to distributed storages such as redis or memcached.
timeout=7000ms
The default value is 5000ms
, that is 5 seconds.
The duration can be specified as explainedhere.
The cache filter can be configured at runtime by executing SQL commands. If there is more than one cache filter in a service, only the first cache filter will be able to process the variables. The remaining filters will not see them and thus configuring them at runtime is not possible.
@maxscale.cache.populate
Using the variable @maxscale.cache.populate
it is possible to specify at
runtime whether the cache should be populated or not. Its initial value is
the value of the configuration parameter enabled
. That is, by default the
value is true
.
The purpose of this variable is make it possible for an application to decide statement by statement whether the cache should be populated.
SET @maxscale.cache.populate=true;
SELECT a, b FROM tbl;
SET @maxscale.cache.populate=false;
SELECT a, b FROM tbl;
In the example above, the first SELECT
will always be sent to the
server and the result will be cached, provided the actual cache rules
specifies that it should be. The second SELECT
may be served from the
cache, depending on the value of @maxscale.cache.use
(and the cache
rules).
The value of @maxscale.cache.populate
can be queried
SELECT @maxscale.cache.populate;
but only after it has been explicitly set once.
@maxscale.cache.use
Using the variable @maxscale.cache.use
it is possible to specify at
runtime whether the cache should be used or not. Its initial value is
the value of the configuration parameter enabled
. That is, by default the
value is true
.
The purpose of this variable is make it possible for an application to decide statement by statement whether the cache should be used.
SET @maxscale.cache.use=true;
SELECT a, b FROM tbl;
SET @maxscale.cache.use=false;
SELECT a, b FROM tbl;
The first SELECT
will be served from the cache, providing the rules
specify that the statement should be cached, the cache indeed contains
the result and the date is not stale (as specified by the TTL).
If the data is stale, the SELECT
will be sent to the server and
the cache entry will be updated, irrespective of the value of@maxscale.cache.populate
.
If @maxscale.cache.use
is true
but the result is not found in the
cache, and the result is subsequently fetched from the server, the
result will not be added to the cache, unless@maxscale.cache.populate
is also true
.
The value of @maxscale.cache.use
can be queried
SELECT @maxscale.cache.use;
but only after it has explicitly been set once.
@maxscale.cache.soft_ttl
Using the variable @maxscale.cache.soft_ttl
it is possible at runtime
to specify in seconds what soft ttl should be applied. Its initial
value is the value of the configuration parameter soft_ttl
. That is,
by default the value is 0.
The purpose of this variable is make it possible for an application to decide statement by statement what soft ttl should be applied.
set @maxscale.cache.soft_ttl=600;
SELECT a, b FROM unimportant;
set @maxscale.cache.soft_ttl=60;
SELECT c, d FROM important;
When data is SELECT
ed from the unimportant table unimportant
, the data
will be returned from the cache provided it is no older than 10 minutes,
but when data is SELECT
ed from the important table important
, the
data will be returned from the cache provided it is no older than 1 minute.
Note that @maxscale.cache.hard_ttl
overrules @maxscale.cache.soft_ttl
in the sense that if the former is less that the latter, then soft ttl
will, when used, be adjusted down to the value of hard ttl.
The value of @maxscale.cache.soft_ttl
can be queried
SELECT @maxscale.cache.soft_ttl;
but only after it has explicitly been set once.
@maxscale.cache.hard_ttl
Using the variable @maxscale.cache.hard_ttl
it is possible at runtime
to specify in seconds what hard ttl should be applied. Its initial
value is the value of the configuration parameter hard_ttl
. That is,
by default the value is 0.
The purpose of this variable is make it possible for an application to decide statement by statement what hard ttl should be applied.
Note that as @maxscale.cache.hard_ttl
overrules @maxscale.cache.soft_ttl
,
is is important to ensure that the former is at least as large as the latter
and for best overall performance that it is larger.
set @maxscale.cache.soft_ttl=600, @maxscale.cache.hard_ttl=610;
SELECT a, b FROM unimportant;
set @maxscale.cache.soft_ttl=60, @maxscale.cache.hard_ttl=65;
SELECT c, d FROM important;
The value of @maxscale.cache.hard_ttl
can be queried
SELECT @maxscale.cache.hard_ttl;
but only after it has explicitly been set once.
Client Driven Caching
With @maxscale.cache.populate
and @maxscale.cache.use
is it possible
to make the caching completely client driven.
Provide no rules
file, which means that all SELECT
statements are
subject to caching and that all users receive data from the cache. Set
the startup mode of the cache to disabled.
[TheCache]
type=filter
module=cache
enabled=false
Now, in order to mark statements that should be cached, set@maxscale.cache.populate
to true
, and perform those SELECT
s.
SET @maxscale.cache.populate=true;
SELECT a, b FROM tbl1;
SELECT c, d FROM tbl2;
SELECT e, f FROM tbl3;
SET @maxscale.cache.populate=false;
Note that those SELECT
s must return something in order for the
statement to be marked for caching.
After this, the value of @maxscale.cache.use
will decide whether
or not the cache is considered.
SET @maxscale.cache.use=true;
SELECT a, b FROM tbl1;
SET @maxscale.cache.use=false;
With @maxscale.cache.use
being true
, the cache is considered
and the result returned from there, if not stale. If it is stale,
the result is fetched from the server and the cached entry is updated.
By setting a very long TTL it is possible to prevent the cache from ever considering an entry to be stale and instead manually cause the cache to be updated when needed.
UPDATE tbl1 SET a = ...;
SET @maxscale.cache.populate=true;
SELECT a, b FROM tbl1;
SET @maxscale.cache.populate=false;
What caching approach is used and how different users are treated has a significant impact on the behaviour of the cache. In the following the implication of different combinations is explained.
thread_specific
No thread contention. Data/work duplicated across threads. May cause unintended sharing.
No thread contention. Data/work duplicated across threads and users. No unintended sharing. Requires the most amount of memory.
shared
Thread contention under high load. No duplicated data/work. May cause unintended sharing. Requires the least amount of memory.
Thread contention under high load. Data/work duplicated across users. No unintended sharing.
Invalidation takes place only in the current cache, so how visible
the invalidation is, depends upon the configuration value ofcached_data
.
cached_data=thread_specific
The invalidation is visible only to the sessions that are handled by the same worker thread where the invalidation occurred. Sessions of the same or other users that are handled by different worker threads will not see the new value before the TTL causes the value to be refreshed.
cache_data=shared
The invalidation is immediately visible to all sessions of all users.
The caching rules are expressed as a JSON object or as an array of JSON objects.
There are two decisions to be made regarding the caching; in what circumstances should data be stored to the cache and in what circumstances should the data in the cache be used.
Expressed in JSON this looks as follows
{
store: [ ... ],
use: [ ... ]
}
or, in case an array is used, as
[
{
store: [ ... ],
use: [ ... ]
},
{ ... }
]
The store
field specifies in what circumstances data should be stored to
the cache and the use
field specifies in what circumstances the data in
the cache should be used. In both cases, the value is a JSON array containing
objects.
If an array of rule objects is specified, then, when looking for a rule that
matches, the store
field of each object are evaluated in sequential order
until a match is found. Then, the use
field of that object is used when
deciding whether data in the cache should be used.
By default, if no rules file have been provided or if the store
field is
missing from the object, the results of all queries will be stored to the
cache, subject to max_resultset_rows
and max_resultset_size
cache filter
parameters.
By providing a store
field in the JSON object, the decision whether to
store the result of a particular query to the cache can be controlled in
a more detailed manner. The decision to cache the results of a query can
depend upon
the database,
the table,
the column, or
the query itself.
Each entry in the store
array is an object containing three fields,
{
"attribute": <string>,
"op": <string>
"value": <string>
}
where,
the attribute can be database
, table
, column
or query
,
the op can be =
, !=
, like
or unlike
, and
the value a string.
If op is =
or !=
then value is used as a string; if it is like
or unlike
, then value is interpreted as a pcre2 regular expression.
Note though that if attribute is database
, table
or column
, then
the string is interpreted as a name, where a dot .
denotes qualification
or scoping.
The objects in the store
array are processed in order. If the result
of a comparison is true, no further processing will be made and the
result of the query in question will be stored to the cache.
If the result of the comparison is false, then the next object is processed. The process continues until the array is exhausted. If there is no match, then the result of the query is not stored to the cache.
Note that as the query itself is used as the key, although the following queries
select * from db1.tbl
and
use db1;
select * from tbl
target the same table and produce the same results, they will be cached separately. The same holds for queries like
select * from tbl where a = 2 and b = 3;
and
select * from tbl where b = 3 and a = 2;
as well. Although they conceptually are identical, there will be two cache entries.
Note that if a column has been specified in a rule, then a statement will match irrespective of where that particular column appears. For instance, if a rule specifies that the result of statements referring to the column a should be cached, then the following statement will match
select a from tbl;
and so will
select b from tbl where a > 5;
Qualified Names
When using =
or !=
in the rule object in conjunction with database
,table
and column
, the provided string is interpreted as a name, that is,
dot (.
) denotes qualification or scope.
In practice that means that if attribute is database
then value may
not contain a dot, if attribute is table
then value may contain one
dot, used for separating the database and table names respectively, and
if attribute is column
then value may contain one or two dots, used
for separating table and column names, or database, table and column names.
Note that if a qualified name is used as a value, then all parts of the
name must be available for a match. Currently Maria DB MaxScale may not
always be capable of deducing in what table a particular column is. If
that is the case, then a value like tbl.field
may not necessarily
be a match even if the field is field
and the table actually is tbl
.
Implication of the default database
If the rules concerns the database
, then only if the statement refers
to no specific database, will the default database be considered.
Regexp Matching
The string used for matching the regular expression contains as much information as there is available. For instance, in a situation like
use somedb;
select fld from tbl;
the string matched against the regular expression will be somedb.tbl.fld
.
Examples
Cache all queries targeting a particular database.
{
"store": [
{
"attribute": "database",
"op": "=",
"value": "db1"
}
]
}
Cache all queries not targeting a particular table
{
"store": [
{
"attribute": "table",
"op": "!=",
"value": "tbl1"
}
]
}
That will exclude queries targeting table tbl1 irrespective of which database it is in. To exclude a table in a particular database, specify the table name using a qualified name.
{
"store": [
{
"attribute": "table",
"op": "!=",
"value": "db1.tbl1"
}
]
}
Cache all queries containing a WHERE clause
{
"store": [
{
"attribute": "query",
"op": "like",
"value": ".*WHERE.*"
}
]
}
Note that will actually cause all queries that contain WHERE anywhere, to be cached.
By default, if no rules file have been provided or if the use
field is
missing from the object, all users may be returned data from the cache.
By providing a use
field in the JSON object, the decision whether to use
data from the cache can be controlled in a more detailed manner. The decision
to use data from the cache can depend upon
the user.
Each entry in the use
array is an object containing three fields,
{
"attribute": <string>,
"op": <string>
"value": <string>
}
where,
the attribute can be user
,
the op can be =
, !=
, like
or unlike
, and
the value a string.
If op is =
or !=
then value is interpreted as a MariaDB account
string, that is, %
means indicates wildcard, but if op is like
orunlike
it is simply assumed value is a pcre2 regular expression.
For instance, the following are equivalent:
{
"attribute": "user",
"op": "=",
"value": "'bob'@'%'"
}
{
"attribute": "user",
"op": "like",
"value": "bob@.*"
}
Note that if op is =
or !=
then the usual assumptions apply,
that is, a value of bob
is equivalent with 'bob'@'%'
. If like
or unlike is used, then no assumptions apply, but the string is
used verbatim as a regular expression.
The objects in the use
array are processed in order. If the result
of a comparison is true, no further processing will be made and the
data in the cache will be used, subject to the value of ttl
.
If the result of the comparison is false, then the next object is processed. The process continues until the array is exhausted. If there is no match, then data in the cache will not be used.
Note that use
is relevant only if the query is subject to caching,
that is, if all queries are cached or if a query matches a particular
rule in the store
array.
Examples
Use data from the cache for all users except admin
(actually 'admin'@'%'
),
regardless of what host the admin
user comes from.
{
"use": [
{
"attribute": "user",
"op": "!=",
"value": "admin"
}
]
}
As the cache is not aware of grants, unless the cache has been explicitly
configured who the caching should apply to, the presence of the cache
may provide a user with access to data he should not have access to.
Note that the following applies only if users=mixed
has been configured.
If users=isolated
has been configured, then there can never be any
unintended sharing between users.
Suppose there is a table access
that the user alice has access to,
but the user bob does not. If bob tries to access the table, he will
get an error as reply:
MySQL [testdb]> select * from access;
ERROR 1142 (42000): SELECT command denied to user 'bob'@'localhost' for table 'access'
If we now setup caching for the table, using the simplest possible rules file, bob will get access to data from the table, provided he executes a select identical with one alice has executed.
For instance, suppose the rules look as follows:
{
"store": [
{
"attribute": "table",
"op": "=",
"value": "access"
}
]
}
If alice now queries the table, she will get the result, which also will be cached:
MySQL [testdb]> select * from access;
+------+------+
| a | b |
+------+------+
| 47 | 11 |
+------+------+
If bob now executes the very same query, and the result is still in the cache, it will be returned to him.
MySQL [testdb]> select current_user();
+----------------+
| current_user() |
+----------------+
| bob@127.0.0.1 |
+----------------+
1 row in set (0.00 sec)
MySQL [testdb]> select * from access;
+------+------+
| a | b |
+------+------+
| 47 | 11 |
+------+------+
That can be prevented, by explicitly declaring in the rules that the caching should be applied to alice only.
{
"store": [
{
"attribute": "table",
"op": "=",
"value": "access"
}
],
"use": [
{
"attribute": "user",
"op": "=",
"value": "'alice'@'%'"
}
]
}
With these rules in place, bob is again denied access, since queries
targeting the table access
will in his case not be served from the cache.
There are two types of storages that can be used; local and shared.
The only local storage implementation is storage_inmemory
that simply
stores the cache values in memory. The storage is not persistent and is
destroyed when MaxScale terminates. Since the storage exists in the MaxScale
process, it is very fast and provides almost always a performance benefit.
Currently there are two shared storages; storage_memcached
andstorage_redis
that are implemented using memcached
and redis respectively.
The shared storages are accessed across the network and consequently it isnot self-evident that their use will provide any performance benefit. Namely, irrespective of whether the data is fetched from the cache or from the server there will be a network hop and often that network hop is, as far as the performance goes, what costs the most.
The presence of a shared cache may provide a performance benefitif the network between MaxScale and the storage server (memcached or &#xNAN;Redis) is faster than the network between MaxScale and the database &#xNAN;server, if the used SELECT statements are heavy (that is, take a significant amount of time) to process for the database server, or
if the presence of the cache reduces the overall load of an otherwise overloaded database server.
As a general rule a shared storage should not be used without first assessing its value using a realistic workload.
storage_inmemory
This simple storage module uses the standard memory allocator for storing the cached data.
storage=storage_inmemory
This storage module takes no arguments.
storage_memcached
This storage module uses memcached for storing the cached data.
Multiple MaxScale instances can share the same memcached server and items cached by one MaxScale instance will be used by the other. Note that all MaxScale instances should have exactly the same configuration, as otherwise there can be unintended sharing.
storage=storage_memcached
storage_memcache
has the following mandatory arguments:
server
using which the location of the server is specified as host[:port]
.
If no port is provided, the default Memcached port of 11211
is used.
storage_memcached
has the following optional arguments:
max_value_size
using which the maximum size of a cached value is specified.
By default, the maximum size of a value stored to memcached is 1MB, but that
configured to be something else. The value of max_value_size
will be used
for capping max_resultset_size
, that is, unless memcached is configured to
allow larger values that 1M and max_value_size
has been set accordingly,
only resultsets up to 1MB in size will be cached. The value can be specified
as documented here.
Example:
storage_options="server=192.168.1.31:11211, max_value_size=10M"
Limitations
Invalidation is not supported.
Configuration values given to max_size
and max_count
are ignored.
Security
Neither the data in the memcached server nor the traffic between MaxScale and the memcached server is encrypted. Consequently, anybody with access to the memcached server or to the network have access to the cached data.
storage_redis
This storage module uses redis for storing the cached data.
Multiple MaxScale instances can share the same redis server and items cached by one MaxScale instance will be used by the other. Note that all MaxScale instances should have exactly the same configuration, as otherwise there can be unintended sharing.
storage=storage_redis
If storage_redis
cannot connect to the Redis server, caching will silently
be disabled and a connection attempt will be made after a timeout
interval.
If a timeout error occurs during an operation, reconnecting will be attempted
after a delay, which will be an increasing multiple of timeout
. For example,
if timeout
is the default 5 seconds, then reconnection attempts will first
be made after 10 seconds, then after 15 seconds, then 20 and so on. However,
once 60 seconds have been reached, the delay will no longer be increased but
the delay will stay at one minute. Note that each time a reconnection attempt
is made, unless the reason for the timeout has disappeared, the client will be
stalled for timeout
seconds.
storage_redis
has the following mandatory arguments:
server
using which the location of the server is specified as host[:port]
.
If no port is provided, the default Redis port of 6379
is used.
Example:
storage_options="server=192.168.1.31:6379"
Note that Redis should be configured with no idle timeout or with a timeout that is very large. Otherwise MaxScale may have to repeatedly connect to Redis, which will hurt both the functionality and the performance.
Limitations
There is no distinction between soft and hard ttl, but only hard ttl is used.
Configuration values given to max_size
and max_count
are ignored.
Invalidation
storage_redis
supports invalidation, but the caveats documented here
are of greater significance since also the communication between the cache and the
cache storage is asynchronous and takes place over the network.
NOTE If invalidation is turned on after caching has been used (in non-invalidation mode), redis must be flushed as otherwise there will be entries in the cache that will not be affected by the invalidation.
$ redis-cli flushall
Security
Neither the data in the redis server nor the traffic between MaxScale and the redis server is encrypted. Consequently, anybody with access to the redis server or to the network have access to the cached data.
In the following we define a cache MyCache that uses the cache storage modulestorage_inmemory
and whose soft ttl is 30
seconds and whose hard ttl is45
seconds. The cached data is shared between all threads and the maximum size
of the cached data is 50
mebibytes. The rules for the cache are in the filecache_rules.json
.
[MyCache]
type=filter
module=cache
storage=storage_inmemory
soft_ttl=30
hard_ttl=45
cached_data=shared
max_size=50Mi
rules=cache_rules.json
[MyService]
type=service
...
filters=MyCache
cache_rules.json
The rules specify that the data of the table sbtest
should be cached.
{
"store": [
{
"attribute": "table",
"op": "=",
"value": "sbtest"
}
]
}
When the cache filter was introduced, the most significant factor affecting
the performance of the cache was whether the statements needed to be parsed.
Initially, all statements were parsed in order to exclude SELECT
statements
that use non-cacheable functions, access non-cacheable variables or refer
to system or user variables. Later, the default value of the selects
parameter
was changed to assume_cacheable
, to maximize the default performance.
With the default configuration, the cache itself will not cause the statements
to be parsed. However, even with assume_cacheable
configured, a rule referring
specifically to a database, table or column will still cause the
statement to be parsed.
For instance, a simple rule like
{
"store": [
{
"attribute": "database",
"op": "=",
"value": "db1"
}
]
}
cannot be fulfilled without parsing the statement.
If the rule is instead expressed using a regular expression
{
"store": [
{
"attribute": "query",
"op": "like",
"value": "FROM db1\\..*"
}
]
}
then the statement will not be parsed.
However, when thequery classifier cache was introduced, the parsing cost was significantly reduced and currently the cost for parsing and regular expression matching is roughly the same.
In the following is a table with numbers giving a rough picture of the relative cost of different approaches.
In the table, regexp match means that the cacheable statements were picked out using a rule like
{
"attribute": "query",
"op": "unlike",
"value": "FROM nomatch"
}
while exact match means that the cacheable statements were picked out using a rule like
{
"attribute": "database",
"op": "!=",
"value": "nomatch"
}
The exact match rule requires all statements to be parsed.
As the purpose of the test is to illustrate the overhead of different approaches, the rules were formulated so that all SELECT statements would match.
Note that these figures were obtained by running sysbench, MaxScale and the server in the same computer, so they are only indicative.
assume_cacheable
none
100
assume_cacheable
regexp match
83
assume_cacheable
exact match
83
verify_cacheable
none
80
verify_cacheable
regexp match
80
verify_cacheable
exact match
80
For comparison, without caching, the qps is 33
.
As can be seen, due to the query classifier cache there is no difference between exact and regex based matching.
For maximum performance:
Arrange the situation so that the default selects=assume_cacheable
can be used, and use no rules.
Otherwise it is mostly a personal preference whether exact or regex based rules are used. However, one should always test with real data and real queries before choosing one over the other.
This page is licensed: CC BY-SA / Gnu FDL
This filter was introduced in MariaDB MaxScale 2.3.0.
The binlogfilter
can be combined with a binlogrouter
service to selectively
replicate the binary log events to slave servers.
The filter uses two settings, match and exclude, to determine which events are replicated. If a binlog event does not match or is excluded, the event is replaced with an empty data event. The empty event is always 35 bytes which translates to a space reduction in most cases.
When statement-based replication is used, any query events that are filtered out are replaced with a SQL comment. This causes the query event to do nothing and thus the event will not modify the contents of the database. The GTID position of the replicating database will still advance which means that downstream servers replicating from it keep functioning correctly.
The filter works with both row based and statement based replication but we recommend using row based replication with the binlogfilter. This guarantees that there are no ambiguities in the event filtering.
match
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
Include queries that match the regex. See next entry, exclude
, for more information.
exclude
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
Exclude queries that match the regex.
If neither match
nor exclude
are defined, the filter does nothing and all events
are replicated. This filter does not accept regular expression options as a separate
setting, such settings must be defined in the patterns themselves. See thePCRE2 api documentation for
more information.
The two settings are matched against the database and table name concatenated
with a period. For example, the string the patterns are matched against for the
database test
and table t1
is test.t1
.
For statement based replication, the pattern is matched against all the tables in the statements. If any of the tables matches the match pattern, the event is replicated. If any of the tables matches the exclude pattern, the event is not replicated.
rewrite_src
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
See the next entry, rewrite_dest
, for more information.
rewrite_dest
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
rewrite_src
and rewrite_dest
control the statement rewriting of the binlogfilter.
The rewrite_src
setting is a PCRE2 regular expression that is matched against
the default database and the SQL of statement based replication events (query
events). rewrite_dest
is the replacement string which supports the normal
PCRE2 backreferences (e.g the first capture group is $1
, the second is $2
,
etc.).
Both rewrite_src
and rewrite_dest
must be defined to enable statement rewriting.
When statement rewriting is enabledGTID-based replication must be used. The filter will disallow replication for all slaves that attempt to replicate with traditional file-and-position based replication.
The replacement is done both on the default database as well as the SQL statement in the query event. This means that great care must be taken when defining the rewriting rules. To prevent accidental modification of the SQL into a form that is no longer valid, use database and table names that never occur in the inserted data and is never used as a constant value.
With the following configuration, only events belonging to database customers
are replicated. In addition to this, events for the table orders
are excluded
and thus are not replicated.
[BinlogFilter]
type=filter
module=binlogfilter
match=/customers[.]/
exclude=/[.]orders/
[BinlogServer]
type=service
router=binlogrouter
server_id=33
filters=BinlogFilter
[BinlogListener]
type=listener
service=BinlogServer
port=4000
For more information about the binlogrouter and how to use it, refer to thebinlogrouter documentation.
This page is licensed: CC BY-SA / Gnu FDL
This filter was introduced in MariaDB MaxScale 2.1.
The Consistent Critical Read (CCR) filter allows consistent critical reads to be done through MaxScale while still allowing scaleout of non-critical reads.
When the filter detects a statement that would modify the database, it attaches a routing hint to all following statements done by that connection. This routing hint guides the routing module to route the statement to the master server where data is guaranteed to be in an up-to-date state. Writes from one session do not, by default, propagate to other sessions.
Note: This filter does not work with prepared statements. Only text protocol queries are handled by this filter.
The triggering of the filter can be limited further by adding MaxScale supported comments to queries and/or by using regular expressions. The query comments take precedence: if a comment is found it is obeyed even if a regular expression parameter might give a different result. Even a comment cannot cause a SELECT-query to trigger the filter. Such a comment is considered an error and ignored.
The comments must follow the MaxScale hint syntax
and the HintFilter needs to be in the filter chain before the CCR-filter. If a
query has a MaxScale supported comment line which defines the parameter ccr
,
that comment is caught by the CCR-filter. Parameter values match
and ignore
are supported, causing the filter to trigger (match
) or not trigger (ignore
)
on receiving the write query. For example, the query
INSERT INTO departments VALUES ('d1234', 'NewDepartment'); -- maxscale ccr=ignore
would normally cause the filter to trigger, but does not because of the
comment. The match
-comment typically has no effect, since write queries by
default trigger the filter anyway. It can be used to override an ignore-type
regular expression that would otherwise prevent triggering.
The CCR filter has no mandatory parameters.
time
Type: duration
Mandatory: No
Dynamic: Yes
Default: 60s
The time window during which queries are routed to the master. The duration can be specified as documentedhere but the value will always be rounded to the nearest second. If no explicit unit has been specified, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. The default value for this parameter is 60 seconds.
When a data modifying SQL statement is processed, a timer is set to the value oftime. Once the timer has elapsed, all statements are routed normally. If a new data modifying SQL statement is processed within the time window, the timer is reset to the value of time.
Enabling this parameter in combination with the count parameter causes both the time window and number of queries to be inspected. If either of the two conditions are met, the query is re-routed to the master.
count
Type: count
Mandatory: No
Dynamic: Yes
Default: 0
The number of SQL statements to route to master after detecting a data modifying SQL statement. This feature is disabled by default.
After processing a data modifying SQL statement, a counter is set to the value of count and all statements are routed to the master. Each executed statement after a data modifying SQL statement cause the counter to be decremented. Once the counter reaches zero, the statements are routed normally. If a new data modifying SQL statement is processed, the counter is reset to the value ofcount.
match
, ignore
Type: regex
Mandatory: No
Dynamic: No
Default: ""
These regular expression settings control which statements trigger statement re-routing. Only non-SELECT statements are inspected. For CCRFilter, the exclude-parameter is instead named ignore, yet works similarly.
match=.*INSERT.*
ignore=.*UPDATE.*
options=case,extended
options
Type: enum
Mandatory: No
Dynamic: No
Values: ignorecase
, case
, extended
Default: ignorecase
Regular expression options for match
and ignore
.
global
Type: boolean
Mandatory: No
Dynamic: Yes
Default: false
global
is a boolean parameter that when enabled causes writes from one
connection to propagate to all other connections. This can be used to work
around cases where one connection writes data and another reads it, expecting
the write done by the other connection to be visible.
This parameter only works with the time
parameter. The use of global
andcount
at the same time is not allowed and will be treated as an error.
Here is a minimal filter configuration for the CCRFilter which should solve most problems with critical reads after writes.
[CCRFilter]
type=filter
module=ccrfilter
time=5
With this configuration, whenever a connection does a write, all subsequent reads done by that connection will be forced to the master for 5 seconds.
This prevents read scaling until the modifications have been replicated to the slaves. For best performance, the value of time should be slightly greater than the actual replication lag between the master and its slaves. If the number of critical read statements is known, the count parameter could be used to control the number reads that are sent to the master.
This page is licensed: CC BY-SA / Gnu FDL
This filter is deprecated in MariaDB MaxScale 6 and will be removed in MaxScale 22.08.
The Database Firewall filter is used to block queries that match a set of rules. It can be used to prevent harmful queries from reaching the backend database instances or to limit access to the database based on a more flexible set of rules compared to the traditional GRANT-based privilege system. Currently the filter does not support multi-statements.
Note that the firewall filter should be viewed as a best-effort solution intended for protecting against accidental misuse rather than malicious attacks.
The Database Firewall filter only requires minimal configuration in the maxscale.cnf file. The actual rules of the Database Firewall filter are located in a separate text file. The following is an example of a Database Firewall filter configuration in maxscale.cnf.
[DatabaseFirewall]
type=filter
module=dbfwfilter
rules=/home/user/rules.txt
[Firewalled-Routing-Service]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=DatabaseFirewall
The Database Firewall filter has one mandatory parameter, rules
.
rules
A path to a file with the rule definitions in it. The file should be readable by the user MariaDB MaxScale is run with. If a relative path is given, the path is interpreted relative to the module configuration directory. The default module configuration directory is /etc/maxscale.modules.d.
action
This parameter is optional and determines what action is taken when a query
matches a rule. The value can be either allow
, which allows all matching
queries to proceed but blocks those that don't match, or block
, which blocks
all matching queries, or ignore
which allows all queries to proceed.
The following statement types will always be allowed through when action
is
set to allow
:
COM_CHANGE_USER: The user is changed for an active connection
COM_FIELD_LIST: Alias for the SHOW TABLES;
query
COM_INIT_DB: Alias for USE <db>;
COM_PING: Server is pinged
COM_PROCESS_INFO: Alias for SHOW PROCESSLIST;
COM_PROCESS_KILL: Alias for KILL <id>;
query
COM_QUIT: Client closes connection
COM_SET_OPTION: Client multi-statements are being configured
You can have both blacklist and whitelist functionality by configuring one
filter with action=allow
and another one with action=block
. You can then use
different rule files with each filter, one for blacklisting and another one for
whitelisting. After this you only have to add both of these filters to a service
in the following way.
[my-firewall-service]
type=service
servers=server1
router=readconnroute
user=maxuser
password=maxpwd
filters=dbfw-whitelist|dbfw-blacklist
[dbfw-whitelist]
type=filter
module=dbfwfilter
action=allow
rules=/home/user/whitelist-rules.txt
[dbfw-blacklist]
type=filter
module=dbfwfilter
action=block
rules=/home/user/blacklist-rules.txt
If a query is blocked, the filter will return an error to the client with the error number 1141 and an SQL state of HY000.
log_match
Log all queries that match a rule. For the any
matching mode, the name of the
rule that matched is logged and for other matching modes, the name of the last
matching rule is logged. In addition to the rule name the matched user and the
query itself is logged. The log messages are logged at the notice level.
log_no_match
Log all queries that do not match a rule. The matched user and the query is logged. The log messages are logged at the notice level.
treat_string_as_field
This optional parameter specifies how the database firewall should treat
strings. If true, they will be handled as fields, which will cause column
blocking rules to match even if ANSI_QUOTES
has been enabled and "
is
used instead of backtick.
treat_string_as_field=false
The default value is true
.
Note that this may cause a false positive, if a "true" string contains the name of a column to be blocked.
treat_string_arg_as_field
This optional parameter specifies how the database firewall should treat
strings used as arguments to functions. If true, they will be handled
as fields, which will cause function column blocking rules to match even
even if ANSI_QUOTES
has been enabled and "
is used instead of backtick.
treat_string_arg_as_field=false
The default value is true
.
Note that this may cause a false positive, if a "true" string contains the name of a column to be blocked.
strict
Whether to treat unsupported SQL or multi-statement SQL as an error. This is a
boolean parameter and the default value is true
.
When disabled, SQL that cannot be fully parsed is allowed to pass if the rules do not cause it to be blocked. This can be used to provide a best-effort mode where uncertainly about the SQL is allowed.
The rules are defined by using the following syntax:
rule NAME match RULE [at_times VALUE...] [on_queries {select|update|insert|delete|grant|revoke|drop|create|alter|use|load}]
Where NAME is the identifier for this rule and RULE is the mandatory rule definition.
Rules are identified by their name and have mandatory parts and optional parts.
You can add comments to the rule files by adding the #
character at
the beginning of the line. Trailing comments are not supported.
The first step of defining a rule is to start with the keyword rule
which
identifies this line of text as a rule. The second token is identified as
the name of the rule. After that the mandatory token match
is required
to mark the start of the actual rule definition.
The rule definition must contain exactly one mandatory rule parameter. It can also contain one of each type of optional rule parameter.
The Database Firewall filter's rules expect a single mandatory parameter for a rule. You can define multiple rules to cover situations where you would like to apply multiple mandatory rules to a query.
wildcard
This rule blocks all queries that use the wildcard character *
.
Example
Use of the wildcard is not allowed:
rule examplerule match wildcard
columns
This rule expects a list of values after the columns
keyword. These values are
interpreted as column names and if a query targets any of these, it is matched.
Example
Deny name and salary columns:
rule examplerule match columns name salary
function
This rule expects a list of values after the function
keyword. These values
are interpreted as function names and if a query uses any of these, it is
matched. The symbolic comparison operators (<
, >
, >=
etc.) are also
considered functions whereas the text versions (NOT
, IS
, IS NOT
etc.) are
not considered functions.
Example
Match queries using the sum and count functions:
rule examplerule match function sum count
not_function
This rule expects a list of values after the not_function
keyword. These values
are interpreted as function names and if a query uses any function other than these,
it is matched. The symbolic comparison operators (<
, >
, >=
etc.) are also
considered functions whereas the text versions (NOT
, IS
, IS NOT
etc.) are
not considered functions.
If the rule is given no values, then the rule will match a query using any function.
Example
Match queries using other functions but the length function:
rule examplerule match not_function length
Match queries using functions:
rule examplerule match not_function
uses_function
This rule expects a list of column names after the keyword. If any of the columns are used with a function, the rule will match. This rule can be used to prevent the use of a column with a function.
Example
Deny function usage with name and address columns:
rule examplerule match uses_function name address
function
and columns
This rule combines the function
and columns
type rules to match if one
of the listed columns uses one of the listed functions. The rule expects
the function
and columns
keywords both followed by a list of values.
Example
Deny use of the sum function with name or address columns:
rule examplerule match function sum columns name address
not_function
and columns
This rule combines the not_function
and columns
type rules to match if
one of the listed columns is used in conjunction with functions other than
the listed ones. The rule expects the not_function
and columns
keywords
both followed by a list of values.
If not_function
is not provided with a list of values, then the rule
matches if any of the columns is used with any function.
Example
Match if any other function but length is used with the name or address columns:
rule examplerule match not_function length columns name address
Match if any function is used with the _ssn_column:
rule examplerule match not_function columns ssn
regex
This rule blocks all queries matching the regular expression. The regex string expects a
PCRE2 syntax regular expression. For more information about PCRE2 syntax, read thePCRE2 documentation. Unlike
typical MaxScale regex parameters, the value should be enclosed in single or double
quotes, not in /.../
. Any compilation options must be included in the pattern itself.
Example
Block selects to accounts:
rule examplerule match regex '.*select.*from.*accounts.*'
limit_queries
This rule has been DEPRECATED. Please use the Throttle Filter instead.
The limit_queries rule expects three parameters. The first parameter is the number of allowed queries during the time period. The second is the time period in seconds and the third is the amount of time in seconds for which the rule is considered active and blocking.
WARNING: Using limit_queries
in action=allow
is not supported.
Example
Over 50 queries within a window of 5 seconds will block for 100 seconds:
rule examplerule match limit_queries 50 5 100
no_where_clause
This rule inspects the query and blocks it if it has no WHERE clause. For
example, this would disallow a DELETE FROM ...
query without a WHERE
clause. This does not prevent wrongful usage of the WHERE
clause e.g. DELETE FROM ... WHERE 1=1
.
Example
Queries must have a where clause:
rule examplerule match no_where_clause
Each mandatory rule accepts one or more optional parameters. These are to be defined after the mandatory part of the rule.
at_times
This rule expects a list of time ranges that define the times when the rule in
question is active. The time formats are expected to be ISO-8601 compliant and
to be separated by a single dash (the - character). For example, to define the
active period of a rule to be 5pm to 7pm, you would include at times 17:00:00-19:00:00
in the rule definition. The rule uses local time to check if
the rule is active and has a precision of one second.
on_queries
This limits the rule to be active only on certain types of queries. The possible values are:
select
SELECT statements
insert
INSERT statements
update
UPDATE statements
delete
DELETE statements
grant
All grant operations
revoke
All revoke operations
create
All create operations
alter
All alter operations
drop
All drop operations
use
USE operations
load
LOAD DATA operations
Multiple values can be combined using the pipe character |
e.g.on_queries select|insert|update
.
The users
directive defines the users to which the rule should be applied.
users NAME... match { any | all | strict_all } rules RULE...
The first keyword is users
, which identifies this line as a user definition
line.
The second component is a list of user names and network addresses in the formatuser
@
0.0.0.0
. The first part is the user name and the second part is
the network address. You can use the %
character as the wildcard to enable
user name matching from any address or network matching for all users. After the
list of users and networks the keyword match is expected. This means that the
following user definitions are supported:
user@host
user@%
%@host
Partial wildcards, e.g. user@192.%
are not supported.
As MaxScale listens to the IPv6 all address by default, IPv4 addresses will be
mapped into the IPv6 space. This means that the IPv4 address 192.168.0.1
will
show up in MaxScale as ::ffff:192.168.0.1
. Take this into account when
defining the users
directives.
After this either the keyword any
, all
or strict_all
is expected. This
defined how the rules are matched. If any
is used when the first rule is
matched the query is considered as matched and the rest of the rules are
skipped. If instead the all
keyword is used all rules must match for the query
to be considered as matched. The strict_all
is the same as all
but it checks the rules
from left to right in the order they were listed. If one of these does not
match, the rest of the rules are not checked. This could be useful in situations
where you would for example combine limit_queries
and regex
rules. By usingstrict_all
you can have the regex
rule first and the limit_queries
rule
second. This way the rule only matches if the regex
rule matches enough times
for the limit_queries
rule to match.
After the matching part comes the rules keyword after which a list of rule names is expected. This allows reusing of the rules and enables varying levels of query restriction.
If a particular NAME appears on several users
lines, then when an
actual user matches that name, the rules of each line are checked
independently until there is a match for the statement in question. That
is, the rules of each users
line are treated in an OR fashion with
respect to each other.
Read Module Commands documentation for details about module commands.
The dbfwfilter supports the following module commands.
rules/reload FILTER [FILE]
Load a new rule file or reload the current rules. New rules are only taken into use if they are successfully loaded and in cases where loading of the rules fail, the old rules remain in use. The FILTER parameter is the filter instance whose rules are reloaded. The FILE argument is an optional path to a rule file and if it is not defined, the current rule file is used.
rules FILTER
Shows the current statistics of the rules. The FILTER parameter is the filter instance to inspect.
To prevent the excessive use of a database we want to set a limit on the rate of queries. We only want to apply this limit to certain queries that cause unwanted behaviour. To achieve this we can use a regular expression.
First we define the limit on the rate of queries. The first parameter for the rule sets the number of allowed queries to 10 queries and the second parameter sets the rate of sampling to 5 seconds. If a user executes queries faster than this, any further queries that match the regular expression are blocked for 60 seconds.
rule limit_rate_of_queries match limit_queries 10 5 60
rule query_regex match regex '.*select.*from.*user_data.*'
To apply these rules we combine them into a single rule by adding a users
line
to the rule file.
users %@% match all rules limit_rate_of_queries query_regex
We have a table which contains all the managers of a company. We want to prevent accidental deletes into this table where the where clause is missing. This poses a problem, we don't want to require all the delete queries to have a where clause. We only want to prevent the data in the managers table from being deleted without a where clause.
To achieve this, we need two rules. The first rule defines that all delete operations must have a where clause. This rule alone does us no good so we need a second one. The second rule blocks all queries that match a regular expression.
rule safe_delete match no_where_clause on_queries delete
rule managers_table match regex '.*from.*managers.*'
When we combine these two rules we get the result we want. To combine these two rules add the following line to the rule file.
users %@% match all rules safe_delete managers_table
This page is licensed: CC BY-SA / Gnu FDL
This filter adds routing hints to a service. The filter has no parameters.
Note: If a query has more than one comment only the first comment is processed. Always place any MaxScale related comments first before any other comments that might appear in the query.
The client connection will need to have comments enabled. For example themariadb
and mysql
command line clients have comments disabled by default and
they need to be enabled by passing the --comments
or -c
option to it. Most,
if not all, connectors keep all comments intact in executed queries.
# The --comments flag is needed for the command line client
mariadb --comments -u my-user -psecret -e "SELECT @@hostname -- maxscale route to server db1"
For comment types, use either --
(notice the whitespace after the double
hyphen) or #
after the semicolon or /* ... */
before the semicolon.
Inline comment blocks, i.e. /* .. */
, do not require a whitespace character
after the start tag or before the end tag but adding the whitespace is advised.
All hints must start with the maxscale
tag.
-- maxscale <hint body>
The hints have two types, ones that define a server type and others that contain name-value pairs.
These hints will instruct the router to route a query to a certain type of a server.
-- maxscale route to [master | slave | server <server name>]
Route to master
-- maxscale route to master
A master
value in a routing hint will route the query to a master server. This
can be used to direct read queries to a master server for a up-to-date result
with no replication lag.
Route to slave
-- maxscale route to slave
A slave
value will route the query to a slave server. Please note that the
hints will override any decisions taken by the routers which means that it is
possible to force writes to a slave server.
Route to named server
-- maxscale route to server <server name>
A server
value will route the query to a named server. The value of<server name>
needs to be the same as the server section name in
maxscale.cnf. If the server is not used by the service, the hint is ignored.
Route to last used server
-- maxscale route to last
A last
value will route the query to the server that processed the last
query. This hint can be used to force certain queries to be grouped to the same
server.
Name-value hints
-- maxscale <param>=<value>
These control the behavior and affect the routing decisions made by the
router. Currently the only accepted parameter is the readwritesplit parametermax_slave_replication_lag
. This will route the query to a server with a lower
replication lag than this parameter's value.
Hints can be either single-use hints, which makes them affect only one query, or named hints, which can be pushed on and off a stack of active hints.
Defining named hints:
-- maxscale <hint name> prepare <hint content>
Pushing a hint onto the stack:
-- maxscale <hint name> begin
Popping the topmost hint off the stack:
-- maxscale end
You can define and activate a hint in a single command using the following:
-- maxscale <hint name> begin <hint content>
You can also push anonymous hints onto the stack which are only used as long as they are on the stack:
-- maxscale begin <hint content>
The hintfilter supports routing hints in prepared statements for both thePREPARE
and EXECUTE
SQL commands as well as the binary protocol prepared
statements.
With binary protocol prepared statements, a routing hint in the prepared statement is applied to the execution of the statement but not the preparation of it. The preparation of the statement is routed normally and is sent to all servers.
For example, when the following prepared statement is prepared with the MariaDB
Connector-C function mariadb_stmt_prepare
and then executed withmariadb_stmt_execute
the result is always returned from the master:
SELECT user FROM accounts WHERE id = ? -- maxscale route to master
Support for binary protocol prepared statements was added in MaxScale 6.0 (MXS-2838).
The protocol commands that the routing hints are applied to are:
COM_STMT_EXECUTE
COM_STMT_BULK_EXECUTE
COM_STMT_SEND_LONG_DATA
COM_STMT_FETCH
COM_STMT_RESET
Support for direct execution of prepared statements was added in MaxScale
6.2.0. For example the MariaDB Connector-C uses direct execution whenmariadb_stmt_execute_direct
is used.
Text protocol prepared statements (i.e. the PREPARE
and EXECUTE
SQL
commands) behave differently. If a PREPARE
command has a routing hint, it will
be routed according to the routing hint. Any subsequent EXECUTE
command will
not be affected by the routing hint in the PREPARE
statement. This means they
must have their own routing hints.
The following example is the recommended method of executing text protocol prepared statements with hints:
PREPARE my_ps FROM 'SELECT user FROM accounts WHERE id = ?';
EXECUTE my_ps USING 123; -- maxscale route to master
The PREPARE
is routed normally and will be routed to all servers. TheEXECUTE
will be routed to the master as a result of it having the route to master
hint.
SELECT
queries to masterIn this example, MariaDB MaxScale is configured with the readwritesplit router and the hint filter.
[ReadWriteService]
type=service
router=readwritesplit
servers=server1,server2
user=maxuser
password=maxpwd
filters=Hint
[Hint]
type=filter
module=hintfilter
Behind MariaDB MaxScale is a master server and a slave server. If there is replication lag between the master and the slave, read queries sent to the slave might return old data. To guarantee up-to-date data, we can add a routing hint to the query.
INSERT INTO table1 VALUES ("John","Doe",1);
SELECT * from table1; -- maxscale route to master
The first INSERT query will be routed to the master. The following SELECT query would normally be routed to the slave but with the added routing hint it will be routed to the master. This way we can do an INSERT and a SELECT right after it and still get up-to-date data.
This page is licensed: CC BY-SA / Gnu FDL
This filter was introduced in MariaDB MaxScale 2.1.
The insertstream filter converts bulk inserts into CSV data streams that are consumed by the backend server via the LOAD DATA LOCAL INFILE mechanism. This leverages the speed advantage of LOAD DATA LOCAL INFILE over regular inserts while also reducing the overall network traffic by condensing the inserted values into CSV.
Note: This is an experimental filter module
This filter has no parameters.
The filter translates all INSERT statements done inside an explicit transaction into LOAD DATA LOCAL INFILE streams. The file name used in the request will always be maxscale.data.
The following example is translated into a LOAD DATA LOCAL INFILE request followed by two CSV rows.
BEGIN;
INSERT INTO test.t1 VALUES (1, "hello"), (2, "world");
COMMIT;
Multiple inserts to the same table are combined into a single stream. This allows for efficient bulk loading with simple insert statements.
The following example will use only one LOAD DATA LOCAL INFILE request followed by four CSV rows.
BEGIN;
INSERT INTO test.t1 VALUES (1, "hello"), (2, "world");
INSERT INTO test.t1 VALUES (3, "foo"), (4, "bar");
COMMIT;
Non-INSERT statements executed inside the transaction will close the streaming of the data. Avoid interleaving SELECT statements with INSERT statements inside transactions.
The following example has to use two LOAD DATA LOCAL INFILE requests, each followed by two CSV rows.
Note: Avoid doing this!
BEGIN;
INSERT INTO test.t1 VALUES (1, "hello"), (2, "world");
SELECT * FROM test.t1;
INSERT INTO test.t1 VALUES (3, "foo"), (4, "bar");
COMMIT;
The more inserts that are streamed, the more efficient this filter is. The saving in network bandwidth in bytes can be estimated with the following formula:
((20 + t) * n) + (n * (m * 2)) - 108 - t = x
n = Number of INSERT statements
m = Number of values in each insert statement
t = Length of table name
x = Number of bytes saved
Positive values indicate savings in network bandwidth usage.
The filter has no parameters so it is extremely simple to configure. The following example shows the required filter configuration.
[Insert-Stream]
type=filter
module=insertstream
This page is licensed: CC BY-SA / Gnu FDL
The luafilter is a filter that calls a set of functions in a Lua script.
Read the Lua language documentation for information on how to write Lua scripts.
Note: This module is experimental and must be built from source. The module is deprecated in MaxScale 23.08 and might be removed in a future release.
The luafilter has two parameters. They control which scripts will be called by
the filter. Both parameters are optional but at least one should be defined. If
both global_script
and session_script
are defined, the entry points in both
scripts will be called.
global_script
The global Lua script. The parameter value is a path to a readable Lua script which will be executed.
This script will always be called with the same global Lua state and it can be used to build a global view of the whole service.
session_script
The session level Lua script. The parameter value is a path to a readable Lua script which will be executed once for each session.
Each session will have its own Lua state meaning that each session can have a unique Lua environment. Use this script to do session specific tasks.
The entry points for the Lua script expect the following signatures:
nil createInstance()
- global script only, called when MaxScale is started
The global script will be loaded in this function and executed once on a global level before calling the createInstance function in the Lua script.
nil newSession(string, string)
- new session is created
After the session script is loaded, the newSession function in the Lua scripts is called. The first parameter is the username of the client and the second parameter is the client's network address.
nil closeSession()
- session is closed
The closeSession
function in the Lua scripts will be called.
(nil | bool | string) routeQuery(string)
- query is being routed
The Luafilter calls the routeQuery
functions of both the session and the
global script. The query is passed as a string parameter to the
routeQuery Lua function and the return values of the session specific
function, if any were returned, are interpreted. If the first value is
bool, it is interpreted as a decision whether to route the query or to
send an error packet to the client. If it is a string, the current query
is replaced with the return value and the query will be routed. If nil is
returned, the query is routed normally.
nil clientReply()
- reply to a query is being routed
This function calls the clientReply
function of the Lua scripts.
string diagnostic()
- global script only, print diagnostic information
This will call the matching diagnostics
entry point in the Lua script. If
the Lua function returns a string, it will be printed to the client.
These functions, if found in the script, will be called whenever a call to the matching entry point is made.
Script Template
Here is a script template that can be used to try out the luafilter. Copy it
into a file and add global_script=<path to script>
into the filter
configuration. Make sure the file is readable by the maxscale
user.
function createInstance()
end
function newSession()
end
function closeSession()
end
function routeQuery(query)
end
function clientReply(query)
end
function diagnostic()
end
The luafilter exposes three functions that can be called from the Lua script.
string lua_qc_get_type_mask()
Returns the type of the current query being executed as a string. The values
are the string versions of the query types defined in query_classifier.h
are separated by vertical bars (|
).
This function can only be called from the routeQuery
entry point.
string lua_qc_get_operation()
Returns the current operation type as a string. The values are defined in
query_classifier.h.
This function can only be called from the routeQuery
entry point.
string lua_get_canonical()
Returns the canonical version of a query by replacing all user-defined constant values with question marks.
This function can only be called from the routeQuery
entry point.
number id_gen()
This function generates unique integers that can be used to distinct sessions from each other.
Here is a minimal configuration entry for a luafilter definition.
[MyLuaFilter]
type=filter
module=luafilter
global_script=/path/to/script.lua
And here is a script that opens a file in /tmp/
and logs output to it.
f = io.open("/tmp/test.log", "a+")
function createInstance()
f:write("createInstance\n")
end
function newSession(a, b)
f:write("newSession for: " .. a .. "@" .. b .. "\n")
end
function closeSession()
f:write("closeSession\n")
end
function routeQuery(string)
f:write("routeQuery: " .. string .. " -- type: " .. lua_qc_get_type_mask() .. " operation: " .. lua_qc_get_operation() .. "\n")
end
function clientReply()
f:write("clientReply\n")
end
function diagnostic()
f:write("diagnostics\n")
return "Hello from Lua!"
end
This page is licensed: CC BY-SA / Gnu FDL
This filter was introduced in MariaDB MaxScale 2.1.
With the masking filter it is possible to obfuscate the returned value of a particular column.
For instance, suppose there is a table person that, among other columns, contains the column ssn where the social security number of a person is stored.
With the masking filter it is possible to specify that when the ssn field is queried, a masked value is returned unless the user making the query is a specific one. That is, when making the query
> SELECT name, ssn FROM person;
instead of getting the real result, as in
+-------+-------------+
+ name | ssn |
+-------+-------------+
| Alice | 721-07-4426 |
| Bob | 435-22-3267 |
...
the ssn would be masked, as in
+-------+-------------+
+ name | ssn |
+-------+-------------+
| Alice | XXX-XX-XXXX |
| Bob | XXX-XX-XXXX |
...
Note that the masking filter should be viewed as a best-effort solution intended for protecting against accidental misuse rather than malicious attacks.
From MaxScale 2.3 onwards, the masking filter will reject statements that use functions in conjunction with columns that should be masked. Allowing function usage provides a way for circumventing the masking, unless a firewall filter is separately configured and installed.
Please see the configuration parameterprevent_function_usage for how to change the default behaviour.
From MaxScale 2.3.5 onwards, the masking filter will check the definition of user variables and reject statements that define a user variable using a statement that refers to columns that should be masked.
Please see the configuration parametercheck_user_variables for how to change the default behaviour.
From MaxScale 2.3.5 onwards, the masking filter will examine unions and if the second or subsequent SELECT refer to columns that should be masked, the statement will be rejected.
Please see the configuration parametercheck_unions for how to change the default behaviour.
From MaxScale 2.3.5 onwards, the masking filter will examine subqueries and if a subquery refers to columns that should be masked, the statement will be rejected.
Please see the configuration parametercheck_subqueries for how to change the default behaviour.
Note that in order to ensure that it is not possible to get access to masked data, the privileges of the users should be minimized. For instance, if a user can create tables and perform inserts, he or she can execute something like
CREATE TABLE cheat (revealed_ssn TEXT);
INSERT INTO cheat SELECT ssn FROM users;
SELECT revealed_ssn FROM cheat;
to get access to the cleartext version of a masked field ssn
.
From MaxScale 2.3.5 onwards, the masking filter will, if any of theprevent_function_usage
, check_user_variables
, check_unions
orcheck_subqueries
parameters is set to true, block statements that
cannot be fully parsed.
Please see the configuration parameterrequire_fully_parsed for how to change the default behaviour.
From MaxScale 2.3.7 onwards, the masking filter will treat any strings
passed to functions as if they were fields. The reason is that as the
MaxScale query classifier is not aware of whether ANSI_QUOTES
is
enabled or not, it is possible to bypass the masking by turning that
option on.
mysql> set @@sql_mode = 'ANSI_QUOTES';
mysql> select concat("ssn") from managers;
Before this change, the content of the field ssn
would have been
returned in clear text even if the column should have been masked.
Note that this change will mean that there may be false positives
if ANSI_QUOTES
is not enabled and a string argument happens to
be the same as the name of a field to be masked.
Please see the configuration parameter [treat_string_arg_as_field(#treat_string_arg_as_field) for how to change the default behaviour.
The masking filter can only be used for masking columns of the following
types: BINARY
, VARBINARY
, CHAR
, VARCHAR
, BLOB
, TINYBLOB
,MEDIUMBLOB
, LONGBLOB
, TEXT
, TINYTEXT
, MEDIUMTEXT
, LONGTEXT
,ENUM
and SET
. If the type of the column is something else, then no
masking will be performed.
Currently, the masking filter can only work on packets whose payload is less
than 16MB. If the masking filter encounters a packet whose payload is exactly
that, thus indicating a situation where the payload is delivered in multiple
packets, the value of the parameter large_payloads
specifies how the masking
filter should handle the situation.
The masking filter is taken into use with the following kind of configuration setup.
[Mask-SSN]
type=filter
module=masking
rules=...
[SomeService]
type=service
...
filters=Mask-SSN
The masking filter has one mandatory parameter - rules
.
rules
Type: path
Mandatory: Yes
Dynamic: Yes
Specifies the path of the file where the masking rules are stored. A relative path is interpreted relative to the module configuration directory of MariaDB MaxScale. The default module configuration directory is/etc/maxscale.modules.d.
rules=/path/to/rules-file
warn_type_mismatch
Type: enum
Mandatory: No
Dynamic: Yes
Values: never
, always
Default: never
With this optional parameter the masking filter can be instructed to log a warning if a masking rule matches a column that is not of one of the allowed types.
warn_type_mismatch=always
large_payload
Type: enum
Mandatory: No
Dynamic: Yes
Values: ignore
, abort
Default: abort
This optional parameter specifies how the masking filter should treat
payloads larger than 16MB
, that is, payloads that are delivered in
multiple MySQL protocol packets.
The values that can be used are ignore
, which means that columns in
such payloads are not masked, and abort
, which means that if such
payloads are encountered, the client connection is closed. The default
is abort
.
Note that the aborting behaviour is applied only to resultsets that contain columns that should be masked. There are no limitations on resultsets that do not contain such columns.
large_payload=ignore
prevent_function_usage
Type: bool
Mandatory: No
Dynamic: Yes
Default: true
This optional parameter specifies how the masking filter should behave if a column that should be masked, is used in conjunction with some function. As the masking filter works only on the basis of the information in the returned result-set, if the name of a column is not present in the result-set, then the masking filter cannot mask a value. This means that the masking filter basically can be bypassed with a query like:
SELECT CONCAT(masked_column) FROM tbl;
If the value of prevent_function_usage
is true
, then all
statements that contain functions referring to masked columns will
be rejected. As that means that also queries using potentially
harmless functions, such as LENGTH(masked_column)
, are rejected
as well, this feature can be turned off. In that case, the firewall
filter should be setup to allow or reject the use of certain functions.
prevent_function_usage=false
require_fully_parsed
Type: bool
Mandatory: No
Dynamic: Yes
Default: true
This optional parameter specifies how the masking filter should
behave in case any of prevent_function_usage
, check_user_variables
,check_unions
or check_subqueries
is true and it encounters a
statement that cannot be fully parsed,
If true, then statements that cannot be fully parsed (due to a parser limitation) will be blocked.
require_fully_parsed=false
Note that if this parameter is set to false, then prevent_function_usage
,check_user_variables
, check_unions
and check_subqueries
are rendered
less effective, as it with a statement that can not be fully parsed may be
possible to bypass the protection that they are intended to provide.
treat_string_arg_as_field
Type: bool
Mandatory: No
Dynamic: Yes
Default: true
This optional parameter specifies how the masking filter should treat
strings used as arguments to functions. If true, they will be handled
as fields, which will cause fields to be masked even if ANSI_QUOTES
has
been enabled and "
is used instead of backtick.
treat_string_arg_as_field=false
check_user_variables
Type: bool
Mandatory: No
Dynamic: Yes
Default: true
This optional parameter specifies how the masking filter should behave with respect to user variables. If true, then a statement like
set @a = (select ssn from customer where id = 1);
will be rejected if ssn
is a column that should be masked.
check_user_variables=false
check_unions
Type: bool
Mandatory: No
Dynamic: Yes
Default: true
This optional parameter specifies how the masking filter should behave with respect to UNIONs. If true, then a statement like
SELECT a FROM t1 UNION select b from t2;
will be rejected if b
is a column that should be masked.
check_unions=false
check_subqueries
Type: bool
Mandatory: No
Dynamic: Yes
Default: true
This optional parameter specifies how the masking filter should behave with respect to subqueries. If true, then a statement like
SELECT * FROM (SELECT a as b FROM t1) as t2;
will be rejected if a
is a column that should be masked.
check_subqueries=false
The masking rules are expressed as a JSON object.
The top-level object is expected to contain a key rules
whose
value is an array of rule objects.
{
"rules": [ ... ]
}
Each rule in the rules array is a JSON object, expected to
contain the keys replace
, with
, applies_to
andexempted
. The two former ones are obligatory and the two
latter ones optional.
{
"rules": [
{
"replace": { ... },
"with": { ... },
"applies_to": [ ... ],
"exempted": [ ... ]
}
]
}
replace
The value of this key is an object that specifies the column
whose values should be masked. The object must contain the keycolumn
and may contain the keys table
and database
. The
value of these keys must be a string.
If only column
is specified, then a column with that name
matches irrespective of the table and database. If table
is specified, then the column matches only if it is in a table
with the specified name, and if database
is specified when
the column matches only if it is in a database with the
specified name.
{
"rules": [
{
"replace": {
"database": "db1",
"table": "person",
"column": "ssn"
},
"with": { ... },
"applies_to": [ ... ],
"exempted": [ ... ]
}
]
}
NOTE If a rule contains a table/database then if the resultset
does not contain table/database information, it will always be
considered a match if the column matches. For instance, given the
rule above, if there is a table person2
, also containing an ssn
field, then a query like
SELECT ssn FROM person2;
will not return masked values, but a query like
SELECT ssn FROM person UNION SELECT ssn FROM person2;
will only return masked values, even if the ssn
values fromperson2
in principle should not be masked. The same effect is
observed even with a nonsensical query like
SELECT ssn FROM person2 UNION SELECT ssn FROM person2;
even if nothing from person2
should be masked. The reason is that
as the resultset contains no table information, the values must be
masked if the column name matches, as otherwise the masking could
easily be circumvented with a query like
SELECT ssn FROM person UNION SELECT ssn FROM person;
The optional key match
makes partial replacement of the original
value possible: only the matched part would be replaced
with the fill character.
The match
value must be a valid pcre2 regular expression.
"replace": {
"column": "ssn",
"match": "(123)"
},
"with": {
"fill": "X#"
}
obfuscate
The obfuscate rule allows the obfuscation of the value by passing it through an obfuscation algorithm. Current solution uses a non-reversible obfuscation approach.
However, note that although it is in principle impossible to obtain the original value from the obfuscated one, if the range of possible original values is limited, it is straightforward to figure out the possible original values by running all possible values through the obfuscation algorithm and then comparing the results.
The minimal configuration is:
"obfuscate": {
"column": "name"
}
Output example for Db field name
= 'remo'
SELECT name from db1.tbl1;`
+------+
| name |
+------+
| $-~) |
+------+
with
The value of this key is an object that specifies what the value of the matched
column should be replaced with for the replace
rule. Currently, the object
is expected to contain either the key value
or the key fill
.
The value of both must be a string with length greater than zero.
If both keys are specified, value
takes precedence.
If fill
is not specified, the default X
is used as its value.
If value
is specified, then its value is used to replace the actual value
verbatim and the length of the specified value must match the actual returned
value (from the server) exactly. If the lengths do not match, the value offill
is used to mask the actual value.
When the value of fill
(fill-value) is used for masking the returned value,
the fill-value is used as many times as necessary to match the length of the
return value. If required, only a part of the fill-value may be used in the end
of the mask value to get the lengths to match.
{
"rules": [
{
"replace": {
"column": "ssn"
},
"with": {
"value": "XXX-XX-XXXX"
},
"applies_to": [ ... ],
"exempted": [ ... ]
},
{
"replace": {
"column": "age"
},
"with": {
"fill": "*"
},
"applies_to": [ ... ],
"exempted": [ ... ]
},
{
"replace": {
"column": "creditcard"
},
"with": {
"value": "1234123412341234",
"fill": "0"
},
"applies_to": [ ... ],
"exempted": [ ... ]
},
]
}
applies_to
With this optional key, whose value must be an array of strings,
it can be specified what users the rule is applied to. Each string
should be a MariaDB account string, that is, %
is a wildcard.
{
"rules": [
{
"replace": { ... },
"with": { ... },
"applies_to": [ "'alice'@'host'", "'bob'@'%'" ],
"exempted": [ ... ]
}
]
}
If this key is not specified, then the masking is performed for all
users, except the ones exempted using the key exempted
.
exempted
With this optional key, whose value must be an array of strings,
it can be specified what users the rule is not applied to. Each
string should be a MariaDB account string, that is, %
is a wildcard.
{
"rules": [
{
"replace": { ... },
"with": { ... },
"applies_to": [ ... ],
"exempted": [ "'admin'" ]
}
]
}
Read Module Commands documentation for details about module commands.
The masking filter supports the following module commands.
reload
Reload the rules from the rules file. The new rules are taken into use only if the loading succeeds without any errors.
MaxScale> call command masking reload MyMaskingFilter
MyMaskingFilter
refers to a particular filter section in the
MariaDB MaxScale configuration file.
In the following we configure a masking filter MyMasking that should always log a
warning if a masking rule matches a column that is of a type that cannot be masked,
and that should abort the client connection if a resultset package is larger than
16MB. The rules for the masking filter are in the file masking_rules.json
.
[MyMasking]
type=filter
module=masking
warn_type_mismatch=always
large_payload=abort
rules=masking_rules.json
[MyService]
type=service
...
filters=MyMasking
masking_rules.json
The rules specify that the data of a column whose name is ssn
, should
be replaced with the string 012345-ABCD. If the length of the data is
not exactly the same as the length of the replacement value, then the
data should be replaced with as many X characters as needed.
{
"rules": [
{
"replace": {
"column": "ssn"
},
"with": {
"value": "012345-ABCD",
"fill": "X"
}
}
]
}
This page is licensed: CC BY-SA / Gnu FDL
This filter was introduced in MariaDB MaxScale 2.1.
The Maxrows filter is capable of restricting the amount of rows that a SELECT, a prepared statement or stored procedure could return to the client application.
If a resultset from a backend server has more rows than the configured limit or the resultset size exceeds the configured size, an empty result will be sent to the client.
The Maxrows filter is easy to configure and to add to any existing service.
[MaxRows]
type=filter
module=maxrows
[MaxRows-Routing-Service]
type=service
...
filters=MaxRows
The Maxrows filter has no mandatory parameters. Optional parameters are:
max_resultset_rows
Type: number
Mandatory: No
Dynamic: Yes
Default: (no limit)
Specifies the maximum number of rows a resultset can have in order to be returned to the user.
If a resultset is larger than this an empty result will be sent instead.
max_resultset_rows=1000
max_resultset_size
Type: size
Mandatory: No
Dynamic: Yes
Default: 64Ki
Specifies the maximum size a resultset can have in order to be sent to the client. A resultset larger than this, will not be sent: an empty resultset will be sent instead.
max_resultset_size=128Ki
max_resultset_return
Type: enum
Mandatory: No
Dynamic: Yes
Values: empty
, error
, ok
Default: empty
Specifies what the filter sends to the client when the rows or size limit is hit, possible values:
an empty result set
an error packet with input SQL
an OK packet
Example output with ERR packet:
MariaDB [(test)]> select * from test.t4;
ERROR 1415 (0A000): Row limit/size exceeded for query: select * from test.t4
debug
Type: number
Mandatory: No
Dynamic: Yes
Default: 0
An integer value, using which the level of debug logging made by the Maxrows filter can be controlled. The value is actually a bitfield with different bits denoting different logging.
0
(0b00000
) No logging is made.
1
(0b00001
) A decision to handle data form server is logged.
2
(0b00010
) Reached max_resultset_rows or max_resultset_size is logged.
To log everything, give debug
a value of 3
.
debug=2
Here is an example of filter configuration where the maximum number of returned rows is 10000 and maximum allowed resultset size is 256KB
[MaxRows]
type=filter
module=maxrows
max_resultset_rows=10000
max_resultset_size=256000
This page is licensed: CC BY-SA / Gnu FDL
The namedserverfilter is a MariaDB MaxScale filter module able to route queries to servers based on regular expression (regex) matches. Since it is a filter instead of a router, the NamedServerFilter only sets routing suggestions. It requires a compatible router to be effective. Currently, bothreadwritesplit and hintrouter take advantage of routing hints in the data packets. This filter uses the PCRE2 library for regular expression matching.
The filter accepts settings in two modes: legacy and indexed. Only one of
the modes may be used for a given filter instance. The legacy mode is meant for
backwards compatibility and allows only one regular expression and one server
name in the configuration. In indexed mode, up to 25 regex-server pairs are
allowed in the form match01 - target01, match02 - target02 and so on.
Also, in indexed mode, the server names (targets) may contain a list of names or
special tags ->master
or ->slave
.
All parameters except the deprecated match
and target
parameters can
be modified at runtime. Any modifications to the filter configuration will
only affect sessions created after the change has completed.
Below is a configuration example for the filter in indexed-mode. The legacy mode is not recommended and may be removed in a future release. In the example, a SELECT on TableOne (match01) results in routing hints to two named servers, while a SELECT on TableTwo is suggested to be routed to the master server of the service. Whether a list of server names is interpreted as a route-to-any or route-to-all is up to the attached router. The HintRouter sees a list as a suggestion to route-to-any. For additional information on hints and how they can also be embedded into SQL-queries, seeHint-Syntax.
[NamedServerFilter]
type=filter
module=namedserverfilter
match01=^Select.*TableOne$
target01=server2,server3
match22=^SELECT.*TableTwo$
target22=->master
[MyService]
type=service
router=readwritesplit
servers=server1,server2,server3
user=myuser
password=mypasswd
filters=NamedServerFilter
NamedServerFilter requires at least one matchXY - targetXY pair.
matchXY
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
matchXY defines aPCRE2 regular expression against which the incoming SQL query is matched. XY must be a number in the range 01 - 25. Each match-setting pairs with a similarly indexed target-setting. If one is defined, the other must be defined as well. If a query matches the pattern, the filter attaches a routing hint defined by the target-setting to the query. Theoptions-parameter affects how the patterns are compiled.
match01=^SELECT
options=case,extended
options
Type: enum
Mandatory: No
Dynamic: Yes
Values: ignorecase
, case
, extended
Default: ignorecase
Regular expression options
for matchXY
.
targetXY
Type: string
Mandatory: No
Dynamic: Yes
Default: None
The hint which is attached to the queries matching the regular expression defined bymatchXY. If a compatible router is used in the service the query will be routed accordingly. The target can be one of the following:
a server or service name (adds a HINT_ROUTE_TO_NAMED_SERVER
hint)
a list of server names, comma-separated (adds severalHINT_ROUTE_TO_NAMED_SERVER
hints)
->master
(adds a HINT_ROUTE_TO_MASTER
hint)
->slave
(adds a HINT_ROUTE_TO_SLAVE
hint)
->all
(adds a HINT_ROUTE_TO_ALL
hint)
The support for service names was added in MaxScale 6.3.2. Older
versions of MaxScale did not accept service names in the target
parameters.
target01=MyServer2
source
Type: string
Mandatory: No
Dynamic: Yes
Default: None
This optional parameter defines an IP address or mask which a connecting client's IP address is matched against. Only sessions whose address matches this setting will have this filter active and performing the regex matching. Traffic from other client IPs is simply left as is and routed straight through.
source=127.0.0.1
Since MaxScale 2.1 it's also possible to use % wildcards:
source=192.%.%.%
source=192.168.%.%
source=192.168.10.%
Note that using source=%
to match any IP is not allowed.
Since MaxScale 2.3 it's also possible to specify multiple addresses separated by comma. Incoming client connections are subsequently checked against each.
source=192.168.21.3,192.168.10.%
user
Type: string
Mandatory: No
Dynamic: Yes
Default: None
This optional parameter defines a username the connecting client username is matched against. Only sessions that are connected using this username will have the match and routing hints applied to them. Traffic from other users is simply left as is and routed straight through.
user=john
The maximum number of accepted match - target pairs is 25.
In the configuration file, the indexed match and target settings may be in any order and may skip numbers. During SQL-query matching, however, the regexes are tested in ascending order: match01, match02, match03 and so on. As soon as a match is found for a given query, the routing hints are written and the packet is forwarded to the next filter or router. Any remaining match regexes are ignored. This means the match - target pairs should be indexed in priority order, or, if priority is not a factor, in order of decreasing match probability.
Binary-mode prepared statements (COM_STMT_PREPARE) are handled by matching the prepared sql against the match-parameters. If a match is found, the routing hints are attached to any execution of that prepared statement. Text- mode prepared statements are not supported in this way. To divert them, use regular expressions which match the specific "EXECUTE"-query.
This will route all queries matching the regular expression *from *users
to
the server named server2. The filter will ignore character case in queries.
A query like SELECT * FROM users
would be routed to server2 where as a query
like SELECT * FROM accounts
would be routed according to the normal rules of
the router.
[NamedServerFilter]
type=filter
module=namedserverfilter
match02= *from *users
target02=server2
[MyService]
type=service
router=readwritesplit
servers=server1,server2
user=myuser
password=mypasswd
filters=NamedServerFilter
This page is licensed: CC BY-SA / Gnu FDL
The Query Log All (QLA) filter logs query content. Logs are written to a file in CSV format. Log elements are configurable and include the time submitted and the SQL statement text, among others.
A minimal configuration is below.
[MyLogFilter]
type=filter
module=qlafilter
filebase=/tmp/SqlQueryLog
[MyService]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=MyLogFilter
The qlafilter
logs can be rotated by executing the maxctrl rotate logs
command. This will cause the log files to be reopened when the next message is
written to the file. This applies to both unified and session type logging.
The QLA filter has one mandatory parameter, filebase
, and a number of optional
parameters. These were introduced in the 1.0 release of MariaDB MaxScale.
filebase
Type: string
Mandatory: Yes
Dynamic: No
The basename of the output file created for each session. A session index is added to the filename for each written session file. For unified log files,.unified is appended.
filebase=/tmp/SqlQueryLog
match
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
Include queries that match the regex.
exclude
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
Exclude queries that match the regex.
options
Type: enum_mask
Mandatory: No
Dynamic: Yes
Values: case
, ignorecase
, extended
Default: case
The extended
option enables PCRE2 extended regular expressions.
user
Type: string
Mandatory: No
Dynamic: Yes
Default: ""
Limit logging to sessions with this user.
source
Type: string
Mandatory: No
Dynamic: Yes
Default: ""
Limit logging to sessions with this client source address.
log_type
Type: enum_mask
Mandatory: No
Dynamic: Yes
Values: session
, unified
, stdout
Default: session
The type of log file to use.
session
Write to session-specific files
unified
Use one file for all sessions
stdout
Same as unified, but to stdout
log_data
Type: enum_mask
Mandatory: No
Dynamic: Yes
Values: service
, session
, date
, user
, reply_time
, total_reply_time
, query
, default_db
, num_rows
, reply_size
, transaction
, transaction_time
, num_warnings
, error_msg
Default: date, user, query
Type of data to log in the log files.
service
Service name
session
Unique session id (ignored for session files)
date
Timestamp
user
User and hostname of client
reply_time
Duration from client query to first server reply
total_reply_time
Duration from client query to last server reply (v6.2)
query
Query
default_db
The default (current) database
num_rows
Number of rows in the result set (v6.2)
reply_size
Number of bytes received from the server (v6.2)
transaction
BEGIN, COMMIT and ROLLBACK (v6.2)
transaction_time
The duration of a transaction (v6.2)
num_warnings
Number of warnings in the server reply (v6.2)
error_msg
Error message from the server (if any) (v6.2)
The durations reply_time and total_reply_time are by default in milliseconds, but can be specified to another unit using duration_unit.
The log entry is written when the last reply from the server is received. Prior to version 6.2 the entry was written when the query was received from the client, or if reply_time was specified, on first reply from the server.
NOTE The error_msg is the raw message from the server. Even if use_canonical_form is set the error message may contain user defined constants. For example:
MariaDB [test]> select secret from T where x password="clear text pwd";
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual
that corresponds to your MariaDB server version for the right syntax to
use near 'password="clear text pwd"' at line 1
duration_unit
Type: string
Mandatory: No
Dynamic: Yes
Default: milliseconds
The unit for logging a duration. The unit can be milliseconds
or microseconds
.
The abbreviations ms
for milliseconds and us
for microseconds are also valid.
This option is available as of MaxScale version 6.2.
use_canonical_form
Type: bool
Mandatory: No
Dynamic: Yes
Default: false
When this option is true the canonical form of the query is logged. In the canonical form all user defined constants are replaced with question marks. This option is available as of MaxScale version 6.2.
flush
Type: bool
Mandatory: No
Dynamic: Yes
Default: false
Flush log files after every write.
append
Type: bool
Mandatory: No
Dynamic: Yes
Default: true
separator
Type: string
Mandatory: No
Dynamic: Yes
Default: ","
Defines the separator string between elements of log entries. The value should be enclosed in quotes.
newline_replacement
Type: string
Mandatory: No
Dynamic: Yes
Default: " "
Default value is " "
(one space). SQL-queries may include line breaks, which, if
printed directly to the log, may break automatic parsing. This parameter defines
what should be written in the place of a newline sequence (\r, \n or \r\n). If
this is set as the empty string, then newlines are not replaced and printed as
is to the output. The value should be enclosed in quotes.
Imagine you have observed an issue with a particular table and you want to determine if there are queries that are accessing that table but not using the primary key of the table. Let's assume the table name is PRODUCTS and the primary key is called PRODUCT_ID. Add a filter with the following definition:
[ProductsSelectLogger]
type=filter
module=qlafilter
match=SELECT.*from.*PRODUCTS .*
exclude=WHERE.*PRODUCT_ID.*
filebase=/var/logs/qla/SelectProducts
[Product-Service]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=ProductsSelectLogger
The result of using this filter with the service used by the application would
be a log file of all select queries querying PRODUCTS without using the
PRODUCT_ID primary key in the predicates of the query. Executing SELECT * FROM PRODUCTS
would log the following into /var/logs/qla/SelectProducts
:
07:12:56.324 7/01/2016, SELECT * FROM PRODUCTS
This page is licensed: CC BY-SA / Gnu FDL
The Regex filter is a filter module for MariaDB MaxScale that is able to rewrite query content using regular expression matches and text substitution. The regular expressions use thePCRE2 syntax.
PCRE2 library uses a different syntax than POSIX to refer to capture
groups in the replacement string. The main difference is the usage of the dollar
character instead of the backslash character for references e.g. $1
instead of\1
. For more details about the replacement string differences, please read theCreating a new string with substitutions
chapter in the PCRE2 manual.
The following demonstrates a minimal configuration.
[MyRegexFilter]
type=filter
module=regexfilter
match=some string
replace=replacement string
[MyService]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=MyRegexfilter
The Regex filter has two mandatory parameters: match and replace.
match
Type: regex
Mandatory: Yes
Dynamic: Yes
Defines the text in the SQL statements that is replaced.
match=TYPE[ ]*=
options=case
options
Type: enum
Mandatory: No
Dynamic: Yes
Values: ignorecase
, case
, extended
Default: ignorecase
The options-parameter affects how the patterns are compiled asusual.
replace
Type: string
Mandatory: Yes
Dynamic: Yes
This is the text that should replace the part of the SQL-query matching the pattern defined in match.
replace=ENGINE =
source
Type: string
Mandatory: No
Dynamic: Yes
Default: None
The optional source parameter defines an address that is used to match against the address from which the client connection to MariaDB MaxScale originates. Only sessions that originate from this address will have the match and replacement applied to them.
source=127.0.0.1
user
Type: string
Mandatory: No
Dynamic: Yes
Default: None
The optional user parameter defines a username that is used to match against the user from which the client connection to MariaDB MaxScale originates. Only sessions that are connected using this username will have the match and replacement applied to them.
user=john
log_file
Type: string
Mandatory: No
Dynamic: Yes
Default: None
The optional log_file parameter defines a log file in which the filter writes all queries that are not matched and matching queries with their replacement queries. All sessions will log to this file so this should only be used for diagnostic purposes.
log_file=/tmp/regexfilter.log
log_trace
Type: string
Mandatory: No
Dynamic: Yes
Default: None
The optional log_trace parameter toggles the logging of non-matching and matching queries with their replacements into the log file on the info level. This is the preferred method of diagnosing the matching of queries since the log level can be changed at runtime. For more details about logging levels and session specific logging, please read theConfiguration Guide.
log_trace=true
MySQL 5.1 used the parameter TYPE = to set the storage engine that should be used for a table. In later versions this changed to be ENGINE =. Imagine you have an application that you can not change for some reason, but you wish to migrate to a newer version of MySQL. The regexfilter can be used to transform the create table statements into the form that could be used by MySQL 5.5
[CreateTableFilter]
type=filter
module=regexfilter
options=ignorecase
match=TYPE\s*=
replace=ENGINE=
[MyService]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=CreateTableFilter
This page is licensed: CC BY-SA / Gnu FDL
The tee filter is a "plumbing" fitting in the MariaDB MaxScale filter toolkit. It can be used in a filter pipeline of a service to make copies of requests from the client and send the copies to another service within MariaDB MaxScale.
Please Note: Starting with MaxScale 2.2.0, any client that connects to a
service which uses a tee filter will require a grant for the loopback address,
i.e. 127.0.0.1
.
The configuration block for the TEE filter requires the minimal filter parameters in its section within the MaxScale configuration file. The service to send the duplicates to must be defined.
[DataMartFilter]
type=filter
module=tee
target=DataMart
[Data-Service]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=DataMartFilter
The tee filter requires a mandatory parameter to define the service to replicate statements to and accepts a number of optional parameters.
target
Type: target
Mandatory: No
Dynamic: Yes
Default: none
The target where the filter will duplicate all queries. The target can be either a service or a server. The duplicate connection that is created to this target will be referred to as the "branch target" in this document.
service
Type: service
Mandatory: No
Dynamic: Yes
Default: none
The service where the filter will duplicate all queries. This parameter is
deprecated in favor of the target
parameter and will be removed in a future
release. Both target
and service
cannot be defined.
match
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
What queries should be included.
match=/insert.*into.*order*/
exclude
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
What queries should be excluded.
exclude=/select.*from.*t1/
options
Type: enum
Mandatory: No
Dynamic: Yes
Values: ignorecase
, case
, extended
Default: ignorecase
How regular expressions should be interpreted.
options=case,extended
source
Type: string
Mandatory: No
Dynamic: Yes
Default: None
The optional source parameter defines an address that is used to match against the address from which the client connection to MariaDB MaxScale originates. Only sessions that originate from this address will be replicated.
source=127.0.0.1
user
Type: string
Mandatory: No
Dynamic: Yes
Default: None
The optional user parameter defines a user name that is used to match against the user from which the client connection to MariaDB MaxScale originates. Only sessions that are connected using this username are replicated.
user=john
sync
Type: boolean
Mandatory: No
Dynamic: Yes
Default: false
Enable synchronous routing mode. When configured with sync=true
, the filter
will queue new queries until the response from both the main and the branch
target has been received. This means that for n
executed queries, n - 1
queries are guaranteed to be synchronized. Adding one extra statement
(e.g. SELECT 1
) to a batch of statements guarantees that all previous SQL
statements have been successfully executed on both targets.
In the synchronous routing mode, a failure of the branch target will cause the client session to be closed.
All statements that are executed on the branch target are done in an
asynchronous manner. This means that when the client receives the response
there is no guarantee that the statement has completed on the branch
target. The sync
feature provides some synchronization guarantees that can
be used to verify successful execution on both targets.
Any errors on the branch target will cause the connection to it to be
closed. If target
is a service, it is up to the router to decide whether the
connection is closed. For direct connections to servers, any network errors
cause the connection to be closed. When the connection is closed, no new
queries will be routed to the branch target.
With sync=true
, a failure of the branch target will cause the whole session
to be closed.
Read Module Commands documentation for details about module commands.
The tee filter supports the following module commands.
tee disable [FILTER]
This command disables a tee filter instance. A disabled tee filter will not send any queries to the target service.
tee enable [FILTER]
Enable a disabled tee filter. This resumes the sending of queries to the target service.
Assume an order processing system that has a table called orders. You also have another database server, the datamart server, that requires all inserts into orders to be replicated to it. Deletes and updates are not, however, required.
Set up a service in MariaDB MaxScale, called Orders, to communicate with the order processing system with the tee filter applied to it. Also set up a service to talk to the datamart server, using the DataMart service. The tee filter would have as its service entry the DataMart service, by adding a match parameter of "insert into orders" would then result in all requests being sent to the order processing system, and insert statements that include the orders table being additionally sent to the datamart server.
[Orders]
type=service
router=readconnroute
servers=server1, server2, server3, server4
user=massi
password=6628C50E07CCE1F0392EDEEB9D1203F3
filters=ReplicateOrders
[ReplicateOrders]
type=filter
module=tee
target=DataMart
match=insert[ ]*into[ ]*orders
[DataMart]
type=service
router=readconnroute
servers=datamartserver
user=massi
password=6628C50E07CCE1F0392EDEEB9D1203F3
filters=QLA-DataMart
[QLA-DataMart]
type=filter
module=qlafilter
options=/var/log/DataMart/InsertsLog
[Orders-Listener]
type=listener
target=Orders
port=4011
[DataMart-Listener]
type=listener
target=DataMart
port=4012
This page is licensed: CC BY-SA / Gnu FDL
This filter was added in MariaDB MaxScale 2.3
The throttle filter replaces and extends on the limit_queries functionality ofthe Database Firewall filter.
The throttle filter is used to limit the maximum query frequency (QPS - queries per second) of a database session to a configurable value. The main use cases are to prevent a rogue session (client side error) and a DoS attack from overloading the system.
The throttling is dynamic. The query frequency is not limited to an absolute value. Depending on the configuration the throttle will allow some amount of high frequency queries, or especially short bursts with no frequency limitation.
[Throttle]
type = filter
module = throttlefilter
max_qps = 500
throttling_duration = 60000
...
[Routing-Service]
type = service
filters = Throttle
This configuration states that the query frequency will be throttled to around 500 qps, and that the time limit a query is allowed to stay at the maximum frequency is 60 seconds. All values involving time are configured in milliseconds. With the basic configuration the throttling will be nearly immediate, i.e. a session will only be allowed very short bursts of high frequency querying.
When a session has been continuously throttled for throttling_duration
milliseconds, or 60 seconds in this example, MaxScale will disconnect the
session.
The two parameters max_qps
and sampling_duration
together define how a
session is throttled.
Suppose max qps is 400 qps and sampling duration is 10 seconds. Since QPS is not an instantaneous measure, but one could say it has a granularity of 10 seconds, we see that over the 10 seconds 10*400 = 4000 queries are allowed before throttling kicks in.
With these values, a fresh session can start off with a speed of 2000 qps, and maintain that speed for 2 seconds before throttling starts.
If the client continues to query at high speed and throttling duration is set to 10 seconds, Maxscale will disconnect the session 12 seconds after it started.
max_qps
Type: number
Mandatory: Yes
Dynamic: Yes
Maximum queries per second.
This is the frequency to which a session will be limited over a given time
period. QPS is not measured as an instantaneous value but over a configurable
sampling duration (see sampling_duration
).
throttling_duration
Type: duration
Mandatory: Yes
Dynamic: Yes
This defines how long a session is allowed to be throttled before MaxScale disconnects the session.
sampling_duration
Type: duration
Mandatory: No
Dynamic: Yes
Default: 250ms
Sampling duration defines the window of time over which QPS is measured. This parameter directly affects the amount of time that high frequency queries are allowed before throttling kicks in.
The lower this value is, the more strict throttling becomes. Conversely, the longer this time is, the longer bursts of high frequency querying is allowed.
continuous_duration
Type: duration
Mandatory: No
Dynamic: Yes
Default: 2s
This value defines what continuous throttling means. Continuous throttling
starts as soon as the filter throttles the frequency. Continuous throttling ends
when no throttling has been performed in the past continuous_duration
time.
This page is licensed: CC BY-SA / Gnu FDL
The top filter is a filter module for MariaDB MaxScale that monitors every SQL statement that passes through the filter. It measures the duration of that statement, the time between the statement being sent and the first result being returned. The top N times are kept, along with the SQL text itself and a list sorted on the execution times of the query is written to a file upon closure of the client session.
Example minimal configuration:
[MyLogFilter]
type=filter
module=topfilter
[Service]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=MyLogFilter
The top filter has one mandatory parameter, filebase
, and a number of optional
parameters.
filebase
Type: string
Mandatory: Yes
Dynamic: Yes
The basename of the output file created for each session. The session ID is added to the filename for each file written. This is a mandatory parameter.
filebase=/tmp/SqlQueryLog
The filebase may also be set as the filter, the mechanism to set the filebase via the filter option is superseded by the parameter. If both are set the parameter setting will be used and the filter option ignored.
count
Type: number
Mandatory: No
Dynamic: Yes
Default: 10
The number of SQL statements to store and report upon.
count=30
match
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
Limits the queries logged by the filter.
match=select.*from.*customer.*where
exclude=where
options=case,extended
exclude
Type: regex
Mandatory: No
Dynamic: Yes
Default: None
Limits the queries logged by the filter.
options
Type: enum
Mandatory: No
Dynamic: No
Values: ignorecase
, case
, extended
Default: case
Regular expression options
for match
and exclude
.
source
Type: string
Mandatory: No
Dynamic: Yes
Default: None
Defines an address that is used to match against the address from which the client connection to MariaDB MaxScale originates. Only sessions that originate from this address will be logged.
source=127.0.0.1
user
Type: string
Mandatory: No
Dynamic: Yes
Default: None
Defines a username that is used to match against the user from which the client connection to MariaDB MaxScale originates. Only sessions that are connected using this username will result in results being generated.
user=john
You have an order system and believe the updates of the PRODUCTS table is causing some performance issues for the rest of your application. You would like to know which of the many updates in your application is causing the issue.
Add a filter with the following definition:
[ProductsUpdateTop20]
type=filter
module=topfilter
count=20
match=UPDATE.*PRODUCTS.*WHERE
exclude=UPDATE.*PRODUCTS_STOCK.*WHERE
filebase=/var/logs/top/ProductsUpdate
Note the exclude entry, this is to prevent updates to the PRODUCTS_STOCK table from being included in the report.
One of your applications servers is slower than the rest, you believe it is related to database access but you are not sure what is taking the time.
Add a filter with the following definition:
[SlowAppServer]
type=filter
module=topfilter
count=20
source=192.168.0.32
filebase=/var/logs/top/SlowAppServer
In order to produce a comparison with an unaffected application server you can also add a second filter as a control.
[ControlAppServer]
type=filter
module=topfilter
count=20
source=192.168.0.42
filebase=/var/logs/top/ControlAppServer
In the service definition add both filters
[App-Service]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=SlowAppServer | ControlAppServer
You will then have two sets of logs files written, one which profiles the top 20 queries of the slow application server and another that gives you the top 20 queries of your control application server. These two sets of files can then be compared to determine what if anything is different between the two.
The following is an example report for a number of fictitious queries executed against the employees example database available for MySQL.
-bash-4.1$ cat /var/logs/top/Employees-top-10.137
Top 10 longest running queries in session.
==========================================
Time (sec) | Query
-----------+-----------------------------------------------------------------
22.985 | select sum(salary), year(from_date) from salaries s, (select distinct year(from_date) as y1 from salaries) y where (makedate(y.y1, 1) between s.from_date and s.to_date) group by y.y1
5.304 | select d.dept_name as "Department", y.y1 as "Year", count(*) as "Count" from departments d, dept_emp de, (select distinct year(from_date) as y1 from dept_emp order by 1) y where d.dept_no = de.dept_no and (makedate(y.y1, 1) between de.from_date and de.to_date) group by y.y1, d.dept_name order by 1, 2
2.896 | select year(now()) - year(birth_date) as age, gender, avg(salary) as "Average Salary" from employees e, salaries s where e.emp_no = s.emp_no and ("1988-08-01" between from_date AND to_date) group by year(now()) - year(birth_date), gender order by 1,2
2.160 | select dept_name as "Department", sum(salary) / 12 as "Salary Bill" from employees e, departments d, dept_emp de, salaries s where e.emp_no = de.emp_no and de.dept_no = d.dept_no and ("1988-08-01" between de.from_date AND de.to_date) and ("1988-08-01" between s.from_date AND s.to_date) and s.emp_no = e.emp_no group by dept_name order by 1
0.845 | select dept_name as "Department", avg(year(now()) - year(birth_date)) as "Average Age", gender from employees e, departments d, dept_emp de where e.emp_no = de.emp_no and de.dept_no = d.dept_no and ("1988-08-01" between from_date AND to_date) group by dept_name, gender
0.668 | select year(hire_date) as "Hired", d.dept_name, count(*) as "Count" from employees e, departments d, dept_emp de where de.emp_no = e.emp_no and de.dept_no = d.dept_no group by d.dept_name, year(hire_date)
0.249 | select moves.n_depts As "No. of Departments", count(moves.emp_no) as "No. of Employees" from (select de1.emp_no as emp_no, count(de1.emp_no) as n_depts from dept_emp de1 group by de1.emp_no) as moves group by moves.n_depts order by 1
0.245 | select year(now()) - year(birth_date) as age, gender, count(*) as "Count" from employees group by year(now()) - year(birth_date), gender order by 1,2
0.179 | select year(hire_date) as "Hired", count(*) as "Count" from employees group by year(hire_date)
0.160 | select year(hire_date) - year(birth_date) as "Age", count(*) as Count from employees group by year(hire_date) - year(birth_date) order by 1
-----------+-----------------------------------------------------------------
Session started Wed Jun 18 18:41:03 2014
Connection from 127.0.0.1
Username massi
Total of 24 statements executed.
Total statement execution time 35.701 seconds
Average statement execution time 1.488 seconds
Total connection time 46.500 seconds
-bash-4.1$
This page is licensed: CC BY-SA / Gnu FDL
Note: This module is experimental and must be built from source. The module is deprecated in MaxScale 23.08 and might be removed in a future release.
The Transaction Performance Monitoring (TPM) filter is a filter module for MaxScale that monitors every SQL statement that passes through the filter. The filter groups a series of SQL statements into a transaction by detecting 'commit' or 'rollback' statements. It logs all committed transactions with necessary information, such as timestamp, client, SQL statements, latency, etc., which can be used later for transaction performance analysis.
The configuration block for the TPM filter requires the minimal filter options in it's section within the maxscale.cnf file, stored in /etc/maxscale.cnf.
[MyLogFilter]
type=filter
module=tpmfilter
[MyService]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=MyLogFilter
The TPM filter does not support any filter options currently.
The TPM filter accepts a number of optional parameters.
The name of the output file created for performance logging. The default filename is tpm.log.
filename=/tmp/SqlQueryLog
The optional source
parameter defines an address that is used
to match against the address from which the client connection
to MaxScale originates. Only sessions that originate from this
address will be logged.
source=127.0.0.1
The optional user
parameter defines a user name that is used
to match against the user from which the client connection to
MaxScale originates. Only sessions that are connected using
this username are logged.
user=john
The optional delimiter
parameter defines a delimiter that is used to
distinguish columns in the log. The default delimiter is :::
.
delimiter=:::
The optional query_delimiter
defines a delimiter that is used to
distinguish different SQL statements in a transaction.
The default query delimiter is @@@
.
query_delimiter=@@@
named_pipe
is the path to a named pipe, which TPM filter uses to
communicate with 3rd-party applications (e.g., DBSeer).
Logging is enabled when the router receives the character '1' and logging is
disabled when the router receives the character '0' from this named pipe.
The default named pipe is /tmp/tpmfilter
and logging is disabled by default.
named_pipe=/tmp/tpmfilter
For example, the following command enables the logging:
$ echo '1' > /tmp/tpmfilter
Similarly, the following command disables the logging:
$ echo '0' > /tmp/tpmfilter
For each transaction, the TPM filter prints its log in the following format:
<timestamp> | <server_name> | <user_name> | <latency of the transaction> | <latencies of individual statements in the transaction> (delimited by 'query_delimiter') | <actual SQL statements>
You want to log every transaction with its SQL statements and latency for future transaction performance analysis.
Add a filter with the following definition:
[PerformanceLogger]
type=filter
module=tpmfilter
delimiter=:::
query_delimiter=@@@
filename=/var/logs/tpm/perf.log
named_pipe=/tmp/tpmfilter
[Product-Service]
type=service
router=readconnroute
servers=server1
user=myuser
password=mypasswd
filters=PerformanceLogger
After the filter reads the character '1' from its named pipe, the following is an example log that is generated from the above TPM filter with the above configuration:
1484086477::::server1::::root::::3::::0.165@@@@0.108@@@@0.102@@@@0.092@@@@0.121@@@@0.122@@@@0.110@@@@2.081::::UPDATE WAREHOUSE SET W_YTD = W_YTD + 3630.48 WHERE W_ID = 2 @@@@SELECT W_STREET_1, W_STREET_2, W_CITY, W_STATE, W_ZIP, W_NAME FROM WAREHOUSE WHERE W_ID = 2@@@@UPDATE DISTRICT SET D_YTD = D_YTD + 3630.48 WHERE D_W_ID = 2 AND D_ID = 9@@@@SELECT D_STREET_1, D_STREET_2, D_CITY, D_STATE, D_ZIP, D_NAME FROM DISTRICT WHERE D_W_ID = 2 AND D_ID = 9@@@@SELECT C_FIRST, C_MIDDLE, C_LAST, C_STREET_1, C_STREET_2, C_CITY, C_STATE, C_ZIP, C_PHONE, C_CREDIT, C_CREDIT_LIM, C_DISCOUNT, C_BALANCE, C_YTD_PAYMENT, C_PAYMENT_CNT, C_SINCE FROM CUSTOMER WHERE C_W_ID = 2 AND C_D_ID = 9 AND C_ID = 1025@@@@UPDATE CUSTOMER SET C_BALANCE = 1007749.25, C_YTD_PAYMENT = 465215.47, C_PAYMENT_CNT = 203 WHERE C_W_ID = 2 AND C_D_ID = 9 AND C_ID = 1025@@@@INSERT INTO HISTORY (H_C_D_ID, H_C_W_ID, H_C_ID, H_D_ID, H_W_ID, H_DATE, H_AMOUNT, H_DATA) VALUES (9,2,1025,9,2,'2017-01-10 17:14:37',3630.48,'locfljbe xtnfqn')
1484086477::::server1::::root::::6::::0.123@@@@0.087@@@@0.091@@@@0.098@@@@0.078@@@@0.106@@@@0.094@@@@0.074@@@@0.089@@@@0.073@@@@0.098@@@@0.073@@@@0.088@@@@0.072@@@@0.087@@@@0.071@@@@0.085@@@@0.078@@@@0.088@@@@0.098@@@@0.081@@@@0.076@@@@0.082@@@@0.073@@@@0.077@@@@0.070@@@@0.105@@@@0.093@@@@0.088@@@@0.089@@@@0.087@@@@0.087@@@@0.086@@@@1.883::::SELECT C_DISCOUNT, C_LAST, C_CREDIT, W_TAX FROM CUSTOMER, WAREHOUSE WHERE W_ID = 2 AND C_W_ID = 2 AND C_D_ID = 10 AND C_ID = 1267@@@@SELECT D_NEXT_O_ID, D_TAX FROM DISTRICT WHERE D_W_ID = 2 AND D_ID = 10 FOR UPDATE@@@@UPDATE DISTRICT SET D_NEXT_O_ID = D_NEXT_O_ID + 1 WHERE D_W_ID = 2 AND D_ID = 10@@@@INSERT INTO OORDER (O_ID, O_D_ID, O_W_ID, O_C_ID, O_ENTRY_D, O_OL_CNT, O_ALL_LOCAL) VALUES (286871, 10, 2, 1267, '2017-01-10 17:14:37', 7, 1)@@@@INSERT INTO NEW_ORDER (NO_O_ID, NO_D_ID, NO_W_ID) VALUES ( 286871, 10, 2)@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 24167@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05, S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 24167 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 96982@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05, S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 96982 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 40679@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05, S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 40679 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 31459@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05, S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 31459 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 6143@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05, S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 6143 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 12001@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05, S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 12001 AND S_W_ID = 2 FOR UPDATE@@@@SELECT I_PRICE, I_NAME , I_DATA FROM ITEM WHERE I_ID = 40407@@@@SELECT S_QUANTITY, S_DATA, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_DIST_05, S_DIST_06, S_DIST_07, S_DIST_08, S_DIST_09, S_DIST_10 FROM STOCK WHERE S_I_ID = 40407 AND S_W_ID = 2 FOR UPDATE@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,1,24167,2,7,348.31998,'btdyjesowlpzjwnmxdcsion')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,2,96982,2,1,4.46,'kudpnktydxbrbxibbsyvdiw')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,3,40679,2,7,528.43,'nhcixumgmosxlwgabvsrcnu')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,4,31459,2,9,341.82,'qbglbdleljyfzdpfbyziiea')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,5,6143,2,3,152.67,'tmtnuupaviimdmnvmetmcrc')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,6,12001,2,5,304.3,'ufytqwvkqxtmalhenrssfon')@@@@INSERT INTO ORDER_LINE (OL_O_ID, OL_D_ID, OL_W_ID, OL_NUMBER, OL_I_ID, OL_SUPPLY_W_ID, OL_QUANTITY, OL_AMOUNT, OL_DIST_INFO) VALUES (286871,10,2,7,40407,2,1,30.32,'hvclpfnblxchbyluumetcqn')@@@@UPDATE STOCK SET S_QUANTITY = 65 , S_YTD = S_YTD + 7, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0 WHERE S_I_ID = 24167 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 97 , S_YTD = S_YTD + 1, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0 WHERE S_I_ID = 96982 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 58 , S_YTD = S_YTD + 7, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0 WHERE S_I_ID = 40679 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 28 , S_YTD = S_YTD + 9, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0 WHERE S_I_ID = 31459 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 86 , S_YTD = S_YTD + 3, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0 WHERE S_I_ID = 6143 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 13 , S_YTD = S_YTD + 5, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0 WHERE S_I_ID = 12001 AND S_W_ID = 2@@@@UPDATE STOCK SET S_QUANTITY = 44 , S_YTD = S_YTD + 1, S_ORDER_CNT = S_ORDER_CNT + 1, S_REMOTE_CNT = S_REMOTE_CNT + 0 WHERE S_I_ID = 40407 AND S_W_ID = 2
...
Note that 3 and 6 are latencies of each transaction in milliseconds, while 0.165 and 0.123 are latencies of the first statement of each transaction in milliseconds.
This page is licensed: CC BY-SA / Gnu FDL