1 of 13

MaxScale 22.08 Avrouter

Avrorouter

The avrorouter is a MariaDB 10.0 binary log to Avro file converter. It consumes binary logs from a local directory and transforms them into a set of Avro files. These files can then be queried by clients for various purposes.

MaxScale 22.08 Binlogrouter 2.4

Binlogrouter 2.4

NOTE This is the documentation for the binlogrouter in MaxScale 2.4 and is only provided for reference. The documentation for the binlogrouter in MaxScale 2.5 is provided here.

Introduction

The binlogrouter is a replication protocol proxy module for MariaDB MaxScale. This module allows MariaDB MaxScale to connect to a master server and retrieve binary logs while slave servers can connect to MariaDB MaxScale like they would connect to a normal master server. If the master server goes down, the slave servers can still connect to MariaDB MaxScale and read binary logs. You can switch to a new master server without the slaves noticing that the actual master server has changed. This allows for a more highly available replication setup where replication is high-priority.

Configuration

Mandatory Router Parameters

The binlogrouter requires the user and password parameters. These should be configured according to the .

In addition to these two parameters, the server_id and binlogdir parameters needs to be defined.

Router Parameters

The binlogrouter accepts the following parameters.

Note: Earlier versions of MaxScale supported the configuration of the binlogrouter only via router_options (a the comma-separated list of key-value pairs). As of MaxScale 2.1, all of the router options should be defined as parameters. The values defined in router_options will have priority over the parameters to support legacy configurations. The use of router_options is deprecated.

binlogdir

This parameter controls the location where MariaDB MaxScale stores the binary log files. This is a mandatory parameter.

The binlogdir also contains the cache subdirectory which stores data retrieved from the master during the slave registration phase. The master.ini file also resides in the binlogdir. This file keeps track of the current master configuration and it is updated when a CHANGE MASTER TO query is executed.

From 2.1 onwards, the 'cache' directory is stored in the same location as other user credential caches. This means that with the default options, the user credential cache is stored in/var/cache/maxscale/<Service Name>/<Listener Name>/cache/.

Read the documentation for instructions on how to define a custom location for the user cache.

server_id

MariaDB MaxScale must have a unique server_id. This parameter configures the value of the server_id that MariaDB MaxScale will use when connecting to the master. This is a mandatory parameter.

Older versions of MaxScale allowed the ID to be specified using server-id. This has been deprecated and will be removed in a future release of MariaDB MaxScale.

master_id

The server_id value that MariaDB MaxScale should use to report to the slaves that connect to MariaDB MaxScale.

This may either be the same as the server id of the real master or can be chosen to be different if the slaves need to be aware of the proxy layer. The real master server ID will be used if the option is not set.

Older versions of MaxScale allowed the ID to be specified using master-id. This has been deprecated and will be removed in a future release of MariaDB MaxScale.

uuid

This is used to set the unique UUID that the binlog router uses when it connects to the master server. By default the UUID will be generated.

master_uuid

It is a requirement of replication that each server has a unique UUID value. If this option is not set, binlogrouter will identify itself to the slaves using the UUID of the real master.

master_version

By default, the router will identify itself to the slaves using the server version of the real master. This option allows the router to use a custom version string.

master_hostname

By default, the router will identify itself to the slaves using the hostname of the real master. This option allows the router to use a custom hostname.

slave_hostname

Since MaxScale 2.1.6 the router can optionally identify itself to the master using a custom hostname. The specified hostname can be seen in the master viaSHOW SLAVE HOSTS command. The default is not to send any hostname string during registration.

user

Note: This is option can only be given to the router_options parameter. Use the user parameter of the service instead.

This is the user name that MariaDB MaxScale uses when it connects to the master. This user name must have the rights required for replication as with any other user that a slave uses for replication purposes. If the user parameter is not given in the router options then the same user as is used to retrieve the credential information will be used for the replication connection, i.e. the user in the service entry.

This user is the only one available for MySQL connection to MaxScale Binlog Server for administration when master connection is not done yet.

In MaxScale 2.1, the service user injection is done by the MySQLAuth authenticator module. Read the documentation for more details.

The user that is used for replication must be granted replication privileges on the database server.

password

Note: This is option can only be given to the router_options parameter. Use the password parameter of the service instead.

The password for the user. If the password is not explicitly given then the password in the service entry will be used. For compatibility with other username and password definitions within the MariaDB MaxScale configuration file it is also possible to use the parameter passwd.

heartbeat

This defines the value of the heartbeat interval for the connection to the master. The duration can be specified as documented . If no explicit unit is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. Note that since the granularity of the parameter is seconds, a value specified in milliseconds will be rejected, even if the duration is longer than a second. The default value for the heartbeat period is every 5 minutes.

MariaDB MaxScale requests the master to ensure that a binlog event is sent at least every heartbeat period. If there are no real binlog events to send the master will sent a special heartbeat event. The current interval value is reported in the diagnostic output.

burstsize

This parameter is used to define the maximum amount of data that will be sent to a slave by MariaDB MaxScale when that slave is lagging behind the master. The default value is 1M.

The burst size can be provided as specified , except that IEC binary prefixes can be used as suffixes only from MaxScale 2.1 onwards. MaxScale 2.0 and earlier only support burstsize defined in bytes.

In this situation the slave is said to be in "catchup mode", this parameter is designed to both prevent flooding of that slave and also to prevent threads within MariaDB MaxScale spending disproportionate amounts of time with slaves that are lagging behind the master.

mariadb10-compatibility

This parameter allows binlogrouter to replicate from a MariaDB 10.0 master server: this parameter is enabled by default since MaxScale 2.2.0. In earlier versions the parameter was disabled by default.

Additionally, since MaxScale 2.2.1, MariaDB 10.x slave servers can connect to binlog server using GTID value instead of binlog name and position.

Example of a MariaDB 10.x slave connection to MaxScale

Note:

Slave servers can connect either with file and pos or GTID.
MaxScale saves all the incoming MariaDB GTIDs (DDLs and DMLs) in a sqlite3 database located in binlogdir (gtid_maps.db). When a slave server connects with a GTID request a lookup is made for the value match and following binlog events will be sent.

transaction_safety

This parameter is used to enable/disable incomplete transactions detection in binlog router. The default value is off.

When MariaDB MaxScale starts an error message may appear if current binlog file is corrupted or an incomplete transaction is found. During normal operations binlog events are not distributed to the slaves until a COMMIT is seen. Set transaction_safety=on to enable detection of incomplete transactions.

send_slave_heartbeat

This defines whether MariaDB MaxScale sends the heartbeat packet to the slave when there are no real binlog events to send. This parameter takes a boolean value and the default value is false. This means that no heartbeat events are sent to slave servers.

If value is set to true the interval value (requested by the slave during registration) is reported in the diagnostic output and the packet is send after the time interval without any event to send.

semisync

This parameter controls whether binlog server could ask Master server to start the Semi-Synchronous replication. This parameter takes a boolean value and the default value is false.

In order to get semi-sync working, the Master server must have the_rpl_semi_sync_master_ plugin installed. The availability of the plugin and the value of the GLOBAL VARIABLE rpl_semi_sync_master_enabled are checked in the Master registration phase: if the plugin is installed in the Master database, the binlog server subsequently requests the semi-sync option.

Note:

the network replication stream from Master has two additional bytes before each binlog event.
the Semi-Sync protocol requires an acknowledge packet to be sent back to Master only when requested: the semi-sync flag will have value of 1. This flag is set only if rpl_semi_sync_master_enabled=1 is set in the Master, otherwise it will always have value of 0 and no ack packet is sent back.

Please note that semi-sync replication is only related to binlog server to Master communication.

ssl_cert_verification_depth

This parameter sets the maximum length of the certificate authority chain that will be accepted. Legal values are positive integers. This applies to SSL connection to master server that could be acivated either by writing options in master.ini or later via a CHANGE MASTER TO command. This parameter cannot be modified at runtime. The default verification depth is 9.

encrypt_binlog

Whether to encrypt binlog files: the default is off.

When set to on the binlog files will be encrypted using specified AES algorithm and the KEY in the specified key file.

Note: binlog encryption must be used while replicating from a MariaDB 10.1 server and serving data to MariaDB 10.x slaves. In order to use binlog encryption the master server MariaDB 10.1 must have encryption active (encrypt-binlog=1 in my.cnf). This is required because both master and maxscale must store encrypted data for a working scenario for Secure data-at-rest. Additionally, as long as Master server doesn't send the StartEncryption event (which contains encryption setup information for the binlog file), there is a position gap between end of FormatDescription event pos and next event start pos. The StartEncryption event size is 36 or 40 (depending on CRC32 being used), so the gap has that size.

MaxScale binlog server adds its own StartEncryption to binlog files consequently the binlog events positions in binlog file are the same as in the master binlog file and there is no position mismatch.

encryption_algorithm

The encryption algorithm, either 'aes_ctr' or 'aes_cbc'. The default is 'aes_cbc'

encryption_key_file

The specified key file must contains lines with following format:

id;HEX(KEY)

Id is the scheme identifier, which must have the value 1 for binlog encryption , the ';' is a separator and HEX(KEY) contains the hex representation of the KEY. The KEY must have exact 16, 24 or 32 bytes size and the selected algorithm (aes_ctr or aes_cbc) with 128, 192 or 256 ciphers will be used.

Note: the key file has the same format as MariaDB 10.1 server so it's possible to use an existing key file (not encrypted) which could contain severalscheme;key values: only key id with value 1 will be parsed, and if not found an error will be reported.

Example key file with multiple keys:

mariadb10_master_gtid

This option allows MaxScale binlog router to register with MariaDB 10.X master using GTID instead of binlog_file name and position in CHANGE MASTER TO admin command. This feature is disabled by default.

The user can set a known GTID or an empty value (in this case the Master server will send events from it's first available binlog file).

Example of MaxScale connection to a MariaDB 10.X Master

If using GTID request then it's no longer possible to use MASTER_LOG_FILE and MASTER_LOG_POS in CHANGE MASTER TO command: an error will be reported.

If this feature is enabled, the transaction_safety option will be automatically enabled. The binlog files will also be stored in a hierarchical directory tree instead of a single directory.

Note:

When the option is On, the connecting slaves can only use GTID request: specifying file and pos will end up in an error sent by MaxScale and replication cannot start.
The GTID request could cause the writing of events in any position of the binlog file, whose name has been sent by the master server before any event. In order to avoid holes in the binlog files, MaxScale will fill all gaps in the binlog files with ignorable events.
It's not possible to specify the GTID _domain_id: the master one is being used for all operations. All slave servers must use the same replication domain as the master server.

master_retry_count

This option sets the maximum number of connection retries when the master server is disconnected or not reachable. Default value is 1000.

connect_retry

The option sets the time interval for a new connection retry to master server. The duration can be specified as documented . If no explicit unit is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. Note that since the granularity of the parameter is seconds, a value specified in milliseconds will be rejected, even if the duration is longer than a second. The default value is 60 seconds.

A complete example of a service entry for a binlog router service would be as follows.

Using secondary masters

From MaxScale 2.3 onwards it is possible to specify secondary masters that the binlog router can use in case the connection to the default master fails.

Note: This is only supported in a Galera Cluster environment in which:

Wsrep GTID mode is enabled in the cluster.
All of the requirements for wsrep GTID mode are met by the cluster.

Wsrep GTID mode is also imperfect, so this secondary master functionality is only guaranteed to work if GTIDs have not become inconsistent within the cluster.

See for more information.

The initial setup is performed exactly like when there is but one default master.

After the setup of the default master, secondary masters can be configured as follows:

That is, a connection name must be provided and the name must be of the format :N where N is a positive integer. If several secondary masters are specified, they must be numbered consecutively, starting from 2.

All settings that are not explicitly specified are copied from the default master. That is, the following is equivalent with the command above:

If a particular master configuration exists already, then any specified definitions will be changed and unspecified ones will remain unchanged. For instance, the following command would only change the password of :2.

It is not possible to delete a particular secondary master, but ifMASTER_HOST is set on the default master, even if it is set to the same value, then all secondary master configurations are deleted.

When START SLAVE is issued, MaxScale will first attempt to connect to the default master and if that fails, try the secondary masters in order, until a connection can be created. Only if all connection attempts fail, will MaxScale wait as specified with connect_retry, before doing the cycle over again.

Once the binlog router has successfully connected to a server, it will stay connected to that server until the connection breaks or STOP SLAVE is issued.

The configurations of the secondary masters are also stored to themaster.ini in sections whose name include the connection name.

CC BY-SA / Gnu FDL

MaxScale 22.08 HintRouter

HintRouter

HintRouter was introduced in 2.2 and is still beta.

Overview

The HintRouter module is a simple router intended to operate in conjunction with the NamedServerFilter. The router looks at the hints embedded in a packet buffer and attempts to route the packet according to the hint. The user can also set a default action to be taken when a query has no hints or when the hints could not be applied.

If a packet has multiple hints attached, the router will read them in order and attempt routing. Any successful routing ends the process and any further hints are ignored for the packet.

Configuration

The HintRouter is a rather simple router and only accepts a few configuration settings.

This setting defines what happens when a query has no routing hint or applying the routing hint(s) fails. If also the default action fails, the routing will end in error and the session closes. The different values are:

Value

Description

Note that setting default action to anything other than all means that session variable write commands are by default not routed to all backends.

Defines the default backend name if default_action=named. <server-name> must be a valid backend name.

<limit> should be an integer, -1 by default. Defines how many backend slave servers a session should attempt to connect to. Having less slaves defined in the services and/or less successful connections during session creation is not an error. The router will attempt to distribute slaves evenly between sessions by assigning them in a round robin fashion. The session will always try to connect to a master regardless of this setting, although not finding one is not an error.

Negative values activate default mode, in which case this value is set to the number of backends in the service - 1, so that the sessions are connected to all slaves.

If the hints or the default_action point to a named server, this setting is probably best left to default to ensure that the specific server is connected to at session creation. The router will not attempt to connect to additional servers after session creation.

Examples

A minimal configuration doesn't require any parameters as all settings have reasonable defaults.

If packets should be routed to the master server by default and only a few connections are required, the configuration might be as follows.

CC BY-SA / Gnu FDL

MaxScale 22.08 KafkaImporter

KafkaImporter

Overview

The KafkaImporter module reads messages from Kafka and streams them into a MariaDB server. The messages are inserted into a table designated by either the topic name or the message key (see for details). By default the table will be automatically created with the following SQL:

The payload of the message is inserted into the data field from which the id field is calculated. The payload must be a valid JSON object and it must either contain a unique _id field or it must not exist or the value must be a JSON null. This is similar to the MongoDB document format where the _id field is the primary key of the document collection.

If a message is read from Kafka and the insertion into the table fails due to a violation of one of the constraints, the message is ignored. Similarly, messages with duplicate _id value are also ignored: this is done to avoid inserting the same document multiple times whenever the connection to either Kafka or MariaDB is lost.

The limitations on the data can be removed by either creating the table before the KafkaImporter is started, in which case the CREATE TABLE IF NOT EXISTS does nothing, or by altering the structure of the existing table. The minimum requirement that must be met is that the table contains the data field to which string values can be inserted into.

The database server where the data is inserted is chosen from the set of servers available to the service. The first server labeled as the Master with the best rank will be chosen. This means that a monitor must be configured for the MariaDB server where the data is to be inserted.

In MaxScale versions 21.06.18, 22.08.15, 23.02.12, 23.08.8, 24.02.4 and 24.08.2 the _id field is not required to be present. Older versions of MaxScale used the following SQL where the _id field was mandatory:

Required Grants

The user defined by the user parameter of the service must have INSERT andCREATE privileges on all tables that are created.

Parameters

`bootstrap_servers`

Type: string
Mandatory: Yes
Dynamic: Yes

The list of Kafka brokers as a CSV list in host:port format.

`topics`

Type: stringlist
Mandatory: Yes
Dynamic: Yes

The comma separated list of topics to subscribe to.

`batch_size`

Type: count
Mandatory: No
Dynamic: Yes
Default: 100

Maximum number of uncommitted records. The KafkaImporter will buffer records into batches and commit them once either enough records are gathered (controlled by this parameter) or when the KafkaImporter goes idle. Any uncommitted records will be read again if a reconnection to either Kafka or MariaDB occurs.

`kafka_sasl_mechanism`

Type:
Mandatory: No
Dynamic: Yes
Values: PLAIN, SCRAM-SHA-256, SCRAM-SHA-512

SASL mechanism to use. The Kafka broker must be configured with the same authentication scheme.

`kafka_sasl_user`

Type: string
Mandatory: No
Dynamic: Yes
Default: ""

SASL username used for authentication. If this parameter is defined,kafka_sasl_password must also be provided.

`kafka_sasl_password`

Type: string
Mandatory: No
Dynamic: Yes
Default: ""

SASL password for the user. If this parameter is defined, kafka_sasl_user must also be provided.

`kafka_ssl`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Enable SSL for Kafka connections.

`kafka_ssl_ca`

Type: path
Mandatory: No
Dynamic: Yes
Default: ""

SSL Certificate Authority file in PEM format. If this parameter is not defined, the system default CA certificate is used.

`kafka_ssl_cert`

Type: path
Mandatory: No
Dynamic: Yes
Default: ""

SSL public certificate file in PEM format. If this parameter is defined,kafka_ssl_key must also be provided.

`kafka_ssl_key`

Type: path
Mandatory: No
Dynamic: Yes
Default: ""

SSL private key file in PEM format. If this parameter is defined,kafka_ssl_cert must also be provided.

`table_name_in`

Type:
Mandatory: No
Dynamic: Yes
Values: topic, key

The Kafka message part that is used to locate the table to insert the data into.

Enumeration Values:

topic: The topic named is used as the fully qualified table name.
key: The message key is used as the fully qualified table name. If the Kafka message does not have a key, the message is ignored.

For example, all messages with a fully qualified table name of my_db.my_table will be inserted into the table my_table located in the my_db database. If the table or database names have special characters that must be escaped to make them valid identifiers, the name must also contain those escape characters. For example, to insert into a table named my table in the database my database, the name would be:

`timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 5000ms

Timeout for both Kafka and MariaDB network communication.

`engine`

Type: string
Default: InnoDB
Mandatory: No
Dynamic: Yes

The storage engine used for tables that are created by the KafkaImporter.

This defines the ENGINE table option and must be the name of a valid storage engine in MariaDB. When the storage engine is something other than InnoDB, the table is created without the generated column and the check constraints:

This is done to avoid conflicts where the custom engine does not support all the features that InnoDB supports.

Limitations

The backend servers used by this service must be MariaDB version 10.2 or newer.

CC BY-SA / Gnu FDL

MaxScale 22.08 Mirror

Mirror

MaxScale 22.08 KafkaImporter

KafkaImporter

Overview

Required Grants

The user defined by the user parameter of the service must have INSERT andCREATE privileges on all tables that are created.

Parameters

`bootstrap_servers`

Type: string
Mandatory: Yes
Dynamic: Yes

The list of Kafka brokers as a CSV list in host:port format.

`topics`

Type: stringlist
Mandatory: Yes
Dynamic: Yes

The comma separated list of topics to subscribe to.

`batch_size`

Type: count
Mandatory: No
Dynamic: Yes
Default: 100

`kafka_sasl_mechanism`

Type:
Mandatory: No
Dynamic: Yes
Values: PLAIN, SCRAM-SHA-256, SCRAM-SHA-512

SASL mechanism to use. The Kafka broker must be configured with the same authentication scheme.

`kafka_sasl_user`

Type: string
Mandatory: No
Dynamic: Yes
Default: ""

SASL username used for authentication. If this parameter is defined,kafka_sasl_password must also be provided.

`kafka_sasl_password`

Type: string
Mandatory: No
Dynamic: Yes
Default: ""

SASL password for the user. If this parameter is defined, kafka_sasl_user must also be provided.

`kafka_ssl`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Enable SSL for Kafka connections.

`kafka_ssl_ca`

Type: path
Mandatory: No
Dynamic: Yes
Default: ""

SSL Certificate Authority file in PEM format. If this parameter is not defined, the system default CA certificate is used.

`kafka_ssl_cert`

Type: path
Mandatory: No
Dynamic: Yes
Default: ""

SSL public certificate file in PEM format. If this parameter is defined,kafka_ssl_key must also be provided.

`kafka_ssl_key`

Type: path
Mandatory: No
Dynamic: Yes
Default: ""

SSL private key file in PEM format. If this parameter is defined,kafka_ssl_cert must also be provided.

`table_name_in`

Type:
Mandatory: No
Dynamic: Yes
Values: topic, key

The Kafka message part that is used to locate the table to insert the data into.

Enumeration Values:

topic: The topic named is used as the fully qualified table name.
key: The message key is used as the fully qualified table name. If the Kafka message does not have a key, the message is ignored.

`timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 5000ms

Timeout for both Kafka and MariaDB network communication.

`engine`

Type: string
Default: InnoDB
Mandatory: No
Dynamic: Yes

The storage engine used for tables that are created by the KafkaImporter.

This is done to avoid conflicts where the custom engine does not support all the features that InnoDB supports.

Limitations

The backend servers used by this service must be MariaDB version 10.2 or newer.

CC BY-SA / Gnu FDL

MaxScale 22.08 Binlogrouter 2.4

Binlogrouter 2.4

NOTE This is the documentation for the binlogrouter in MaxScale 2.4 and is only provided for reference. The documentation for the binlogrouter in MaxScale 2.5 is provided here.

Introduction

Configuration

Mandatory Router Parameters

The binlogrouter requires the user and password parameters. These should be configured according to the .

In addition to these two parameters, the server_id and binlogdir parameters needs to be defined.

Router Parameters

The binlogrouter accepts the following parameters.

binlogdir

This parameter controls the location where MariaDB MaxScale stores the binary log files. This is a mandatory parameter.

Read the documentation for instructions on how to define a custom location for the user cache.

server_id

MariaDB MaxScale must have a unique server_id. This parameter configures the value of the server_id that MariaDB MaxScale will use when connecting to the master. This is a mandatory parameter.

Older versions of MaxScale allowed the ID to be specified using server-id. This has been deprecated and will be removed in a future release of MariaDB MaxScale.

master_id

The server_id value that MariaDB MaxScale should use to report to the slaves that connect to MariaDB MaxScale.

Older versions of MaxScale allowed the ID to be specified using master-id. This has been deprecated and will be removed in a future release of MariaDB MaxScale.

uuid

This is used to set the unique UUID that the binlog router uses when it connects to the master server. By default the UUID will be generated.

master_uuid

It is a requirement of replication that each server has a unique UUID value. If this option is not set, binlogrouter will identify itself to the slaves using the UUID of the real master.

master_version

By default, the router will identify itself to the slaves using the server version of the real master. This option allows the router to use a custom version string.

master_hostname

By default, the router will identify itself to the slaves using the hostname of the real master. This option allows the router to use a custom hostname.

slave_hostname

user

Note: This is option can only be given to the router_options parameter. Use the user parameter of the service instead.

This user is the only one available for MySQL connection to MaxScale Binlog Server for administration when master connection is not done yet.

In MaxScale 2.1, the service user injection is done by the MySQLAuth authenticator module. Read the documentation for more details.

The user that is used for replication must be granted replication privileges on the database server.

password

Note: This is option can only be given to the router_options parameter. Use the password parameter of the service instead.

heartbeat

burstsize

This parameter is used to define the maximum amount of data that will be sent to a slave by MariaDB MaxScale when that slave is lagging behind the master. The default value is 1M.

mariadb10-compatibility

Additionally, since MaxScale 2.2.1, MariaDB 10.x slave servers can connect to binlog server using GTID value instead of binlog name and position.

Example of a MariaDB 10.x slave connection to MaxScale

Note:

Slave servers can connect either with file and pos or GTID.
MaxScale saves all the incoming MariaDB GTIDs (DDLs and DMLs) in a sqlite3 database located in binlogdir (gtid_maps.db). When a slave server connects with a GTID request a lookup is made for the value match and following binlog events will be sent.

transaction_safety

This parameter is used to enable/disable incomplete transactions detection in binlog router. The default value is off.

send_slave_heartbeat

If value is set to true the interval value (requested by the slave during registration) is reported in the diagnostic output and the packet is send after the time interval without any event to send.

semisync

This parameter controls whether binlog server could ask Master server to start the Semi-Synchronous replication. This parameter takes a boolean value and the default value is false.

Note:

the network replication stream from Master has two additional bytes before each binlog event.
the Semi-Sync protocol requires an acknowledge packet to be sent back to Master only when requested: the semi-sync flag will have value of 1. This flag is set only if rpl_semi_sync_master_enabled=1 is set in the Master, otherwise it will always have value of 0 and no ack packet is sent back.

Please note that semi-sync replication is only related to binlog server to Master communication.

ssl_cert_verification_depth

encrypt_binlog

Whether to encrypt binlog files: the default is off.

When set to on the binlog files will be encrypted using specified AES algorithm and the KEY in the specified key file.

encryption_algorithm

The encryption algorithm, either 'aes_ctr' or 'aes_cbc'. The default is 'aes_cbc'

encryption_key_file

The specified key file must contains lines with following format:

id;HEX(KEY)

Example key file with multiple keys:

mariadb10_master_gtid

The user can set a known GTID or an empty value (in this case the Master server will send events from it's first available binlog file).

Example of MaxScale connection to a MariaDB 10.X Master

If using GTID request then it's no longer possible to use MASTER_LOG_FILE and MASTER_LOG_POS in CHANGE MASTER TO command: an error will be reported.

If this feature is enabled, the transaction_safety option will be automatically enabled. The binlog files will also be stored in a hierarchical directory tree instead of a single directory.

Note:

When the option is On, the connecting slaves can only use GTID request: specifying file and pos will end up in an error sent by MaxScale and replication cannot start.
The GTID request could cause the writing of events in any position of the binlog file, whose name has been sent by the master server before any event. In order to avoid holes in the binlog files, MaxScale will fill all gaps in the binlog files with ignorable events.
It's not possible to specify the GTID _domain_id: the master one is being used for all operations. All slave servers must use the same replication domain as the master server.

master_retry_count

This option sets the maximum number of connection retries when the master server is disconnected or not reachable. Default value is 1000.

connect_retry

A complete example of a service entry for a binlog router service would be as follows.

Using secondary masters

From MaxScale 2.3 onwards it is possible to specify secondary masters that the binlog router can use in case the connection to the default master fails.

Note: This is only supported in a Galera Cluster environment in which:

Wsrep GTID mode is enabled in the cluster.
All of the requirements for wsrep GTID mode are met by the cluster.

Wsrep GTID mode is also imperfect, so this secondary master functionality is only guaranteed to work if GTIDs have not become inconsistent within the cluster.

See for more information.

The initial setup is performed exactly like when there is but one default master.

After the setup of the default master, secondary masters can be configured as follows:

All settings that are not explicitly specified are copied from the default master. That is, the following is equivalent with the command above:

Once the binlog router has successfully connected to a server, it will stay connected to that server until the connection breaks or STOP SLAVE is issued.

The configurations of the secondary masters are also stored to themaster.ini in sections whose name include the connection name.

CC BY-SA / Gnu FDL

MaxScale 22.08 Binlogrouter

Binlogrouter

NOTE: The binlog router delivered with 2.5 is completely new and is not 100% backward compatible with the binlog router delivered with earlier versions of MaxScale.

The binlogrouter is a router that acts as a replication proxy for MariaDB master-slave replication. The router connects to a master, retrieves the binary logs and stores them locally. Slave servers can connect to MaxScale like they would connect to a normal master server. If the master server goes down, replication between MaxScale and the slaves can still continue up to the latest point to which the binlogrouter replicated to. The master can be changed without disconnecting the slaves and without them noticing that the master server has changed. This allows for a more highly available replication setup.

In addition to the high availability benefits, the binlogrouter creates only one connection to the master whereas with normal replication each individual slave will create a separate connection. This reduces the amount of work the master database has to do which can be significant if there are a large number of replicating slaves.

Differences Between Old and New Binlogrouter Implementations

The binlogrouter in MaxScale 2.5.0 is a new and improved version of the original binlogrouter found in older MaxScale versions. The new implementation contains most of the features that were in the original binlogrouter but some of them were removed as they were either redundant or not useful.

The major differences between the new and old binlog router are:

The list of servers where the database users for authentication are loaded must be explicitly configured with the cluster, servers ortargets parameter. Alternatively, the users can be read from a file. See for more information.
The old binlog router had both server_id and master_id, the new onlyserver_id.

The documentation for the binlogrouter in MaxScale 2.4 is provided for reference .

Supported SQL Commands

The binlogrouter supports a subset of the SQL constructs that the MariaDB server supports. The following commands are supported:

CHANGE MASTER TO
The binlogrouter supports the same syntax as the MariaDB server but only the following values are allowed:
- MASTER_HOST

NOTE: MASTER_LOG_FILE and MASTER_LOG_POS are not supported as binlogrouter only supports GTID based replication.

STOP SLAVE
Stops replication, same as MariaDB.
START SLAVE
Starts replication, same as MariaDB.

Configuration Parameters

The binlogrouter is configured similarly to how normal routers are configured in MaxScale. It requires at least one listener where clients can connect to and one server from which the database user information can be retrieved. An example configuration can be found in the section of this document.

`datadir`

Type: path
Mandatory: No
Dynamic: No
Default: /var/lib/maxscale/binlogs

Directory where binary log files are stored.

`server_id`

Type: number
Mandatory: No
Dynamic: No
Default: 1234

The server ID that MaxScale uses when connecting to the master and when serving binary logs to the slaves.

`net_timeout`

Type:
Mandatory: No
Dynamic: No
Default: 10s

Network connection and read timeout in seconds for the connection to the master.

`select_master`

Type:
Mandatory: No
Dynamic: No
Default: false

Automatically select the master server to replicate from.

When this feature is enabled, the master which binlogrouter will replicate from will be selected from the servers defined by a monitor cluster=TheMonitor. Alternatively servers can be listed in servers. The servers should be monitored by a monitor. Only servers with the Master status are used. If multiple master servers are available, the first available master server will be used.

If a CHANGE MASTER TO command is received while select_master is on, the command will be honored and select_master turned off until the next reboot. This allows the Monitor to perform failover, and more importantly, switchover. It also allows the user to manually redirect the Binlogrouter. The current master is "sticky", meaning that the same master will be chosen on reboot.

NOTE: Do not use the mariadbmon parameter if the monitor is monitoring a binlogrouter. The binlogrouter does not support all the SQL commands that the monitor will send and the rejoin will fail. This restriction will be lifted in a future version.

The GTID the replication will start from, will be based on the latest replicated GTID. If no GTID has been replicated, the router will start replication from the start. Manual configuration of the GTID can be done by first configuring the replication manually with CHANGE MASTER TO.

`expire_log_duration`

Type:
Mandatory: No
Dynamic: No
Default: 0s

Duration after which a binary log file can be automatically removed.

The duration is measured from the last modification of the log file. Files are purged in the order they were created. The automatic purge works in a similar manner to PURGE BINARY LOGS TO <filename> in that it will stop the purge if an eligible file is in active use, i.e. being read by a slave.

`expire_log_minimum_files`

Type: number
Mandatory: No
Dynamic: No
Default: 2

The minimum number of log files the automatic purge keeps. At least one file is always kept.

`ddl_only`

Type: boolean
Default: false
Dynamic: No

When enabled, only DDL events are written to the binary logs. This means thatCREATE, ALTER and DROP events are written but INSERT, UPDATE andDELETE events are not.

This mode can be used to keep a record of all the schema changes that occur on a database. As only the DDL events are stored, it becomes very easy to set up an empty server with no data in it by simply pointing it at a binlogrouter instance that has ddl_only enabled.

`encryption_key_id`

Type: string
Mandatory: No
Dynamic: No
Default: ""

Encryption key ID used to encrypt the binary logs. If configured, an must also be configured and it must contain the key with the given ID. If the encryption key manager supports versioning, new binary logs will be encrypted using the latest encryption key. Old binlogs will remain encrypted with older key versions and remain readable as long as the key versions used to encrypt them are available.

Once binary log encryption has been enabled, the encryption key ID cannot be changed and the key must remain available to MaxScale in order for replication to work. If an encryption key is not available or the key manager fails to retrieve it, the replication from the currently selected master server will stop. If the replication is restarted manually, the encryption key retrieval is attempted again.

Re-encryption of binlogs using another encryption key is not possible. However, this is possible if the data is replicated to a second MaxScale server that uses a different encryption key. The same approach can also be used to decrypt binlogs.

`encryption_cipher`

Type:
Mandatory: No
Dynamic: No
Values: AES_CBC, AES_CTR, AES_GCM

The encryption cipher to use. The encryption key size also affects which mode is used: only 128, 192 and 256 bit encryption keys are currently supported.

Possible values are:

AES_GCM (default)
.
AES_CBC
.

New installation

Configure and start MaxScale.
If you have not configured select_master=true (automatic master selection), issue a CHANGE MASTER TO command to binlogrouter.

Redirect each slave to replicate from Binlogrouter

Upgrading to version 2.5

Binlogrouter does not read any of the data that a version prior to 2.5 has saved. By default binlogrouter will request the replication stream from the blank state (from the start of time), which is basically meant for new systems. If a system is live, the entire replication data probably does not exist, and if it does, it is not necessary for binlogrouter to read and store all the data.

Before you start

Note that binlogrouter only supports GTID based replication.
Make sure that the configured data directory for the new binlogrouter is different from the old one, or move old data away. See .
If the master contains binlogs from the blank state, and there is a large amount of data, consider purging old binlogs. See .

Deployment

The method described here inflicts the least downtime. Assuming you have configured version 2.5, and it is ready to go:

Redirect each slave that replicates from Binlogrouter to replicate from the master.

Stop the old version of MaxScale, and start the new one. Verify routing functionality.
Issue a CHANGE MASTER TO command, or use .

Run maxctrl list servers. Make sure all your servers are accounted for. Pick the lowest gtid state (e.g. 0-1000-1234,1-1001-5678) on display and issue this command to Binlogrouter:

NOTE: Even with select_master=true you have to set @@global.gtid_slave_pos if any binlog files have been purged on the master. The server will only stream from the start of time if the first binlog file is present. See .

Redirect each slave to replicate from Binlogrouter.

Galera cluster

When replicating from a Galera cluster, must be set to true, and the servers must be monitored by the . Configuring binlogrouter is the same as described above.

The Galera cluster must be configured to use .

The MariaDB version must be 10.5.1 or higher. The required GTID related server settings for MariaDB/Galera to work with Binlogrouter are listed here:

Example

The following is a small configuration file with automatic master selection. With it, the service will accept connections on port 3306.

Limitations

Old-style replication with binlog name and file offset is not supported and the replication must be started by setting up the GTID to replicate from.
Only replication from MariaDB servers (including Galera) is supported.
Old encrypted binary logs are not re-encrypted with newer key versions ()
The MariaDB server where the replication is done from must be configured withbinlog_checksum=CRC32

CC BY-SA / Gnu FDL

MaxScale 22.08 SchemaRouter

SchemaRouter

The SchemaRouter provides an easy and manageable sharding solution by building a single logical database server from multiple separate ones. Each database is shown to the client and queries targeting unique databases are routed to their respective servers. In addition to providing simple database-based sharding, the schemarouter also enables cross-node session variable usage by routing all queries that modify the session to all nodes.

By default the SchemaRouter assumes that each database and table is only located on one server. If it finds the same database or table on multiple servers, it will close the session with the following error:

The exception to this rule are the system tables mysql, information_schema,performance_schema, sys that are never treated as duplicates.

If duplicate tables are expected, use the parameter to controls which duplicate tables are allowed. To disable the duplicate database detection, useignore_tables_regex=.*.

Schemarouter compares table and database names case-insensitively. This means that the tables test.t1 and test.T1 are assumed to refer to the same table.

The main limitation of SchemaRouter is that aside from session variable writes and some specific queries, a query can only target one server. This means that queries which depend on results from multiple servers give incorrect results. See for more information.

From 2.3.0 onwards, SchemaRouter is capable of limited table family sharding.

Changes in Version 6

The auth_all_servers parameter is no longer automatically enabled by the schemarouter. To retain the old behavior that was present in 2.5, explicitly define auth_all_servers=true for all schemarouter services.

Routing Logic

If a command modifies the session state by modifying any session or user variables, the query is routed to all nodes. These statements include SET statements as well as any other statements that modify the behavior of the client.
If a client changes the default database after connecting, either with a USE <db> query or a COM_INIT_DB command, the query is routed to all servers that contain the database. This same logic applies when a client connects with a default database: the default database is set only on servers that actually contain it.
If a query targets one or more tables that the schemarouter has discovered during the database mapping phase, the query is only routed if a server is found that contains all of the tables that the query uses. If no such server is found, the query is routed to the server that was previously used or to the first available backend if none have been used. If a query uses a table but doesn't define the database it is in, it is assumed to be located on the default database of the connection.

This means that all administrative commands, replication related command as well as certain transaction control statements (XA transaction) are routed to the first available server in certain cases. To avoid problems, use routing hints to direct where these statements should go.

Starting with MaxScale 6.4.5, transaction control commands (BEGIN, COMMIT and ROLLBACK) are routed to all nodes. Older versions of MaxScale routed the queries to the first available backend. This means that cross-shard transactions are technically possible but, without external synchronization, the transactions are not guaranteed to be globally consistent.
LOAD DATA LOCAL INFILE commands are routed to the first available server that contains the tables listed in the query.

Custom SQL Commands

To check how databases and tables map to servers, execute the special querySHOW SHARDS. The query does not support any modifiers such as LIKE.

The schemarouter will also intercept the SHOW DATABASES command and generate it based on its internal data. This means that newly created databases will not show up immediately and will only be visible when the cached data has been updated.

Database Mapping

The schemarouter maps each of the servers to know where each database and table is located. As each user has access to a different set of tables and databases, the result is unique to the username and the set of servers that the service uses. These results are cached by the schemarouter. The lifetime of the cached result is controlled by the refresh_interval parameter.

When a server needs to be mapped, the schemarouter will route a query to each of the servers using the client's credentials. While this query is being executed, all other sessions that would otherwise share the cached result will wait for the update to complete. This waiting functionality was added in MaxScale 2.4.19, older versions did not wait for existing updates to finish and would perform parallel database mapping queries.

Configuration

Here is an example configuration of the schemarouter:

The module generates the list of databases based on the servers parameter using the connecting client's credentials. The user and password parameters define the credentials that are used to fetch the authentication data from the database servers. The credentials used only require the same grants as mentioned in the configuration documentation.

The list of databases is built by sending a SHOW DATABASES query to all the servers. This requires the user to have at least USAGE and SELECT grants on the databases that need be sharded.

If you are connecting directly to a database or have different users on some of the servers, you need to get the authentication data from all the servers. You can control this with the auth_all_servers parameter. With this parameter, MariaDB MaxScale forms a union of all the users and their grants from all the servers. By default, the schemarouter will fetch the authentication data from all servers.

For example, if two servers have the database shard and the following rights are granted only on one server, all queries targeting the databaseshard would be routed to the server where the grants were given.

This would in effect allow the user 'john' to only see the database 'shard' on this server. Take notice that these grants are matched against MariaDB MaxScale's hostname instead of the client's hostname. Only user authentication uses the client's hostname and all other grants use MariaDB MaxScale's hostname.

Router Parameters

`ignore_tables`

Type: stringlist
Mandatory: No
Dynamic: Yes
Default: ""

List of full table names (e.g. db1.t1) to ignore when checking for duplicate tables. By default no tables are ignored.

This parameter was once called ignore_databases.

`ignore_tables_regex`

Type:
Mandatory: No
Dynamic: No
Default: ""

A that is matched against database names when checking for duplicate databases. By default no tables are ignored.

The following configuration ignores duplicate tables in the databases db1 and db2, and all tables starting with "t" in db3.

This parameter was once called ignore_databases_regex.

`max_sescmd_history`

This parameter has been moved to in MaxScale 6.0.

`disable_sescmd_history`

This parameter has been moved to in MaxScale 6.0.

`refresh_databases`

Type:
Mandatory: No
Dynamic: No
Default: false

Enable database map refreshing mid-session. These are triggered by a failure to change the database i.e. USE ... queries. This feature is disabled by default.

Before MaxScale 6.2.0, this parameter did nothing. Starting with the 6.2.0 release of MaxScale this parameter now works again but it is disabled by default to retain the same behavior as in older releases.

`refresh_interval`

Type:
Mandatory: No
Dynamic: Yes
Default: 300s

The minimum interval between database map refreshes in seconds. The default value is 300 seconds.

The interval is specified as documented . If no explicit unit is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. Note that since the granularity of the intervaltimeout is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

Table Family Sharding

This functionality was introduced in 2.3.0.

If the same database exists on multiple servers, but the database contains different tables in each server, SchemaRouter is capable of routing queries to the right server, depending on which table is being addressed.

As an example, suppose the database db exists on servers server1 and server2, but that the database on server1 contains the table tbl1 and on server2 contains the table tbl2. The query SELECT * FROM db.tbl1 will be routed to server1 and the querySELECT * FROM db.tbl2 will be routed to server2. As in the example queries, the table names must be qualified with the database names for table-level sharding to work. Specifically, the query series below is not supported.

Router Diagnostics

The router_diagnostics output for a schemarouter service contains the following fields.

queries: Number of queries executed through this service.
sescmd_percentage: The percentage of queries that were session commands.
longest_sescmd_chain: The largest amount of session commands executed by one client session.

Limitations

Cross-database queries (e.g. SELECT column FROM database1.table UNION select column FROM database2.table) are not properly supported. Such queries are routed either to the first explicit database in the query, the current database in use or to the first available database, depending on which succeeds.
Without a default database, queries that do not use fully qualified table names and which do not modify the session state (e.g. SELECT * FROM t1) will be routed to the first available server. This includes queries such as explicit transaction commands (BEGIN, COMMIT, ROLLBACK), all non-table CREATE commands (

Examples

is a small tutorial on how to set up a sharded database.

CC BY-SA / Gnu FDL

MaxScale 22.08 KafkaCDC

KafkaCDC

Overview

The KafkaCDC module reads data changes in MariaDB via replication and converts them into JSON objects that are then streamed to a Kafka broker.

DDL events (CREATE TABLE, ALTER TABLE) are streamed as JSON objects in the following format (example created by CREATE TABLE test.t1(id INT)):

The domain, server_id and sequence fields contain the GTID that this event belongs to. The event_number field is the sequence number of events inside the transaction starting from 1. The timestamp field is the UNIX timestamp when the event occurred. The event_type field contains the type of the event, one of:

insert: the event is the data that was added to MariaDB
delete: the event is the data that was removed from MariaDB
update_before: the event contains the data before an update statement modified it

All remaining fields contains data from the table. In the example event this would be the fields id and data.

The sending of these schema objects is optional and can be disabled usingsend_schema=false.

DML events (INSERT, UPDATE, DELETE) are streamed as JSON objects that follow the format specified in the DDL event. The objects are in the following format (example created by INSERT INTO test.t1 VALUES (1)):

The table_name and table_schema fields were added in MaxScale 2.5.3. These contain the table name and schema the event targets.

The router stores table metadata in the MaxScale data directory. The default value is /var/lib/maxscale/<service name>. If data for a table is replicated before a DDL event for it is replicated, the CREATE TABLE will be queried from the master server.

During shutdown, the Kafka event queue is flushed. This can take up to 60 seconds if the network is slow or there are network problems.

Configuration

In order for kafkacdc to work, the binary logging on the source server must be configured to use row-based replication and the row image must be set to full by configuring binlog_format=ROW and binlog_row_image=FULL.
The servers parameter defines the set of servers where the data is replicated from. The replication will be done from the first master server that is found.
The

Parameters

`bootstrap_servers`

Type: string
Mandatory: Yes
Dynamic: No

The list of Kafka brokers to use in host:port format. Multiple values can be separated with commas. This is a mandatory parameter.

`topic`

Type: string
Mandatory: Yes
Dynamic: No

The Kafka topic where the replicated events will be sent. This is a mandatory parameter.

`enable_idempotence`

Type:
Mandatory: No
Dynamic: No
Default: false

Enable idempotent producer mode. This feature requires Kafka version 0.11 or newer to work and is disabled by default.

When enabled, the Kafka producer enters a strict mode which avoids event duplication due to broker outages or other network errors. In HA scenarios where there are more than two MaxScale instances, event duplication can still happen as there is no synchronization between the MaxScale instances.

The Kafka C library,, describes the parameter as follows:

When set to true, the producer will ensure that messages are successfully produced exactly once and in the original produce order. The following configuration properties are adjusted automatically (if not modified by the user) when idempotence is enabled: max.in.flight.requests.per.connection=5 (must be less than or equal to 5), retries=INT32_MAX (must be greater than 0), acks=all, queuing.strategy=fifo.

`timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 10s

The connection and read timeout for the replication stream.

`gtid`

Type: string
Mandatory: No
Dynamic: No
Default: ""

The initial GTID position from where the replication is started. By default the replication is started from the beginning. The value of this parameter is only used if no previously replicated events with GTID positions can be retrieved from Kafka.

Once the replication has started and a GTID position has been recorded, this parameter will be ignored. To reset the recorded GTID position, delete thecurrent_gtid.txt file located in /var/lib/maxscale/<SERVICE>/ where<SERVICE> is the name of the KafkaCDC service.

`server_id`

Type: number
Mandatory: No
Dynamic: No
Default: 1234

The used when replicating from the master in direct replication mode. The default value is 1234. This parameter was added in MaxScale 2.5.7.

`match`

Type:
Mandatory: No
Dynamic: Yes
Default: ""

Only include data from tables that match this pattern.

For example, if configured with match=accounts[.].*, only data from theaccounts database is sent to Kafka.

The pattern is matched against the combined database and table name separated by a period. This means that the event for the table t1 in the test database would appear as test.t1. The behavior is the same even if the database or the table name contains a period. This means that an event for the test.table table in the my.data database would appear as my.data.test.table.

Here is an example configuration that only sends events for tables from thedb1 database. The accounts and users tables in the db1 database are filtered out using the exclude parameter.

`exclude`

Type:
Mandatory: No
Dynamic: Yes
Default: ""

Exclude data from tables that match this pattern.

For example, if configured with exclude=mydb[.].*, all data from the tables in the mydb database is not sent to Kafka.

The pattern matching works the same way for both of the exclude and match parameters. See for an explanation on how the patterns are matched against the database and table names.

`cooperative_replication`

Type:
Mandatory: No
Dynamic: No
Default: false

Controls whether multiple instances cooperatively replicate from the same cluster. This is a boolean parameter and is disabled by default. It was added in MaxScale 6.0.

When this parameter is enabled and the monitor pointed to by the cluster parameter supports cooperative monitoring (currently only mariadbmon), the replication is only active if the monitor owns the cluster it is monitoring.

Whenever an instance that does not own the cluster gains ownership of the cluster, the replication will continue from the latest GTID that was delivered to Kafka.

This means that multiple MaxScale instances can replicate from the same set of servers and the event is only processed once. This feature does not provide exactly-once semantics for the Kafka event delivery. However, it does provide high-availability for the kafkacdc instances which allows automated failover between multiple MaxScale instances.

`send_schema`

Type:
Mandatory: No
Dynamic: Yes
Default: true

Send JSON schema object into the stream whenever the table schema changes. These events, as described , can be used to detect whenever the format of the data being sent changes.

If this information in these schema change events is not needed or the code that processes the Kafka stream can't handle them, they can be disabled with this parameter.

`read_gtid_from_kafka`

Type:
Mandatory: No
Dynamic: No
Default: true

On startup, the latest GTID is by default read from the Kafka cluster. This makes it possible to recover the replication position stored by another MaxScale. Sometimes this is not desirable and the GTID should only be read from the local file or started anew. Examples of these are when the GTIDs are reset or the replication topology has changed.

`kafka_ssl`

Type:
Mandatory: No
Dynamic: No
Default: false

Enable SSL for Kafka connections. This is a boolean parameter and is disabled by default.

`kafka_ssl_ca`

Type: path
Mandatory: No
Dynamic: No
Default: ""

Path to the certificate authority file in PEM format. If this is not provided, the default system certificates will be used.

`kafka_ssl_cert`

Type: path
Mandatory: No
Dynamic: No
Default: ""

Path to the public certificate in PEM format.

The client must provide a certificate if the Kafka server performs authentication of the client certificates. This feature is enabled by default in Kafka and is controlled by .

If kafka_ssl_cert is provided, kafka_ssl_key must also be provided.

`kafka_ssl_key`

Type: path
Mandatory: No
Dynamic: No
Default: ""

Path to the private key in PEM format.

If kafka_ssl_key is provided, kafka_ssl_cert must also be provided.

`kafka_sasl_user`

Type: string
Mandatory: No
Dynamic: No
Default: ""

Username for SASL authentication.

If kafka_sasl_user is provided, kafka_sasl_password must also be provided.

`kafka_sasl_password`

Type: string
Mandatory: No
Dynamic: No
Default: ""

Password for SASL authentication.

If kafka_sasl_password is provided, kafka_sasl_user must also be provided.

`kafka_sasl_mechanism`

Type:
Mandatory: No
Dynamic: No
Values: PLAIN, SCRAM-SHA-256, SCRAM-SHA-512

The SASL mechanism used. The default value is PLAIN which uses plaintext authentication. It is recommended to enable SSL whenever plaintext authentication is used.

Allowed values are:

PLAIN
SCRAM-SHA-256
SCRAM-SHA-512

The value that should be used depends on the SASL mechanism used by the Kafka broker.

Example Configuration

The following configuration defines the minimal setup for streaming replication events from MariaDB into Kafka as JSON:

Limitations

The KafkaCDC module provides at-least-once semantics for the generated events. This means that each replication event is delivered to kafka at least once but there can be duplicate events in case of failures.

CC BY-SA / Gnu FDL

MaxScale 22.08 Readconnroute

Readconnroute

This document provides an overview of the readconnroute router module and its intended use case scenarios. It also displays all router configuration parameters with their descriptions.

Overview

The readconnroute router provides simple and lightweight load balancing across a set of servers.

Note that *readconnroute balances connections and not statements. When a client connects, the router selects the server that matches the value ofrouter_options and has the least number of connections. Once the connection is opened, it will not be changed for the duration of the session. If the connection between MaxScale and the server breaks, the connection can not be re-established and the client session will be closed. The fact that the server is fixed when the client connects also means that routing hints are ignored.

Connections from other MaxScale instances or connections done directly on a database are not taken into account. Only connections done through the same Maxscale instance are taken into account.

Warning: readconnroute will not prevent writes from being done even if you define router_options=slave. The client application is responsible for making sure that it only performs read-only queries in such cases. readconnroute is simple by design: it selects a server for each client connection and routes all queries there. If something more complex is required, the router is usually the right choice.

Configuration

For more details about the standard service parameters, refer to the .

`router_options`

Type:
Mandatory: No
Dynamic: Yes
Values: master, slave, synced

router_options can contain a comma separated list of valid server roles. These roles are used as the valid types of servers the router will form connections to when new sessions are created.

Examples:

Here is a list of all possible values for the router_options.

Role

Description

If no router_options parameter is configured in the service definition, the router will use the default value of running. This means that it will load balance connections across all running servers defined in the servers parameter of the service.

When a connection is being created and the candidate server is being chosen, the list of servers is processed in from first entry to last. This means that if two servers with equal rank and number of connections are found, the one that's listed first in the servers parameter for the service is chosen.

When using router_options=slave, only servers with the Slave status are used. If there are no servers with the Slave status but there is a Master status, it will be used as the fallback server. Note that the use ofrouter_options=slave does not prevent writes from being done and the client application is responsible for making sure that no writes are done on a Slave server.

`master_accept_reads`

Type:
Mandatory: No
Dynamic: Yes
Default: true

This option can be used to prevent queries from being sent to the current master. If router_options does not contain "master", the readconnroute instance is usually meant for reading. Setting master_accept_reads=false excludes the master from server selection (and thus from receiving reads).

If router_options contains "master", the setting of master_accept_reads has no effect.

By default master_accept_reads=true.

`max_replication_lag`

Type:
Mandatory: No
Dynamic: Yes
Default: 0s

The maximum acceptable replication lag. The value is in seconds and is specified as documented . The default value is 0s, which means that the lag is ignored.

The replication lag of a server must be less than the configured value in order for it to be used for routing. To configure the router to not allow any lag, use the smallest duration larger than 0, that is, max_replication_lag=1s.

Examples

The most common use for the readconnroute is to provide either a read or write port for an application. This provides a more lightweight routing solution than the more complex readwritesplit router but requires the application to be able to use distinct write and read ports.

To configure a read-only service that tolerates master failures, we first need to add a new section in to the configuration file.

Here the router_options designates slaves as the only valid server type. With this configuration, the queries are load balanced across the slave servers.

For more complex examples of the readconnroute router, take a look at the examples in the folder.

Router Diagnostics

The router_diagnostics output for readconnroute has the following fields.

queries: Number of queries executed through this service.

Limitations

Sending of binary data with LOAD DATA LOCAL INFILE is not supported.
The router will never reconnect to the server it initially connected to.

CC BY-SA / Gnu FDL

MaxScale 22.08 Readwritesplit

Readwritesplit

This document provides a short overview of the readwritesplit router module and its intended use case scenarios. It also displays all router configuration parameters with their descriptions. A list of current limitations of the module is included and use examples are provided.

Overview

The readwritesplit router is designed to increase the read-only processing capability of a cluster while maintaining consistency. This is achieved by splitting the query load into read and write queries. Read queries, which do not modify data, are spread across multiple nodes while all write queries will be sent to a single node. For more details on how the load balancing works, refer to and .

The router is designed to be used with a traditional Master-Slave replication cluster. It automatically detects changes in the master server and will use the current master server of the cluster. With a Galera cluster, one can achieve a resilient setup and easy master failover by using one of the Galera nodes as a Write-Master node, where all write queries are routed, and spreading the read load over all the nodes.

Interaction with servers in `Maintenance` and `Draining` state

When a server that readwritesplit uses is put into maintenance mode, any ongoing requests are allowed to finish before the connection is closed. If the server that is put into maintenance mode is a master, open transaction are allowed to complete before the connection is closed. Note that this means neither idle session nor long-running transactions will be closed by readwritesplit. To forcefully close the connections, use the following command:

If a server is put into the Draining state while a connection is open, the connection will be used normally. Whenever a new connection needs to be created, whether that be due to a network error or when a new session being opened, only servers that are neither Draining nor Drained will be used.

Configuration

Readwritesplit router-specific settings are specified in the configuration file of MariaDB MaxScale in its specific section. The section can be freely named but the name is used later as a reference in a listener section.

For more details about the standard service parameters, refer to the .

Starting with 2.3, all router parameters can be configured at runtime. Usemaxctrl alter service to modify them. The changed configuration will only be taken into use by new sessions.

Parameters

`max_slave_connections`

Type: integer
Mandatory: No
Dynamic: Yes
Default: 255

max_slave_connections sets the maximum number of slaves a router session uses at any moment. The default is to use at most 255 slave connections per client connection. In older versions the default was to use all available slaves with no limit.

For MaxScale 2.5.12 and newer, the minimum value is 0.

For MaxScale versions 2.5.11 and older, the minimum value is 1. These versions suffer from a bug () that causes the parameter to accept any values but only function when a value greater than one was given.

Starting with MaxScale 2.5.0, the use of percentage values inmax_slave_connections is deprecated. The support for percentages will be removed in a future release.

For example, if you have configured MaxScale with one master and three slaves and set max_slave_connections=2, for each client connection a connection to the master and two slave connections would be opened. The read query load balancing is then done between these two slaves and writes are sent to the master.

By tuning this parameter, you can control how dynamic the load balancing is at the cost of extra created connections. With a lower value ofmax_slave_connections, less connections per session are created and the set of possible slave servers is smaller. With a higher value inmax_slave_connections, more connections are created which requires more resources but load balancing will almost always give the best single query response time and performance. Longer sessions are less affected by a highmax_slave_connections as the relative cost of opening a connection is lower.

Behavior of max_slave_connections=0

When readwritesplit is configured with max_slave_connections=0, readwritesplit will behave slightly differently in that it will route all reads to the current master server. This is a convenient way to force all of the traffic to go to a single node while still being able to leverage the replay and reconnection features of readwritesplit.

In this mode, the behavior of master_failure_mode=fail_on_write also changes slightly. If the current Master server fails and a read is done when there's no other Master server available, the connection will be closed. This is done to prevent an extra slave connection from being opened that would not be closed if a new Master server would arrive.

`slave_connections`

Type: integer
Mandatory: No
Dynamic: Yes
Default: 255

This parameter controls how many slave connections each new session starts with. The default value is 255 which is the same as the default value ofmax_slave_connections.

In contrast to max_slave_connections, slave_connections serves as a soft limit on how many slave connections are created. The number of slave connections can exceed slave_connections if the load balancing algorithm finds an unconnected slave server better than all other slaves.

Setting this parameter to 1 allows faster connection creation and improved resource usage due to the smaller amount of initial backend connections. It is recommended to use slave_connections=1 when the lifetime of the client connections is short.

`max_slave_replication_lag`

Type:
Mandatory: No
Dynamic: Yes
Default: 0s

Specify how many seconds a slave is allowed to be behind the master. The lag of a slave must be less than the configured value in order for it to be used for routing. If set to 0 (the default value), the feature is disabled.

In MaxScale 2.5.0, the slave lag must be less than max_slave_replication_lag whereas in older versions the slave lag had to be less than or equal tomax_slave_replication_lag. This means that in MaxScale 2.5.0 it is possible to define, with max_slave_replication_lag=1, that all slaves must be up to date in order for them to be used for routing.

Note that this feature does not guarantee that writes done on the master are visible for reads done on the slave. This is mainly due to the method of replication lag measurement. For a feature that guarantees this, refer to .

The lag is specified as documented . If no explicit unit is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. Note that since the granularity of the lag is seconds, a lag specified in milliseconds will be rejected, even if the duration is longer than a second.

The Readwritesplit-router does not detect the replication lag itself. A monitor such as the MariaDB-monitor for a Master/Slave-cluster is required. This option only affects Master-Slave clusters. Galera clusters do not have a concept of slave lag even if the application of write sets might have lag. When a server is disqualified from routing because of replication lag, a warning is logged. Similarly, when the server has caught up enough to be a valid routing target, another warning is logged. These messages are only logged when a query is being routed and the replication state changes.

`use_sql_variables_in`

Type:
Mandatory: No
Dynamic: Yes
Values: master, all

This parameter controls how SELECT statements that use SQL user variables are handled. Here is an example of such a query that uses it to return an increasing row number for a resultset:

By default MaxScale will route both the SET and SELECT statements to all nodes. Any future reads of the user variables can also be performed on any node.

The possible values for this parameter are:

all (default)
Modifications to user variables inside SELECT statements as well as reads of user variables are routed to all servers. Versions before MaxScale 22.08 returned an error if a user variable was modified inside of a SELECT statement when use_sql_variables_in=all was used. MaxScale 22.08 will instead route the query to all servers and discard the extra results.
master

DML statements, such as INSERT, UPDATE or DELETE, that modify SQL user variables are still treated as writes and are only routed to the master server. For example, after the following query the value of @myid is no longer the same on all servers and the SELECT statement can return different values depending where it ends up being executed:

`connection_keepalive`

Note: This parameter has been moved into the MaxScale core. For the current documentation, read the section in the configuration guide.

Send keepalive pings to backend servers. This feature was introduced in MaxScale 2.2.0. The default value is 300 seconds starting with 2.3.2 and for older versions the feature was disabled by default. This parameter was converted into a service parameter in MaxScale 2.5.0.

`master_reconnection`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Allow the master server to change mid-session. This feature was introduced in MaxScale 2.3.0 and is disabled by default. This feature requires thatdisable_sescmd_history is not used.

When a readwritesplit session starts, it will pick a master server as the current master server of that session. By default, when this master server is lost or changes to another server, the connection will be closed.

When master_reconnection is enabled, readwritesplit can sometimes recover a lost connection to the master server. This largely depends on the value ofmaster_failure_mode.

With master_failure_mode=fail_instantly, the master server is only allowed to change to another server. This change must happen without a loss of the master server.

With master_failure_mode=fail_on_write, the loss of the master server is no longer a fatal error: if a replacement master server appears before any write queries are received, readwritesplit will transparently reconnect to the new master server.

In both cases the change in the master server can only take place ifprune_sescmd_history is enabled or max_sescmd_history has not yet been exceeded and the session does not have an open transaction.

The recommended configuration is to use master_reconnection=true andmaster_failure_mode=fail_on_write. This provides improved fault tolerance without any risk to the consistency of the database.

`slave_selection_criteria`

Type:
Mandatory: No
Dynamic: Yes
Values: LEAST_CURRENT_OPERATIONS, ADAPTIVE_ROUTING, LEAST_BEHIND_MASTER

This option controls how the readwritesplit router chooses the slaves it connects to and how the load balancing is done. The default behavior is to route read queries to the slave server with the lowest amount of ongoing queries i.e.LEAST_CURRENT_OPERATIONS.

All of the load balancing methods use MaxScale's own accounting. Connections and queries done directly on the database and not through MaxScale are not taken into account by readwritesplit. For example, if server A has 100 queries running all of which are routed through MaxScale and server B has 115 queries but only 95 of those were routed through MaxScale, server B is considered a better candidate even if the absolute number of active queries on it is higher. This is because MaxScale only tracks the connections and queries routed through the same process.

The option syntax:

Where <criteria> is one of the following values.

LEAST_CURRENT_OPERATIONS (default), the slave with least active operations
ADAPTIVE_ROUTING, based on server average response times.
LEAST_BEHIND_MASTER, the slave with smallest replication lag

LEAST_CURRENT_OPERATIONS uses the current number of active operations (i.e. SQL queries) as the load balancing metric and it optimizes for maximal query throughput. Each query gets routed to the server with the least active operations which results in faster servers processing more traffic. If two servers have an equal number of active operations, the one that was least recently used is chosen.

ADAPTIVE_ROUTING uses the server response time and current estimated server load as the load balancing metric. The server that is estimated to finish an additional query first is chosen. A modified average response time for each server is continuously updated to allow slow servers at least some traffic and quickly react to changes in server load conditions. If a server has not received any traffic, the network lag to the server as measured by the monitor is used as the proxy of the true response time. This selection criteria is designed for heterogeneous clusters: servers of differing hardware, differing network distances, or when other loads are running on the servers (including a backup). If the servers are queried by other clients than MaxScale, the load caused by them is indirectly taken into account.

LEAST_BEHIND_MASTER uses the measured replication lag as the load balancing metric. This means that servers that are more up-to-date are favored which increases the likelihood of the data being read being up-to-date. However, this is not as effective as causal_reads would be as there's no guarantee that writes done by the same connection will be routed to a server that has replicated those changes. The recommended approach is to useLEAST_CURRENT_OPERATIONS or ADAPTIVE_ROUTING in combination withcausal_reads

NOTE: LEAST_GLOBAL_CONNECTIONS and LEAST_ROUTER_CONNECTIONS should not be used, they are legacy options that exist only for backwards compatibility. Using them will result in skewed load balancing as the algorithm uses a metric that's too coarse (number of connections) to load balance something that's finer (individual SQL queries).

The LEAST_GLOBAL_CONNECTIONS and LEAST_ROUTER_CONNECTIONS use the connections from MariaDB MaxScale to the server, not the amount of connections reported by the server itself.

Starting with MaxScale versions 2.5.29, 6.4.11, 22.08.9, 23.02.5 and 23.08.1, lowercase versions of the values are also accepted. For example,slave_selection_criteria=LEAST_CURRENT_OPERATIONS andslave_selection_criteria=least_current_operations are both accepted as valid values.

`max_sescmd_history`

This parameter has been moved to in MaxScale 6.0.

`disable_sescmd_history`

This parameter has been moved to in MaxScale 6.0.

`prune_sescmd_history`

This parameter has been moved to in MaxScale 6.0.

`master_accept_reads`

Type:
Mandatory: No
Dynamic: Yes
Default: false

master_accept_reads allows the master server to be used for reads. This is a useful option to enable if you are using a small number of servers and wish to use the master for reads as well.

By default, no reads are sent to the master as long as there is a valid slave server available. If no slaves are available, reads are sent to the master regardless of the value of master_accept_reads.

`strict_multi_stmt`

Type:
Mandatory: No
Dynamic: Yes
Default: false

This option is disabled by default since MaxScale 2.2.1. In older versions, this option was enabled by default.

When a client executes a multi-statement query, it will be treated as if it were a DML statement and routed to the master. If the option is enabled, all queries after a multi-statement query will be routed to the master to guarantee a consistent session state.

If the feature is disabled, queries are routed normally after a multi-statement query.

Warning: Enable the strict mode only if you know that the clients will send statements that cause inconsistencies in the session state.

`strict_sp_calls`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Similar to strict_multi_stmt, this option allows all queries after a CALL operation on a stored procedure to be routed to the master. This option is disabled by default and was added in MaxScale 2.1.9.

All warnings and restrictions that apply to strict_multi_stmt also apply tostrict_sp_calls.

`master_failure_mode`

Type:
Mandatory: No
Dynamic: Yes
Values: fail_instantly, fail_on_write, error_on_write

This option controls how the failure of a master server is handled. By default, the router will close the client connection as soon as the master is lost.

The following table describes the values for this option and how they treat the loss of a master server.

Value

Description

These also apply to new sessions created after the master has failed. This means that in fail_on_write or error_on_write mode, connections are accepted as long as slave servers are available.

When configured with fail_on_write or error_on_write, sessions that are idle will not be closed even if all backend connections for that session have failed. This is done in the hopes that before the next query from the idle session arrives, a reconnection to one of the slaves is made. However, this can leave idle connections around unless the client application actively closes them. To prevent this, use the parameter.

Note: If master_failure_mode is set to error_on_write and the connection to the master is lost, by default, clients will not be able to execute write queries without reconnecting to MariaDB MaxScale once a new master is available. If is enabled, the session can recover if one of the slaves is promoted as the master.

`retry_failed_reads`

Type:
Mandatory: No
Dynamic: Yes
Default: true

This option controls whether autocommit selects are retried in case of failure. This option is enabled by default.

When a simple autocommit select is being executed outside of a transaction and the slave server where the query is being executed fails, readwritesplit can retry the read on a replacement server. This makes the failure of a slave transparent to the client.

`delayed_retry`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Retry queries over a period of time. This parameter takes a boolean value, was added in Maxscale 2.3.0 and is disabled by default.

When this feature is enabled, a failure to route a query due to a connection problem will not immediately result in an error. The routing of the query is delayed until either a valid candidate server is available or the retry timeout is reached. If a candidate server becomes available before the timeout is reached, the query is routed normally and no connection error is returned. If no candidates are found and the timeout is exceeded, the router returns to normal behavior and returns an error.

When combined with the master_reconnection parameter, failures of writes done outside of transactions can be hidden from the client connection. This allows a master to be replaced while writes are being sent.

Starting with MaxScale 21.06.18, 22.08.15, 23.02.12, 23.08.8, 24.02.4 and 24.08.1, delayed_retry will no longer attempt to retry a query if it was already sent to the database. If a query is received while a valid target server is not available, the execution of the query is delayed until a valid target is found or the delayed retry timeout is hit. If a query was already sent, it will not be replayed to prevent duplicate execution of statements.

In older versions of MaxScale, duplicate execution of a statement can occur if the connection to the server is lost or the server crashes but the server comes back up before the timeout for the retrying is exceeded. At this point, if the server managed to read the client's statement, it will be executed. For this reason, it is recommended to only enable delayed_retry for older versions of MaxScale when the possibility of duplicate statement execution is an acceptable risk.

`delayed_retry_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 10s

The duration to wait until an error is returned to the client whendelayed_retry is enabled. The default value is 10 seconds.

The timeout is specified as documented . If no explicit unit is provided, the value is interpreted as seconds in MaxScale 2.4. In subsequent versions a value without a unit may be rejected. Note that since the granularity of the timeout is seconds, a timeout specified in milliseconds will be rejected, even if the duration is longer than a second.

`transaction_replay`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Replay interrupted transactions. This parameter was added in MaxScale 2.3.0 and is disabled by default. Enabling this parameter enables both delayed_retry andmaster_reconnection and sets master_failure_mode to fail_on_write, thereby overriding any configured values for these parameters.

When the server where the transaction is in progress fails, readwritesplit can migrate the transaction to a replacement server. This can completely hide the failure of a master node without any visible effects to the client.

If no replacement node becomes available, the client connection is closed.

To control how long a transaction replay can take, usetransaction_replay_timeout.

Please refer to the section for a more detailed explanation of what should and should not be done with transaction replay.

`transaction_replay_max_size`

Type:
Mandatory: No
Dynamic: Yes
Default: 1 MiB

The limit on transaction size for transaction replay in bytes. Any transaction that exceeds this limit will not be replayed. The default value is 1 MiB. This limit applies at a session level which means that the total peak memory consumption can be transaction_replay_max_size times the number of client connections.

The amount of memory needed to store a particular transaction will be slightly larger than the length in bytes of the SQL used in the transaction. If the limit is ever exceeded, a message will be logged at the info level.

Starting with MaxScale 6.4.10, the number of times that this limit has been exceeded is shown in maxctrl show service as trx_max_size_exceeded.

Read for more details on size type parameters in MaxScale.

`transaction_replay_attempts`

Type: integer
Mandatory: No
Dynamic: Yes
Default: 5

The upper limit on how many times a transaction replay is attempted before giving up. The default value is 5.

A transaction replay failure can happen if the server where the transaction is being replayed fails while the replay is in progress. In practice this parameter controls how many server and network failures a single transaction replay tolerates. If a transaction is replayed successfully, the counter for failed attempts is reset.

`transaction_replay_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 0s

The time how long transactions are attempted for. This feature is disabled by default and was added in MaxScale 6.2.1. To explicitly disable this feature, set the value to 0 seconds.

The timeout is and the value must include a unit for the duration.

When transaction_replay_timeout is enabled, the time a transaction replay can take is controlled solely by this parameter. This is a more convenient and predictable method of controlling how long a transaction replay can be attempted before the connection is closed.

If delayed_retry_timeout is less than transaction_replay_timeout, it is set to the same value.

By default the time how long a transaction can be retried is controlled bydelayed_retry_timeout and transaction_replay_attempts. This can result in a maximum replay time limit of delayed_retry_timeout multiplied bytransaction_replay_attempts, by default this is 50 seconds. The minimum replay time limit can be as low as transaction_replay_attempts seconds (5 seconds by default) in cases where the connection fails after it was created. Usually this happens due to problems like the max_connections limit being hit on the database server.

With the introduction of transaction_replay_timeout, these problems are avoided. Starting with MaxScale 6.2.1, this is the recommended method of controlling the timeouts for transaction replay.

`transaction_replay_retry_on_deadlock`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Enable automatic retrying of transactions that end up in a deadlock. This parameter was added in MaxScale 2.4.6 and the feature is disabled by default. MaxScale versions from 2.4.0 to 2.4.5 always tried to replay deadlocked transactions.

If this feature is enabled and a transaction returns a deadlock error (e.g. SQLSTATE 40001: Deadlock found when trying to get lock; try restarting transaction), the transaction is automatically retried. If the retrying of the transaction results in another deadlock error, it is retried until it either succeeds or a transaction checksum error is encountered.

`transaction_replay_retry_on_mismatch`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Retry transactions that end in checksum mismatch. This parameter was added in MaxScale 6.2.1 is disabled by default.

When enabled, any replayed transactions that end with a checksum mismatch are retried until they either succeeds or one of the transaction replay limits is reached (delayed_retry_timeout, transaction_replay_timeout ortransaction_replay_attempts).

`transaction_replay_checksum`

Type:
Mandatory: No
Dynamic: Yes
Values: full, result_only, no_insert_id

Selects which transaction checksum method is used to verify the result of the replayed transaction.

Note that only transaction_replay_checksum=full is guaranteed to retain the consistency of the replayed transaction.

Possible values are:

full (default)
All responses from the server are included in the checksum. This retains the full consistency guarantee of the replayed transaction as it must match exactly the one that was already returned to the client.
result_only

`optimistic_trx`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Enable optimistic transaction execution. This parameter controls whether normal transactions (i.e. START TRANSACTION or BEGIN) are load balanced across slaves. This feature is disabled by default and enabling it implicitly enablestransaction_replay, delayed_retry and master_reconnection parameters.

When this mode is enabled, all transactions are first attempted on slave servers. If the transaction contains no statements that modify data, it is completed on the slave. If the transaction contains statements that modify data, it is rolled back on the slave server and restarted on the master. The rollback is initiated the moment a data modifying statement is intercepted by readwritesplit so only read-only statements are executed on slave servers.

As with transaction_replay and transactions that are replayed, if the results returned by the master server are not identical to the ones returned by the slave up to the point where the first data modifying statement was executed, the connection is closed. If the execution of ROLLBACK statement on the slave fails, the connection to that slave is closed.

All limitations that apply to transaction_replay also apply tooptimistic_trx.

`causal_reads`

Type:
Mandatory: No
Dynamic: Yes
Values: none, local, global

Enable causal reads. This parameter is disabled by default and was introduced in MaxScale 2.3.0.

If a client connection modifies the database and causal_reads is enabled, any subsequent reads performed on slave servers will be done in a manner that prevents replication lag from affecting the results.

The following table contains a comparison of the modes. Read the for more information on what a sync consists of and why minimizing the number of them is important.

Mode

Level of Causality

Latency

The fast and fast_global modes should only be used when low latency is more important than proper distribution of reads. These modes should only be used when the workload is mostly read-only with only occasional writes. If used with a mixed or a write-heavy workload, the traffic will end up being routed almost exclusively to the master server.

Note: This feature requires MariaDB 10.2.16 or newer to function. In addition to this, the session_track_system_variables parameter must includelast_gtid in its list of tracked system variables.

Note: This feature also enables multi-statement execution of SQL in the protocol. This is equivalent to using allowMultiQueries=true in or using CLIENT_MULTI_STATEMENTS and CLIENT_MULTI_RESULTS in the Connector/C. The Implementation of causal_reads section explains why this is necessary.

The possible values for this parameter are:

none (default)
Read causality is disabled.
local
Writes are locally visible. Writes are guaranteed to be visible only to the connection that does it. Unrelated modifications done by other connections are not visible. This mode improves read scalability at the cost of latency and reduces the overall load placed on the master server without breaking causality guarantees.

Before MaxScale 2.5.0, the causal_reads parameter was a boolean parameter. False values translated to none and true values translated tolocal. The use of boolean parameters is deprecated but still accepted in MaxScale 2.5.0.

Implementation of causal_reads

This feature is based on the MASTER_GTID_WAIT function and the tracking of server-side status variables. By tracking the latest GTID that each statement generates, readwritesplit can then perform a synchronization operation with the help of the MASTER_GTID_WAIT function.

If the slave has not caught up to the master within the configured time, as specified by , it will be retried on the master. In MaxScale 2.3.0 an error was returned to the client when the slave timed out.

The exception to this rule is the fast mode which does not do any synchronization at all. This can be done as any reads that would go to out-of-date servers will be re-routed to the current master.

Normal SQL

A practical example can be given by the following set of SQL commands executed with autocommit=1.

As the statements are not executed inside a transaction, from the load balancer's point of view, the latter statement can be routed to a slave server. The problem with this is that if the value that was inserted on the master has not yet replicated to the server where the SELECT statement is being performed, it can appear as if the value we just inserted is not there.

By prefixing these types of SELECT statements with a command that guarantees consistent results for the reads, read scalability can be improved without sacrificing consistency.

The set of example SQL above will be translated by MaxScale into the following statements.

The SET command will synchronize the slave to a certain logical point in the replication stream (see for more details). If the synchronization fails, the query will not run and it will be retried on the server where the transaction was originally done.

Prepared Statements

Binary protocol prepared statements are handled in a different manner. Instead of adding the synchronization SQL into the original SQL query, it is sent as a separate packet before the prepared statement is executed.

We'll use the same example SQL but use a binary protocol prepared statement for the SELECT:

The SQL that MaxScale executes will be the following:

Both the synchronization query and the execution of the prepared statement are sent at the same time. This is done to remove the need to wait for the result of the synchronization query before routing the execution of the prepared statement. This keeps the performance of causal_reads for prepared statements the same as it is for normal SQL queries.

As a result of this, each time the synchronization query times out, the connection will be killed by the KILL statement and readwritesplit will retry the query on the master. This is done to prevent the execution of the prepared statement that follows the synchronization query from being processed by the MariaDB server.

It is recommend that the session command history is enabled whenever prepared statements are used with causal_reads. This allows new connections to be created whenever a causal read times out.

Starting with MaxScale 2.5.17, a failed causal read inside of a read-only transaction started with START TRANSACTION READ ONLY will return the following error:

Older versions of MaxScale attempted to retry the command on the current master server which would cause the connection to be closed and a warning to be logged.

Limitations of Causal Reads

This feature does not work with Galera or any other non-standard replication mechanisms. As Galera does not update the gtid_slave_pos variable when events are replicated via the Galera library, the function used by MaxScale to synchronize reads will wait until the timeout. With Galera this is not a serious issue as it, by nature, is a mostly-synchronous replication mechanism.
If the combination of the original SQL statement and the modifications added to it by readwritesplit exceed the maximum packet size (16777213 bytes), the causal read will not be attempted and a non-causal read is done instead. This applies only to text protocol queries as the binary protocol queries use a different synchronization mechanism.
SQL like INSERT ... RETURNING that commits a transaction and returns a resultset will only work with causal reads if the connector supports the DEPRECATE_EOF protocol feature. The following table contains a list of MariaDB connectors and whether they support the protocol feature.

Connector

Supported

Version

`causal_reads_timeout`

Type:
Mandatory: No
Dynamic: Yes
Default: 10s

The timeout for the slave synchronization done by causal_reads. The default value is 10 seconds.

`lazy_connect`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Lazy connection creation causes connections to backend servers to be opened only when they are needed. This reduces the load that is placed on the backend servers when the client connections are short. This parameter is a boolean type and is disabled by default.

By default readwritesplit opens as many connections as it can when the session is first opened. This makes the execution of the first query faster when all available connections are already created. When lazy_connect is enabled, this initial connection creation is skipped. If the client executes only read queries, no connection to the master is made. If only write queries are made, only the master connection is used.

`reuse_prepared_statements`

Type:
Mandatory: No
Dynamic: Yes
Default: false

Reuse identical prepared statements inside the same client connection. This is a boolean parameter and is disabled by default. This feature only applies to binary protocol prepared statements.

When this parameter is enabled and the connection prepares an identical prepared statement multiple times, instead of preparing it on the server the existing prepared statement handle is reused. This also means that whenever prepared statements are closed by the client, they will be left open by readwritesplit.

Enabling this feature will increase memory usage of a session. The amount of memory stored per prepared statement is proportional to the length of the prepared SQL statement and the number of parameters the statement has.

Router Diagnostics

The router_diagnostics output for a readwritesplit service contains the following fields.

queries: Number of queries executed through this service.
route_master: Number of writes routed to master.
route_slave: Number of reads routed to slaves.

Server Ranks

The general rule with server ranks is that primary servers will be used before secondary servers. Readwritesplit is an exception to this rule. The following rules govern how readwritesplit behaves with servers that have different ranks.

Sessions will use the current master server as long as possible. This means that sessions with a secondary master will not use the primary master as long as the secondary master is available.
All slave connections will use the same rank as the master connection. Any stale connections with a different rank than the master will be discarded.
If no master connection is available and master_reconnection is enabled, a connection to the best master is created. If the new master has a different priority than existing connections have, the connections with a different rank will be discarded.

Routing hints

The readwritesplit router supports routing hints. For a detailed guide on hint syntax and functionality, please read document.

Note: Routing hints will always have the highest priority when a routing decision is made. This means that it is possible to cause inconsistencies in the session state and the actual data in the database by adding routing hints to DDL/DML statements which are then directed to slave servers. Only use routing hints when you are sure that they can cause no harm.

An exception to this rule is transaction_replay: when it is enabled, all routing hints inside transaction are ignored. This is done to prevent changes done inside a re-playable transaction from affecting servers outside of the transaction. This behavior was added in MaxScale 6.1.4. Older versions allowed routing hints to override the transaction logic.

Known Limitations of Routing Hints

If a SELECT statement with a maxscale route to slave hint is received while autocommit is disabled, the query will be routed to a slave server. This causes some metadata locks to be acquired on the database in question which will block DDL statements on the server until either the connection is closed or autocommit is enabled again.

Module Commands

The readwritesplit router implements the following module commands.

`reset-gtid`

The command resets the global GTID state in the router. It can be used withcausal_reads=global to reset the state. This can be useful when the cluster is reverted to an earlier state and the GTIDs recorded in MaxScale are no longer valid.

The first and only argument to the command is the router name. For example, to reset the GTID state of a readwritesplit named My-RW-Router, the following MaxCtrl command should be used:

Examples

Examples of the readwritesplit router in use can be found in the folder.

Readwritesplit routing decisions

Here is a small explanation which shows what kinds of queries are routed to which type of server.

Routing to Master

Routing to master is important for data consistency and because majority of writes are written to binlog and thus become replicated to slaves.

The following operations are routed to master:

DML statements (INSERT, UPDATE, DELETE etc.)
DDL statements (DROP, CREATE, ALTER etc.)

In addition to these, if the readwritesplit service is configured with themax_slave_replication_lag parameter, and if all slaves suffer from too much replication lag, then statements will be routed to the Master. (There might be other similar configuration parameters in the future which limit the number of statements that will be routed to slaves.)

Transaction Isolation Level Tracking

Use of the SERIALIZABLE transaction isolation level with readwritesplit is not recommended as it somewhat goes against the goals of load balancing.

If either session_track_transaction_info=CHARACTERISTICS orsession_track_system_variables=tx_isolation is configured for the MariaDB server, readwritesplit will track the transaction isolation level and lock the session to the master when the isolation level is set to serializable. This retains the correctness of the isolation level which can otherwise cause problems. Once a session is locked to the master, it will not be unlocked. To reinstate the normal routing behavior, a new connection must be created.

For example, if transaction isolation level tracking cannot be done and an autocommit SELECT is routed to a slave, it no longer behaves in a serializable manner. This can also have an effect on the replication in the slave server.

Routing to Slaves

The ability to route some statements to slaves is important because it also decreases the load targeted to master. Moreover, it is possible to have multiple slaves to share the load in contrast to single master.

Queries which can be routed to slaves must be auto committed and belong to one of the following group:

Read-only statements (i.e. SELECT) that only use read-only built-in functions
All statements within an explicit read-only transaction (START TRANSACTION READ ONLY)
SHOW statements except SHOW MASTER STATUS

The list of supported built-in fuctions can be found .

Routing to every session backend

A third class of statements includes those which modify session data, such as session system variables, user-defined variables, the default database, etc. We call them session commands, and they must be replicated as they affect the future results of read and write operations. They must be executed on all servers that could execute statements on behalf of this client.

Session commands include for example:

Commands that modify the session state (SET, USE, CHANGE USER)
Text protocol PREPARE statements
Binary protocol prepared statements

NOTE: if variable assignment is embedded in a write statement it is routed to Master only. For example, INSERT INTO t1 values(@myvar:=5, 7) would be routed to Master only.

The router stores all of the executed session commands so that in case of a slave failure, a replacement slave can be chosen and the session command history can be repeated on that new slave. This means that the router stores each executed session command for the duration of the session. Applications that use long-running sessions might cause MariaDB MaxScale to consume a growing amount of memory unless the sessions are closed. This can be solved by adjusting the value of max_sescmd_history.

Routing to previous target

In the following cases, a query is routed to the same server where the previous query was executed. If no previous target is found, the query is routed to the current master.

If a query uses the FOUND_ROWS() function, it will be routed to the server where the last query was executed. This is done with the assumption that a query with SQL_CALC_FOUND_ROWS was previously executed.
COM_STMT_FETCH_ROWS will always be routed to the same server where the COM_STMT_EXECUTE was routed.

Limitations

Read queries are routed to the master server in the following situations:

Query is executed inside an open read-write transaction
Statement includes a stored procedure or an UDF call
If there are multiple statements inside one query e.g.INSERT INTO ... ; SELECT LAST_INSERT_ID();

Prepares Statement Limitations

If a prepared statement targets a temporary table on the master, the slave servers will fail to execute it. This will cause all slave connections to be closed (MXS-1816).

Transaction Replay Limitations

If the results from the replacement server are not identical when the transaction is replayed, the client connection is closed. This means that any transaction with a server specific result (e.g. NOW(), @@server_id) cannot be replayed successfully but it will still be attempted.

If a transaction reads data before updating it, the rows should be locked by using SELECT ... FOR UPDATE. This will prevent overlapping transactions when multiple transactions are being replayed that modify the same set of rows.

If the connection to the server where the transaction is being executed is lost when the final COMMIT is being executed, it is impossible to know whether the transaction was successfully committed. This means that there is a possibility for duplicate transaction execution which can result in data duplication in certain cases. Data duplication can happen if the transaction consists of the following statement types:

INSERT of rows into a table that does not have an auto-increment primary key
A "blind update" of one or more rows e.g. UPDATE t SET c = c + 1 WHERE id = 123
A "blind delete" e.g. DELETE FROM t LIMIT 100

This is not an exhaustive list and any operations that do not check the row contents before performing the operation on them might face this problem.

In all cases the problem of duplicate transaction execution can be avoided by including a SELECT ... FOR UPDATE in the statement. This will guarantee that in the case that the transaction fails when it is being committed, the row is only modified if it matches the expected contents.

Similarly, a connection loss during COMMIT can also result in transaction replay failure. This happens due to the same reason as duplicate transaction execution but the retried transaction will not be committed. This can be considered a success case as the transaction replay detected that the results of the two transactions are different. In these cases readwritesplit will abort the transaction and close the client connection.

Statements that result in an implicit commit do not reset the transaction when transaction_replay is enabled. This means that if the transaction is replayed, the transaction will be committed twice due to the implicit commit being present. The exception to this are the transaction management statements such asBEGIN and START TRANSACTION: they are detected and will cause the transaction to be correctly reset.

Any changes to the session state (e.g. autocommit state, SQL mode) done inside a transaction will remain in effect even if the connection to the server where the transaction is being executed fails. When readwritesplit creates a new connection to a server to replay the transaction, it will first restore the session state by executing all session commands that were executed. This means that if the session state is changed mid-transaction in a way that affects the results, transaction replay will fail.

The following partial transaction demonstrates the problem by using inside a transaction.

If this transaction has to be replayed the actual SQL that gets executed is the following.

First the session state is restored by executing all commands that changed the state after which the actual transaction is replayed. Due to the fact that the SQL_MODE was changed mid-transaction, one of the queries will now return an error instead of the result we expected leading to a transaction replay failure.

In a service-to-service configuration (i.e. a service using another service in its targets list ), if the topmost service starts a transaction, all lower-level readwritesplit services will also behave as if a transaction is open. If a connection to a backend database fails during this, it can result in unnecessary transaction replays which in turn can end up with checksum conflicts. The recommended approach is to not use any commands inside a transaction that would be routed to more than one node.

If the connection to the server where a transaction is being executed is lost while a ROLLBACK is being executed, readwritesplit will still attempt to replay the transaction in the hopes that the real response can be delivered to the client. However, this does mean that it is possible that a rolled back transaction which gets replayed ends up with a conflict and is reported as a replay failure when in reality a rolled back transaction could be safely ignored.

Legacy Configuration

In older versions of MaxScale, routers were configured via the router_options parameter. This functionality was deprecated in 2.2 and was removed in 2.3.

JDBC Batched Statements

Readwritesplit does not support pipelining of JDBC batched statements. This is caused by the fact that readwritesplit executes the statements one at a time to track the state of the response.

Limitations in multi-statement handling

When a multi-statement query is executed through the readwritesplit router, it will always be routed to the master. See for more details.

If the multi-statement query creates a temporary table, it will not be detected and reads to this table can be routed to slave servers. To prevent this, always execute the temporary table creation as an individual statement.

Limitations in client session handling

Some of the queries that a client sends are routed to all backends instead of just to one. These queries include USE <db name> and SET autocommit=0, among many others. Readwritesplit sends a copy of these queries to each backend server and forwards the master's reply to the client. Below is a list of MySQL commands which are classified as session commands.

Prior to MaxScale 2.3.0, session commands that were 2²⁴ - 1 bytes or longer were not supported and caused the session to be closed.

There is a possibility for misbehavior. If USE mytable is executed in one of the slaves and fails, it may be due to replication lag rather than the database not existing. Thus, the same command may produce different result in different backend servers. The slaves which fail to execute a session command will be dropped from the active list of slaves for this session to guarantee a consistent session state across all the servers used by the session. In addition, the server will not be used again for routing for the duration of the session.

The above-mentioned behavior for user variables can be partially controlled with the configuration parameter use_sql_variables_in:

WARNING

If a SELECT query modifies a user variable when the use_sql_variables_in parameter is set to all, it will not be routed and the client will receive an error. A log message is written into the log further explaining the reason for the error. Here is an example use of a SELECT query which modifies a user variable and how MariaDB MaxScale responds to it.

Allow user variable modification in SELECT queries by settinguse_sql_variables_in=master. This will route all queries that use user variables to the master.

CC BY-SA / Gnu FDL