MariaDB Galera Cluster is a virtually synchronous multi-master cluster that runs on Linux only. Its Enterprise version is MariaDB Enterprise Cluster (powered by Galera).
MariaDB Enterprise Cluster is a solution designed to handle high workloads exceeding the capacity of a single server. It is based on Galera Cluster technology integrated with MariaDB Enterprise Server and includes features like data-at-rest encryption for added security. This multi-primary replication alternative is ideal for maintaining data consistency across multiple servers, providing enhanced reliability and scalability.
MariaDB Enterprise Cluster, powered by Galera, is available with MariaDB Enterprise Server. MariaDB Galera Cluster is available with MariaDB Community Server.
In order to handle increasing load and especially when that load exceeds what a single server can process, it is best practice to deploy multiple MariaDB Enterprise Servers with a replication solution to maintain data consistency between them. MariaDB Enterprise Cluster is a multi-primary replication solution that serves as an alternative to the single-primary MariaDB Replication.
MariaDB Enterprise Cluster is built on MariaDB Enterprise Server with Galera Cluster and MariaDB MaxScale. In MariaDB Enterprise Server 10.5 and later, it features enterprise-specific options, such as data-at-rest encryption for the write-set cache, that are not available in other Galera Cluster implementations.
As a multi-primary replication solution, any MariaDB Enterprise Server can operate as a Primary Server. This means that changes made to any node in the cluster replicate to every other node in the cluster, using certification-based replication and global ordering of transactions for the InnoDB storage engine.
Note: MariaDB Enterprise Cluster is only available for Linux operating systems.
There are a few things to consider when planning the hardware, virtual machines, or containers for MariaDB Enterprise Cluster.
MariaDB Enterprise Cluster architecture involves deploying MariaDB MaxScale with multiple instances of MariaDB Enterprise Server. The Servers are configured to use multi-primary replication to maintain consistency between themselves while MariaDB MaxScale routes reads and writes between them.
The application establishes a client connection to MariaDB MaxScale. MaxScale then routes statements to one of the MariaDB Enterprise Servers in the cluster. Writes made to any node in this cluster replicate to all the other nodes of the cluster.
When MariaDB Enterprise Servers start in a cluster:
Each Server attempts to establish network connectivity with the other Servers in the cluster
Groups of connected Servers form a component
When a Server establishes network connectivity with the Primary Component, it synchronizes its local database with that of the cluster
As a member of the Primary Component, the Server becomes operational — able to accept read and write queries from clients
During startup, the Primary Component is the Server bootstrapped to run as the Primary Component. Once the cluster is online, the Primary Component is any combination of Servers which includes a minimum of more than half the total number of Servers.
A Server or group of Servers that loses network connectivity with the majority of the cluster becomes non-operational.
In planning the number of systems to provision for MariaDB Enterprise Cluster, it is important to keep cluster operation in mind. Ensuring that it has enough disk space and that it is able to maintain a Primary Component in the event of outages.
Each Server requires the minimum amount of disk space needed to store the entire database. The upper storage limit for MariaDB Enterprise Cluster is that of the smallest disk in use.
Each switch in use should have an odd number of Servers above three.
In a cluster that spans multiple switches, each data center in use should have an odd number of switches above three.
In a cluster that spans multiple data centers, use an odd number of data centers above three.
Each data center in use should have at least one Server dedicated to backup operations. This can be another cluster node or a separate Replica Server kept in sync using MariaDB Replication.
In case of planning Servers to the switch, switches to the data center, and data centers in the cluster, this model helps preserve the Primary Component. A minimum of three in use means that a single Server or switch can fail without taking down the cluster.
Using an odd number above three reduces the risk of a split-brain situation (that is, a case where two separate groups of Servers believe that they are part of the Primary Component and remain operational).
Nodes in MariaDB Enterprise Cluster are individual MariaDB Enterprise Servers configured to perform multi-primary cluster replication. This configuration is set using a series of system variables in the configuration file.
[mariadb]
# General Configuration
bind_address = 0.0.0.0
innodb_autoinc_lock_mode = 2
# Cluster Configuration
wsrep_cluster_name = "accounting_cluster"
wsrep_cluster_address = "gcomm://192.0.2.1,192.0.2.2,192.0.2.3"
# wsrep Provider
wsrep_provider = /usr/lib/galera/libgalera_enterprise_smm.so
wsrep_provider_options = "evs.suspect_timeout=PT10S"
Additional information on system variables is available in the Reference chapter.
The innodb_autoinc_lock_mode
system variable must be set to a value of 2 to enable interleaved lock mode. MariaDB Enterprise Cluster does not support other lock modes.
Ensure also that the bind_address
system variable is properly set to allow MariaDB Enterprise Server to listen for TCP/IP connections:
bind_address = 0.0.0.0
innodb_autoinc_lock_mode = 2
MariaDB Enterprise Cluster requires that you set a name for your cluster, using the wsrep_cluster_name
system variable. When nodes connect to each other, they check the cluster name to ensure that they've connected to the correct cluster before replicating data. All Servers in the cluster must have the same value for this system variable.
Using the wsrep_cluster_address
system variable, you can define the back-end protocol (always gcomm) and comma-separated list of the IP addresses or domain names of the other nodes in the cluster.
wsrep_cluster_name = "accounting_cluster"
wsrep_cluster_address = "gcomm://192.0.2.1,192.0.2.2,192.0.2.3"
It is best practice to list all nodes on this system variable, as this is the list the node searches when attempting to reestablish network connectivity with the primary component.
Note: In certain environments, such as deployments in the cloud, you may also need to set the wsrep_node_address
system variable, so that MariaDB Enterprise Server properly informs other Servers how to reach it.
MariaDB Enterprise Server connects to other Servers and replicates data from the cluster through a wsrep Provider called the Galera Replicator plugin. In order to enable clustering, specify the path to the relevant .so
file using the wsrep_provider
system variable.
MariaDB Enterprise Server 10.4 and later installations use an enterprise-build of the Galera Enterprise 4 plugin. This includes all the features of Galera Cluster 4 as well as enterprise features like GCache encryption.
To enable MariaDB Enterprise Cluster, use the libgalera_enterprise_smm.so
library:
wsrep_provider = /usr/lib/galera/libgalera_enterprise_smm.so
MariaDB Enterprise Server use the older community-release of the Galera 3 plugin. This is set using the libgalera_smm.so
library:
wsrep_provider = /usr/lib/galera/libgalera_smm.so
In addition to system variables, there is a set of options that you can pass to the wsrep Provider to configure or to otherwise adjust its operations. This is done through the wsrep_provider_options
system variable:
wsrep_provider_options = "evs.suspect_timeout=PT10S"
Additional information is available in the Reference chapter.
MariaDB Enterprise Cluster implements a multi-primary replication solution.
When you write to a table on a node, the node collects the write into a write-set transaction, which it then replicates to the other nodes in the cluster.
Your application can write to any node in the cluster. Each node certifies the replicated write-set. If the transaction has no conflicts, the nodes apply it. If the transaction does have conflicts, it is rejected and all of the nodes revert the changes.
The first node you start in MariaDB Enterprise Cluster bootstraps the Primary Component. Each subsequent node that establishes a connection joins and synchronizes with the Primary Component. A cluster achieves a quorum when more than half the nodes are joined to the Primary Component.
When a component forms that has less than half the nodes in the cluster, it becomes non-operational, since it believes there is a running Primary Component to which it has lost network connectivity.
These quorum requirements, combined with the requisite number of odd nodes, avoid a split brain situation, or one in which two separate components believe they are each the Primary Component.
In cases where the cluster goes down and your nodes become non-operational, you can dynamically bootstrap the cluster.
First, find the most up-to-date node (that is, the node with the highest value for the wsrep_last_committed
status variable):
SHOW STATUS LIKE 'wsrep_last_committed';
Once you determine the node with the most recent transaction, you can designate it as the Primary Component by running the following on it:
SET GLOBAL wsrep_provider_options="pc.bootstrap=YES";
The node bootstraps the Primary Component onto itself. Other nodes in the cluster with network connectivity then submit state transfer requests to this node to bring their local databases into sync with what's available on this node.
From time to time a node can fall behind the cluster. This can occur due to expensive operations being issued to it or due to network connectivity issues that lead to write-sets backing up in the queue. Whatever the cause, when a node finds that it has fallen too far behind the cluster, it attempts to initiate a state transfer.
In a state transfer, the node connects to another node in the cluster and attempts to bring its local database back in sync with the cluster. There are two types of state transfers:
Incremental State Transfer (IST)
State Snapshot Transfer (SST)
When the donor node receives a state transfer request, it checks its write-set cache (that is, the GCache) to see if it has enough saved write-sets to bring the joiner into sync. If the donor node has the intervening write-sets, it performs an IST operation, where the donor node only sends the missing write-sets to the joiner. The joiner applies these write-sets following the global ordering to bring its local databases into sync with the cluster.
When the donor does not have enough write-sets cached for an IST, it runs an SST operation. In an SST, the donor uses a backup solution, like MariaDB Enterprise Backup, to copy its data directory to the joiner. When the joiner completes the SST, it begins to process the write-sets that came in during the transfer. Once it's in sync with the cluster, it becomes operational.
IST's provide the best performance for state transfers and the size of the GCache may need adjustment to facilitate their use.
MariaDB Enterprise Server uses Flow Control to sometimes throttle transactions and ensure that all nodes work equitably.
Write-sets that replicate to a node are collected by the node in its received queue. The node then processes the write-sets according to global ordering. Large transactions, expensive operations, or simple hardware limitations can lead to the received queue backing up over time.
When a node's received queue grows beyond certain limits, the node initiates Flow Control. In Flow Control, the node pauses replication to work through the write-sets it already has. Once it has worked the received queue down to a certain size, it re-initiates replication.
A node or nodes will be removed or evicted from a cluster if it becomes non-responsive.
In MariaDB Enterprise Cluster, each node monitors network connectivity and response times from every other node. MariaDB Enterprise Cluster evaluates network performance using the EVS Protocol.
When a node finds another to have poor network connectivity, it adds an entry to the delayed list. If the node becomes active again and the network performance improves for a certain amount of time, an entry for it is removed from the delayed list. That is, the longer a node has network problems the longer it has to be active again to be removed from the delayed list.
If the number of entries for a node in the delayed list exceeds a threshold established for the cluster, the EVS Protocol evicts the node from the cluster.
Evicted nodes become non-operational components. They cannot rejoin the cluster until you restart MariaDB Enterprise Server.
Under normal operation, huge transactions and long-running transactions are difficult to replicate. MariaDB Enterprise Cluster rejects conflicting transactions and rolls back the changes. A transaction that takes several minutes or longer to run can encounter issues if a small transaction is run on another node and attempts to write to the same table. The large transaction fails because it encounters a conflict when it attempts to replicate.
MariaDB Enterprise Server 10.4 and later support streaming replication for MariaDB Enterprise Cluster. In streaming replication, huge transactions are broken into transactional fragments, which are replicated and applied as the operation runs. This makes it more difficult for intervening sessions to introduce conflicts.
To initiate streaming replication, set the wsrep_trx_fragment_unit and wsrep_trx_fragment_size
system variables. You can set the unit to BYTES, ROWS, or STATEMENTS
:
SET SESSION wsrep_trx_fragment_unit='STATEMENTS';
SET SESSION wsrep_trx_fragment_size=5;
Then, run your transaction.
Streaming replication works best with very large transactions where you don't expect to encounter conflicts. If the statement does encounter a conflict, the rollback operation is much more expensive than usual. As such, it's best practice to enable streaming replication at a session-level and to disable it by setting the wsrep_trx_fragment_size
system variable to 0 when it's not needed.
SET SESSION wsrep_trx_fragment_size=0;
Deployments on mixed hardware can introduce issues where some MariaDB Enterprise Servers perform better than others. A Server in one part of the world might perform more reliably or be physically closer to most users than others. In cases where a particular MariaDB Enterprise Server holds logical significance for your cluster, you can weight its value in quorum calculations.
Galera Arbitrator is a separate process that runs alongside MariaDB Enterprise Server. While the Arbitrator does not take part in replication, whenever the cluster performs quorum calculations it gives the Arbitrator a vote as though it were another MariaDB Enterprise Server. In effect this means that the system has the vote of MariaDB Enterprise Server plus any running Arbitrators in determining whether it's part of the Primary Component.
Bear in mind that the Galera Arbitrator is a separate package, galera-arbitrator-4
, which is not installed by default with MariaDB Enterprise Server.
MariaDB Enterprise Servers that join a cluster attempt to connect to the IP addresses provided to the wsrep_cluster_address
system variable. This variable adjusts itself at runtime to include the addresses of all connected nodes.
To scale-out MariaDB Enterprise Cluster, start new MariaDB Enterprise Servers with the appropriate wsrep_cluster_address
list and the same wsrep_cluster_name
value. The new nodes establish network connectivity with the running cluster and request a state transfer to bring their local database into sync with the cluster.
Once the MariaDB Enterprise Server reports itself as being in sync with the cluster, MariaDB MaxScale can begin including it in the load distribution for the cluster.
Being a multi-primary replication solution means that any MariaDB Enterprise Server in the cluster can handle write operations, but write scale-out is minimal as every Server in the cluster needs to apply the changes.
MariaDB Enterprise Cluster does not provide failover capabilities on its own. MariaDB MaxScale is used to route client connections to MariaDB Enterprise Server.
Unlike a traditional load balancer, MariaDB MaxScale is aware of changes in the node and cluster states.
MaxScale takes nodes out of the distribution that initiate a blocking SST operation or Flow Control or otherwise go down, which allows them to recover or catch up without stopping service to the rest of the cluster.
With MariaDB Enterprise Cluster, each node contains a replica of all the data in the cluster. As such, you run MariaDB Enterprise Backup on any node to back up the available data. The process for backing up a node is the same as for a single MariaDB Enterprise Server.
MariaDB Enterprise Server supports data-at-rest encryption to secure data on disk, and data-in-transit encryption to secure data on the network.
MariaDB Enterprise Server support data-at-rest encryption of the GCache, the file used by Galera systems to cache write sets. Encrypting GCache ensures the Server encrypts both data it temporarily caches from the cluster as well as the data it permanently stores in tablespaces.
For data-in-transit, MariaDB Enterprise Cluster supports encryption the same as MariaDB Server and additionally provides data-in-transit encryption for Galera replication traffic and for State Snapshot Transfer (SST) traffic.
MariaDB Enterprise Server 10.6 encrypts Galera replication and SST traffic using the server's TLS configuration by default. With the wsrep_ssl_mode
system variable, you can configure the node to use the TLS configuration of wsrep Provider options.
MariaDB Enterprise Server 10.5 and earlier support encrypting Galera replication and SST traffic through wsrep Provider options.
TLS encryption is only available when used by all nodes in the cluster.
To encrypt data-at-rest such as GCache, stop the server, set encrypt_binlog=ON
within the MariaDB Enterprise Server configuration file, and restart the server. This variable also controls encryption of the binary log and the relay log when used.
[mariadb]
...
# Controls Binary Log, Relay Log, and GCache Encryption
encrypt_binlog=ON
To stop using encryption on the GCache file, stop the server, set encrypt_binlog=OFF
within the MariaDB Enterprise Server configuration file, and restart the server. This variable also controls encryption of the binary log and the relay log when used.
[mariadb]
...
# Controls Binary Log, Relay Log, and GCache Encryption
encrypt_binlog=OFF
This page is: Copyright © 2025 MariaDB. All rights reserved.
The most recent release of MariaDB 11.4 is: MariaDB 11.4.5 Stable (GA) Download Now Alternate download from mariadb.org
MariaDB Galera Cluster is a Linux-exclusive, multi-primary cluster designed for MariaDB, offering features such as active-active topology, read/write capabilities on any node, automatic membership and node joining, true parallel replication at the row level, and direct client connections, with an emphasis on the native MariaDB experience.
MariaDB Galera Cluster is a virtually synchronous multi-primary cluster for MariaDB. It is available on Linux only and only supports the InnoDB storage engine (although there is experimental support for MyISAM and, from MariaDB 10.6, Aria. See the wsrep_replicate_myisam system variable, or, from MariaDB 10.6, the wsrep_mode system variable.
Active-active multi-primary topology
Read and write to any cluster node
Automatic membership control: failed nodes drop from the cluster
Automatic node joining
True parallel replication, on row level
Direct client connections, native MariaDB look & feel
The above features yield several benefits for a DBMS clustering solution, including:
No replica lag
No lost transactions
Read scalability
Smaller client latencies
The Getting Started with MariaDB Galera Cluster page has instructions on how to get up and running with MariaDB Galera Cluster.
A great resource for Galera users is Codership on Google Groups (codership-team
'at'
googlegroups
(dot)
com
) - If you use Galera, it is recommended you subscribe.
MariaDB Galera Cluster is powered by:
MariaDB Server.
The functionality of MariaDB Galera Cluster can be obtained by installing the standard MariaDB Server packages and the Galera wsrep provider library package. The following Galera version corresponds to each MariaDB Server version:
In MariaDB 10.4 and later, MariaDB Galera Cluster uses Galera 4. This means that the wsrep API version is 26 and the Galera wsrep provider library is version 4.X.
In MariaDB 10.3 and before, MariaDB Galera Cluster uses Galera 3. This means that the wsrep API is version 25 and the Galera wsrep provider library is version 3.X.
See Deciphering Galera Version Numbers for more information about how to interpret these version numbers.
The following table lists each version of the Galera 4 wsrep provider, and it lists which version of MariaDB each one was first released in. If you would like to install Galera 4 using yum, apt, or zypper, then the package is called galera-4
.
26.4.2
26.4.1
26.4.0
See here for Galera 3 versions.
Codership on Google Groups (codership-team 'at' googlegroups (dot) com
) - A great mailing list for Galera users.
This page is licensed: CC BY-SA / Gnu FDL
In MariaDB Cluster, transactions are replicated using the wsrep API, synchronously ensuring consistency across nodes. Synchronous replication offers high availability and consistency but is complex and potentially slower compared to asynchronous replication. Due to these challenges, asynchronous replication is often preferred for database performance and scalability, as seen in popular systems like MySQL and PostgreSQL, which typically favor asynchronous or semi-synchronous solutions.
In MariaDB Cluster, the server replicates a transaction at commit time by broadcasting the write set associated with the transaction to every node in the cluster. The client connects directly to the DBMS and experiences behavior that is similar to native MariaDB in most cases. The wsrep API (write set replication API) defines the interface between Galera replication and MariaDB.
The basic difference between synchronous and asynchronous replication is that "synchronous" replication guarantees that if a change happened on one node in the cluster, then the change will happen on other nodes in the cluster "synchronously," or at the same time. "Asynchronous" replication gives no guarantees about the delay between applying changes on the "master" node and the propagation of changes to "slave" nodes. The delay with "asynchronous" replication can be short or long. This also implies that if a master node crashes in an "asynchronous" replication topology, then some of the latest changes may be lost.
Theoretically, synchronous replication has several advantages over asynchronous replication:
Clusters utilizing synchronous replication are always highly available. If one of the nodes crashed, then there would be no data loss. Additionally, all cluster nodes are always consistent.
Clusters utilizing synchronous replication allow transactions to be executed on all nodes in parallel.
Clusters utilizing synchronous replication can guarantee causality across the whole cluster. This means that if a transactionSELECT
is executed on one cluster node after a transaction is executed on another cluster node, it should see the effects of that transaction.
However, in practice, synchronous database replication has traditionally been implemented via the so-called "2-phase commit" or distributed locking, which proved to be very slow. Low performance and complexity of implementation of synchronous replication led to a situation where asynchronous replication remains the dominant means for database performance scalability and availability. Widely adopted open-source databases such as MySQL or PostgreSQL offer only asynchronous or semi-synchronous replication solutions.
Galera's replication is not completely synchronous. It is sometimes called virtually synchronous replication.
An alternative approach to synchronous replication that uses group communication and transaction ordering techniques was suggested by a number of researchers. For example:
Prototype implementations have shown a lot of promise. We combined our experience in synchronous database replication and the latest research in the field to create the Galera Replication library and the wsrep API.
Galera replication is a highly transparent, scalable, and virtually synchronous replication solution for database clustering to achieve high availability and improved performance. Galera-based clusters are:
Highly available
Highly transparent
Highly scalable (near-linear scalability may be reached depending on the application)
Galera replication functionality is implemented as a shared library and can be linked with any transaction processing system that implements the wsrep API hooks.
The Galera replication library is a protocol stack providing functionality for preparing, replicating, and applying transaction write sets. It consists of:
wsrep API specifies the interface responsibilities for DBMS and replication provider
wsrep hooks is the wsrep integration in the DBMS engine.
Galera provider implements the wsrep API for Galera library
certification layer takes care of preparing write sets and performing certification
replication manages replication protocol and provides total ordering capabilities
GCS framework provides plugin architecture for group communication systems
many gcs implementations can be adapted; we have experimented with spread and our in-house implementations: vsbes and Gemini.
Many components in the Galera replication library were redesigned and improved with the introduction of Galera 4.
Although the Galera provider certifies the write set associated with a transaction at commit time on each node in the cluster, this write set is not necessarily applied on that cluster node immediately. Instead, the write set is placed in the cluster node's receive queue on the node, and it is eventually applied by one of the cluster node's Galera slave threads.
The number of Galera slave threads can be configured with the wsrep_slave_threads system variable.
The Galera slave threads are able to determine which write sets are safe to apply in parallel. However, if your cluster nodes seem to have frequent consistency problems, then setting the value to 1
will probably fix the problem.
When a cluster node's state, as seen by wsrep_local_state_comment, isJOINED
, then increasing the number of slave threads may help the cluster node catch up with the cluster more quickly. In this case, it may be useful to set the number of threads to twice the number of CPUs on the system.
Streaming replication was introduced in Galera 4.
In older versions of MariaDB Cluster, there was a 2GB limit on the size of the transaction you could run. The node waits on the transaction commit before performing replication and certification. With large transactions, long-running writes, and changes to huge datasets, there was a greater possibility of a conflict forcing a rollback on an expensive operation.
Using streaming replication, the node breaks huge transactions up into smaller and more manageable fragments; it then replicates these fragments to the cluster as it works instead of waiting for the commit. Once certified, the fragment can no longer be aborted by conflicting transactions. As this can have performance consequences both during execution and in the event of rollback, it is recommended that you only use it with large transactions that are unlikely to experience conflict.
For more information on streaming replication, see the Galera documentation.
Group Commit support for MariaDB Cluster was introduced in Galera 4.
In MariaDB Group Commit, groups of transactions are flushed together to disk to improve performance. In previous versions of MariaDB, this feature was not available in MariaDB Cluster, as it interfered with the global ordering of transactions for replication. MariaDB Cluster can now take advantage of Group Commit.
For more information on Group Commit, see the Galera documentation.
This page is licensed: CC BY-SA / Gnu FDL