MariaDB Galera Cluster Usage Guide

MariaDB Galera Cluster usage guide

Quickstart Guide: MariaDB Galera Cluster Usage

This guide provides essential information for effectively using and interacting with a running MariaDB Galera Cluster. It covers connection methods, operational considerations, monitoring, and best practices for applications.

1. Connecting to the Cluster

Since Galera Cluster is multi-primary, any node can accept read and write connections.

a. Using a Load Balancer (Recommended for Production):

Deploying a load balancer or proxy (like MariaDB MaxScale, ProxySQL, or HAProxy) is the recommended approach.

MariaDB MaxScale: Provides intelligent routing (e.g., readwritesplit, router), connection pooling, and advanced cluster awareness (e.g., binlogrouter for replication clients, switchover for failover).
Other Load Balancers: Configure them to distribute connections across your Galera nodes, typically using health checks on port 3306 or other cluster-specific checks.

b. Direct Connection:

You can connect directly to any individual node's IP address or hostname using standard MariaDB client tools or connectors (e.g., mariadb command-line client, MariaDB Connector/J, Connector/Python).

Example (Command-line):Bash
```
mariadb -h <node_ip_address> -u <username> -p
```
While simple, this method lacks automatic failover; your application would need to handle connection retries and failover logic.

2. Basic Operations (Reads & Writes)

Active-Active: You can perform both read and write operations on any node in the cluster. All successful write operations are synchronously replicated to all other nodes.
Transactions: Standard SQL transactions (START TRANSACTION, COMMIT, ROLLBACK) work as expected. Galera handles the replication of committed transactions.

3. DDL (Data Definition Language) Operations

DDL operations (like CREATE TABLE, ALTER TABLE, DROP TABLE) require special attention in a synchronous multi-primary cluster to avoid conflicts and outages.

Total Order Isolation (TOI) - Default:
- This is Galera's default DDL method.
- The DDL statement is executed on all nodes in the same order, and it temporarily blocks other transactions on all nodes while it applies.
- It ensures consistency but can cause brief pauses in application activity, especially on busy clusters.
- Best Practice: Execute DDL during maintenance windows or low-traffic periods.
Rolling Schema Upgrade (RSU) / Percona's pt-online-schema-change:
- For large tables or critical production systems, use tools like pt-online-schema-change (from Percona Toolkit) which performs DDL without blocking writes.
- This tool works by creating a new table, copying data, applying changes, and then swapping the tables. It's generally preferred for minimizing downtime for ALTER TABLE operations.
wsrep_OSU_method:
- This system variable controls how DDL operations are executed.
- TOI (default): Total Order Isolation.
- RSU: Rolling Schema Upgrade (requires manual steps with pt-online-schema-change).
- NBO (Non-Blocking Operations): A newer method allowing non-blocking DDL for some operations, but not fully implemented for all DDL types. Use with caution and test thoroughly.

4. Monitoring Cluster Status

Regularly monitor your Galera Cluster to ensure its health and consistency.

wsrep_cluster_size: Number of nodes currently in the Primary Component.SQL
```
SHOW STATUS LIKE 'wsrep_cluster_size';
```
Expected value: the total number of nodes configured (e.g., 3).
wsrep_local_state_comment / wsrep_local_state: The state of the current node.
- Synced (4): Node is fully synchronized and operational.
- Donor/Desync (2): Node is transferring state to another node.
- Joining (1): Node is in the process of joining the cluster.
- Donor/Stalled (1): Node is stalled.
wsrep_incoming_addresses: List of incoming connections from other cluster nodes.
wsrep_cert_deps_distance: Indicates flow control. A high value suggests that this node is falling behind and flow control may activate.
wsrep_flow_control_paused: Percentage of time the node was paused due to flow control. High values indicate a bottleneck.
wsrep_local_recv_queue / wsrep_local_send_queue: Size of the receive/send queue. Ideally, these should be close to 0. Sustained high values indicate replication lag or node issues.

5. Handling Node Failures and Recovery

Galera Cluster is designed for automatic recovery, but understanding the process is key.

Node Failure: If a node fails, the remaining nodes continue to operate as the Primary Component. The failed node will automatically attempt to rejoin when it comes back online.
Split-Brain Scenarios: If the network partitions the cluster, nodes will try to form a "Primary Component." The partition with the majority of nodes forms the new Primary Component. If no majority can be formed (e.g., a 2-node cluster splits), the cluster will become inactive. A 3-node or higher cluster is recommended to avoid this.
Manual Bootstrapping (Last Resort): If the entire cluster goes down or a split-brain occurs where no Primary Component forms, you might need to manually "bootstrap" a new Primary Component from one of the healthy nodes.
- Choose the node that was most up-to-date.
- Stop MariaDB on that node.
- Start it with: sudo galera_new_cluster or sudo systemctl start mariadb --wsrep-new-cluster.
- Start other nodes normally; they will rejoin the bootstrapped component.

6. Application Best Practices

Use Connection Pooling: Essential for managing connections efficiently in high-traffic applications.
Short Transactions: Keep transactions as short and concise as possible to minimize conflicts and improve throughput. Long-running transactions increase the risk of rollbacks due to certification failures.
Primary Keys: All tables should have a primary key. Galera relies on primary keys for efficient row-level replication. Tables without primary keys can cause performance degradation and issues.
Retry Logic: Implement retry logic in your application for failed transactions (e.g., due to certification failures, deadlock, or temporary network issues).
Connect to a Load Balancer: Always direct your application's connections through a load balancer or proxy to leverage automatic failover and intelligent routing.

By following these guidelines, you can effectively manage and operate your MariaDB Galera Cluster for high availability and performance.

Further Resources:

MariaDB Galera Cluster Guide
Galera Cluster Documentation - Operational Aspects
MariaDB documentation - Galera Cluster Best Practices
MariaDB documentation - Galera Cluster Monitor

PreviousMariaDB Galera Cluster Replication Guide NextGalera Management

Last updated 5 days ago

Was this helpful?