Only this pageAll pages
Powered by GitBook
1 of 88

ColumnStore

MariaDB ColumnStore

Discover MariaDB ColumnStore, the powerful columnar storage engine for analytical workloads. Learn about its architecture, features, and how it enables high-performance data warehousing and analytics.

Quickstart GuidesColumnStore ArchitectureManaging ColumnStoreSecurityUse CasesHigh AvailabilityClients & ToolsTutorialsReference

Quickstart Guides

MariaDB ColumnStore Quickstart Guides provide concise, Docker-friendly steps to quickly set up, configure, and explore the ColumnStore analytic engine.

MariaDB ColumnStore Guide

Quickstart guide for MariaDB ColumnStore

Quickstart Guide: MariaDB ColumnStore

MariaDB ColumnStore is a specialized columnar storage engine designed for high-performance analytical processing and big data workloads. Unlike traditional row-based storage engines, ColumnStore organizes data by columns, which is highly efficient for analytical queries that often access only a subset of columns across vast datasets.

1. What is MariaDB ColumnStore?

MariaDB ColumnStore is a columnar storage engine that integrates with MariaDB Server. It employs a massively parallel distributed data architecture, making it ideal for processing petabytes of data with linear scalability. It was originally ported from InfiniDB and is released under the GPL license.

2. Key Benefits

  • Exceptional Analytical Performance: Delivers superior performance for complex analytical queries (OLAP) due to its columnar nature, which minimizes disk I/O by reading only necessary columns.

  • High Data Compression: Columnar storage allows for much higher compression ratios compared to row-based storage, reducing disk space usage and improving query speed.

  • Massive Scalability: Designed to scale horizontally across multiple nodes, processing petabytes of data with ease.

  • Just-in-Time Projection: Only the required columns are processed and returned, further optimizing query execution.

  • Real-time Analytics: Capable of handling real-time analytical queries efficiently.

3. Architecture Concepts (Simplified)

MariaDB ColumnStore utilizes a distributed architecture with different components working together:

  • User Module (UM): Handles incoming SQL queries, optimizes them for columnar processing, and distributes tasks.

  • Performance Module (PM): Manages data storage, compression, and execution of query fragments on the data segments.

  • Data Files: Data is stored in column-segments across the nodes, highly compressed.

4. Installation Overview

MariaDB ColumnStore is installed as a separate package that integrates with MariaDB Server. The exact installation steps vary depending on your operating system and desired deployment type (single server or distributed cluster).

General Steps (conceptual):

  1. Install MariaDB Server: Ensure you have a compatible MariaDB Server version installed (e.g., MariaDB 10.5.4 or later).

  2. Install ColumnStore Package: Download and install the specific MariaDB ColumnStore package for your OS. This package includes the ColumnStore storage engine and its associated tools.

    • Linux (e.g., Debian/Ubuntu): You would typically add the MariaDB repository configured for ColumnStore and then install mariadb-plugin-columnstore.

    • Single Server vs. Distributed: For a single-server setup, you install all ColumnStore components on one machine. For a distributed setup, you install and configure components across multiple machines.

  3. Configure MariaDB: After installation, you might need to adjust your MariaDB server configuration (my.cnf or equivalent) to properly load and manage the ColumnStore engine.

  4. Initialize ColumnStore: Run a specific columnstore-setup or post-install script to initialize the ColumnStore environment.

5. Basic Usage

Once MariaDB ColumnStore is installed and configured, you can create and interact with ColumnStore tables using standard SQL.

a. Create a ColumnStore Table:

Specify ENGINE=ColumnStore when creating your table. Note that ColumnStore tables do not support primary keys in the same way as InnoDB, as their primary focus is analytical processing.

CREATE TABLE sales_data (
    sale_id INT,
    product_name VARCHAR(255),
    category VARCHAR(100),
    sale_date DATE,
    quantity INT,
    price DECIMAL(10, 2)
) ENGINE=ColumnStore;

b. Insert Data:

You can insert data using standard INSERT statements. For large datasets, bulk loading utilities (e.g., LOAD DATA INFILE) are highly recommended for performance.

INSERT INTO sales_data (sale_id, product_name, category, sale_date, quantity, price) VALUES
(1, 'Laptop', 'Electronics', '2023-01-15', 1, 1200.00),
(2, 'Mouse', 'Electronics', '2023-01-15', 2, 25.00),
(3, 'Keyboard', 'Electronics', '2023-01-16', 1, 75.00);

c. Query Data:

Perform analytical queries. ColumnStore will efficiently process these, often leveraging its columnar nature and parallelism.

-- Get total sales per category
SELECT category, SUM(quantity * price) AS total_sales
FROM sales_data
WHERE sale_date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY category
ORDER BY total_sales DESC;

-- Count distinct products
SELECT COUNT(DISTINCT product_name) FROM sales_data;

Further Resources:

  • MariaDB ColumnStore Overview

  • MariaDB documentation: MariaDB ColumnStore

  • DigitalOcean: How to Install MariaDB ColumnStore on Ubuntu 20.04

MariaDB ColumnStore Hardware Guide

Quickstart guide for MariaDB ColumnStore hardware requirements

Quickstart Guide: MariaDB ColumnStore Minimum Hardware Specifications

MariaDB ColumnStore is designed for analytical workloads and scales linearly with hardware resources. While the performance generally improves with more CPU cores, memory, and servers, understanding the minimum hardware specifications is crucial for successful deployment, especially in development and production environments.

General Principle

MariaDB ColumnStore's performance directly benefits from additional hardware resources. More CPU cores enable greater parallel processing, increased memory allows for more data caching (reducing I/O), and more servers enable a larger distributed architecture.

Minimum Hardware Recommendations

The specifications differentiate between a basic development environment and a production-ready setup:

1. For Development Environments:

  • CPU: A minimum of 8 CPU cores.

  • Memory (RAM): A minimum of 32 GB.

  • Storage: Local disk storage is acceptable for development purposes.

2. For Production Environments:

  • CPU: A minimum of 64 CPU cores.

    • Note: This recommendation underscores the highly parallel nature of ColumnStore, which can effectively utilize a large number of cores for analytical processing.

  • Memory (RAM): A minimum of 128 GB.

    • Note: Adequate memory is critical for caching data and intermediate results, directly impacting query performance.

  • Storage: StorageManager (S3) is recommended.

    • Note: This implies leveraging cloud-object storage (like AWS S3 or compatible services) for scalable and durable data persistence in production.

Network Interconnectivity (for Multi-Server Deployments)

  • Minimum Network: For multi-server ColumnStore deployments, a minimum of a 1 Gigabit (1G) network is recommended.

    • Note: This facilitates efficient data transfer between nodes via TCP/IP for replication and query processing across the distributed architecture. For optimal performance in heavy-load scenarios, higher bandwidth (e.g., 10G or more) is highly beneficial.

Adhering to these minimum specifications will provide a baseline for ColumnStore functionality. For specific workload requirements, it's always advisable to conduct performance testing and scale hardware accordingly.

Further Resources:

  • MariaDB ColumnStore Minimum Hardware Specification Documentation

  • MariaDB ColumnStore Overview

  • MariaDB documentation: MariaDB ColumnStore

ColumnStore Architecture

MariaDB ColumnStore uses a shared-nothing, distributed architecture with separate modules for SQL and storage, enabling scalable, high-performance analytics.

Release notes and other documentation for ColumnStore is also available in the Enterprise docs section of the MariaDB website. For example:

  • ColumnStore 23.10 Release Notes

  • ColumnStore 23.02 Release Notes

  • ColumnStore 22.08 Release Notes

  • ColumnStore 6 Release Notes

  • ColumnStore 5.6 Release Notes

  • Deployment Instructions

ColumnStore Architectural Overview

MariaDB ColumnStore enhances MariaDB Enterprise Server with a columnar engine for OLAP and HTAP workloads, using MPP for scalability. It supports cross-engine JOINs, integrates with S3 storage, and provides high-speed bulk loading with multi-node management via REST API.

MariaDB ColumnStore is a columnar storage engine designed for distributed massively parallel processing (MPP), such as for big data analysis. Deployments can be composed of several MariaDB servers or just one, each running several subprocess working together to provide linear scalability and exceptional performance with real-time response to analytical queries.

It provides a highly available, fault tolerant, and performant columnar storage engine for MariaDB Enterprise Server. MariaDB Enterprise ColumnStore is designed for data warehousing, decision support systems (DSS), online analytical processing (OLAP), and hybrid transactional-analytical processing (HTAP).

Benefits

  • Columnar storage engine that enables MariaDB Enterprise Server to perform new workloads

  • Optimized for online analytical process (OLAP) workloads including data warehousing, decision support systems, and business intelligence

  • Single-stack solution for hybrid transactional-analytical workloads to eliminate barriers and prevent data silos

  • Implements cross-engine JOINs to join Enterprise ColumnStore tables with tables using row-based storage engines, such as InnoDB

  • Smart storage engine that plans and optimizes its own queries using a custom select handler

  • Scalable query execution using massively parallel processing (MPP) strategies, parallel query execution, and distributed function evaluation

  • S3-compatible object storage can be used for highly available, low-cost, multi-regional, resilient, scalable, secure, and virtually limitless data storage

  • High availability and automatic failover by leveraging MariaDB MaxScale

  • REST API for multi-node administration with the Cluster Management API (CMAPI) server

  • Connectors for popular BI platforms such as Microsoft Power BI and Tableau

  • High-speed bulk data loading that bypasses the SQL layer and does not block concurrent read-only queries

Topologies

MariaDB Enterprise ColumnStore supports multiple topologies. Several options are described below. MariaDB Enterprise ColumnStore can be deployed in other topologies. The topologies on this page are representative of basic product capabilities.

MariaDB products can be deployed to form other topologies that leverage advanced product capabilities and combine the capabilities of multiple topologies.

Enterprise ColumnStore with Object Storage

columnstore-topology-s3

The MariaDB Enterprise ColumnStore topology with Object Storage delivers production analytics with high availability, fault tolerance, and limitless data storage by leveraging S3-compatible storage.

The topology consists of:

  • One or more MaxScale nodes

  • An odd number of ColumnStore nodes (minimum of 3) running ES, Enterprise ColumnStore, and CMAPI

The MaxScale nodes:

  • Monitor the health and availability of each ColumnStore node using the MariaDB Monitor (mariadbmon)

  • Accept client and application connections

  • Route queries to ColumnStore nodes using the Read/Write Split Router (readwritesplit)

The ColumnStore nodes:

  • Receive queries from MaxScale

  • Execute queries

  • Use S3-compatible object storage for data

  • Use Shared Local Storage for the Storage Manager directory.

Enterprise ColumnStore with Shared Local Storage

es-columnstore-topology-nfs-no-title

The MariaDB Enterprise ColumnStore topology with Shared Local Storage delivers production analytics with high availability and fault tolerance by leveraging shared local storage, such as NFS.

The topology consists of:

  • One or more MaxScale nodes

  • An odd number of ColumnStore nodes (minimum of 3) running ES, Enterprise ColumnStore, and CMAPI

The MaxScale nodes:

  • Monitor the health and availability of each ColumnStore node using the MariaDB Monitor (mariadbmon)

  • Accept client and application connections

  • Route queries to ColumnStore nodes using the Read/Write Split Router (readwritesplit)

The ColumnStore nodes:

  • Receive queries from MaxScale

  • Execute queries

  • Use Shared Local Storage for the DB Root directories.

Software Architecture

Software Component
Role

Software Component

Role

MariaDB Enterprise ColumnStore

• Columnar storage engine • Query execution • Data storage

MariaDB Enterprise Server

Enterprise-grade database server

ColumnStore Storage Engine Plugin

• Storage engine plugin • Integrates MariaDB Enterprise ColumnStore into MariaDB Enterprise Server

Cluster Management API (CMAPI)

• REST API • Used for administrative tasks

MariaDB MaxScale

• Database proxy • Accepts connections • Routes queries • Performs auto-failover

MariaDB Enterprise ColumnStore

MariaDB Enterprise ColumnStore is the columnar storage engine that handles data storage and query optimization/execution.

MariaDB Enterprise ColumnStore is a columnar storage engine that is optimized for analytical or online analytical processing (OLAP) workloads, data warehousing, and DSS. MariaDB Enterprise ColumnStore can be used for hybrid transactional-analytical processing (HTAP) workloads when paired with a row-based storage engine, like InnoDB.

MariaDB Enterprise Server

MariaDB Enterprise ColumnStore is built on top of MariaDB Enterprise Server. MariaDB Enterprise ColumnStore 5 is included with the standard MariaDB Enterprise Server 10.5 releases, while MariaDB Enterprise ColumnStore 6 is included with the standard MariaDB Enterprise Server 10.6 releases.

Enterprise ColumnStore interfaces with the Enterprise Server SQL engine through the ColumnStore storage engine plugin.

MariaDB has been continually improving the integration of MariaDB Enterprise ColumnStore with MariaDB Enterprise Server:

  • MariaDB ColumnStore required special custom-built releases of MariaDB Server.

  • MariaDB Enterprise ColumnStore was included with the standard MariaDB Enterprise Server 10.5 releases up to ES 10.5.5-3. It was the first release to replace the Operations/Administration/Maintenance (OAM) API with the more modern Cluster Management API (CMAPI), which is still in use.

  • Starting with ES 10.5.6-4, MariaDB Enterprise ColumnStore is included with the standard MariaDB Enterprise Server 10.5 releases.

ColumnStore Storage Engine Plugin

MariaDB Enterprise ColumnStore integrates with MariaDB Enterprise Server using the ColumnStore storage engine plugin. The ColumnStore storage engine plugin enables MariaDB Enterprise Server to interact with ColumnStore tables.

The ColumnStore storage engine plugin is a smart storage engine that implements a custom select handler to fully take advantage of Enterprise ColumnStore's capabilities, such as:

  • Using a custom query planner

  • Selecting data by column instead of by row

  • Parallel query evaluation

  • Distributed aggregations

  • Distributed functions

  • Extent elimination

As a smart storage engine, the ColumnStore storage engine plugin tightly integrates Enterprise ColumnStore with ES, but it has enough independence to efficiently execute analytical queries using a completely unique approach.

For additional information, see "ColumnStore Storage Engine".

Cluster Management API (CMAPI) Server

The Cluster Management API (CMAPI) server provides a REST API that can be used to configure and manage Enterprise ColumnStore.

CMAPI must run on every ColumnStore node in a multi-node deployment but is not required in a single-node deployment.

The REST API can be used to perform multiple operations:

  • Add ColumnStore nodes

  • Remove ColumnStore nodes

  • Start Enterprise ColumnStore

  • Shutdown Enterprise ColumnStore

  • Check the status of Enterprise ColumnStore

MariaDB MaxScale

MariaDB Enterprise ColumnStore leverages MariaDB MaxScale as an advanced database proxy and query router.

Multi-node Enterprise ColumnStore deployments must have one or more MaxScale nodes. MaxScale performs many different roles:

  • Routing write queries to the primary server

  • Load balancing read queries on replica servers

  • Monitoring node health

  • Performing automatic failover if a node fails

Storage Architecture

MariaDB Enterprise ColumnStore's storage architecture provides a columnar storage engine with high availability, fault tolerance, compression, and automatic partitioning for production analytics and data warehousing.

For additional information, see "MariaDB Enterprise ColumnStore Storage Architecture".

Columnar Storage Engine

MariaDB Enterprise ColumnStore is a columnar storage engine for MariaDB Enterprise Server. MariaDB Enterprise ColumnStore enables ES to perform analytical workloads, including online analytical processing (OLAP), data warehousing, decision support systems (DSS), and hybrid transactional-analytical processing (HTAP) workloads.

Most traditional relational databases use row-based storage engines. In row-based storage engines, all columns for a table are stored contiguously. Row-based storage engines perform very well for transactional workloads, but are less performant for analytical workloads.

Columnar storage engines store each column separately. Columnar storage engines perform very well for analytical workloads. Analytical workloads are characterized by ad hoc queries on very large data sets by relatively few users.

MariaDB Enterprise ColumnStore automatically partitions each column into extents, which helps improve query performance without using indexes.

S3-Compatible Object Storage

MariaDB Enterprise ColumnStore supports S3-compatible object storage.

S3-compatible object storage is optional, but highly recommended. If S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use Shared Local Storage (such as NFS) for high availability.

S3-compatible object storage is:

  • Compatible: Many object storage services are compatible with the Amazon S3 API.

  • Economical: S3-compatible object storage is often very low cost.

  • Flexible: S3-compatible object storage is available for both cloud and on-premises deployments.

  • Limitless: S3-compatible object storage is often virtually limitless.

  • Resilient: S3-compatible object storage is often low maintenance and highly available, since many services use resilient cloud infrastructure.

  • Scalable: S3-compatible object storage is often highly optimized for read and write scaling.

  • Secure: S3-compatible object storage is often encrypted-at-rest.

Many S3-compatible object storage services exist. MariaDB Corporation cannot make guarantees about all S3-compatible object storage services, because different services provide different functionality.

If you have any questions about using specific S3-compatible object storage with MariaDB Enterprise ColumnStore, contact us.

Shared Local Storage

MariaDB Enterprise ColumnStore can use shared local storage.

Shared local storage is required for high availability. The specific Shared Local Storage requirements depend on whether Enterprise ColumnStore is configured to use S3-compatible object storage:

  • When S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use shared local storage for high availability.

  • When S3-compatible object storage is not used, Enterprise ColumnStore requires the DB Root directories to use shared local storage for high availability.

The most common shared local storage options for on-premises and cloud deployments are:

  • NFS (Network File System)

  • GlusterFS

The most common shared local storage options for AWS (Amazon Web Services) deployments are:

  • EBS (Elastic Block Store) Multi-Attach

  • EFS (Elastic File System)

The most common shared local storage option for GCP (Google Cloud Platform) deployments is:

  • Filestore

Query Evaluation Architecture

ECStore-QueryExecutionwith-S3-FlowChart

MariaDB Enterprise ColumnStore uses distributed query execution and massively parallel processing (MPP) techniques to achieve vertical and horizontal scalability for production analytics and data warehousing.

For additional information, see "MariaDB Enterprise ColumnStore Query Evaluation".

Extent Elimination

ECStore-QueryExecutionExtentElimination

MariaDB Enterprise ColumnStore uses extent elimination to scale query evaluation as table size increases.

Most databases are row-based databases that use manually-created indexes to achieve high performance on large tables. This works well for transactional workloads. However, analytical queries tend to have very low selectivity, so traditional indexes are not typically effective for analytical queries.

Enterprise ColumnStore uses extent elimination to achieve high performance, without requiring manually created indexes. Enterprise ColumnStore automatically partitions all data into extents. Enterprise ColumnStore stores the minimum and maximum values for each extent in the extent map. Enterprise ColumnStore uses the minimum and maximum values in the extent map to perform extent elimination.

When Enterprise ColumnStore performs extent elimination, it compares the query's join conditions and filter conditions (i.e., WHERE clause) to the minimum and maximum values for each extent in the extent map. If the extent's minimum and maximum values fall outside the bounds of the query's conditions, Enterprise ColumnStore skips that extent for the query.

Extent elimination is automatically performed for every query. It can significantly decrease I/O for columns with clustered values. For example, extent elimination works effectively for series, ordered, patterned, and time-based data.

Custom Select Handler

The ColumnStore storage engine plugin implements a custom select handler to fully take advantage of Enterprise ColumnStore's capabilities.

All storage engines interact with ES using an internal handler API, which is highly extensible. Storage engines can implement different features by implementing different methods within the handler API.

For select statements, the handler API transforms each query into a SELECT_LEX object, which is provided to the select handler.

The generic select handler is not optimal for Enterprise ColumnStore, because:

  • Enterprise ColumnStore selects data by column, but the generic select handler selects data by row

  • Enterprise ColumnStore supports parallel query evaluation, but the generic select handler does not

  • Enterprise ColumnStore supports distributed aggregations, but the generic select handler does not

  • Enterprise ColumnStore supports distributed functions, but the generic select handler does not

  • Enterprise ColumnStore supports extent elimination, but the generic select handler does not

  • Enterprise ColumnStore has its own query planner, but the generic select handler cannot use it

Smart Storage Engine

The ColumnStore storage engine plugin is known as a smart storage engine, because it implements a custom select handler. MariaDB Enterprise ColumnStore integrates with MariaDB Enterprise Server using the ColumnStore storage engine plugin. The ColumnStore storage engine plugin enables MariaDB Enterprise Server to interact with ColumnStore tables.

If a storage engine implements a custom select handler, it is known as a smart storage engine.

As a smart storage engine, the ColumnStore storage engine plugin tightly integrates Enterprise ColumnStore with ES, but it has enough independence to efficiently execute analytical queries using a completely unique approach.

Query Planning

The ColumnStore storage engine plugin is a smart storage engine, so MariaDB Enterprise ColumnStore to plan its own queries using the custom select handler.

MariaDB Enterprise ColumnStore's query planning is divided into two steps:

  1. ES provides the query's SELECT_LEX object to the custom select handler. The custom select handler builds a ColumnStore Execution Plan (CSEP).

  2. The custom select handler provides the CSEP to the ExeMgr process on the same node. The ExeMgr process performs extent elimination and creates a job list.

Job Steps

When Enterprise ColumnStore executes a query, the ExeMgr process on the initiator/aggregator node translates the ColumnStore execution plan (CSEP) into a job list. A job list is a sequence of job steps.

Enterprise ColumnStore uses many different types of job steps that provide different scalability benefits:

  • Some types of job steps perform operations in a distributed manner using multiple nodes to operate on different extents. Distributed operations provide horizontal scalability.

  • Some types of job steps perform operations in a multi-threaded manner using a thread pool. Performing multi-threaded operations provides vertical scalability.

As you increase the number of ColumnStore nodes or the number of cores on each node, Enterprise ColumnStore can use those resources to more efficiently execute job steps.

High Availability and Failover

MariaDB Enterprise ColumnStore leverages common technologies to provide highly available production analytics with automatic failover:

Technology
Role

Technology

Role

S3-compatible object storage

• HA for data • Optional.

Shared Local Storage

• With S3: HA for Storage Manager directory • Without S3: HA for DB Root directories

MariaDB Replication

• Schema replication (ColumnStore tables) • Schema and data replication (non-ColumnStore tables) • Database object replication

MaxScale

• Monitoring • Automatic failover • Load balancing

Cluster Management API (CMAPI) Server

• REST API • Administration • Add nodes • Remove nodes

Shared Local Storage

MariaDB Enterprise ColumnStore can use shared local storage.

Shared local storage is required for high availability. The specific Shared Local Storage requirements depend on whether Enterprise ColumnStore is configured to use S3-compatible object storage:

  • When S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use shared local storage for high availability.

  • When S3-compatible object storage is not used, Enterprise ColumnStore requires the DB Root directories to use Shared Local Storage for high availability.

The most common shared local storage options for on-premises and cloud deployments are:

  • NFS (Network File System)

  • GlusterFS

The most common shared local storage options for AWS (Amazon Web Services) deployments are:

  • EBS (Elastic Block Store) Multi-Attach

  • EFS (Elastic File System)

The most common shared local storage option for GCP (Google Cloud Platform) deployments is:

  • Filestore

MariaDB Replication

MariaDB Enterprise ColumnStore requires MariaDB Replication to synchronize various database objects on multiple nodes for high availability.

MariaDB replication synchronizes:

  • The schemas for all ColumnStore tables on all nodes

  • The schemas and data for all non-ColumnStore tables on all nodes

  • All other databases objects (stored procedures, stored functions, user accounts, and other objects) on all nodes

MaxScale

MariaDB Enterprise ColumnStore requires MariaDB MaxScale to achieve high availability, automatic failover, and load balancing.

MariaDB Monitor (mariadbmon) in MaxScale monitors the health of each Enterprise ColumnStore node.

MaxScale provides load balancing by routing queries and/or connections to healthy nodes by:

  • Providing query-based routing using Read/Write Split Router (readwritesplit).

  • Providing connection-based routing using Read Connection Router (readconnroute).

When MaxScale's MariaDB Monitor notices the primary node fail, MariaDB Monitor performs automatic failover by:

  • Promoting a replica node to become the new primary node.

  • Re-configuring all replica nodes to replicate from the new primary node.

Cluster Management API (CMAPI) Server

MariaDB Enterprise ColumnStore requires the [[Cluster Management API (CMAPI) Server for high availability.

The CMAPI server provides a REST API that can be used to manage and configure Enterprise ColumnStore.

The CMAPI server has a role in automatic failover. After MaxScale performs automatic failover, the CMAPI server detects the topology change and automatically re-configures the roles of each Enterprise ColumnStore node.

Data Loading

ECStoreDataLoadingS3FlowChart

MariaDB Enterprise ColumnStore performs bulk data loads very efficiently using a variety of mechanisms including the cpimport tool, specialized handling of certain SQL statements, and minimal locking during data import.

For additional information, see "MariaDB Enterprise ColumnStore Data Loading".

cpimport

MariaDB Enterprise ColumnStore includes a bulk data loading tool called cpimport, which provides several benefits:

  • Bypasses the SQL layer to decrease overhead

  • Does not block read queries

  • Requires a write metadata lock on the table, which can be monitored with the METADATA_LOCK_INFO plugin.

  • Appends the new data to the table. While the bulk load is in progress, the newly appended data is temporarily hidden from queries. After the bulk load is complete, the newly appended data is visible to queries.

  • Inserts each row in the order the rows are read from the source file. Users can optimize data loads for Enterprise ColumnStore's automatic partitioning by loading presorted data files. For additional information, see "Load Ordered Data in Proper Order".

  • Supports parallel distributed bulk loads

  • Imports data from text files

  • Imports data from binary files

  • Imports data from standard input (stdin)

Batch Insert Mode

MariaDB Enterprise ColumnStore enables batch insert mode by default.

When batch insert mode is enabled, MariaDB Enterprise ColumnStore, MariaDB Enterprise ColumnStore has special handling for the following statements:

  • LOAD DATA INFILE

  • INSERT INTO .. SELECT FROM .

Enterprise ColumnStore uses the following rules:

  • If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Enterprise ColumnStore executes cpimport using a wrapper called cpimport.bin.

  • If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.

Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.

Locking

MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

When a bulk data load is running:

  • Read queries will not be blocked.

  • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

  • The write metadata lock (MDL) can be monitored with the METADATA_LOCK_INFO plugin.

Backup and Restore

EntColStoreBackupS3FlowChart

MariaDB Enterprise ColumnStore supports backup and restore using well-known tools and methods.

Component
Backup Methods

Component

Backup Methods

S3-compatible object storage

• S3 snapshot

Shared Local Storage

• File system snapshot • File copy

Enterprise Server Data Directory

• MariaDB Enterprise Backup

For additional information, see "MariaDB Enterprise ColumnStore Backup and Restore".

S3-Compatible Object Storage

MariaDB Enterprise ColumnStore can leverage S3 snapshots to backup S3-compatible object storage when it is used for Enterprise ColumnStore's data.

The S3-compatible object storage can be backed up by:

  1. Locking the database on the primary node

  2. Performing an S3 snapshot using the vendor's standard snapshot functionality

Shared Local Storage

MariaDB Enterprise ColumnStore can leverage file system snapshots or file copy tools (such as rsync) to backup shared local storage when it is used for the Storage Manager directory or the DB Root directories.

The shared local storage can be backed up by:

  1. Locking the database on the primary node

  2. Performing a file system snapshot or using a file copy tool (such as rsync) to copy the contents of the Storage Manager directory and/or the DB Root directories.

Enterprise Server Data Directory

MariaDB Enterprise ColumnStore can leverage the standard MariaDB Enterprise Backup utility to backup the Enterprise Server data directory.

The backup contains:

  • All ColumnStore schemas

  • All non-ColumnStore schemas and data

  • All other database objects

It does not contain:

  • ColumnStore data

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Query Processing

Clients issue a query to the MariaDB Server, which has the ColumnStore storage engine installed. MariaDB Server parses the SQL, identifies the involved ColumnStore tables, and creates an initial logical query execution plan.

Using the ColumnStore storage engine interface (ha_columnstore), MariaDB Server converts involved table references into ColumnStore internal objects. These are then handed off to the ExeMgr, which is responsible for managing and orchestrating query execution across the cluster.

The ExeMgr analyzes the query plan and translates it into a distributed ColumnStore execution plan. It determines the necessary query steps and the execution order, including any required parallelization.

The ExeMgr then references the extent map to identify which PrimProc instances hold the relevant data segments. It applies extent elimination to exclude any PrimProc nodes whose extents do not match the query’s filter criteria. \

The ExeMgr dispatches commands to the selected PrimProc instances to perform data block I/O operations.

The PrimProc components perform operations such as

  • Predicate filtering

  • Join processing

  • Initial aggregation

  • Data retrieval from local disk or external storage (e.g., S3 or cloud object storage)

They then return intermediate result sets back to the ExeMgr.

The ExeMgr handles:\

  • Final-stage aggregation

  • Window function evaluation

  • Result-set sorting and shaping

The completed result set is returned to the MariaDB Server, which performs any remaining SQL operations like ORDER BY, LIMIT, or computed expressions in the SELECT list.

Finally, the MariaDB Server returns the result set to the client.

ColumnStore Storage Architecture

Overview

MariaDB Enterprise ColumnStore's storage architecture is designed to provide great performance for analytical queries.

Columnar Storage Engine

MariaDB Enterprise ColumnStore is a columnar storage engine for MariaDB Enterprise Server. MariaDB Enterprise ColumnStore enables ES to perform analytical workloads, including online analytical processing (OLAP), data warehousing, decision support systems (DSS), and hybrid transactional-analytical processing (HTAP) workloads.

Most traditional relational databases use row-based storage engines. In row-based storage engines, all columns for a table are stored contiguously. Row-based storage engines perform very well for transactional workloads, but are less performant for analytical workloads.

Columnar storage engines store each column separately. Columnar storage engines perform very well for analytical workloads. Analytical workloads are characterized by ad hoc queries on very large data sets by relatively few users.

MariaDB Enterprise ColumnStore automatically partitions each column into extents, which helps improve query performance without using indexes.

OLAP Workloads

MariaDB Enterprise ColumnStore enables MariaDB Enterprise Server to perform analytical or online analytical processing (OLAP) workloads.

OLAP workloads are generally characterized by ad hoc queries on very large data sets. Some other typical characteristics are:

  • Each query typically reads a subset of columns in the table

  • Most activity typically consists of read-only queries that perform aggregations, window functions, and various calculations

  • Analytical applications typically require only a few concurrent queries

  • Analytical applications typically require scalability of large, complex queries

  • Analytical applications typically require efficient bulk loads of new data

OLAP workloads are typically required for:

  • Business intelligence (BI)

  • Health informatics

  • Historical data mining

Row-based storage engines have a disadvantage for OLAP workloads. Indexes are not usually very useful for OLAP workloads, because the large size of the data set and the ad hoc nature of the queries preclude the use of indexes to optimize queries.

Columnar storage engines are much better suited for OLAP workloads. MariaDB Enterprise ColumnStore is a columnar storage engine that is designed for OLAP workloads:

  • When a query reads a subset of columns in the table, Enterprise ColumnStore is able to reduce I/O by reading those columns and ignoring all others, because each column is stored separately

  • When most activity consists of read-only queries that perform aggregations, window functions, and various calculations, Enterprise ColumnStore is able to efficiently execute those queries using extent elimination, distributed query execution, and massively parallel processing (MPP) techniques

  • When only a few concurrent queries are required, Enterprise ColumnStore is able to maximize the use of system resources by using multiple threads and multiple nodes to perform work for each query

  • When scalability of large, complex queries is required, Enterprise ColumnStore is able to achieve horizontal and vertical scalability using distributed query execution and massively parallel processing (MPP) techniques

  • When efficient bulk loads of new data is required, Enterprise ColumnStore is able to bulk load new data without affecting existing data using automatic partitioning with the extent map

OLTP Workloads

MariaDB Enterprise Server has had excellent performance for transactional or online transactional processing (OLTP) workloads since the beginning.

OLTP workloads are generally characterized by a fixed set of queries using a relatively small data set. Some other typical characteristics are:

  • Each query typically reads and/or writes many columns in the table

  • Most activity typically consists of small transactions that only read and/or write a small number of rows

  • Transactional applications typically require many concurrent transactions

  • Transactional applications typically require a fast response time and low latency

  • Transactional applications typically require ACID properties to protect data

OLTP workloads are typically required for:

  • Financial transactions performed by financial institutions and e-commerce sites

  • Store inventory changes performed by brick-and-mortar stores and e-commerce sites

  • Account metadata changes performed by many sites that stores personal data

Row-based storage engines have several advantages for OLTP workloads:

  • When a query reads and/or writes many columns in the table, row-based storage engines can find all columns on a single page, so the I/O costs of the operation are low

  • When a transaction reads/writes a small number of rows, row-based storage engines can use an index to find the page for each row without a full table scan

  • When many concurrent transactions are operating, row-based storage engines can implement transactional isolation by storing multiple versions of changed rows

  • When a fast response time and low latency are required, row-based storage engines can use indexes to optimize the most common queries

  • When ACID properties are required, row-based storage engines can implement consistency and durability with fewer performance trade-offs, since each row's columns are stored contiguously

InnoDB is ES's default storage engine, and it is a highly performant row-based storage engine.

Hybrid Workloads

MariaDB Enterprise ColumnStore enables MariaDB Enterprise Server to function as a single-stack solution for Hybrid transactional-analytical processing (HTAP) workloads.

Hybrid workloads are characterized by a mix of transactional and analytical queries. Hybrid workloads are also known as "Smart Transactions", "Augmented Transactions" "Translytical", or "Hybrid Operational-Analytical Processing (HOAP)".

Hybrid workloads are typically required for applications that require real-time analytics that lead to immediate action:

  • Financial institutions use transactional queries to handle financial transactions and analytical queries to analyze the transactions for business intelligence

  • Insurance companies use transactional queries to accept/process claims and analytical queries to analyze those claims for business opportunities or risks

  • Health providers use transactional queries to track electronic health records (EHR) and analytical queries to analyze the EHRs to discover health trends or prevent adverse drug interactions

MariaDB Enterprise Server provides multiple components to perform hybrid workloads:

  • For analytical queries, the Enterprise ColumnStore storage engine can be used.

  • For transactional queries, row-based storage engines, such as InnoDB, can be used.

  • For queries that reference both analytical and transactional data, ES's cross-engine join functionality can be used to join Enterprise ColumnStore tables with InnoDB tables.

  • MariaDB MaxScale is a high-performance database proxy, which can dynamically route analytical queries to Enterprise ColumnStore and transactional queries to the transactional storage engine.

Storage Options

MariaDB Enterprise ColumnStore supports multiple storage types:

Storage Type
Description

Storage Type

Description

S3-Compatible Object Storage

• S3-compatible object storage is optional, but recommended • Enterprise ColumnStore can use S3-compatible object storage to store data. •With multi-node Enterprise ColumnStore, the Storage Manager directory should use shared local storage for high availability.

Shared Local Storage

• Required for multi-node Enterprise ColumnStore with high availability • Enterprise ColumnStore can use shared local storage to store data and metadata. •If S3-compatible storage is used for data, the shared local storage will only be used for the Storage Manager directory.

Non-Shared Local Storage

• Appropriate for single-node Enterprise ColumnStore. • Enterprise ColumnStore can use non-shared local storage to store data and metadata.

Deployment with S3-Compatible Storage

EntColumnStoreTopologyS3-Network-Diagram

Deployment with Shared Storage

EntColStoreTopologySharedStorageNetworkDiagram

S3-Compatible Object Storage

MariaDB Enterprise ColumnStore supports S3-compatible object storage.

S3-compatible object storage is optional, but highly recommended. If S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use Shared Local Storage (such as NFS) for high availability.

S3-compatible object storage is:

  • Compatible: Many object storage services are compatible with the Amazon S3 API.

  • Economical: S3-compatible object storage is often very low cost.

  • Flexible: S3-compatible object storage is available for both cloud and on-premises deployments.

  • Limitless: S3-compatible object storage is often virtually limitless.

  • Resilient: S3-compatible object storage is often low maintenance and highly available, since many services use resilient cloud infrastructure.

  • Scalable: S3-compatible object storage is often highly optimized for read and write scaling.

  • Secure: S3-compatible object storage is often encrypted-at-rest.

Many S3-compatible object storage services exist. MariaDB Corporation cannot make guarantees about all S3-compatible object storage services, because different services provide different functionality.

If you have any questions about using specific S3-compatible object storage with MariaDB Enterprise ColumnStore, contact us.

S3 API

MariaDB Enterprise ColumnStore can use any object store that is compatible with the Amazon S3 API.

Many object storage services are compatible with the Amazon S3 API, and compatible object storage services are available for cloud deployments and on-premises deployments, so vendor lock-in is not a concern.

Storage Manager

MariaDB Enterprise ColumnStore's Storage Manager enables remote S3-compatible object storage to be efficiently used. The Storage Manager uses a persistent local disk cache for read/write operations, so that network latency has minimal performance impact on Enterprise ColumnStore. In some cases, it will even perform better than local disk operations.

Enterprise ColumnStore only uses the Storage Manager when S3-compatible storage is used for data.

Storage Manager is configured using storagemanager.cnf.

Storage Manager Directory

MariaDB Enterprise ColumnStore's Storage Manager directory is at the following path by default:

/var/lib/columnstore/storagemanager

To enable high availability when S3-compatible object storage is used, the Storage Manager directory should use Shared Local Storage and be mounted on every ColumnStore node.

Configure the S3 Storage Manager

When you want to use S3-compatible storage for Enterprise ColumnStore, you must configure Enterprise ColumnStore's S3 Storage Manager to use S3-compatible storage.

To configure Enterprise ColumnStore to use S3-compatible storage, edit /etc/columnstore/storagemanager.cnf:

[ObjectStorage]
…
service = S3
…
[S3]
region = your_columnstore_bucket_region
bucket = your_columnstore_bucket_name
endpoint = your_s3_endpoint
aws_access_key_id = your_s3_access_key_id
aws_secret_access_key = your_s3_secret_key
# iam_role_name = your_iam_role
# sts_region = your_sts_region
# sts_endpoint = your_sts_endpoint
# ec2_iam_mode=enabled
# port_number = your_port_number

[Cache]
cache_size = your_local_cache_size
path = your_local_cache_path

The S3-compatible object storage options are configured under [S3]:

  • The bucket option must be set to the name of the bucket.

  • The endpoint option must be set to the endpoint for the S3-compatible object storage.

  • The aws_access_key_id and aws_secret_access_key options must be set to the access key ID and secret access key for the S3-compatible object storage.

  • To use a specific IAM role, you must uncomment and set iam_role_name, sts_region, and sts_endpoint.

  • To use the IAM role assigned to an EC2 instance, you must uncomment ec2_iam_mode=enabled.

  • To use a non-default port number, you must set port_number to the desired port.

The local cache options are configured under [Cache]:

  • The cache_size option is set to 2 GB by default.

  • The path option is set to /var/lib/columnstore/storagemanager/cache by default.

Ensure that the specified path has sufficient storage space for the specified cache size.

Shared Local Storage

MariaDB Enterprise ColumnStore can use shared local storage.

Shared local storage is required for high availability. The specific Shared Local Storage requirements depend on whether Enterprise ColumnStore is configured to use S3-compatible object storage:

When S3-compatible object storage is used, Enterprise ColumnStore requires the Storage Manager directory to use shared local storage for high availability.

When S3-compatible object storage is not used, Enterprise ColumnStore requires the DB Root directories to use shared local storage for high availability.

The most common shared local storage options for on-premises and cloud deployments are:

  • NFS (Network File System)

  • GlusterFS

The most common shared local storage options for AWS (Amazon Web Services) deployments are:

  • EBS (Elastic Block Store) Multi-Attach

  • EFS (Elastic File System)

The most common shared local storage option for GCP (Google Cloud Platform) deployments is:

  • Filestore

Shared Local Storage Options

The most common options for shared local storage are:

Shared Local Storage
Description

Shared Local Storage

Description

EBS (Elastic Block Store) Multi-Attach

• EBS is a high-performance block-storage service for AWS (Amazon Web Services). • EBS Multi-Attach allows an EBS volume to be attached to multiple instances in AWS. Only clustered file systems, such as GFS2, are supported. • For deployments in AWS, EBS Multi-Attach is a recommended option for the Storage Manager directory , and Amazon S3 storage is the recommended option for data.

EFS (Elastic File System)

• EFS is a scalable, elastic, cloud-native NFS file system for AWS (Amazon Web Services). • For deployments in AWS, EFS is a recommended option for the Storage Manager directory , and Amazon S3 storage is the recommended option for data.

Filestore

• Filestore is high-performance, fully managed storage for GCP (Google Cloud Platform). • For deployments in GCP, Filestore is the recommended option for the Storage Manager directory, and Google Object Storage (S3-compatible) is the recommended option for data.

NFS (Network File System)

• NFS is a distributed file system. • If NFS is used, the storage should be mounted with the sync option to ensure that each node flushes its changes immediately. • For on-premises deployments, NFS is the recommended option for the Storage Manager directory , and any S3-compatible storage is the recommended option for data.

GlusterFS

• GlusterFS is a distributed file system. • GlusterFS supports replication and failover.

Directories Requiring Shared Local Storage for HA

Multi-node MariaDB Enterprise ColumnStore requires some directories to use shared local storage for high availability. The specific requirements depend on if MariaDB Enterprise ColumnStore is configured to use S3-compatible object storage:

Using S3-Compatible Object Storage?
Directories to use Shared Local Storage

Using S3-Compatible Object Storage?

Directories to use Shared Local Storage

Yes

Storage Manager directory

No

DB Root directories

Recommended Storage Options

For best results, MariaDB Corporation would recommend the following storage options:

Environment
Object Storage For Data
Shared Local Storage For Storage Manager

Environment

Object Storage For Data

Shared Local Storage For Storage Manager

AWS

Amazon S3 storage

EBS Multi-Attach or EFS

GCP

Google Object Storage (S3-compatible)

Filestore

On-premises

Any S3-compatible object storage

NFS

Storage Format

MariaDB Enterprise ColumnStore's storage format is optimized for analytical queries.

DB Root Directories

MariaDB Enterprise ColumnStore stores data in DB Root directories when S3-compatible object storage is not configured.

In multi-node Enterprise ColumnStore, each node has its own DB Root directory.

The DB Root directories are at the following path by default:

  • /var/lib/columnstore/dataN

The N in dataN represents a range of integers that starts at 1 and stops at the number of nodes in the deployment. For example, with a 3-node Enterprise ColumnStore deployment, this would refer to the following directories:

  • /var/lib/columnstore/data1

  • /var/lib/columnstore/data2

  • /var/lib/columnstore/data3

To enable high availability for the DB Root directories, each directory should be mounted on every ColumnStore node using Shared Local Storage.

Extents

EColumnStorePhysicalDataOrganizationColumnExtents

Each column in a table is stored in units called extents.

By default, each extent contains the column values for 8 million rows. The physical size of each extent can range from 8 MB to 64 MB. When an extent reaches the maximum number of column values, Enterprise ColumnStore creates a new extent.

Each extent is stored in 8 KB blocks, and each block has a logical block identifier (LBID).

If a string column is longer than 8 characters, the value is stored in a separate dictionary file, and a pointer to the value is stored in the extent.

Segment Files

SegmentFiles

A segment file is used to store Enterprise ColumnStore data within a DB Root directory.

A segment file always contains two extents. When a segment file reaches its maximum size, Enterprise ColumnStore creates a new segment file.

The relevant configuration options are:

Option
Description

Option

Description

ExtentsPerSegmentFile

• Configures the maximum number of extents that can be stored in each segment file. • Default value is 2.

For example, to configure Enterprise ColumnStore to store more extents in each segment file using the mcsSetConfig utility:

$ mcsSetConfig ExtentMap ExtentsPerSegmentFile 4

Column Partitions

ColumnPartitions

Enterprise ColumnStore automatically groups a column's segment files into column partitions.

On disk, each column partition is represented by a directory in the DB Root. The directory contains the segment files for the column partition.

By default, a column partition can contain four segment files, but you can configure Enterprise ColumnStore to store more segment files in each column partition. When a column partition reaches the maximum number of segment files, Enterprise ColumnStore creates a new column partition.

The relevant configuration options are:

Option
Description

Option

Description

FilesPerColumnPartition

• Configures the maximum number of segment files that can be stored in each column partition. • Default value is 4.

For example, to configure Enterprise ColumnStore to store more segment files in each column partition using the mcsSetConfig utility:

$ mcsSetConfig ExtentMap FilesPerColumnPartition 8

Extent Map

DataOrganizationExtentMap

Enterprise ColumnStore maintains an Extent Map to determine which values are located in each extent.

The Extent Map identifies each extent using its logical block identifier (LBID) values, and it maintains the minimum and maximum values within each extent.

The Extent Map is used to implement a performance optimization called Extent Elimination.

The primary node has a master copy of the Extent Map. When Enterprise ColumnStore is started, the primary node copies the Extent Map to the replica nodes.

While Enterprise ColumnStore is running, each node maintains a copy of the Extent Map in its main memory, so that it can be accessed quickly without additional I/O.

If the Extent Map gets corrupted, the mcsRebuildEM utility can rebuild the Extent Map from the contents of the database file system. The mcsRebuildEM utility is available starting in MariaDB Enterprise ColumnStore 6.2.2.

Compression

Enterprise ColumnStore automatically compresses all data on disk using either snappy or LZ4 compression. See the columnstore_compression_type system variable for how to select the desired compression type.

Since Enterprise ColumnStore stores a single column's data in each segment file, the data in each segment file tends to be very similar. The similar data usually allows for excellent compressibility. However, the specific data compression ratio will depend on a lot of factors, such as the randomness of the data and the number of distinct values.

Enterprise ColumnStore's compression strategy is tuned to optimize the performance of I/O-bound queries, because the decompression rate is optimized to maximize read performance.

Version Buffer

Enterprise ColumnStore uses the version buffer to store blocks that are being modified.

The version buffer is used for multiple tasks:

  • It is used to rollback a transaction.

  • It is used for multi-version concurrency control (MVCC). With MVCC, Enterprise ColumnStore can implement read snapshots, which allows a statement to have a consistent view of the database, even if some of the underlying rows have changed. The snapshot for a given statement is identified by the system change number (SCN).

The version buffer is split between data structures that are in-memory and on-disk.

The in-memory data structures are hash tables that keep track of in-flight transaction. The hash tables store the LBIDs for each block that is being modified by a transaction. The in-memory hash tables start at 4 MB, and they grow as-needed. The size of the hash tables increases as the number of modified blocks increases.

An on-disk version buffer file is stored in each DB Root. By default, the on-disk version buffer file is 1 GB, but you can configure Enterprise ColumnStore to use a different file size. The relevant configuration options are:

Option
Description

Option

Description

VersionBufferFileSize

• Configures the size of the on-disk version buffer in each DB Root. • Default value is 1 GB.

For example, to configure Enterprise ColumnStore to use a larger on-disk version buffer file using the mcsSetConfig utility:

$ mcsSetConfig VersionBuffer VersionBufferFileSize 2GB

Extent Elimination

Using the Extent Map, ColumnStore can perform logical range partitioning and only retrieve the blocks needed to satisfy the query. This is done through Extent Elimination, the process of eliminating Extents from the results that don't meet the given join and filter conditions of the query, which reduces the overall I/O operations.

extent-elimination

In Extent Elimination, ColumnStore scans the columns in join and filter conditions. It then extracts the logical horizontal partitioning information of each extent along with the minimum and maximum values for the column to further eliminate Extents. To eliminate an Extent when a column scan involves a filter, that filter is compared to the minimum and maximum values stored in each extent for the column. If the filter value is outside the Extents minimum and maximum value range, ColumnStore eliminates the Extent.

This behavior is automatic and well suited for series, ordered, patterned and time-based data, where the data is loaded frequently and often referenced by time. Any column with clustered values is a good candidate for Extent Elimination.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore System Databases

When using ColumnStore, MariaDB Server creates a series of system databases used for operational purposes.

Database
Description

Database

Description

calpontsys

Database maintains table metadata about ColumnStore tables.

infinidb_querystats

Database maintains information about query performance. For more information, see Query Analysis.

columnstore_info

The database for stored procedures is used to retrieve information about ColumnStore usage. For more information, see the ColumnStore Information Schema tables.

This page is licensed: CC BY-SA / Gnu FDL

MariaDB Enterprise Columnstore Locking

Overview

MariaDB Enterprise ColumnStore minimizes locking for analytical workloads, bulk data loads, and online schema changes.

Lockless Reads

MariaDB Enterprise ColumnStore supports lockless reads.

Locking for Writes

MariaDB Enterprise ColumnStore requires a table lock for write operations.

Locking for Data Loading

MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

When a bulk data load is running:

  • Read queries will not be blocked.

  • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

  • The write metadata lock (MDL) can be monitored with the METADATA_LOCK_INFO plugin.

For additional information, see "MariaDB Enterprise ColumnStore Data Loading".

Online Schema Changes

MariaDB Enterprise ColumnStore supports online schema changes, so that supported DDL operations can be performed without blocking reads. The supported DDL operations only require a write metadata lock (MDL) on the target table.

This page is: Copyright © 2025 MariaDB. All rights reserved.

Mariadb Enterprise Columnstore Query Evaluation

Overview

MariaDB Enterprise ColumnStore is a smart storage engine designed to efficiently execute analytical queries using distributed query execution and massively parallel processing (MPP) techniques.

Scalability

MariaDB Enterprise ColumnStore is designed to achieve vertical and horizontal scalability for production analytics using distributed query execution and massively parallel processing (MPP) techniques.

Enterprise ColumnStore evaluates each query as a sequence of job steps using sophisticated techniques to get the best performance for complex analytical queries. Some types of job steps are designed to scale with the system's resources. As you increase the number of ColumnStore nodes or the number of cores on each node, Enterprise ColumnStore can use those resources to more efficiently execute those types of job steps.

Enterprise ColumnStore stores each column on disk in extents. The storage format is designed to maintain scalability, even as the table grows. If an operation does not read parts of a large table, I/O costs are reduced. Enterprise ColumnStore uses a technique called extent elimination that compares the maximum and minimum values in the extent map to the query's conditions, and it avoids scanning extents that don't satisfy the conditions.

Enterprise ColumnStore provides exceptional scalability for analytical queries. Enterprise ColumnStore's design supports targeted scale-out to address increased workload requirements, whether it is a larger query load or increased storage and query processing capacity.

Horizontal Scalability

MariaDB Enterprise ColumnStore provides horizontal scalability by executing some types of job steps in a distributed manner using multiple nodes.

When Enterprise ColumnStore is evaluating a job step, the ExeMgr process or facility on the initiator/aggregator node requests the PrimProc process on each node to perform the job step on different extents in parallel. As more nodes are added, Enterprise ColumnStore can perform more work in parallel.

Enterprise ColumnStore also uses massively parallel processing (MPP) techniques to speed up some types of job steps. For some types of aggregation operations, each node can perform an initial local aggregation, and then the initiator/aggregator node only needs to combine the local results and perform a final aggregation. This technique can be very efficient for some types of aggregation operations, such as for queries that use the AVG(), COUNT(), or SUM() aggregate functions.

Vertical Scalability

MariaDB Enterprise ColumnStore provides vertical scalability by executing some types of job steps in a multi-threaded manner using a thread pool.

When the PrimProc process on a node receives work, it executes the job step on an extent in a multi-threaded manner using a thread pool. Each thread operates on a different block within the extent. As more CPUs are added, Enterprise ColumnStore can work on more blocks in parallel.

Extent Elimination

ECStore-QueryExecutionExtentElimination

MariaDB Enterprise ColumnStore uses extent elimination to scale query evaluation as table size increases.

Most databases are row-based databases that use manually-created indexes to achieve high performance on large tables. This works well for transactional workloads. However, analytical queries tend to have very low selectivity, so traditional indexes are not typically effective for analytical queries.

Enterprise ColumnStore uses extent elimination to achieve high performance, without requiring manually created indexes. Enterprise ColumnStore automatically partitions all data into extents. Enterprise ColumnStore stores the minimum and maximum values for each extent in the extent map. Enterprise ColumnStore uses the minimum and maximum values in the extent map to perform extent elimination.

When Enterprise ColumnStore performs extent elimination, it compares the query's join conditions and filter conditions (i.e., WHERE clause) to the minimum and maximum values for each extent in the extent map. If the extent's minimum and maximum values fall outside the bounds of the query's conditions, Enterprise ColumnStore skips that extent for the query.

Extent elimination is automatically performed for every query. It can significantly decrease I/O for columns with clustered values. For example, extent elimination works effectively for series, ordered, patterned, and time-based data.

Custom Select Handler

The ColumnStore storage engine plugin implements a custom select handler to fully take advantage of Enterprise ColumnStore's capabilities.

All storage engines interact with ES using an internal handler API, which is highly extensible. Storage engines can implement different features by implementing different methods within the handler API.

For select statements, the handler API transforms each query into a SELECT_LEX object, which is provided to the select handler.

The generic select handler is not optimal for Enterprise ColumnStore, because:

  • Enterprise ColumnStore selects data by column, but the generic select handler selects data by row

  • Enterprise ColumnStore supports parallel query evaluation, but the generic select handler does not

  • Enterprise ColumnStore supports distributed aggregations, but the generic select handler does not

  • Enterprise ColumnStore supports distributed functions, but the generic select handler does not

  • Enterprise ColumnStore supports extent elimination, but the generic select handler does not

  • Enterprise ColumnStore has its own query planner, but the generic select handler cannot use it

Smart Storage Engine

The ColumnStore storage engine plugin is known as a smart storage engine, because it implements a custom select handler. MariaDB Enterprise ColumnStore integrates with MariaDB Enterprise Server using the ColumnStore storage engine plugin. The ColumnStore storage engine plugin enables MariaDB Enterprise Server to interact with ColumnStore tables.

If a storage engine implements a custom select handler, it is known as a smart storage engine.

As a smart storage engine, the ColumnStore storage engine plugin tightly integrates Enterprise ColumnStore with ES, but it has enough independence to efficiently execute analytical queries using a completely unique approach.

Configure the Select Handler

The ColumnStore storage engine can use either the custom select handler or the generic select handler. The select handler can be configured using the columnstore_select_handler system variable:

Value
Description

Value

Description

AUTO

• When set to AUTO, Enterprise ColumnStore automatically chooses the best select handler for a given SELECT query. • AUTO was added in Enterprise ColumnStore 6.

OFF

• When set to OFF, Enterprise ColumnStore uses the generic select handler for all SELECT queries. • It is not recommended to use this value, unless recommended by MariaDB Support.

ON

• When set to ON, Enterprise ColumnStore uses the custom select handler for all SELECT queries. • ON is the default in Enterprise ColumnStore 5 and Enterprise ColumnStore 6.

Joins

MariaDB Enterprise ColumnStore performs join operations using hash joins.

By default, hash joins are performed in memory.

Configure In-Memory Joins

MariaDB Enterprise ColumnStore can be configured to allocate more memory for hash joins.

The relevant configuration options are:

Section
Option
Description

Section

Option

Description

HashJoin

PmMaxMemorySmallSide

• Configures the amount of memory available for a single join. • Valid values are from 0 to 4 GB. • Default value is 1 GB.

HashJoin

TotalUmMemory

• Configures the amount of memory available for all joins. • Values can be specified as a percentage of total system memory or as a specific amount of memory. • Valid percentage values are from 0 to 100%. • Default value is 25%.

For example, to configure Enterprise ColumnStore to use more memory for hash joins using the mcsSetConfig utility:

$ mcsSetConfig HashJoin PmMaxMemorySmallSide 2G
$ mcsSetConfig HashJoin TotalUmMemory '40%'

Configure Disk-Based Joins

MariaDB Enterprise ColumnStore can be configured to perform disk-based joins.

The relevant configuration options are:

Section
Option
Description

Section

Option

Description

HashJoin

AllowDiskBasedJoin

• Enables disk-based joins • Valid values are Y and N • Default value is N.

HashJoin

TempFileCompression

• Enables compression for temporary files used by disk-based joins. • Valid values are Y and N • Default value is N.

SystemConfig

SystemTempFileDir

Configures the directory used for temporary files used by disk-based joins and aggregations • Default value is /tmp/columnstore_tmp_files

For example, to configure Enterprise ColumnStore to perform disk-based joins using the mcsSetConfig utility:

$ mcsSetConfig HashJoin AllowDiskBasedJoin Y
$ mcsSetConfig HashJoin TempFileCompression Y
$ mcsSetConfig SystemConfig SystemTempFileDir /mariadb/tmp

Aggregations

MariaDB Enterprise ColumnStore performs aggregation operations on all nodes in a distributed manner, and then all nodes send their results to a single node, which combines the results and performs the final aggregation.

By default, aggregation operations are performed in memory.

Configure Disk-Based Aggregations

In Enterprise ColumnStore 5.6.1 and later, disk-based aggregations can be configured.

The relevant configuration options are:

Section
Option
Description

Section

Option

Description

RowAggregation

AllowDiskBasedAggregation

• Enables disk-based joins • Valid values are Y and N • Default value is N.

RowAggregation

Compression

• Enables compression for temporary files used by disk-based joins. • Valid values are Y and N • Default value is N.

SystemConfig

SystemTempFileDir

Configures the directory used for temporary files used by disk-based joins and aggregations • Default value is /tmp/columnstore_tmp_files

For example, to configure Enterprise ColumnStore to perform disk-based aggregations using the mcsSetConfig utility:

$ mcsSetConfig RowAggregation AllowDiskBasedAggregation Y
$ mcsSetConfig RowAggregation Compression SNAPPY
$ mcsSetConfig SystemConfig SystemTempFileDir /mariadb/tmp

Query Planning

The ColumnStore storage engine plugin is a smart storage engine, so MariaDB Enterprise ColumnStore to plan its own queries using the custom select handler.

MariaDB Enterprise ColumnStore's query planning is divided into two steps:

  • ES provides the query's SELECT_LEX object to the custom select handler. The custom select handler builds a ColumnStore Execution Plan (CSEP).

  • The custom select handler provides the CSEP to the ExeMgr process or facility on the same node. ExeMgr performs extent elimination and creates a job list.

ExeMgr Process/Facility

The ColumnStore storage engine provides the CSEP to the ExeMgr process or facility on the same node, which will act as the initiator/aggregator node for the query.

Starting with MariaDB Enterprise ColumnStore 22.08, the ExeMgr facility has been integrated into the PrimProc process, so it is no longer a separate process.

ExeMgr performs multiple tasks:

  • Performs extent elimination.

  • Views the optimizer statistics.

  • Transforms the CSEP to a job list, which consists of job steps.

  • Assigns distributed job steps to the PrimProc process on each node.

  • Evaluates non-distributed job steps itself.

  • Provides final query results to ES.

Query Evaluation Process

ECStore-QueryExecutionwith-S3-FlowChart

When Enterprise ColumnStore executes a query, it goes through the following process:

  1. The client or application sends the query to MariaDB MaxScale's listener port.

  2. The query is processed by the Read/Write Split Router (readwritesplit) service associated with the listener.

  3. The service routes the query to the ES TCP port on a ColumnStore node.

  4. MariaDB Enterprise Server (ES) evaluates the query using the handler interface.

  • The handler interface builds a SELECT_LEX object to represent the query.

  • The handler interface provides the SELECT_LEX object to the ColumnStore storage engine's select handler.

  • The select handler transforms the SELECT_LEX object into a ColumnStore Execution Plan (CSEP).

  • The select handler provides the CSEP to the ExeMgr facility on the same node, which will act as the initiator/aggregator node for the query.

  1. ExeMgr transforms the CSEP into a job list, which consists of job steps.

  2. ExeMgr evaluates each job step sequentially.

  • If it is a non-distributed job step, ExeMgr evaluates the job step itself.

  • If it is a distributed job step, ExeMgr provides the job step to the PrimProc process on each node. The PrimProc process on each node evaluates the job step in a multi-threaded manner using a thread pool. After the PrimProc process on each node evaluates its job step, the results are returned to ExeMgr on the initiator/aggregator node as a Row Group.

  1. After all job steps are evaluated, ExeMgr returns the results to ES.

  2. ES returns the results to MaxScale.

  3. MaxScale returns the results to the client or application.

This page is: Copyright © 2025 MariaDB. All rights reserved.

ColumnStore Storage Engine

Overview

MariaDB Enterprise ColumnStore integrates with MariaDB Enterprise Server using the ColumnStore storage engine plugin. The ColumnStore storage engine plugin enables MariaDB Enterprise Server to interact with ColumnStore tables.

For deployment instructions and available documentation, see "MariaDB Enterprise ColumnStore".

Feature Summary The ColumnStore storage engine has the following features:

Feature
Detail
Resources

Feature

Detail

Resources

Storage Engine

ColumnStore

Availability

ES 10.5+, CS 10.5+

MariaDB Enterprise Server

Workload Optimization

OLAP and Hybrid

OLAP Workloads Hybrid Workloads

Table Orientation

Columnar

Columnar Storage Engine

ACID-compliant

Yes

Indexes

Unnecessary

Extent Elimination

Compression

Yes

Compression

High Availability (HA)

Yes

High Availability and Failover

Main Memory Caching

Yes

Transaction Logging

Yes

Version Buffer

Garbage Collection

Yes

Version Buffer

Online Schema changes

Yes

Online Schema Changes

Non-locking Reads

Yes

Lockless Reads

Examples

Creating a ColumnStore Table

To create a ColumnStore table, use the CREATE TABLE statement with the ENGINE=ColumnStore option:

CREATE DATABASE columnstore_db;

CREATE TABLE columnstore_db.analytics_test (
   id INT,
   str VARCHAR(50)
) ENGINE = ColumnStore;

Multi-Node Configuration

To deploy a multi-node Enterprise ColumnStore deployment, a configuration similar to below is required:

[mariadb]
log_error                              = mariadbd.err
character_set_server                   = utf8
collation_server                       = utf8_general_ci
log_bin                                = mariadb-bin
log_bin_index                          = mariadb-bin.index
relay_log                              = mariadb-relay
relay_log_index                        = mariadb-relay.index
log_slave_updates                      = ON
gtid_strict_mode                       = ON

# This must be unique on each cluster node
server_id                              = 1

Configure the Mandatory Utility User Account

To configure the mandatory utility user account, use the mcsSetConfig command:

$ sudo mcsSetConfig CrossEngineSupport Host 127.0.0.1
$ sudo mcsSetConfig CrossEngineSupport Port 3306
$ sudo mcsSetConfig CrossEngineSupport User cross_engine
$ sudo mcsSetConfig CrossEngineSupport Password cross_engine_passwd

This page is: Copyright © 2025 MariaDB. All rights reserved.

Managing ColumnStore

Managing MariaDB ColumnStore involves setup, configuration, and tools like mcsadmin and cpimport for efficient analytics.

ColumnStore Minimum Hardware Specification

The following table outlines the minimum recommended production server specifications which can be followed for both on premise and cloud deployments:

Per Server

Item
Development Environment
Production Environment

Item

Development Environment

Production Environment

Physical Server

8 Core CPU, 32 GB Memory

64 Core CPU, 128 GB Memory

Storage

Local disk

StorageManager (S3)

Network

Network Interconnect

In a multi server deployment data will be passed around via TCP/IP networking. At least a 1G network is recommended.

Details

These are minimum recommendations and in general the system will perform better with more hardware:

  • More CPU cores and servers will improve query processing response time.

  • More memory will allow the system to cache more data blocks in memory. We have users running system with anywhere from 64G RAM to 512 G RAM for UM and 32 to 64 G RAM for PM.

  • Faster network will allow data to flow faster between UM and PM nodes.

  • SSD's may be used, however the system is optimized towards block streaming which may perform well enough with HDD's for lower cost.

  • Where it is an option, it is recommended to use bare metal servers for additional performance since ColumnStore will fully consume CPU cores and memory.

  • In general it makes more sense to use a higher core count / higher memory server for single server or 2 server combined deployments.

  • In a deployment with multiple UM nodes the system will round robin requests from the mysqld handling the query to any ExeMgr in the cluster for load balancing. A higher bandwidth network such as 10g or 40g will be of benefit for large result set queries.

AWS instance sizes

For AWS our own internal testing generally uses m4.4xlarge instance types as a cost effective middle ground. The R4.8xlarge has also been tested and performs about twice as fast for about twice the price.

This page is licensed: CC BY-SA / Gnu FDL

Managing ColumnStore Database Environment

Managing MariaDB ColumnStore means deploying its architecture, scaling modules, and maintaining performance through monitoring, optimization, and backups.

Release notes and other documentation for ColumnStore is also available in the Enterprise docs section of the MariaDB website. For example:

  • ColumnStore 23.10 Release Notes

  • ColumnStore 23.02 Release Notes

  • ColumnStore 22.08 Release Notes

  • ColumnStore 6 Release Notes

  • ColumnStore 5.6 Release Notes

  • Deployment Instructions

ColumnStore Partition Management

  1. Introduction "Introduction"

  2. Managing partitions by partition number "Managing partitions by partition number"

  3. Displaying partition information "Displaying partition information"

  4. Disabling partitions "Disabling partitions"

  5. Enabling partitions "Enabling partitions"

  6. Dropping partitions "Dropping partitions"

  7. Managing partitions by column value "Managing partitions by column value"

  8. Displaying partition information "Displaying partition information"

  9. Disabling partitions "Disabling partitions"

  10. Enabling partitions "Enabling partitions"

  11. Dropping partitions "Dropping partitions"

  12. Dropping data not wholly within one partition "Dropping data not wholly within one partition"

Introduction

MariaDB ColumnStore automatically creates logical horizontal partitions across every column. For ordered or semi-ordered data fields such as an order date this will result in a highly effective partitioning scheme based on that column. This allows for increased performance of queries filtering on that column since partition elimination can be performed. It also allows for data lifecycle management as data can be disabled or dropped by partition cheaply. Caution should be used when disabling or dropping partitions as these commands are destructive.

It is important to understand that a Partition in ColumnStore terms is actually 2 extents (16 million rows) and that extents & partitions are created according to the following algorithm in 1.0.x:

  1. Create 4 extents in 4 files

  2. When these are filled up (after 32M rows), create 4 more extents in the 4 files created in step 1.

  3. When these are filled up (after 64M rows), create a new partition.

Managing partitions by partition number

Displaying partition information

Information about all partitions for a given column can be retrieved using the calShowPartitions stored procedure which takes either two or three mandatory parameters: [database_name], table_name, and column_name. If two parameters are provided the current database is assumed. For example:

select calShowPartitions('orders','orderdate');
+-----------------------------------------+
| calShowPartitions('orders','orderdate') |
+-----------------------------------------+
| Part# Min        Max        Status
  0.0.1 1992-01-01 1998-08-02 Enabled
  0.1.2 1998-08-03 2004-05-15 Enabled
  0.2.3 2004-05-16 2010-07-24 Enabled |
+-----------------------------------------+

1 row in set (0.05 sec)

Disabling partitions

The calDisablePartitions stored procedure allows for disabling of one or more partitions. A disabled partition still exists on the file system (and can be enabled again at a later time) but will not participate in any query, DML or import activity. The procedure takes either two or three mandatory parameters: [database_name], table_name, and partition_numbers separated by commas. If two parameters are provided the current database is assumed. For example:

select calDisablePartitions('orders','0.0.1');
+----------------------------------------+
| calDisablePartitions('orders','0.0.1') |
+----------------------------------------+
| Partitions are disabled successfully.  |
+----------------------------------------+
1 row in set (0.28 sec)

The result showing the first partition has been disabled:

select calShowPartitions('orders','orderdate');
+-----------------------------------------+
| calShowPartitions('orders','orderdate') |
+-----------------------------------------+
| Part# Min        Max        Status
  0.0.1 1992-01-01 1998-08-02 Disabled
  0.1.2 1998-08-03 2004-05-15 Enabled
  0.2.3 2004-05-16 2010-07-24 Enabled |
+-----------------------------------------+
1 row in set (0.05 sec)

Enabling partitions

The calEnablePartitions stored procedure allows for enabling of one or more partitions. The procedure takes the same set of parameters as calDisablePartitions. For example:

select calEnablePartitions('orders', '0.0.1');
+----------------------------------------+
| calEnablePartitions('orders', '0.0.1') |
+----------------------------------------+
| Partitions are enabled successfully.   |
+----------------------------------------+
1 row in set (0.28 sec)

The result showing the first partition has been enabled:

select calShowPartitions('orders','orderdate');
+-----------------------------------------+
| calShowPartitions('orders','orderdate') |
+-----------------------------------------+
| Part# Min        Max        Status
  0.0.1 1992-01-01 1998-08-02 Enabled
  0.1.2 1998-08-03 2004-05-15 Enabled
  0.2.3 2004-05-16 2010-07-24 Enabled |
+-----------------------------------------+
1 rows in set (0.05 sec)

Dropping partitions

The calDropPartitions stored procedure allows for dropping of one or more partitions. Dropping means that the underlying storage is deleted and the partition is completely removed. A partition can be dropped from either enabled or disabled state. The procedure takes the same set of parameters as calDisablePartitions. Extra caution should be used with this procedure since it is destructive and cannot be reversed. For example:

select calDropPartitions('orders', '0.0.1');
+--------------------------------------+
| calDropPartitions('orders', '0.0.1') |
+--------------------------------------+
| Partitions are enabled successfully  |
+--------------------------------------+
1 row in set (0.28 sec)

The result showing the first partition has been dropped:

select calShowPartitions('orders','orderdate');
+-----------------------------------------+
| calShowPartitions('orders','orderdate') |
+-----------------------------------------+
| Part# Min        Max        Status
  0.1.2 1998-08-03 2004-05-15 Enabled
  0.2.3 2004-05-16 2010-07-24 Enabled |
+-----------------------------------------+
1 row in set (0.05 sec)

Managing partitions by column value

Displaying partition information

Information about a range of parititions for a given column can be retrieved using the calShowPartitionsByValue stored procedure. This procedure takes either four or five mandatory parameters: [database_name], table_name, column_name, start_value, and end_value. If four parameters are provided the current database is assumed. Only casual partition column types (INTEGER, DECIMAL, DATE, DATETIME, CHAR up to 8 bytes andVARCHAR up to 7 bytes) are supported for this function.

The function returns a list of partitions whose minimum and maximum values for the column 'col_name' fall completely within the range of 'start_value' and 'end_value'. For example:

select calShowPartitionsByValue('orders','orderdate', '1992-01-01', '2010-07-24');
+----------------------------------------------------------------------------+
| calShowPartitionsbyvalue('orders','orderdate', '1992-01-02', '2010-07-24') |
+----------------------------------------------------------------------------+
| Part# Min        Max        Status
  0.0.1 1992-01-01 1998-08-02 Enabled
  0.1.2 1998-08-03 2004-05-15 Enabled
  0.2.3 2004-05-16 2010-07-24 Enabled |
+----------------------------------------------------------------------------+
1 row in set (0.05 sec)

Disabling partitions

The calDisablePartitionsByValue stored procedure allows for disabling of one or more partitions by value. A disabled partition still exists on the file system (and can be enabled again at a later time) but will not participate in any query, DML or import activity. The procedure takes the same set of arguments as calShowPartitionsByValue. A good practice is to use calShowPartitionsByValue to identify the partitions to be disabled and then the same argument values used to construct the calDisablePartitionsByValue call. For example:

select calDisablePartitionsByValue('orders','orderdate', '1992-01-01', '1998-08-02');
+---------------------------------------------------------------------------------+
| caldisablepartitionsbyvalue ('orders', 'o_orderdate','1992-01-01','1998-08-02') |
+---------------------------------------------------------------------------------+
| Partitions are disabled successfully                                            |
+---------------------------------------------------------------------------------+
1 row in set (0.28 sec)

The result showing the first partition has been disabled:

select calShowPartitionsByValue('orders','orderdate', '1992-01-01', '2010-07-24');
+----------------------------------------------------------------------------+
| calShowPartitionsbyvalue('orders','orderdate', '1992-01-02','2010-07-24’ ) |
+----------------------------------------------------------------------------+
| Part# Min        Max        Status
  0.0.1 1992-01-01 1998-08-02 Disabled
  0.1.2 1998-08-03 2004-05-15 Enabled
  0.2.3 2004-05-16 2010-07-24 Enabled |
+----------------------------------------------------------------------------+
1 row in set (0.05 sec)

Enabling partitions

The calEnablePartitionsbyValue stored procedure allows for enabling of one or more partitions by value. The procedure takes the same set of arguments as calShowPartitionsByValue. A good practice is to use calShowPartitionsByValue to identify the partitions to be enabled and then the same argument values used to construct the calEnablePartitionsbyValue call. For example:

select calEnablePartitionsByValue('orders','orderdate', '1992-01-01', '1998-08-02');
+--------------------------------------------------------------------------------+
| calenablepartitionsbyvalue ('orders', 'o_orderdate','1992-01-01','1998-08-02') |
+--------------------------------------------------------------------------------+
| Partitions are enabled successfully                                            |
+--------------------------------------------------------------------------------+
1 row in set (0.28 sec)

The result showing the first partition has been enabled:

select calShowPartitionsByValue('orders','orderdate', '1992-01-01', '2010-07-24');
+----------------------------------------------------------------------------+
| calShowPartitionsbyvalue('orders','orderdate', '1992-01-02','2010-07-24' ) |
+----------------------------------------------------------------------------+
| Part# Min        Max        Status
  0.0.1 1992-01-01 1998-08-02 Enabled
  0.1.2 1998-08-03 2004-05-15 Enabled
  0.2.3 2004-05-16 2010-07-24 Enabled |
+----------------------------------------------------------------------------+
1 rows in set (0.05 sec)

Dropping partitions

The calDropPartitionsByValue stored procedure allows for dropping of one or more partitions by value. Dropping means that the underlying storage is deleted and the partition is completely removed. A partition can be dropped from either enabled or disabled state. The procedure takes the same set of arguments as calShowPartitionsByValue. A good practice is to use calShowPartitionsByValue to identify the partitions to be enabled and then the same argument values used to construct the calDropPartitionsByValue call. Extra caution should be used with this procedure since it is destructive and cannot be reversed. For example:

select calDropPartitionsByValue('orders','orderdate', '1992-01-01', '1998-08-02');
+------------------------------------------------------------------------------+
| caldroppartitionsbyvalue ('orders', 'o_orderdate','1992-01-01','1998-08-02') |
+------------------------------------------------------------------------------+
| Partitions are enabled successfully.                                         |
+------------------------------------------------------------------------------+
1 row in set (0.28 sec)

The result showing the first partition has been dropped:

select calShowPartitionsByValue('orders','orderdate', '1992-01-01', '2010-07-24');
+----------------------------------------------------------------------------+
| calShowPartitionsbyvalue('orders','orderdate', '1992-01-02','2010-07-24' ) |
+----------------------------------------------------------------------------+
| Part# Min        Max        Status
  0.1.2 1998-08-03 2004-05-15 Enabled
  0.2.3 2004-05-16 2010-07-24 Enabled |
+----------------------------------------------------------------------------+
1 row in set (0.05 sec)

Dropping data not wholly within one partition

Since the partitioning scheme is system maintained the min and max values are not directly specified but influenced by the order of data loading. If the goal is to drop a specific date range then additional deletes are required to achieve this. The following cases may occur:

  • For semi-ordered data, there may be overlap between min and max values between partitions.

  • As in the example above, the partition ranged from 1992-01-01 to 1998-08-02. Potentially it may be desirable to drop the remaining 1998 rows.

A bulk delete statement can be used to delete the remaining rows that do not fall exactly within partition ranges. The partition drops will be fastest, however the system optimizes bulk delete statements to delete by block internally so are still relatively fast.

delete from orders where orderdate <= '1998-12-31';

This page is licensed: CC BY-SA / Gnu FDL

Managing ColumnStore System

ColumnStore Audit Plugin

  1. Introduction "Introduction"

  2. Installation "Installation"

  3. Enabling the audit plugin "Enabling the audit plugin"

Introduction

MariaDB server includes an optional Audit Plugin that enables logging and tracking of all user access and statements. This is included and can be enabled for ColumnStore

Installation

To enable the audit plugin for the currently running instance (but no across restarts) run the following as mcsmysql with the default root account:

INSTALL PLUGIN server_audit 
SONAME 'server_audit.so';

To have this persist across restarts edit the ColumnStore my.cnf file (example shown for root install):

$ vi /usr/local/mariadb/columnstore/mysql/my.cnf
[mysqld]
... 
plugin_load=server_audit=server_audit.so

For more details see the audit plugin installation guide

Enabling the audit plugin

To enable audit logging the following global variable must be set to ON:

SET GLOBAL server_audit_logging=ON;

To ensure this persists across restarts edit the ColumnStore my.cnf file (example shown for root install):

$ vi /usr/local/mariadb/columnstore/mysql/my.cnf
[server]
... 
server_audit_logging=ON

This will enable logging to the file /usr/local/mariadb/columnstore/mysql/db/server_audit.log. For example:

20170914 17:31:24,centos,root,localhost,11,114,QUERY,loans,'SELECT grade, AVG(loan_amnt) avg, FROM loanstats GROUP BY grade ORDER BY grade',0

To have the log entries written to syslog the global variable server_audit_output_type should be changed from 'file' to 'syslog'. In this case the 'syslog_info' entry contains the ColumnStore server instance name, for example:

Sep 14 17:46:51 centos mysql-server_auditing: columnstore-1 centos,root,localhost,11,117,QUERY,loans,'SELECT grade, AVG(loan_amnt) avg,FROM loanstats GROUP BY grade ORDER BY grade',0

For additional configuration and customization options see the Audit Plugin documentation.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Configuration File Update and Distribution

In the case where an entry in the MariaDB ColumnStore's configuration needs to be updated and distributed, this can be done from the command line from Performance Module #1. All changes made to MariaDB ColumnStore's configuration file need to be applied on PM1.

NOTE: 'mcsadmin distributeconfigfile' only needs to be run if the system is Active. If the system is down, then just make the change and when the system is started, the update will get distributed.

Here is an example

/usr/local/mariadb/columnstore/bin/configxml.sh setconfig SystemConfig SystemName mcs-1
mcsadmin distributeconfigfile

In the cases where the MariaDB ColumnStore's configuration files gets out of sync with the PM1 copy, run the following to get MariaDB ColumnStore's configuration file redistribute to the other nodes.

To do this run the following on PM1:

mscadmin distributeconfigfile

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Redistribute Data

Introduction

When new PM nodes are added to a running instance it may be desirable to redistribute the data in current nodes across all of the nodes. This is not strictly required as ongoing data ingestion will prioritize the new empty nodes for data loading to rebalance the system.

An important point is that the operation works at Partition granularity, so a minimal data set is 64M rows in a table for this to run.

Usage

A command redistributeData is available in the admin console to initiate a data distribution:

mcsadmin> redistributeData start
redistributedata   Tue Dec 13 04:42:31 2016
redistributeData START
Source dbroots: 1 2
Destination dbroots: 1 2

WriteEngineServer returned status 1: Cleared.
WriteEngineServer returned status 2: Redistribute is started.

The command has 3 possible options:

  • Start : start a new redistribution to redistribute data equally amongst the current set of DBRoots in the system.

  • Stop : abort a redistribution leaving the system in a usable state.

  • Status : return status information on an active redistribution.

The start command can take an option of Remove. The Start Remove option is used to remove all data from the enumerated list of dbroots and redistribute the data to the remaining dbroots. This should be done before taking a dbroot out of service in order to preserve the data on that dbroot. The dbroot list is a space delimited list of integers representing the dbroots to be emptied.

As mentioned above, the operation works at partition granularity, which means that a minimal move is 64 million rows. Any table smaller than that will not be redistributed and there may be as much as one full Partition difference in the resulting balance. The redistribute logic does not currently consolidate individually deleted records.

Redistribute can take a long time. During this time, it is required that all data manipulation including bulk inserts are suspended. SuspendDatabaseWrites must be called before redistributedata and ResumeDatabaseWrites should be called when the redistribution is complete.

If "redistributeData stop" is called, all processing stops where it's at, but in a usable state. "redistributeData status" can be used to see how much has been done. A further "redistributeData start" will start over using the new state of the system. This may lead to a less optimal distribution, so stop-start sequences aren't recommended.

While the system is working, "redistributeData status" can be called to see what's happening. a -r option can be used on the status command line to repeat the call and act as a monitor.

To see how much data resides on any given DBRoot for a table, you can use a query like:

select count(*) from <table> where idbdbroot(<any column>)=<dbrootnum>;

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore System Monitoring Configuration

  1. Introduction "Introduction"

  2. System monitoring configuration "System monitoring configuration"

  3. Module heartbeats "Module heartbeats"

  4. Disk threshold "Disk threshold"

  5. Memory utilization "Memory utilization"

  6. Viewing storage configuration "Viewing storage configuration"

  7. Module monitoring configuration "Module monitoring configuration"

  8. Alarm trigger count threshold "Alarm trigger count threshold"

  9. Clearing alarms "Clearing alarms"

  10. Automated restart based on excessive swapping "Automated restart based on excessive swapping"

  11. Logging level management "Logging level management"

Introduction

ColumnStore is designed to be somewhat self managing and healing. The following 2 processes help achieve this:

  • ProcMon runs on each node and is responsible for ensuring that the other required ColumnStore processes are started and automatically restarted as appropriate on that server. This in turn is started and monitored by the run.sh shell script which ensures it is restarted should it be killed. The run.sh script is invoked and automatically started by the columnstore systemd service at bootup time. This can also be utilized to restart the service on an individual node though generally it is preferred to use the mcsadmin stop, shutdown, and start commands from the PM1 node.

  • ProcMgr runs on each PM node with only one taking an active role at a time, the others remaining in warm standby mode. This process manager is responsible for overall system health, resource monitoring, and PM node failover management.

To provide additional monitoring guarantees, an external monitoring tool should monitor the health of these 3 processes and potentially all. If the run.sh process fails then the system is at potential risk of not being able to self heal.

System monitoring configuration

A number of system configuration variables exist to allow fine tuning of the system monitoring capabilities. In general the default values will work relatively well for many cases.

The configuration parameters are maintained in the /usr/local/mariadb/columnstore/etc/Columnstore.xml file. In a multiple server deployment these should only be edited on the PM1 server as this will be automatically replicated to other servers by the system. A system restart will be required for the configuration change to take affect.

Convenience utility programs getConfig and setConfig are available to safely update the Columnstore.xml without needing to be comfortable with editing XML files. The -h argument will display usage information. The section value will be SystemConfig for all settings in this document. For example:

# ./setConfig SystemConfig ModuleHeartbeatPeriod 5
# ./getConfig SystemConfig ModuleHeartbeatPeriod
5

Module heartbeats

Heartbeat monitoring occurs between modules (both UM and PM) to determine the module is up and functioning. The module heartbeat settings are the same for all modules.

  1. ModuleHeartbeatPeriod refers to how often the heartbeat test is performed. For example, if you set the period to 5, then the heartbeat test is performed every 5 seconds. The initial default value is 1. To disable heartbeat monitoring set the value to -1.

  2. ModuleHeartbeatCount refers to how many failures in a row must take place before a fault is processed. The initial default value is 3.

Disk threshold

Thresholds can be set to trigger a local alert when file system usage crosses a specified percentage of a file system on a server. Critical, Major or Minor thresholds can be set for the disk usage for each server. However it is recommend to use an external system monitoring tool configured to monitor for free disk space to perform proactive external alerting or paging. Actual columnstore data is stored within the data directories of the installation and mariadb db files are stored under the mysql/db directory.

  1. ExternalMinorThreshold - Percentage threshold for when a minor local alarm is triggered. Default value is 70.

  2. ExternalMajorThreshold - Percentage threshold for when a minor local alarm is triggered. Default value is 80.

  3. ExternalCriticalThreshold - Percentage threshold for when a minor local alarm is triggered. Default value is 90.

The value is a numeric percentage value between 0 and 100. To disable a particular threshold use value 0. To disable a threshold alarm, set it to 0.

Memory utilization

A couple of mcsadmin commands provide convenience functions for monitoring memory utilization across nodes. getSystemMemory returns server level memory statistics and getSystemMemoryUsers shows the top 5 processes by server. The following examples are for a 2 server combined setup:

mcsadmin> getSystemMemory
getsystemmemory   Tue Nov 29 11:14:21 2016

System Memory Usage per Module (in K bytes)

Module  Mem Total  Mem Used  Cache    Mem Usage %  Swap Total  Swap Used  Swap Usage % 
------  ---------  --------  -------  -----------  ----------  ---------  ------------ 
pm1     7979488    1014772   6438432      12       3145724     0               0
pm2     3850724    632712    1134324      16       3145724     0               0

mcsadmin> getSystemMemoryUsers
getsystemmemoryusers   Tue Nov 29 11:41:10 2016

System Process Top Memory Users per Module

Module 'pm1' Top Memory Users (in bytes)

Process             Memory Used  Memory Usage %
-----------------   -----------  --------------
mysqld              19621              3
PrimProc            18990              3
gnome-shell         10192              2
systemd-journald    4236               1
DDLProc             3004               1

Module 'pm2' Top Memory Users (in bytes)

Process             Memory Used  Memory Usage %
-----------------   -----------  --------------
mysqld              19046              5
PrimProc            18891              5
ProcMon             2343               1
workernode          1806               1
WriteEngineServ     1507               1

Viewing storage configuration

To view the storage configuration, use the getStorageConfig command in mcsadmin, or simply use mcsadmin getStorageConfig from the operating system prompt. This will provide information on DBRoots and which PM they are assigned to, if any.

Example:

# mcsadmin getstorageconfig Wed Mar 28 10:40:34 2016

System Storage Configuration

Storage Type = internal
System DBRoot count = 6
DBRoot IDs assigned to 'pm1' = 1
DBRoot IDs assigned to 'pm2' = 2
DBRoot IDs assigned to 'pm3' = 3
DBRoot IDs assigned to 'pm4' = 4
DBRoot IDs assigned to 'pm5' = 5
DBRoot IDs assigned to 'pm6' = 6

Module monitoring configuration

An internal alarm system is used to keep track of internal notable events as a convenience or reference point. It is recommended to use a dedicated system monitoring tool for more proactive alerting of critical CPU, memory, or disk utilization issues for each of the servers.

Alarms are logged to the /var/log/mariadb/columnstore/alarm.log file and a summary is displayed in mcsadmin. The getActiveAlarms command in mcsadmin can be used to retrieve current alarm conditions.

For each module (PM and UM), the following resource monitoring parameters can be configured:

Resource Monitoring Parameter
mcsadmin command

Resource Monitoring Parameter

mcsadmin command

CPU thresholds

setModuleTypeConfig (module name) ModuleCPU(Clear/ Minor/Major/Critical)Threshold n (where n= percentage of CPU usage)

Disk file system use threshold

setModuleTypeConfig (module name) ModuleDisk(Minor/ Major/Critical)Threshold n (where n= percentage of disk system used)

Module swap thresholds

setModuleTypeConfig (module name) ModuleSwap(Minor/ Major/Crictical)Threshold n (where n= percentage of swap space used)

Alarm trigger count threshold

For an alarm, a threshold can be set for how many times the alarm can be triggered in 30 minutes. The default threshold is 100.

setAlarmConfig (alarmID#) Threshold n

(where n= maximum number of times an alarm can be triggered in 30 minutes),

Example to change Alarm ID 22's threshold to 50:

# mcsadmin setAlarmConfig 22 Threshold 50

Clearing alarms

The resetAlarm command is used to clear and acknowledge the issue is resolved. The resetAlarm command can be invoked with the argument ALL to clear all outstanding local alarms.

Automated restart based on excessive swapping

ColumnStore by default has behavior that will restart a server should swap space utilization exceed the configured module swap major threshold (default is 80%). At this point the system will likely be near unusable and so this is an attempt to recover from very large queries or data loads. The behavior of this is configured by the SystemConfig section configuration variable SwapAction which contains the oam command to be run if the threshold is exceeded. The default value is 'restartSystem' but it can be set to 'none' to disable this behavior. The fact that this has happened can be determined by the following log entry:

Nov 01 11:23:13 [ServerMonitor] 13.306324 |0|0|0| C 09 CAL0000: Swap Space usage over Major threashold, perform OAM command restartSystem

Logging level management

There are five levels of logging in MariaDB ColumnStore.

  • Critical

  • Error

  • Warning

  • Info

  • Debug

Application log files are written to /var/log/mariadb/columnstore on each server and log rotation / archiving is configured to manage these automatically.

To get details about current logging configuration:

# mcsadmin getlogconfig
getlogconfig   Wed Oct 19 06:58:47 2016

MariaDB Columnstore System Log Configuration Data

System Logging Configuration File being used: /etc/rsyslog.d/49-columnstore.conf

Module    Configured Log Levels
------    ---------------------------------------
pm1       Critical Error Warning Info

The system logging configuration file referenced is a standard syslog configuration file and may be edited to enable and or disable specific levels, for example to disable debug logging and to only log at the specific level in each file:

# cat /etc/rsyslog.d/49-columnstore.conf
# MariaDb Columnstore Database Platform Logging
local1.=crit -/var/log/mariadb/columnstore/crit.log
local1.=err -/var/log/mariadb/columnstore/err.log
local1.=warning -/var/log/mariadb/columnstore/warning.log
local1.=info -/var/log/mariadb/columnstore/info.log

After making changes to this restart the syslog process, e.g:

# systemctl restart rsyslog

Log rotation and archiving are also configured by the installer and the settings for this may be found and managed similarly in the file /etc/logrotate.d/columnstore. If the current log files are manually deleted restart the syslog process to resume logging.

This page is licensed: CC BY-SA / Gnu FDL

Backup & Restore

MariaDB ColumnStore backup and restore manage distributed data using snapshots or tools like mariadb-backup, with restoration ensuring cluster sync via cpimport or file system recovery.

Backup and Restore Overview

Overview

MariaDB Enterprise ColumnStore supports backup and restore.

System of Record

Before you determine a backup strategy for your Enterprise ColumnStore deployment, it is a good idea to determine the system of record for your Enterprise ColumnStore data.

A system of record is the authoritative data source for a given piece of information. Organizations often store duplicate information in several systems, but only a single system can be the authoritative data source.

Enterprise ColumnStore is designed to handle analytical processing for OLAP, data warehousing, DSS, and hybrid workloads on very large data sets. Analytical processing does not generally happen on the system of record. Instead, analytical processing generally occurs on a specialized database that is loaded with data from the separate system of record. Additionally, very large data sets can be difficult to back up. Therefore, it may be beneficial to only backup the system of record.

If Enterprise ColumnStore is not acting as the system of record for your data, you should determine how the system of record affects your backup plan:

  • If your system of record is another database server, you should ensure that the other database server is properly backed up and that your organization has procedures to reload Enterprise ColumnStore from the other database server.

  • If your system of record is a set of data files, you should ensure that the set of data files is properly backed up and that your organization has procedures to reload Enterprise ColumnStore from the set of data files.

Full Backup and Restore

MariaDB Enterprise ColumnStore supports full backup and restore for all storage types. A full backup includes:

  • Enterprise ColumnStore's data and metadata

With S3: an S3 snapshot of the S3-compatible object storage and a file system snapshot or copy of the Storage Manager directory Without S3: a file system snapshot or copy of the DB Root directories.

  • The MariaDB data directory from the primary node

To see the procedure to perform a full backup and restore, choose the storage type:

Storage Type
Diagram

Storage Type

Diagram

Enterprise ColumnStore with Object Storage

Enterprise ColumnStore with Shared Local Storage

This page is: Copyright © 2025 MariaDB. All rights reserved.

Backup and Restore with Object Storage

Overview

MariaDB Enterprise ColumnStore supports backup and restore. If Enterprise ColumnStore uses S3-compatible object storage for data and shared local storage for the Storage Manager directory, the S3 bucket, the Storage Manager directory, and the MariaDB data directory must be backed up separately.

Recovery Planning

MariaDB Enterprise ColumnStore supports multiple storage options.

This page discusses how to backup and restore Enterprise ColumnStore when it uses S3-compatible object storage for data and shared local storage (such as NFS) for the Storage Manager directory.

Any file can become corrupt due to hardware issues, crashes, power loss, and other reasons. If the Enterprise ColumnStore data or metadata were to become corrupt, Enterprise ColumnStore could become unusable, and data loss could occur.

If Enterprise ColumnStore is your system of record, it should be backed up regularly.

If Enterprise ColumnStore uses S3-compatible object storage for data and shared local storage for the Storage Manager directory, the following items must be backed up:

  • The MariaDB Data directory is backed up using mariadb-backup.

  • The S3 bucket must be backed up using the vendor's snapshot procedure.

  • The Storage Manager directory must be backed up.

See the instructions below for more details.

Backup

Enterprise-ColumnStore-Backup-with-S3-Flow-Chart

Use the following process to take a backup:

  1. Determine which node is the primary server using curl to send the status command to the CMAPI Server:

$ curl -k -s https://mcs1:8640/cmapi/0.4.0/mcs cluster status \
   --header 'Content-Type:application/json' \
   --header 'x-api-key:93816fa66cc2d8c224e62275bd4f248234dd4947b68d4af2b29671dd7d5532dd' \
   | jq .

The output will show "dbrm_mode": "master" for the primary server:

{
  "timestamp": "2020-12-15 00:40:34.353574",
  "192.0.2.1": {
    "timestamp": "2020-12-15 00:40:34.362374",
    "uptime": 11467,
    "dbrm_mode": "master",
    "cluster_mode": "readwrite",
    "dbroots": [
      "1"
    ],
    "module_id": 1,
    "services": [
      {
        "name": "workernode",
        "pid": 19202
      },
      {
        "name": "controllernode",
        "pid": 19232
      },
      {
        "name": "PrimProc",
        "pid": 19254
      },
      {
        "name": "ExeMgr",
        "pid": 19292
      },
      {
        "name": "WriteEngine",
        "pid": 19316
      },
      {
        "name": "DMLProc",
        "pid": 19332
      },
      {
        "name": "DDLProc",
        "pid": 19366
      }
    ]
  }
  1. Connect to the primary server using MariaDB Client as a user account that has privileges to lock the database:

$ mariadb --host=192.0.2.1 \
   --user=root \
   --password
  1. Lock the database with the FLUSH TABLES WITH READ LOCK statement:

FLUSH TABLES WITH READ LOCK;

Ensure that the client remains connected to the primary server, so that the lock is held for the remaining steps.

  1. Make a copy or snapshot of the Storage Manager directory. By default, it is located at /var/lib/columnstore/storagemanager.

For example, to make a copy of the directory with rsync:

$ sudo mkdir -p /backups/columnstore/202101291600/
$ sudo rsync -av /var/lib/columnstore/storagemanager /backups/columnstore/202101291600/
  1. Use MariaDB Backup to backup the MariaDB data directory:

$ sudo mkdir -p /backups/mariadb/202101291600/
$ sudo mariadb-backup --backup \
   --target-dir=/backups/mariadb/202101291600/ \
   --user=mariadb-backup \
   --password=mbu_passwd
  1. Use MariaDB Backup to prepare the backup:

$ sudo mariadb-backup --prepare \
   --target-dir=/backups/mariadb/202101291600/
  1. Create a snapshot of the S3-compatible storage. Consult the storage vendor's manual for details on how to do this.

  2. Ensure that all previous operations are complete.

  3. In the original client connection to the primary server, unlock the database with the UNLOCK TABLES statement:

UNLOCK TABLES;

Restore

Use the following process to restore a backup:

  1. Deploy Enterprise ColumnStore, so that you can restore the backup to an empty deployment.

  2. Ensure that all services are stopped on each node:

$ sudo systemctl stop mariadb-columnstore-cmapi
$ sudo systemctl stop mariadb-columnstore
$ sudo systemctl stop mariadb
  1. Restore the backup of the Storage Manager directory. By default, it is located at /var/lib/columnstore/storagemanager.

For example, to restore the backup with rsync:

$ sudo rsync -av /backups/columnstore/202101291600/storagemanager/ /var/lib/columnstore/storagemanager/
$ sudo chown -R mysql:mysql /var/lib/columnstore/storagemanager
  1. Use MariaDB Backup to restore the backup of the MariaDB data directory:

$ sudo mariadb-backup --copy-back \
   --target-dir=/backups/mariadb/202101291600/
$ sudo chown -R mysql:mysql /var/lib/mysql
  1. Restore the snapshot of your S3-compatible storage to the new S3 bucket that you plan to use. Consult the storage vendor's manual for details on how to do this.

  2. Update storagemanager.cnf to configure Enterprise ColumnStore to use the S3 bucket. By default, it is located at /etc/columnstore/storagemanager.cnf.

For example:

[ObjectStorage]
…
service = S3
…
[S3]
bucket = your_columnstore_bucket_name
endpoint = your_s3_endpoint
aws_access_key_id = your_s3_access_key_id
aws_secret_access_key = your_s3_secret_key
# iam_role_name = your_iam_role
# sts_region = your_sts_region
# sts_endpoint = your_sts_endpoint

[Cache]
cache_size = your_local_cache_size
path = your_local_cache_path
  • The default local cache size is 2 GB.

  • The default local cache path is /var/lib/columnstore/storagemanager/cache.

  • Ensure that the local cache path has sufficient store space to store the local cache.

  • The bucket option must be set to the name of the bucket that you created from your snapshot in the previous step.

  • To use an IAM role, you must also uncomment and set iam_role_name, sts_region, and sts_endpoint.

  1. Start the services on each node:

$ sudo systemctl start mariadb
$ sudo systemctl start mariadb-columnstore-cmapi

This page is: Copyright © 2025 MariaDB. All rights reserved.

Backup and Restore with Shared Local Storage

Overview

MariaDB Enterprise ColumnStore supports backup and restore. If Enterprise ColumnStore uses shared local storage for the DB Root directories, the DB Root directories and the MariaDB data directory must be backed up separately.

Recovery Planning

MariaDB Enterprise ColumnStore supports multiple storage options.

This page discusses how to backup and restore Enterprise ColumnStore when it uses shared local storage (such as NFS) for the DB Root directories.

Any file can become corrupt due to hardware issues, crashes, power loss, and other reasons. If the Enterprise ColumnStore data or metadata were to become corrupt, Enterprise ColumnStore could become unusable, and data loss could occur.

If Enterprise ColumnStore is your system of record, it should be backed up regularly.

If Enterprise ColumnStore uses shared local storage for the DB Root directories, the following items must be backed up:

  • The MariaDB Data directory is backed up using MariaDB Backup

  • The Storage Manager directory must be backed up

  • Each DB Root directories must be backed up

See the instructions below for more details.

Backup

Use the following process to take a backup:

  1. Determine which node is the primary server using curl to send the status command to the CMAPI Server:

$ curl -k -s https://mcs1:8640/cmapi/0.4.0/cluster/status \
   --header 'Content-Type:application/json' \
   --header 'x-api-key:93816fa66cc2d8c224e62275bd4f248234dd4947b68d4af2b29671dd7d5532dd' \
   | jq .

The output will show "dbrm_mode": "master" for the primary server:

{
  "timestamp": "2020-12-15 00:40:34.353574",
  "192.0.2.1": {
    "timestamp": "2020-12-15 00:40:34.362374",
    "uptime": 11467,
    "dbrm_mode": "master",
    "cluster_mode": "readwrite",
    "dbroots": [
      "1"
    ],
    "module_id": 1,
    "services": [
      {
        "name": "workernode",
        "pid": 19202
      },
      {
        "name": "controllernode",
        "pid": 19232
      },
      {
        "name": "PrimProc",
        "pid": 19254
      },
      {
        "name": "ExeMgr",
        "pid": 19292
      },
      {
        "name": "WriteEngine",
        "pid": 19316
      },
      {
        "name": "DMLProc",
        "pid": 19332
      },
      {
        "name": "DDLProc",
        "pid": 19366
      }
    ]
  }
  1. Connect to the primary server using MariaDB Client as a user account that has privileges to lock the database:

$ mariadb --host=192.0.2.1 \
   --user=root \
   --password
  1. Lock the database with the FLUSH TABLES WITH READ LOCK statement:

FLUSH TABLES WITH READ LOCK;

Ensure that the client remains connected to the primary server, so that the lock is held for the remaining steps.

  1. Make a copy or snapshot of the Storage Manager directory. By default, it is located at /var/lib/columnstore/storagemanager.

For example, to make a copy of the directory with rsync:

$ sudo mkdir -p /backups/columnstore/202101291600/
$ sudo rsync -av /var/lib/columnstore/storagemanager /backups/columnstore/202101291600/
  1. Make a copy or snapshot of the DB Root directories. By default, they are located at /var/lib/columnstore/dataN, where the N in dataN represents a range of integers that starts at 1 and stops at the number of nodes in the deployment.

For example, to make a copy of the directories with rsync in a 3-node deployment:

$ sudo rsync -av /var/lib/columnstore/data1 /backups/columnstore/202101291600/
$ sudo rsync -av /var/lib/columnstore/data2 /backups/columnstore/202101291600/
$ sudo rsync -av /var/lib/columnstore/data3 /backups/columnstore/202101291600/
  1. Use MariaDB Backup to backup the Storage Manager directory:

$ sudo mkdir -p /backups/mariadb/202101291600/
$ sudo mariadb-backup --backup \
   --target-dir=/backups/mariadb/202101291600/ \
   --user=mariadb-backup \
   --password=mbu_passwd
  1. Use MariaDB Backup to prepare the backup:

$ sudo mariadb-backup --prepare \
   --target-dir=/backups/mariadb/202101291600/
  1. Ensure that all previous operations are complete.

  2. In the original client connection to the primary server, unlock the database with the UNLOCK TABLES statement:

UNLOCK TABLES;

Restore

Use the following process to restore a backup:

  1. Deploy Enterprise ColumnStore, so that you can restore the backup to an empty deployment.

  2. Ensure that all services are stopped on each node:

$ sudo systemctl stop mariadb-columnstore-cmapi
$ sudo systemctl stop mariadb-columnstore
$ sudo systemctl stop mariadb
  1. Restore the backup of the Storage Manager directory. By default, it is located at /var/lib/columnstore/storagemanager.

For example, to restore the backup with rsync:

$ sudo rsync -av /backups/columnstore/202101291600/storagemanager/ /var/lib/columnstore/storagemanager/
$ sudo chown -R mysql:mysql /var/lib/columnstore/storagemanager
  1. Restore the backup of the DB Root directories. By default, they are located at /var/lib/columnstore/dataN, where the N in dataN represents a range of integers that starts at 1 and stops at the number of nodes in the deployment.

For example, to restore the backup with rsync in a 3-node deployment:

$ sudo rsync -av /backups/columnstore/202101291600/data1/ /var/lib/columnstore/data1/
$ sudo rsync -av /backups/columnstore/202101291600/data2/ /var/lib/columnstore/data2/
$ sudo rsync -av /backups/columnstore/202101291600/data3/ /var/lib/columnstore/data3/
$ sudo chown -R mysql:mysql /var/lib/columnstore/data1
$ sudo chown -R mysql:mysql /var/lib/columnstore/data2
$ sudo chown -R mysql:mysql /var/lib/columnstore/data3
  1. Use MariaDB Backup to restore the backup of the MariaDB data directory:

$ sudo mariadb-backup --copy-back \
   --target-dir=/backups/mariadb/202101291600/
$ sudo chown -R mysql:mysql /var/lib/mysql
  1. Start the services on each node:

$ sudo systemctl start mariadb
$ sudo systemctl start mariadb-columnstore-cmapi

This page is: Copyright © 2025 MariaDB. All rights reserved.

Extent Map Backup & Recovery

MariaDB ColumnStore utilizes an Extent Map to manage data distribution across extents—logical blocks within physical segment files ranging from 8 to 64 MB. Each extent holds a consistent number of rows, with the Extent Map cataloging these extents, their corresponding block identifiers (LBIDs), and the minimum and maximum values for each column's data within the extent.​

The primary node maintains the master copy of the Extent Map. Upon system startup, this map is loaded into memory and propagated to other nodes for redundancy and quick access. Corruption of the master Extent Map can render the system unusable and lead to data loss.​

Purpose

ColumnStore's extent map is a smart structure that underpins its performance. By providing a logical partitioning scheme, it avoids the overhead associated with indexing and other common row-based database optimizations.

The primary node in a ColumnStore cluster holds the master copy of the extent map. Upon system startup, this master copy is read into memory and then replicated to all other participating nodes for high availability and disaster recovery. Nodes keep the extent map in memory for rapid access during query processing. As data within extents is modified, these updates are broadcast to all participating nodes to maintain consistency.

If the master copy of the extent map becomes corrupted, the entire system could become unusable, potentially leading to data loss. Having a recent backup of the extent map allows for a much faster recovery compared to reloading the entire database in such a scenario.

Backup Procedure

Note: MariaDB recommends implementing regular backups to ensure data integrity and recovery. A common default is to back up every 3 hours and retain backups for at least 10 days.

To safeguard against potential Extent Map corruption, regularly back up the master copy:

  1. Lock Table:

mariadb -e "FLUSH TABLES WITH READ LOCK;"
  1. Save BRM:

save_brm
  1. Create Backup Directory:

mkdir -p /extent_map_backup
  1. Copy Extent Map:

cp -f /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_em /extent_map_backup
  1. Unlock Tables:

mariadb -e "UNLOCK TABLES;"

Recovery Procedures

Single-Node System

  1. Stop ColumnStore:

systemctl stop mariadb-columnstore
  1. Rename Corrupted Map:

mv /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_em /tmp/BRM_saves_em.bad
  1. Clear Versioning Files:

> /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_vbbm > /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_vss
  1. Restore Backup:

cp -f /extent_map_backup/BRM_saves_em /var/lib/columnstore/data1/systemFiles/dbrm/
  1. Set Ownership:

chown -R mysql:mysql /var/lib/columnstore/data1/systemFiles/dbrm/
  1. Start ColumnStore:

systemctl start mariadb-columnstore

Clustered System

  1. Shutdown Cluster:

curl -s -X PUT https://127.0.0.1:8640/cmapi/0.4.0/cluster/shutdown \ --header 'Content-Type:application/json' \ --header 'x-api-key:your_api_key' \ --data '{"timeout":60}' -k
  1. Rename Corrupted Map:

mv /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_em /tmp/BRM_saves_em.bad
  1. Clear Versioning Files:

> /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_vbbm > /var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_vss
  1. Restore Backup:

mv cp -f /extent_map_backup/BRM_saves_em /var/lib/columnstore/data1/systemFiles/dbrm/
  1. Set Ownership:

chown -R mysql:mysql /var/lib/columnstore/data1/systemFiles/dbrm
  1. Start Cluster:

curl -s -X PUT https://127.0.0.1:8640/cmapi/0.4.0/cluster/start \ --header 'Content-Type:application/json' \ --header 'x-api-key:your_api_key' \ --data '{"timeout":60}' -k

Automation Recommendation

Incorporate the save_brm command into your data import scripts (e.g., those using cpimport) to automate Extent Map backups. This practice ensures regular backups without manual intervention. Refer to the MariaDB ColumnStore Backup Script for an example implementation.​

This page is licensed: CC BY-SA / Gnu FDL

Security

MariaDB ColumnStore uses MariaDB Server’s security—encryption, access control, auditing, and firewall—for secure analytics.

ColumnStore Security Vulnerabilities

  1. About CVEs "About CVEs"

  2. CVEs fixed in ColumnStore "CVEs fixed in ColumnStore"

This page is about security vulnerabilities that have been fixed for or still affect MariaDB ColumnStore. In addition, links are included to fixed security vulnerabilities in MariaDB Server since MariaDB ColumnStore is based on MariaDB Server.

Sensitive security issues can be sent directly to the persons responsible for MariaDB security: security [AT] mariadb (dot) org.

About CVEs

CVE® stands for "Common Vulnerabilities and Exposures". It is a publicly available and free-to-use database of known software vulnerabilities maintained at

CVEs fixed in ColumnStore

The appropriate release notes listed here document CVEs fixed within a given release. Additional information can also be found at Security Vulnerabilities Fixed in MariaDB.

There are no known CVEs on ColumnStore-specific infrastructure outside of the MariaDB server at this time.

This page is licensed: CC BY-SA / Gnu FDL

Credentials Management

Overview

Starting with MariaDB Enterprise ColumnStore 6.2.3, ColumnStore supports encryption for user passwords stored in Columnstore.xml:

  • Encryption keys are created with the cskeys utility

  • Passwords are encrypted using the cspasswd utility

Compatibility

  • MariaDB Enterprise ColumnStore 6

  • MariaDB Enterprise ColumnStore 22.08

  • MariaDB Enterprise ColumnStore 23.02

Encryption Keys

MariaDB Enterprise ColumnStore stores its password encryption keys in the plain-text file /var/lib/columnstore/.secrets.

The encryption keys are not created by default, but can be generated by executing the cskeys utility:

$ cskeys

In a multi-node Enterprise ColumnStore cluster, every ColumnStore node should have the same encryption keys. Therefore, it is recommended to execute cskeys on the primary server, and then copy /var/lib/columnstore/.secrets to every other ColumnStore node and fix the file's permissions:

$ scp 192.0.2.1:/var/lib/columnstore/.secrets /var/lib/columnstore/.secrets
$ sudo chown mysql:mysql /var/lib/columnstore/.secrets
$ sudo chmod 0400 /var/lib/columnstore/.secrets

Encrypt a Password

To encrypt a password:

Generate an encrypted password using the cspasswd utility:

$ cspasswd util_user_passwd
  • If the --interactive command-line option is specified, cspasswd prompts for the password.

Set the encrypted password in Columnstore.xml using the mcsSetConfig utility:

$ sudo mcsSetConfig CrossEngineSupport Password util_user_encrypted_passwd

Decrypt a Password

To decrypt a password, execute the cspasswd utility and specify the --decrypt command-line option:

$ cspasswd --decrypt util_user_encrypted_passwd

This page is: Copyright © 2025 MariaDB. All rights reserved.

Use Cases

MariaDB ColumnStore is ideal for real-time analytics and complex queries on large datasets across industries.

About MariaDB ColumnStore

MariaDB ColumnStore is a columnar storage engine that utilizes a massively parallel distributed data architecture. It's a columnar storage system built by porting InfiniDB 4.6.7 to MariaDB and released under the GPL license.

MariaDB 10.5.4 is available as a storage engine for MariaDB Server. Before then, it is available as a separate download.

Release notes and other documentation for ColumnStore is also available in the Enterprise docs section of the MariaDB website. For example:

  • ColumnStore 23.10 Release Notes

  • ColumnStore 23.02 Release Notes

  • ColumnStore 22.08 Release Notes

  • ColumnStore 6 Release Notes

  • ColumnStore 5.6 Release Notes

  • Deployment Instructions

It is designed for big data scaling to process petabytes of data, linear scalability, and exceptional performance with real-time response to analytical queries. It leverages the I/O benefits of columnar storage, compression, just-in-time projection, and horizontal and vertical partitioning to deliver tremendous performance when analyzing large data sets.

Links:

  • MariaDB Columnstore Blogs.

  • A Google Group exists for MariaDB ColumnStore that can be used to discuss ideas and issues and communicate with the community: Send email to mariadb-columnstore@googlegroups.com or use the forum interface

  • Bugs can be reported in MariaDB Jira: jira.mariadb.org (see Reporting Bugs). Please file bugs under the MCOL project and include the output from the support utility if possible.

MariaDB ColumnStore is released under the GPL license.

This page is licensed: CC BY-SA / Gnu FDL

High Availability

MariaDB ColumnStore ensures high availability with multi-node setups and shared storage, while MaxScale adds monitoring and failover for continuous analytics.

Performance Related Configuration Settings

MariaDB ColumnStore

  1. Introduction "Introduction"

  2. Memory management - NumBlocksPct and TotalUmMemory "Memory management - NumBlocksPct and TotalUmMemory"

  3. Query concurrency - MaxOutstandingRequests "Query concurrency - MaxOutstandingRequests"

  4. Join tuning - PmMaxMemorySmallSide "Join tuning - PmMaxMemorySmallSide"

  5. Multi table join tuning "Multi table join tuning"

  6. Disk based joins - AllowDiskBasedJoin "Disk based joins - AllowDiskBasedJoin"

Introduction

A number of system configuration variables exist to allow fine tuning of the system to suit the physical hardware and query characteristics. In general the default values will work relatively well for many cases.

The configuration parameters are maintained in the /usr/local/mariadb/columnstore/etc/Columnstore.xml file. In a multiple server deployment these should only be edited on the PM1 server as this will be automatically replicated to other servers by the system. A system restart will be required for the configuration change to take affect.

Convenience utility programs getConfig and setConfig are available to safely update the Columnstore.xml without needing to be comfortable with editing XML files. The -h argument will display usage information.

Memory management - NumBlocksPct and TotalUmMemory

The NumBlocksPct configuration parameter specifies the percentage of physical memory to utilize for disk block caching. For a Single Server or Combined Multi Server deployment the default value is 50 to ensure enough physical memory for the UM and for a non combined multi serve deployment the default value is 70.

The TotalUmMemory configuration parameter specifies the percentage of physical memory to utilize for joins, intermediate results and set operations on the UM. This specifies an upper limit for small table results in joins rather than a pre-allocation of memory.

In a single server or combined deployment, the sum of NumBlocksPct and TotalUmMemory should typically not exceed 75% of physical memory. With very large memory servers this could be raised but the key point is to leave enough memory for other processes including mysqld.

With version 1.2.2 onwards these can be set to static numeric limits instead of percentages by entering a number with 'M' or 'G' at the end to signify MiB or GiB.

Query concurrency - MaxOutstandingRequests

ColumnStore handles concurrent query execution by managing the rate of concurrent batch primitive steps from the UM to the PM. This is configured using the MaxOutstandingRequests parameter and has a default value of 20. Each batch primitive step is executed within the context of 1 extent column according to this high level process:

  • The UM issues up to MaxOutstandingRequests number of batch primitive steps.

  • The PM processes the request using many threads and returns its response. These generally take a fraction of a second up to a low number of seconds depending on the amount of Physical I/O and the performance of that storage.

  • The UM will issue new requests as prior requests complete maintaining the maximum number of outstanding requests.

This scheme allows for large queries to use all available resources when not otherwise being consumed and for smaller queries to execute with minimal delay. Lower values optimize for higher throughput of smaller queries while a larger value optimizes for response time of that query. The default value should work well under most circumstances however the value should be increased as the number of PM nodes is increased.

How many Queries are running and how many queries are currently in the queue can be checked with

select calgetsqlcount();

Join tuning - PmMaxMemorySmallSide

ColumnStore maintains statistics for table and utilizes this to determine which is the larger table of the two. This is based both on the number of blocks in that table and estimation of the predicate cardinality. The first step is to apply any filters as appropriate to the smaller table and returning this data set to the UM. The size of this data set is compared against the configuration parameter PmMaxMemorySmallSide which has a default value of 64 (MB). This value can be set all the way up to 4GB. This default allows for approximately 1M rows on the small table side to be joined against billions (or trillions) on the large table side. If the size of the small data set is less than PmMaxMemorySmallSide the dataset will be sent to the PM for creation of a distributed hashmap otherwise it is created on the UM. Thus this setting is important to tuning of joins and whether the operation can be distributed or not. This should be set to support your largest expected small table join size up to available memory:

  • Although this will increase the size of data sent from the UM to PM to support the join, it means that the join and subsequent aggregates are pushed down, scaled out, and a smaller data set is returned back to the UM.

  • In a multiple PM deployment, the sizing should be based from available physical memory on the PM servers, how much memory to reserve for block caching, and the number of simultaneous join operations that can be expected to run times the average small table join data size.

Multi table join tuning

The above logic for a single table join extrapolates out to multi table joins where the small table values are precalculated and performed as one single scan against the large table. This works well for the typical star schema case joining multiple dimension tables with a large fact table. For some join scenarios it may be necessary to sequence joins to create the intermediate datasets for joining, this would happen for instance with a snowflake schema structure. In some extreme cases it may be hard for the optimizer to be able to determine the most optimal join path. In this case a hint is available to force a join ordering. The INFINIDB_ORDERED hint will force the first table in the from clause to be considered the largest table and override any statistics based decision, for example:

select /*! INFINIDB_ORDERED */ r_regionkey     
from region r, customer c, nation n    
where r.r_regionkey = n.n_regionkey      
and n.n_nationkey = c.c_nationkey

Note: INFINIDB\_ORDERED is deprecated and does not work anymore for ColumnStore 1.2 and above.

use set infinidb_ordered_only=ON;

and for 1.4 set columnstore_ordered_only=ON;

Disk based joins - AllowDiskBasedJoin

When a join is very large and exceeds the PmMaxMemorySmallSide setting it is performed in memory in the UM server. For very large joins this could exceed the available memory in which case this is detected and a query error reported. A number of configuration parameters are available to enable and configure usage of disk overflow should this occur:

  • AllowDiskBasedJoin – Controls the option to use disk Based joins or not. Valid values are Y (enabled) or N (disabled). By default, this option is disabled.

  • TempFileCompression – Controls whether the disk join files are compressed or noncompressed. Valid values are Y (use compressed files) or N (use non-compressed files).

  • TempFilePath – The directory path used for the disk joins. By default, this path is the tmp directory for your installation (i.e., /usr/local/mariadb/columnstore/tmp). Files (named infinidb-join-data*) in this directory will be created and cleaned on an as needed basis. The entire directory is removed and recreated by ExeMgr at startup. It is strongly recommended that this directory is stored on a dedicated partition.

A mariadb global or session variable is available to specify a memory limit at which point the query is switched over to disk based joins:

  • infinidb_um_mem_limit - Memory limit in MB per user (i.e. switch to disk based join if this limit is exceeded). By default, this limit is not set (value of 0).

This page is licensed: CC BY-SA / Gnu FDL

Query Tuning

MariaDB ColumnStore query tuning optimizes analytics using data types, joins, projection elimination, WHERE clauses, and EXPLAIN for performance insights.

Collecting Statistics with ANALYZE TABLE

Overview

In MariaDB Enterprise ColumnStore 6, the ExeMgr process uses optimizer statistics in its query planning process.

ColumnStore uses the optimizer statistics to add support for queries that contain circular inner joins.

In Enterprise ColumnStore 5 and before, ColumnStore would raise the following error when a query containing a circular inner join was executed:

ERROR 1815 (HY000): Internal error: IDB-1003: Circular joins are not supported.

The optimizer statistics store each column's NDV (Number of Distinct Values), which can help the ExeMgr process choose the optimal join order for queries with circular joins. When Enterprise ColumnStore executes a query with a circular join, the query's execution can take longer if ColumnStore chooses a sub-optimal join order. When you collect optimizer statistics for your ColumnStore tables, the ExeMgr process is less likely to choose a sub-optimal join order.

Enterprise ColumnStore's optimizer statistics can be collected for ColumnStore tables by executing ANALYZE TABLE:

[[analyze-table|ANALYZE TABLE]] columnstore_tab;

Enterprise ColumnStore's optimizer statistics are not updated automatically. To update the optimizer statistics for a ColumnStore table, ANALYZE TABLE must be re-executed.

Enterprise ColumnStore does not implement an interface to show optimizer statistics.

This page is: Copyright © 2025 MariaDB. All rights reserved.

Query Tuning Recommendations

When tuning queries for MariaDB Enterprise ColumnStore, there are some important details to consider.

Avoid Selecting Unnecessary Columns

Enterprise ColumnStore only reads the columns that are necessary to resolve a query.

For example, the following query selects every column in the table:

SELECT * FROM tab;

Whereas the following query only selects two columns in the table, so it requires less I/O:

SELECT col1, col2 FROM tab;

For best performance, only select the columns that are necessary to resolve a query.

Avoid Large Sorts

When Enterprise ColumnStore performs ORDER BY and LIMIT operations, the operations are performed in a single-threaded manner after the rest of the query processing has been completed, and the full unsorted result-set has been retrieved. For large data sets, the performance overhead can be significant.

Avoid Excessive Aggregations

When Enterprise ColumnStore 5 performs aggregations (i.e., DISTINCT, GROUP BY, COUNT(*), etc.), all of the aggregation work happens in-memory by default. As a consequence, more complex aggregation operations require more memory in that version.

For example, the following query could require a lot of memory in Enterprise ColumnStore 5, since it has to calculate many distinct values in memory:

SELECT DISTINCT col1 FROM tab LIMIT 10000;

Whereas the following query could require much less memory in Enterprise ColumnStore 5, since it has to calculate fewer distinct values:

SELECT DISTINCT col1 FROM tab LIMIT 100;

In Enterprise ColumnStore 6, disk-based aggregations can be enabled.

For best performance, avoid excessive aggregations or enable disk-based aggregations.

For additional information, see "Configure Disk-Based Aggregations".

Avoid Non-Distributed Functions

When Enterprise ColumnStore evaluates built-in functions and aggregate functions, it can often evaluate the function in a distributed manner. Distributed evaluation of functions can significantly improve performance.

Enterprise ColumnStore supports distributed evaluation for some built-in functions. For other built-in functions, the function must be evaluated serially on the final result set.

Enterprise ColumnStore also supports distributed evaluation for user-defined functions developed with ColumnStore's User-Defined Aggregate Function (UDAF) C++ API. For functions developed with Enterprise Server's standard User-Defined Function (UDF) API, the function must be evaluated serially on the final result set.

For best performance, avoid non-distributed functions.

Optimize Large Joins

By default, Enterprise ColumnStore performs all joins as in-memory hash joins.

If the joined tables are very large, the in-memory hash join can require too much memory for the default configuration. There are a couple options to work around this:

  • Enterprise ColumnStore can be configured to use more memory for in-memory hash joins.

  • Enterprise ColumnStore can be configured to use disk-based joins.

  • Enterprise ColumnStore can use optimizer statistics to better optimize the join order.

For additional information, see "Configure In-Memory Joins", "Configure Disk-Based Joins", and "Optimizer Statistics".

Load Ordered Data in Proper Order

Enterprise ColumnStore uses extent elimination to optimize queries. Extent elimination uses the minimum and maximum values in the extent map to determine which extents can be skipped for a query.

When data is loaded into Enterprise ColumnStore, it appends the data to the latest extent. When an extent reaches the maximum number of column values, Enterprise ColumnStore creates a new extent. As a consequence, if ordered data is loaded in its proper order, then similar values will be clustered together in the same extent. This can improve query performance, because extent elimination performs best when similar values are clustered together.

For example, if you expect to query a table with a filter on a timestamp column, you should sort the data using the timestamp column before loading it into Enterprise ColumnStore. Later, when the table is queried with a filter on the timestamp column, Enterprise ColumnStore would be able to skip many extents using extent elimination.

For best performance, load ordered data in proper order.

Enable Decimal Overflow Checks

When Enterprise ColumnStore performs mathematical operations with very big values using the DECIMAL, NUMERIC, and FIXED data types, the operation can sometimes overflow ColumnStore's maximum precision or scale. The maximum precision and scale depends on the version of Enterprise ColumnStore:

  • In Enterprise ColumnStore 6, the maximum precision (M) is 38, and the maximum scale (D) is 38.

  • In Enterprise ColumnStore 5, the maximum precision (M) is 18, and the maximum scale (D) is 18.

In Enterprise ColumnStore 6, applications can configure Enterprise ColumnStore to check for decimal overflows by setting the columnstore_decimal_overflow_check system variable, but only when the column has a decimal precision that is 18 or more:

SET SESSION columnstore_decimal_overflow_check=ON;

SELECT (big_decimal1 * big_decimal2) AS product
FROM columnstore_tab;

When decimal overflow checks are enabled, math operations have extra overhead.

When the decimal overflow check fails, MariaDB Enterprise ColumnStore raises an error with the ER_INTERNAL_ERROR error sql, and it writes detailed information about the overflow check failure to the ColumnStore system logs.

User-Defined Aggregate Function (UDAF) C++ API

MariaDB Enterprise ColumnStore supports Enterprise Server's standard User-Defined Function (UDF) API. However, UDFs developed using that API cannot be executed in a distributed manner.

To support distributed execution of custom sql, MariaDB Enterprise ColumnStore supports a Distributed User Defined Aggregate Functions (UDAF) C++ API:

  • The Distributed User Defined Aggregate Functions (UDAF) C++ API allows anyone to create aggregate functions of arbitrary complexity for distributed execution in the ColumnStore storage engine.

  • These functions can also be used as Analytic (Window) functions just like any built-in aggregate function.

This page is: Copyright © 2025 MariaDB. All rights reserved.

Query Plans and Optimizer Trace

MariaDB ColumnStore's query plans and Optimizer Trace show how analytical queries run in parallel across its distributed, columnar architecture, aiding performance tuning.

Execution Plan (CSEP)

Overview

The ColumnStore storage engine uses a ColumnStore Execution Plan (CSEP) to represent a query plan internally.

When the select handler receives the SELECT_LEX object, it transforms it into a CSEP as part of the query planning and optimization process. For additional information, see "MariaDB Enterprise ColumnStore Query Evaluation."

Viewing the CSEP

The CSEP for a given query can be viewed by performing the following:

  1. Calling the calSetTrace(1) function:

SELECT calSetTrace(1);
  1. Executing the query:

SELECT column1, column2
FROM columnstore_tab
WHERE column1 > '2020-04-01'
AND column1 < '2020-11-01';
  1. Calling the calGetTrace() function:

SELECT calGetTrace();

This page is: Copyright © 2025 MariaDB. All rights reserved.

Job Steps

Overview

When Enterprise ColumnStore executes a query, the ExeMgr process on the initiator/aggregator node translates the ColumnStore execution plan (CSEP) into a job list. A job list is a sequence of job steps.

Enterprise ColumnStore uses many different types of job steps that provide different scalability benefits:

  • Some types of job steps perform operations in a distributed manner, using multiple nodes to operate to different extents. Distributed operations provide horizontal scalability.

  • Some types of job steps perform operations in a multi-threaded manner using a thread pool. Performing multi-threaded operations provides vertical scalability.

As you increase the number of ColumnStore nodes or the number of cores on each node, Enterprise ColumnStore can use those resources to more efficiently execute job steps.

For additional information, see "MariaDB Enterprise ColumnStore Query Evaluation.".

Batch Primitive Step (BPS)

Enterprise ColumnStore defines a batch primitive step to handle many types of tasks, such as scanning/filtering columns, JOIN operations, aggregation, functional filtering, and projecting (putting values into a SELECT list).

In calGetTrace() output, a batch primitive step is abbreviated BPS.

Batch primitive steps are evaluated on multiple nodes in parallel. The PrimProc process on each node evaluates the batch primitive step to one extent at a time. The PrimProc process uses a thread pool to operate on individual blocks within the extent in parallel.

Cross Engine Step (CES)

Enterprise ColumnStore defines a cross-engine step to perform cross-engine joins, in which a ColumnStore table is joined with a table that uses a different storage engine.

In calGetTrace() output, a cross-engine step is abbreviated CES.

Cross-engine steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.

Enterprise ColumnStore can perform cross-engine joins when the mandatory utility user is properly configured.

For additional information, refer to the "Mandatory Utility User Account."

Dictionary Structure Step (DSS)

Enterprise ColumnStore defines a dictionary structure step to scan the dictionary extents that ColumnStore uses to store variable-length string values.

In calGetTrace() output, a dictionary structure step is abbreviated DSS.

Dictionary structure steps are evaluated on multiple nodes in parallel. The PrimProc process on each node evaluates the dictionary structure step to one extent at a time. It uses a thread pool to operate on individual blocks within the extent in parallel.

Dictionary structure steps can require a lot of I/O for a couple of reasons:

  • Dictionary structure steps do not support extent elimination, so all extents for the column must be scanned.

  • Dictionary structure steps must read the column extents to find each pointer and the dictionary extents to find each value, so it doubles the number of extents to scan.

It is generally recommended to avoid queries that will cause dictionary scans.

For additional information, see "Avoid Creating Long String Columns".

Hash Join Step (HJS)

Enterprise ColumnStore defines a hash join step to perform a hash join between two tables.

In calGetTrace() output, a hash join step is abbreviated HJS.

Hash join steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.

Enterprise ColumnStore performs the hash join in memory by default. If you perform large joins, you may be able get better performance by changing some configuration defaults with mcsSetConfig:

  • Enterprise ColumnStore can be configured to use more memory for in-memory hash joins.

  • Enterprise ColumnStore can be configured to use disk-based joins.

For additional information, see "Configure in-memory joins" and "Configure Disk-Based Joins".

Having Step (HVS)

Enterprise ColumnStore defines a having step to evaluate a HAVING clause on a result set.

In calGetTrace() output, a having step is abbreviated HVS.

Subquery Step (SQS)

Enterprise ColumnStore defines a subquery step to evaluate a subquery.

In calGetTrace() output, a subquery step is abbreviated SQS.

Tuple Aggregation Step (TAS)

Enterprise ColumnStore defines a tuple aggregation step to collect intermediate aggregation prior to the final aggregation and evaluation of the results.

In calGetTrace() output, a tuple aggregation step is abbreviated TAS.

Tuple aggregation steps are primarily evaluated by the ExeMgr process on the initiator/aggregator node. However, the PrimProc process on each node also plays a role, since the PrimProc process on each node provides the intermediate aggregation results to the ExeMgr process on the initiator/aggregator node.

Tuple Annexation Step (TNS)

Enterprise ColumnStore defines a tuple annexation step to perform the final aggregation and evaluation of the results.

In calGetTrace() output, a tuple annexation step is abbreviated TNS.

Tuple annexation steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.

Enterprise ColumnStore 5 performs aggregation operations in memory. As a consequence, more complex aggregation operations require more memory in that version.

In Enterprise ColumnStore 6, disk-based aggregations can be enabled.

For additional information, see "Configure Disk-Based Aggregations".

Tuple Union Step (TUS)

Enterprise ColumnStore defines a tuple union step to perform a union of two subqueries.

In calGetTrace() output, a tuple union step is abbreviated TUS.

Tuple union steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.

Tuple Constant Step (TCS)

Enterprise ColumnStore defines a tuple constant step to evaluate constant values.

In calGetTrace() output, a tuple constant step is abbreviated TCS.

Tuple constant steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.

Window Function Step (WFS)

Enterprise ColumnStore defines a window function step to evaluate window functions.

In calGetTrace() output, a window function step is abbreviated WFS.

Window function steps are evaluated locally by the ExeMgr process on the initiator/aggregator node.

This page is: Copyright © 2025 MariaDB. All rights reserved.

Clients & Tools

MariaDB ColumnStore supports standard MariaDB tools, BI connectors (e.g., Tableau, Power BI), data ingestion (cpimport, Kafka), and REST APIs for admin.

StorageManager

The ColumnStore StorageManager manages columnar data storage and retrieval, optimizing analytical queries.

Certified S3 Object Storage Providers

Hardware (On Prem)

  • Quantum ActiveScale

  • IBM Cloud Object Storage (Formerly known as CleverSafe)

  • DELL EMC

Cloud (IaaS)

  • AWS S3

  • Google GCS

Software Based

Due to the frequent code changes and deviation from the AWS standards, none are approved at this time.

This page is licensed: CC BY-SA / Gnu FDL

Sample storagemanager.cnf

# Sample storagemanager.cnf

[ObjectStorage]
service = S3
object_size = 5M
metadata_path = /var/lib/columnstore/storagemanager/metadata
journal_path = /var/lib/columnstore/storagemanager/journal
max_concurrent_downloads = 21
max_concurrent_uploads = 21
common_prefix_depth = 3

[S3]
region = us-west-1
bucket = my_columnstore_bucket
endpoint = s3.amazonaws.com
aws_access_key_id = AKIAR6P77BUKULIDIL55
aws_secret_access_key = F38aR4eLrgNSWPAKFDJLDAcax0gZ3kYblU79

[LocalStorage]
path = /var/lib/columnstore/storagemanager/fake-cloud
fake_latency = n
max_latency = 50000

[Cache]
cache_size = 2g
path = /var/lib/columnstore/storagemanager/cache

Note: A region is required even when using an on-prem solution like ActiveScale due to header expectations within the API.

This page is licensed: CC BY-SA / Gnu FDL

Using StorageManager With IAM Role

AWS IAM Role Configuration

We have added a new feature in Columnstore 5.5.2 that allows you to use AWS IAM roles in order to connect to S3 buckets without explicitly entering credentials into the storagemanager.cnf config file.

You will need to modify the IAM role of your Amazon EC2 instance to allow for this. Please follow the AWS documentation before beginning this process.

It is important to note that you must update the AWS S3 endpoint based on your chosen region; otherwise, you might face delays in propagation as discussed here and here.

For a complete list of AWS service endpoints, please visit the AWS reference guide.

Sample configuration

Edit your Storage Manager configuration file located at /etc/columnstore/storagemanager.cnf in order to look similar to the example below (replacing those in the [S3] section with your own custom variables):

[ObjectStorage]
service = S3
object_size = 5M
metadata_path = /var/lib/columnstore/storagemanager/metadata
journal_path = /var/lib/columnstore/storagemanager/journal
max_concurrent_downloads = 21
max_concurrent_uploads = 21
common_prefix_depth = 3

[S3]
ec2_iam_mode=enabled
bucket = my_mcs_bucket
region = us-west-2
endpoint = s3.us-west-2.amazonaws.com

[LocalStorage]
path = /var/lib/columnstore/storagemanager/fake-cloud
fake_latency = n
max_latency = 50000

[Cache]
cache_size = 2g
path = /var/lib/columnstore/storagemanager/cache

Note: This is an AWS only feature. For other deployment methods, see the example here.

This page is licensed: CC BY-SA / Gnu FDL

Data Ingestion Methods & Tools

Learn about data ingestion for MariaDB ColumnStore. This section covers various methods and tools for efficiently loading large datasets into your columnar database for analytical workloads.

ColumnStore provides several mechanisms to ingest data:

  • cpimport provides the fastest performance for inserting data and ability to route data to particular PM nodes. Normally this should be the default choice for loading data .

  • LOAD DATA INFILE provides another means of bulk inserting data.

    • By default with autocommit on it will internally stream the data to an instance of the cpimport process. This requires some memory overhead on the UM server so may be less reliable than cpimport for very large imports.

    • In transactional mode DML inserts are performed which will be significantly slower plus it will consume both binlog transaction files and ColumnStore VersionBuffer files.

  • DML, i.e. INSERT, UPDATE, and DELETE, provide row level changes. ColumnStore is optimized towards bulk modifications and so these operations are slower than they would be in say InnoDB.

    • Currently ColumnStore does not support operating as a replication slave target.

    • Bulk DML operations will in general perform better than multiple individual statements.

      • INSERT INTO SELECT with autocommit behaves similarly to LOAD DATE INFILE in that internally it is mapped to cpimport for higher performance.

      • Bulk update operations based on a join with a small staging table can be relatively fast especially if updating a single column.

  • Using ColumnStore Bulk Write SDK or ColumnStore Streaming Data Adapters

ColumnStore Bulk Data Loading

Overview

cpimport is a high-speed bulk load utility that imports data into ColumnStore tables in a fast and efficient manner. It accepts as input any flat file containing data that contains a delimiter between fields of data (i.e. columns in a table). The default delimiter is the pipe (‘|’) character, but other delimiters such as commas may be used as well. The data values must be in the same order as the create table statement, i.e. column 1 matches the first column in the table and so on. Date values must be specified in the format 'yyyy-mm-dd'.

cpimport – performs the following operations when importing data into a MariaDB ColumnStore database:

  • Data is read from specified flat files.

  • Data is transformed to fit ColumnStore’s column-oriented storage design.

  • Redundant data is tokenized and logically compressed.

  • Data is written to disk.

It is important to note that:

  • The bulk loads are an append operation to a table so they allow existing data to be read and remain unaffected during the process.

  • The bulk loads do not write their data operations to the transaction log; they are not transactional in nature but are considered an atomic operation at this time. Information markers, however, are placed in the transaction log so the DBA is aware that a bulk operation did occur.

  • Upon completion of the load operation, a high water mark in each column file is moved in an atomic operation that allows for any subsequent queries to read the newly loaded data. This append operation provides for consistent read but does not incur the overhead of logging the data.

There are two primary steps to using the cpimport utility:

  1. Optionally create a job file that is used to load data from a flat file into multiple tables.

  2. Run the cpimport utility to perform the data import.

Syntax

The simplest form of cpimport command is

cpimport dbName tblName [loadFile]

The full syntax is like this:

cpimport dbName tblName [loadFile]
[-h] [-m mode] [-f filepath] [-d DebugLevel]
[-c readBufferSize] [-b numBuffers] [-r numReaders]
[-e maxErrors] [-B libBufferSize] [-s colDelimiter] [-E EnclosedByChar]
[-C escChar] [-j jobID] [-p jobFilePath] [-w numParsers]
[-n nullOption] [-P pmList] [-i] [-S] [-q batchQty]

positional parameters:
	dbName     Name of the database to load
	tblName    Name of table to load
	loadFile   Optional input file name in current directory,
			unless a fully qualified name is given.
			If not given, input read from STDIN.
Options:
	-b	Number of read buffers
	-c	Application read buffer size(in bytes)
	-d	Print different level(1-3) debug message
	-e	Max number of allowable error per table per PM
	-f	Data file directory path.
			Default is current working directory.
			In Mode 1, -f represents the local input file path.
			In Mode 2, -f represents the PM based input file path.
			In Mode 3, -f represents the local input file path.
	-l	Name of import file to be loaded, relative to -f path. (Cannot be used with -p)
	-h	Print this message.
	-q	Batch Quantity, Number of rows distributed per batch in Mode 1
	-i	Print extended info to console in Mode 3.
	-j	Job ID. In simple usage, default is the table OID.
			unless a fully qualified input file name is given.
	-n	NullOption (0-treat the string NULL as data (default);
			1-treat the string NULL as a NULL value)
	-p	Path for XML job description file.
	-r	Number of readers.
	-s	The delimiter between column values.
	-B	I/O library read buffer size (in bytes)
	-w	Number of parsers.
	-E	Enclosed by character if field values are enclosed.
	-C	Escape character used in conjunction with 'enclosed by'
			character, or as part of NULL escape sequence ('\N');
			default is '\'
	-I	Import binary data; how to treat NULL values:
			1 - import NULL values
			2 - saturate NULL values
	-P	List of PMs ex: -P 1,2,3. Default is all PMs.
	-S	Treat string truncations as errors.
	-m	mode
			1 - rows will be loaded in a distributed manner across PMs.
			2 - PM based input files loaded onto their respective PM.
			3 - input files will be loaded on the local PM.

cpimport modes

Mode 1: Bulk Load from a central location with single data source file

In this mode, you run the cpimport from your primary node (mcs1). The source file is located at this primary location and the data from cpimport is distributed across all the nodes. If no mode is specified, then this is the default.

cpimport-mode1

Example:

cpimport -m1 mytest mytable mytable.tbl

Mode 2: Bulk load from central location with distributed data source files

In this mode, you run the cpimport from your primary node (mcs1). The source data is in already partitioned data files residing on the PMs. Each PM should have the source data file of the same name but containing the partitioned data for the PM

cpimport-mode2

Example:

cpimport -m2 mytest mytable -l /home/mydata/mytable.tbl

Mode 3: Parallel distributed bulk load

In this mode, you run cpimport from the individual nodes independently, which will import the source file that exists on that node. Concurrent imports can be executed on every node for the same table.

cpimport-mode3

Example:

cpimport -m3 mytest mytable /home/mydata/mytable.tbl

Note:

  • The bulk loads are an append operation to a table so they allow existing data to be read and remain unaffected during the process.

  • The bulk loads do not write their data operations to the transaction log; they are not transactional in nature but are considered an atomic operation at this time. Information markers, however, are placed in the transaction log so the DBA is aware that a bulk operation did occur.

  • Upon completion of the load operation, a high water mark in each column file is moved in an atomic operation that allows for any subsequent queries to read the newly loaded data. This append operation provides for consistent read but does not incur the overhead of logging the data.

Bulk loading data from STDIN

Data can be loaded from STDIN into ColumnStore by simply not including the loadFile parameter

Example:

cpimport db1 table1

Bulk loading from AWS S3

Similarly the AWS cli utility can be utilized to read data from an s3 bucket and pipe the output into cpimport allowing direct loading from S3. This assumes the aws cli program has been installed and configured on the host:

Example:

aws s3 cp --quiet s3://dthompson-test/trades_bulk.csv - | cpimport test trades -s ","

For troubleshooting connectivity problems remove the --quiet option which suppresses client logging including permission errors.

Bulk loading output of SELECT FROM Table(s)

Standard in can also be used to directly pipe the output from an arbitrary SELECT statement into cpimport. The select statement may select from non-columnstore tables such as MyISAM or InnoDB. In the example below, the db2.source_table is selected from, using the -N flag to remove non-data formatting. The -q flag tells the mysql client to not cache results which will avoid possible timeouts causing the load to fail.

Example:

mariadb -q -e 'select * from source_table;' -N <source-db> | cpimport -s '\t' <target-db> <target-table>

Bulk loading from JSON

Let's create a sample ColumnStore table:

CREATE DATABASE `json_columnstore`;

USE `json_columnstore`;

CREATE TABLE `products` (
  `product_name` varchar(11) NOT NULL DEFAULT '',
  `supplier` varchar(128) NOT NULL DEFAULT '',
  `quantity` varchar(128) NOT NULL DEFAULT '',
  `unit_cost` varchar(128) NOT NULL DEFAULT ''
) ENGINE=Columnstore DEFAULT CHARSET=utf8;

Now let's create a sample products.json file like this:

[{
  "_id": {
    "$oid": "5968dd23fc13ae04d9000001"
  },
  "product_name": "Sildenafil Citrate",
  "supplier": "Wisozk Inc",
  "quantity": 261,
  "unit_cost": "$10.47"
}, {
  "_id": {
    "$oid": "5968dd23fc13ae04d9000002"
  },
  "product_name": "Mountain Juniperus Ashei",
  "supplier": "Keebler-Hilpert",
  "quantity": 292,
  "unit_cost": "$8.74"
}, {
  "_id": {
    "$oid": "5968dd23fc13ae04d9000003"
  },
  "product_name": "Dextromethorphan HBR",
  "supplier": "Schmitt-Weissnat",
  "quantity": 211,
  "unit_cost": "$20.53"
}]

We can then bulk load data from JSON into Columnstore by first piping the data to jq and then to cpimport using a one line command.

Example:

cat products.json | jq -r '.[] | [.product_name,.supplier,.quantity,.unit_cost] | @csv' | cpimport json_columnstore products -s ',' -E '"'

In this example, the JSON data is coming from a static JSON file but this same method will work for and output streamed from any datasource using JSON such as an API or NoSQL database. For more information on 'jq', please view the manual here here.

Bulk loading into multiple tables

There are two ways multiple tables can be loaded:

  1. Run multiple cpimport jobs simultaneously. Tables per import should be unique or PMs for each import should be unique if using mode 3.

  2. Use colxml utility : colxml creates an XML job file for your database schema before you can import data. Multiple tables may be imported by either importing all tables within a schema or listing specific tables using the -t option in colxml. Then, using cpimport, that uses the job file generated by colxml. Here is an example of how to use colxml and cpimport to import data into all the tables in a database schema

colxml mytest -j299
cpimport -m1 -j299

colxml syntax

Usage: colxml [options] dbName

Options: 
   -d Delimiter (default '|')
   -e Maximum allowable errors (per table)
   -h Print this message
   -j Job id (numeric)
   -l Load file name
   -n "name in quotes"
   -p Path for XML job description file that is generated
   -s "Description in quotes"
   -t Table name
   -u User
   -r Number of read buffers
   -c Application read buffer size (in bytes)
   -w I/O library buffer size (in bytes), used to read files
   -x Extension of file name (default ".tbl")
   -E EnclosedByChar (if data has enclosed values)
   -C EscapeChar
   -b Debug level (1-3)

Example usage of colxml

The following tables comprise a database name ‘tpch2’:

MariaDB[tpch2]> show tables;
+---------------+
| Tables_in_tpch2 |
+--------------+
| customer    |
| lineitem    |
| nation      |
| orders      |
| part        |
| partsupp    |
| region      |
| supplier    |
+--------------+
8 rows in set (0.00 sec)
  1. First, put delimited input data file for each table in /usr/local/mariadb/columnstore/data/bulk/data/import. Each file should be named .tbl.

  2. Run colxml for the load job for the ‘tpch2’ database as shown here:

/usr/local/mariadb/columnstore/bin/colxml tpch2 -j500
Running colxml with the following parameters:
2015-10-07 15:14:20 (9481) INFO :
Schema: tpch2
Tables:
Load Files:
-b 0
-c 1048576
-d |
-e 10
-j 500
-n
-p /usr/local/mariadb/columnstore/data/bulk/job/
-r 5
-s
-u
-w 10485760
-x tbl
File completed for tables:
tpch2.customer
tpch2.lineitem
tpch2.nation
tpch2.orders
tpch2.part
tpch2.partsupp
tpch2.region
tpch2.supplier
Normal exit.

Now actually run cpimport to use the job file generated by the colxml execution

/usr/local/mariadb/columnstore/bin/cpimport -j 500
Bulkload root directory : /usr/local/mariadb/columnstore/data/bulk
job description file : Job_500.xml
2015-10-07 15:14:59 (9952) INFO : successfully load job file /usr/local/mariadb/columnstore/data/bulk/job/Job_500.xml
2015-10-07 15:14:59 (9952) INFO : PreProcessing check starts
2015-10-07 15:15:04 (9952) INFO : PreProcessing check completed
2015-10-07 15:15:04 (9952) INFO : preProcess completed, total run time : 5 seconds
2015-10-07 15:15:04 (9952) INFO : No of Read Threads Spawned = 1
2015-10-07 15:15:04 (9952) INFO : No of Parse Threads Spawned = 3
2015-10-07 15:15:06 (9952) INFO : For table tpch2.customer: 150000 rows processed and 150000 rows inserted.
2015-10-07 15:16:12 (9952) INFO : For table tpch2.nation: 25 rows processed and 25 rows inserted.
2015-10-07 15:16:12 (9952) INFO : For table tpch2.lineitem: 6001215 rows processed and 6001215 rows inserted.
2015-10-07 15:16:31 (9952) INFO : For table tpch2.orders: 1500000 rows processed and 1500000 rows inserted.
2015-10-07 15:16:33 (9952) INFO : For table tpch2.part: 200000 rows processed and 200000 rows inserted.
2015-10-07 15:16:44 (9952) INFO : For table tpch2.partsupp: 800000 rows processed and 800000 rows inserted.
2015-10-07 15:16:44 (9952) INFO : For table tpch2.region: 5 rows processed and 5 rows inserted.
2015-10-07 15:16:45 (9952) INFO : For table tpch2.supplier: 10000 rows processed and 10000 rows inserted.

Handling Differences in Column Order and Values

If there are some differences between the input file and table definition then the colxml utility can be utilized to handle these cases:

  • Different order of columns in the input file from table order

  • Input file column values to be skipped / ignored.

  • Target table columns to be defaulted.

In this case run the colxml utility (the -t argument can be useful for producing a job file for one table if preferred) to produce the job xml file and then use this a template for editing and then subsequently use that job file for running cpimport.

Consider the following simple table example:

create table emp (
emp_id int, 
 dept_id int,
name varchar(30), 
salary int, 
hire_date date) engine=columnstore;

This would produce a colxml file with the following table element:

<Table tblName="test.emp" 
      loadName="emp.tbl" maxErrRow="10">
   <Column colName="emp_id"/>
   <Column colName="dept_id"/>
   <Column colName="name"/>
   <Column colName="salary"/>
   <Column colName="hire_date"/>
 </Table>

If your input file had the data such that hire_date comes before salary then the following modification will allow correct loading of that data to the original table definition (note the last 2 Column elements are swapped):

<Table tblName="test.emp" 
      loadName="emp.tbl" maxErrRow="10">
   <Column colName="emp_id"/>
   <Column colName="dept_id"/>
   <Column colName="name"/>
   <Column colName="hire_date"/>
   <Column colName="salary"/>
 </Table>

The following example would ignore the last entry in the file and default salary to it's default value (in this case null):

<Table tblName="test.emp"        
           loadName="emp.tbl" maxErrRow="10">
      <Column colName="emp_id"/>
      <Column colName="dept_id"/>
      <Column colName="name"/>
      <Column colName="hire_date"/>
      <IgnoreField/>
      <DefaultColumn colName="salary"/>
    </Table>
  • IgnoreFields instructs cpimport to ignore and skip the particular value at that position in the file.

  • DefaultColumn instructs cpimport to default the current table column and not move the column pointer forward to the next delimiter.

Both instructions can be used indepedently and as many times as makes sense for your data and table definition.

Binary Source Import

It is possible to import using a binary file instead of a CSV file using fixed length rows in binary data. This can be done using the '-I' flag which has two modes:

  • -I1 - binary mode with NULLs accepted Numeric fields containing NULL will be treated as NULL unless the column has a default value

  • -I2 - binary mode with NULLs saturated NULLs in numeric fields will be saturated

Example
cpimport -I1 mytest mytable /home/mydata/mytable.bin

The following table shows how to represent the data in the binary format:

Datatype
Description

Datatype

Description

INT/TINYINT/SMALLINT/BIGINT

Little-endian format for the numeric data

FLOAT/DOUBLE

IEEE format native to the computer

CHAR/VARCHAR

Data padded with '\0' for the length of the field. An entry that is all '\0' is treated as NULL

DATE

Using the Date struct below

DATETIME

Using the DateTime struct below

DECIMAL

Stored using an integer representation of the DECIMAL without the decimal point. With precision/width of 2 or less 2 bytes should be used, 3-4 should use 3 bytes, 4-9 should use 4 bytes and 10+ should use 8 bytes

For NULL values the following table should be used:

Datatype
Signed NULL
Unsigned NULL

Datatype

Signed NULL

Unsigned NULL

BIGINT

0x8000000000000000ULL

0xFFFFFFFFFFFFFFFEULL

INT

0x80000000

0xFFFFFFFE

SMALLINT

0x8000

0xFFFE

TINYINT

0x80

0xFE

DECIMAL

As equiv. INT

As equiv. INT

FLOAT

0xFFAAAAAA

N/A

DOUBLE

0xFFFAAAAAAAAAAAAAULL

N/A

DATE

0xFFFFFFFE

N/A

DATETIME

0xFFFFFFFFFFFFFFFEULL

N/A

CHAR/VARCHAR

Fill with '\0'

N/A

Date Struct

struct Date
{
  unsigned spare : 6;
  unsigned day : 6;
  unsigned month : 4;
  unsigned year : 16
};

The spare bits in the Date struct "must" be set to 0x3E.

DateTime Struct

struct DateTime
{
  unsigned msecond : 20;
  unsigned second : 6;
  unsigned minute : 6;
  unsigned hour : 6;
  unsigned day : 6;
  unsigned month : 4;
  unsigned year : 16
};

Working Folders & Logging

As of version 1.4, cpimport uses the /var/lib/columnstore/bulk folder for all work being done. This folder contains:

  1. Logs

  2. Rollback info

  3. Job info

  4. A staging folder

The log folder typically contains:

-rw-r--r--. 1 root  root        0 Dec 29 06:41 cpimport_1229064143_21779.err
-rw-r--r--. 1 root  root     1146 Dec 29 06:42 cpimport_1229064143_21779.log

A typical log might look like this:

2020-12-29 06:41:44 (21779) INFO : Running distributed import (mode 1) on all PMs...
2020-12-29 06:41:44 (21779) INFO2 : /usr/bin/cpimport.bin -s , -E " -R /tmp/columnstore_tmp_files/BrmRpt112906414421779.rpt -m 1 -P pm1-21779 -T SYSTEM -u388952c1-4ab8-46d6-9857-c44827b1c3b9 bts flights
2020-12-29 06:41:58 (21779) INFO2 : Received a BRM-Report from 1
2020-12-29 06:41:58 (21779) INFO2 : Received a Cpimport Pass from PM1
2020-12-29 06:42:03 (21779) INFO2 : Received a BRM-Report from 2
2020-12-29 06:42:03 (21779) INFO2 : Received a Cpimport Pass from PM2
2020-12-29 06:42:03 (21779) INFO2 : Received a BRM-Report from 3
2020-12-29 06:42:03 (21779) INFO2 : BRM updated successfully
2020-12-29 06:42:03 (21779) INFO2 : Received a Cpimport Pass from PM3
2020-12-29 06:42:04 (21779) INFO2 : Released Table Lock
2020-12-29 06:42:04 (21779) INFO2 : Cleanup succeed on all PMs
2020-12-29 06:42:04 (21779) INFO : For table bts.flights: 374573 rows processed and 374573 rows inserted.
2020-12-29 06:42:04 (21779) INFO : Bulk load completed, total run time : 20.3052 seconds
2020-12-29 06:42:04 (21779) INFO2 : Shutdown of all child threads Finished!!

Prior to version 1.4, this folder was located at /usr/local/mariadb/columnstore/bulk.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Streaming Data Adapters

The ColumnStore Bulk Data API enables the creation of higher performance adapters for ETL integration and data ingestions. The Streaming Data Adapters are out of box adapters using these API for specific data sources and use cases.

  • MaxScale CDC Data Adapter is integration of the MaxScale CDC streams into MariaDB ColumnStore.

  • Kafka Data Adapter is integration of the Kafka streams into MariaDB ColumnStore.

MaxScale CDC Data Adapter

The MaxScale CDC Data Adapter has been deprecated.

The MaxScale CDC Data Adapter allows to stream change data events(binary log events) from MariaDB Master hosting non-columnstore engines(InnoDB, MyRocks, MyISAM) to MariaDB ColumnStore. In another words replicate data from MariaDB Master to MariaDB ColumnStore. It acts as a CDC Client for MaxScale and uses the events received from MaxScale as input to MariaDB ColumnStore Bulk Data API to push the data to MariaDB ColumnStore. It registers with MariaDB MaxScale as a CDC Client using the MaxScale CDC Connector API, receiving change data records from MariaDB MaxScale (that are converted from binlog events received from the Master on MariaDB TX) in a JSON format. Then, using the MariaDB ColumnStore bulk write SDK, converts the JSON data into API calls and streams it to a MariaDB PM node. The adapter has options to insert all the events in the same schema as the source database table or insert each event with metadata as well as table data. The event meta data includes the event timestamp, the GTID, event sequence and event type (insert, update, delete).

Installation

Pre-requisite:

  • Download and install MaxScale CDC Connector API from connector

  • Download and install MariaDB ColumnStore bulk write SDK from columnstore-bulk-write-sdk.md

CentOS 7

sudo yum -y install epel-release
sudo yum -y install <data adapter>.rpm

Debian 9/Ubuntu Xenial:

sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f install

Debian 8:

sudo echo "deb http://httpredir.debian.org/debian jessie-backports main contrib non-free" >> /etc/apt/sources.list
sudo apt-get update
sudo dpkg -i <data adapter>.deb
sudo apt-get -f install

Usage

Usage: mxs_adapter [OPTION]... DATABASE TABLE

 -f FILE      TSV file with database and table names to stream (must be in `database TAB table NEWLINE` format)
  -h HOST      MaxScale host (default: 127.0.0.1)
  -P PORT      Port number where the CDC service listens (default: 4001)
  -u USER      Username for the MaxScale CDC service (default: admin)
  -p PASSWORD  Password of the user (default: mariadb)
  -c CONFIG    Path to the Columnstore.xml file (default: '/usr/local/mariadb/columnstore/etc/Columnstore.xml')
  -a           Automatically create tables on ColumnStore
  -z           Transform CDC data stream from historical data to current data (implies -n)
  -s           Directory used to store the state files (default: '/var/lib/mxs_adapter')
  -r ROWS      Number of events to group for one bulk load (default: 1)
  -t TIME      Connection timeout (default: 10)
  -n           Disable metadata generation (timestamp, GTID, event type)
  -i TIME      Flush data every TIME seconds (default: 5)
  -l FILE      Log output to FILE instead of stdout
  -v           Print version and exit
  -d           Enable verbose debug output

Streaming Multiple Tables

To stream multiple tables, use the -f parameter to define a path to a TSV formatted file. The file must have one database and one table name per line. The database and table must be separated by a TAB character and the line must be terminated in a newline \n.

Here is an example file with two tables, t1 and t2 both in the test database.

test	t1
test	t2

Automated Table Creation on ColumnStore

You can have the adapter automatically create the tables on the ColumnStore instance with the -a option. In this case, the user used for cross-engine queries will be used to create the table (the values in Columnstore.CrossEngineSupport). This user will require CREATE privileges on all streamed databases and tables.

Data Transformation Mode

The -z option enables the data transformation mode. In this mode, the data is converted from historical, append-only data to the current version of the data. In practice, this replicates changes from a MariaDB master server to ColumnStore via the MaxScale CDC.

Note: This mode is not as fast as the append-only mode and might not be suitable for heavy workloads. This is due to the fact that the data transformation is done via various DML statements.

Quick Start

Download and install both MaxScale and ColumnStore.

Copy the Columnstore.xml file from /usr/local/mariadb/columnstore/etc/Columnstore.xml from one of the ColumnStore UM or PMnodese to the server where the adapter is installed.

Configure MaxScale according to the CDC tutorial.

Create a CDC user by executing the following MaxAdmin command on the MaxScale server. Replace the <service> with the name of the avrorouter service and <user> and <password> with the credentials that are to be created.

maxadmin call command cdc add_user <service> <user> <password>

Then we can start the adapter by executing the following command.

mxs_adapter -u <user> -p <password> -h <host> -P <port> -c <path to Columnstore.xml> <database><table>

The <database> and <table> define the table that is streamed to ColumnStore. This table should exist on the master server where MaxScale is reading events from. If the table is not created on ColumnStore, the adapter will print instructions on how to define it in the correct way.

The <user> and <password> are the users created for the CDC user, <host> is the MaxScale address and <port> is the port where the CDC service listener is listening.

The -c flag is optional if you are running the adapter on the server where ColumnStore is located.

Kafka to ColumnStore Adapter

The Kafka data adapter streams all messages published to Apache Kafka topics in Avro format to MariaDB AX automatically and continuously - enabling data from many sources to be streamed and collected for analysis without complex code. The Kafka adapter is built using librdkafka and the MariaDB ColumnStore bulk write SDK

A tutorial for the Kafka adapter for ingesting Avro formatted data can be found in the kafka-to-columnstore-data-adapter document.

ColumnStore - Pentaho Data Integration - Data Adapter

Starting with MariaDB ColumnStore 1.1.4, a data adapter for Pentaho Data Integration (PDI) / Kettle is available to import data directly into ColumnStore’s WriteEngine. It is built on MariaDB’s rapid-paced Bulk Write SDK.

PDI Plugin Block info graphic

Compatibility notice

The plugin was designed for the following software composition:

  • Operating system: Windows 10 / Ubuntu 16.04 / RHEL/CentOS 7+

  • MariaDB ColumnStore >= 1.1.4

  • MariaDB Java Database client* >= 2.2.1

  • Java >= 8

  • Pentaho Data Integration >= 7

+not officially supported by Pentaho.

*Only needed if you want to execute DDL.

Installation

The following steps are necessary to install the ColumnStore Data adapter (bulk loader plugin):

  1. Build the plugin from source or download it from our website

  2. Extract the archive mariadb-columnstore-kettle-bulk-exporter-plugin-*.zip into your PDI installation directory $PDI-INSTALLATION/plugins.

  3. Copy MariaDB's JDBC Client mariadb-java-client-2.2.x.jar into PDI's lib directory $PDI-INSTALLATION/lib.

  4. Install the additional library dependencies

Ubuntu dependencies

sudo apt-get install libuv1 libxml2 libsnappy1v5

CentOS dependencies

sudo yum install epel-release
sudo yum install libuv libxml2 snappy

Windows 10 dependencies

On Windows the installation of the Visual Studio 2015/2017 C++ Redistributable (x64) is required.

Configuration

Each MariaDB ColumnStore Bulk Loader block needs to be configured. On the one hand, it needs to know how to connect to the underlying Bulk Write SDK to inject data into ColumnStore, and on the other hand, it needs to have a proper JDBC connection to execute DDL.

Both configurations can be set in each block’s settings tab.

PDI Plugin Block settings info graphic

The database connection configuration follows PDI’s default schema.

By default the plugin tries to use ColumnStore's default configuration /usr/local/mariadb/columnstore/etc/Columnstore.xml to connect to the ColumnStore instance through the Bulk Write SDK. In addition, individual paths or variables can be used too.

Information on how to prepare the Columnstore.xml configuration file can be found here.

Usage

PDI Plugin Block mapping info graphic

Once a block is configured and all inputs are connected in PDI, the inputs have to be mapped to ColumnStore’s table format.

One can either choose “Map all inputs”, which sets target columns of adequate type, or choose a custom mapping based on the structure of the existing table.

The SQL button can be used to generate DDL based on the defined mapping and to execute it.

Limitations

This plugin is a beta release.

In addition, it can't handle blob data types and only supports multiple inputs to one block if the input field names are equal for all input sources.

This page is licensed: CC BY-SA / Gnu FDL

Data Import

Learn how to import data into MariaDB ColumnStore. This section covers various methods and tools for efficiently loading large datasets into your columnar database for analytical workloads.

Overview

MariaDB Enterprise ColumnStore supports very efficient bulk data loads.

MariaDB Enterprise ColumnStore performs bulk data loads very efficiently using a variety of mechanisms, including the cpimport tool, specialized handling of certain SQL statements, and minimal locking during data import.

cpimport

MariaDB Enterprise ColumnStore includes a bulk data loading tool called cpimport, which provides several benefits:

  • Bypasses the SQL layer to decrease overhead

  • Does not block read queries

  • Requires a write metadata lock on the table, which can be monitored with the METADATA_LOCK_INFO plugin

  • Appends the new data to the table. While the bulk load is in progress, the newly appended data is temporarily hidden from queries. After the bulk load is complete, the newly appended data is visible to queries.

  • Inserts each row in the order the rows are read from the source file. Users can optimize data loads for Enterprise ColumnStore's automatic partitioning by loading presorted data files. For additional information, see "Load Ordered Data in Proper Order".

  • Supports parallel distributed bulk loads

  • Imports data from text files

  • Imports data from binary files

  • Imports data from standard input (stdin)

Batch Insert Mode

MariaDB Enterprise ColumnStore enables batch insert mode by default.

When batch insert mode is enabled, MariaDB Enterprise ColumnStore has special handling for the following statements:

  • [[|load-data-infileLOAD DATA [ LOCAL ] INFILE]]

  • INSERT INTO .. SELECT FROM ..

Enterprise ColumnStore uses the following rules:

  • If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. It executes cpimport using a wrapper called cpimport.bin.

  • If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.

Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable to OFF. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.

Locking

MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

When a bulk data load is running:

  • Read queries will not be blocked.

  • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

  • The write metadata lock (MDL) can be monitored with the METADATA_LOCK_INFO plugin.

Choose a Data Load Method

Performance
Method
Interface
Format(s)
Location(s)
Benefits

Performance

Method

Interface

Format(s)

Location(s)

Benefits

Fastest

cpimport

Shell

• Text file. • Binary file • Standard input (stdin)

• Server file system

Lowest latency. • Bypasses SQL layer. • Non-blocking

Fast

columnstore_info.load_from_s3

SQL

• Text file.

• S3-compatible object storage

• Loads data from the cloud. • Translates operation to cpimport command. • Non-blocking

Fast

LOAD DATA [ LOCAL ] INFILE

SQL

• Text file.

• Server file system • Client file system

• Translates operation to cpimport command. • Non-blocking

Slow

INSERT .. SELECT

SQL

• Other table(s).

• Same MariaDB server

• Translates operation to cpimport command. • Non-blocking

Data Loading with cpimport

Overview

MariaDB Enterprise ColumnStore includes a bulk data loading tool called cpimport, which bypasses the SQL layer to decrease the overhead of bulk data loading.

Refer to, the cpimport modes for additional information and ColumnStore Bulk Data Loading.

The cpimport tool:

  • Bypasses the SQL layer to decrease overhead

  • Does not block read queries

  • Requires a write metadata lock on the table, which can be monitored with the METADATA_LOCK_INFO plugin

  • Appends the new data to the table. While the bulk load is in progress, the newly appended data is temporarily hidden from queries. After the bulk load is complete, the newly appended data is visible to queries.

  • Inserts each row in the order the rows are read from the source file. Users can optimize data loads for Enterprise ColumnStore's automatic partitioning by loading presorted data files.

  • Supports parallel distributed bulk loads

  • Imports data from text files

  • Imports data from binary files

  • Imports data from standard input (stdin)

Intended Use Cases

You can load data using the cpimport tool in the following cases:

  • You are loading data into a ColumnStore table from a text file stored on the primary node's file system.

  • You are loading data into a ColumnStore table from a binary file stored on the primary node's file system.

  • You are loading data into a ColumnStore table from the output of a command running on the primary node.

Locking

MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

When a bulk data load is running:

  • Read queries will not be blocked.

  • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

  • The write metadata lock (MDL) can be monitored with the METADATA_LOCK_INFO plugin.

Import the Schema

Before data can be imported into the tables, the schema must be created.

  1. Connect to the primary server using MariaDB Client:

$ mariadb --host 192.168.0.100 --port 5001 \
      --user db_user --password \
      --default-character-set=utf8

After the command is executed, it will prompt you for a password.

  1. For each database that you are importing, create the database with the CREATE DATABASE statement:

CREATE DATABASE inventory;
  1. For each table that you are importing, create the table with the CREATE TABLE statement:

CREATE TABLE inventory.products (
   product_name varchar(11) NOT NULL DEFAULT '',
   supplier varchar(128) NOT NULL DEFAULT '',
   quantity varchar(128) NOT NULL DEFAULT '',
   unit_cost varchar(128) NOT NULL DEFAULT ''
) ENGINE=Columnstore DEFAULT CHARSET=utf8;

Note: To get the best performance from Enterprise ColumnStore, make sure to follow Enterprise ColumnStore's best practices for schema design.

Appends Data

When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read. Appending data reduces the I/O requirements of bulk data loads, so that larger data sets can be loaded very efficiently.

While the bulk load is in progress, the newly appended data is temporarily hidden from queries.

After the bulk load is complete, the newly appended data is visible to queries.

Sort the Input File

When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read.

The order of data can have a significant effect on performance with Enterprise ColumnStore, so it can be helpful to sort the data in the input file prior to importing it.

For additional information, see "Load Ordered Data in Proper Order".

Confirm the Field Delimiter

Before importing a file into MariaDB Enterprise ColumnStore, confirm that the field delimiter is not present in the data.

The default field delimiter for the cpimport tool is a pipe (|).

To use a different delimiter, you can set the field delimiter.

Import from Text Files

The cpimport tool can import data from a text file if a file is provided as an argument after the database and table name.

For example, to import the file inventory-products.txt into the products table in the inventory database:

$ sudo cpimport \
   inventory products \
   inventory-products.txt

Import from Binary Files

The cpimport tool can import data from a binary file if the -I1 or -I2 option is provided and a file is provided as an argument after the database and table name.

For example, to import the file inventory-products.bin into the products table in the inventory database:

$ sudo cpimport -I1 \
   inventory products \
   inventory-products.bin

The -I1 and -I2 options allow two different binary import modes to be selected:

Option
Description

Option

Description

-I1

Numeric fields containing NULL will be treated as NULL unless the column has a default value

-I2

Numeric fields containing NULL will be saturated

The binary file should use the following format for data:

Data Type(s)
Format

Data Type(s)

Format

BIGINT

Little-endian integer format Signed NULL: 0x8000000000000000ULL Unsigned NULL: 0xFFFFFFFFFFFFFFFEULL

CHAR

String padded with '0' to match the length of the field NULL: '0' for the full length of the field

DATE

Use the format represented by the struct Date NULL: 0xFFFFFFFE

DATETIME

Use the format represented by the struct DateTime NULL: 0xFFFFFFFFFFFFFFFEULL

DECIMAL

Use an integer representation of the value without the decimal point Sizing depends on precision: * 1-2: use 2 bytes * 3-4: use 3 bytes * 4-9: use 4 bytes * 10+: use 8 bytes Signed and unsigned NULL: See equivalent-sized integer

DOUBLE

Native IEEE floating point format NULL: 0xFFFAAAAAAAAAAAAAULL

FLOAT

Native IEEE floating point format NULL: 0xFFAAAAAA

INT

Little-endian integer format Signed NULL: 0x80000000 Unsigned NULL: 0xFFFFFFFE

SMALLINT

Little-endian integer format Signed NULL: 0x8000 Unsigned NULL: 0xFFFE

TINYINT

Little-endian integer format Signed NULL: 0x80 Unsigned NULL: 0xFE

VARCHAR

String padded with '0' to match the length of the field NULL: '0' for the full length of the field

Binary DATE Format

In binary input files, the cpimport tool expects DATE columns to be in the following format:

struct Date
{
  unsigned spare : 6;
  unsigned day : 6;
  unsigned month : 4;
  unsigned year : 16
};

Binary DATETIME Format

In binary input files, the cpimport tool expects DATETIME columns to be in the following format:

struct DateTime
{
  unsigned msecond : 20;
  unsigned second : 6;
  unsigned minute : 6;
  unsigned hour : 6;
  unsigned day : 6;
  unsigned month : 4;
  unsigned year : 16
};

Import from Standard Input

The cpimport tool can import data from standard input (stdin) if no file is provided as an argument.

Importing from standard input is useful in many scenarios.

One scenario is when you want to import data from a remote database. You can use MariaDB Client to query the table using the SELECT statement, and then pipe the results into the standard input of the cpimport tool:

$ mariadb --quick \
   --skip-column-names \
   --execute="SELECT * FROM inventory.products" \
   | cpimport -s '\t' inventory products

Import from S3 using AWS CLI

The cpimport tool can import data from a file stored in a remote S3 bucket.

You can use the AWS CLI to copy the file from S3, and then pipe the contents into the standard input of the cpimport tool:

$ aws s3 cp --quiet s3://columnstore-test/inventory-products.csv - \
   | cpimport -s ',' inventory products

Alternatively, the columnstore_info.load_from_s3 stored procedure can import data from S3-compatible cloud object storage.

Set the Field Delimiter

The default field delimiter for the cpimport tool is a pipe (|).

If your data file uses a different field delimiter, you can specify the field delimiter with the -s option.

For a TSV (tab-separated values) file:

$ sudo cpimport -s '\t' \
   inventory products \
   inventory-products.tsv

For a CSV (comma-separated values) file:

$ sudo cpimport -s ',' \
   inventory products \
   inventory-products.csv

Set the Quoting Style

By default, the cpimport tool does not expect fields to be quoted.

If your data file uses quotes around fields, you can specify the quote character with the -E option.

To load a TSV (tab-separated values) file that uses double quotes:

$ sudo cpimport -s '\t' -E '"' \
   inventory products \
   inventory-products.tsv

To load a CSV (comma-separated values) file that uses optional single quotes:

$ sudo cpimport -s ',' -E "'" \
   inventory products \
   inventory-products.csv

Logging

The cpimport tool writes logs to different directories, depending on the Enterprise ColumnStore version:

  • In Enterprise ColumnStore 5.5.2 and later, logs are written to /var/log/mariadb/columnstore/bulk/

  • In Enterprise ColumnStore 5 releases before 5.5.2, logs are written to /var/lib/columnstore/data/bulk/

  • In Enterprise ColumnStore 1.4, logs are written to /usr/local/mariadb/columnstore/bulk/

Special Handling

Column Order

The cpimport tool requires column values to be in the same order in the input file as the columns in the table definition.

Date Format

The cpimport tool requires DATE values to be specified in the format YYYY-MM-DD.

Transaction Log

The cpimport tool does not write bulk data loads to the transaction log, so they are not transactional.

Binary Log

The cpimport tool does not write bulk data loads to the binary log, so they cannot be replicated using MariaDB replication.

EFS Storage

When Enterprise ColumnStore uses object storage and the Storage Manager directory uses EFS in the default Bursting Throughput mode, the cpimport tool can have performance problems if multiple data load operations are executed consecutively. The performance problems can occur because the Bursting Throughput mode scales the rate relative to the size of the file system, so the burst credits for a small Storage Manager volume can be fully consumed very quickly.

When this problem occurs, some solutions are:

  • Avoid using burst credits by using Provisioned Throughput mode instead of Bursting Throughput mode

  • Monitor burst credit balances in AWS and run data load operations when burst credits are available

  • Increase the burst credit balance by increasing the file system size (for example, by creating a dummy file)

Additional information is available here.

This page is: Copyright © 2025 MariaDB. All rights reserved.

Data Loading with INSERT .. SELECT

Overview

MariaDB Enterprise ColumnStore automatically translates INSERT INTO .. SELECT FROM .. statements into bulk data loads. By default, it translates the statement into a bulk data load that uses cpimport.bin, which is an internal wrapper around the cpimport tool.

Intended Use Cases

You can load data using INSERT INTO .. SELECT FROM .. in the following cases:

You are loading data into a ColumnStore table by querying one or more local tables.

Batch Insert Mode

MariaDB Enterprise ColumnStore enables batch insert mode by default.

When batch insert mode is enabled, MariaDB Enterprise ColumnStore has special handling for INSERT INTO .. SELECT FROM .. statements.

Enterprise ColumnStore uses the following rules:

  • If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Enterprise ColumnStore executes cpimport using a wrapper called cpimport.bin.

  • If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.

Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable to OFF. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.

Locking

MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

When a bulk data load is running:

  • Read queries will not be blocked.

  • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

  • The write metadata lock (MDL) can be monitored with the METADATA_LOCK_INFO plugin.

Import the Schema

Before data can be imported into the tables, the schema must be created.

  1. Connect to the primary server using MariaDB Client:

$ mariadb --host 192.168.0.100 --port 5001 \
      --user db_user --password \
      --default-character-set=utf8

After the command is executed, it will prompt you for a password.

  1. For each database that you are importing, create the database with the CREATE DATABASE statement:

CREATE DATABASE inventory;
  1. For each table that you are importing, create the table with the CREATE TABLE statement:

CREATE TABLE inventory.products (
   product_name varchar(11) NOT NULL DEFAULT '',
   supplier varchar(128) NOT NULL DEFAULT '',
   quantity varchar(128) NOT NULL DEFAULT '',
   unit_cost varchar(128) NOT NULL DEFAULT ''
) ENGINE=Columnstore DEFAULT CHARSET=utf8;

Note: To get the best performance from Enterprise ColumnStore, make sure to follow Enterprise ColumnStore's best practices for schema design.

Appends Data

When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read. Appending data reduces the I/O requirements of bulk data loads, so that larger data sets can be loaded very efficiently.

While the bulk load is in progress, the newly appended data is temporarily hidden from queries.

After the bulk load is complete, the newly appended data is visible to queries.

Sort the Query Results

When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read.

The order of data can have a significant effect on performance with Enterprise ColumnStore. If your data is not already sorted, it can be helpful to sort the query results using an ORDER BY clause.

For example:

# Perform import from other table
# and insert in sorted order
INSERT INTO inventory.orders
SELECT *
FROM innodb_inventory.orders
ORDER BY order_date;

For additional information, see "Load Ordered Data in Proper Order".

Confirm the Field Delimiter

Before importing a table's data into MariaDB Enterprise ColumnStore, confirm that the field delimiter is not present in the data.

The field delimiter is determined by the columnstore_import_for_batchinsert_delimiter system variable. By default, Enterprise ColumnStore sets the field delimiter to the ASCII value 7, which corresponds to the BEL character.

To use a different delimiter, you can set the field delimiter.

Set the Field Delimiter

When the data is passed to cpimport, each value is separated by a field delimiter. The field delimiter is determined by the columnstore_import_for_batchinsert_delimiter system variable.

By default, Enterprise ColumnStore sets the field delimiter to the ASCII value 7, which corresponds to the BEL character. In general, setting the field delimiter is only required if your data contains this value.

Set the field delimiter by setting the columnstore_import_for_batchinsert_delimiter system variable to the ASCII value for the desired delimiter character.

For example, if you want to use a comma (,) as the field delimiter, you would set columnstore_import_for_batchinsert_delimiter to 44:

# Set field delimiter
SET SESSION columnstore_import_for_batchinsert_delimiter=44;

# Perform import from other table
INSERT INTO inventory.products
SELECT *
FROM innodb_inventory.products;

Set the Quoting Style

When the data is passed to cpimport, each value is enclosed by a quote character. The quote character is determined by the columnstore_import_for_batchinsert_enclosed_by system variable.

By default, Enterprise ColumnStore sets the quote character to the ASCII value 17, which corresponds to the DC1 character. In general, setting the quote character is only required if your data contains this value.

Set the quote character by setting the columnstore_import_for_batchinsert_enclosed_by system variable to the ASCII value for the desired quote character.

For example, if you want to use a double quote (") as the quote character, you would set columnstore_import_for_batchinsert_enclosed_by to 34:

# Set quote character
SET SESSION columnstore_import_for_batchinsert_enclosed_by=34;

# Perform import from other table
INSERT INTO inventory.products
SELECT *
FROM innodb_inventory.products;

This page is: Copyright © 2025 MariaDB. All rights reserved.

Data Loading with LOAD DATA INFILE

Overview

MariaDB Enterprise ColumnStore automatically translates LOAD DATA [ LOCAL ] INFILE statements into bulk data loads. By default, it translates the statement into a bulk data load that uses cpimport.bin, which is an internal wrapper around the cpimport tool.

Intended Use Cases

You can load data using the LOAD DATA INFILE statement in the following cases:

  • You are loading data into a ColumnStore table from a text file stored on the primary node's file system.

  • You are loading data into a ColumnStore table from a text file stored on the client's file system. In this case, LOAD DATA LOCAL INFILE must be used.

Batch Insert Mode

ECStoreDataLoadingS3FlowChart

MariaDB Enterprise ColumnStore enables batch insert mode by default.

When batch insert mode is enabled, MariaDB Enterprise ColumnStore has special handling for LOAD DATA INFILE statements.

Enterprise ColumnStore uses the following rules:

  • If the statement is executed outside of a transaction, Enterprise ColumnStore loads the data using cpimport, which is a command-line utility that is designed to efficiently load data in bulk. Enterprise ColumnStore executes cpimport using a wrapper called cpimport.bin.

  • If the statement is executed inside of a transaction, Enterprise ColumnStore loads the data using the DML interface, which is slower.

Batch insert mode can be disabled by setting the columnstore_use_import_for_batchinsert system variable to OFF. When batch insert mode is disabled, Enterprise ColumnStore executes the statements using the DML interface, which is slower.

Insert Cache

Starting with MariaDB Enterprise ColumnStore 6, an insert cache can be enabled to speed up LOAD DATA INFILE statements.

The insert cache is disabled by default, but it can be enabled by configuring columnstore_cache_inserts=ON:

[mariadb]
...
columnstore_cache_inserts=ON\

The cache is flushed to ColumnStore in the following scenarios:

When the number of cached rows exceeds the value of columnstore_cache_flush_threshold

When a statement other than INSERT or LOAD DATA INFILE is executed

cpimport is used to flush the insert cache to ColumnStore when columnstore_cache_use_import=ON is configured.

Locking

MariaDB Enterprise ColumnStore requires a write metadata lock (MDL) on the table when a bulk data load is performed with cpimport.

When a bulk data load is running:

  • Read queries will not be blocked.

  • Write queries and concurrent bulk data loads on the same table will be blocked until the bulk data load operation is complete, and the write metadata lock on the table has been released.

  • The write metadata lock (MDL) can be monitored with the METADATA_LOCK_INFO plugin.

Import the Schema

Before data can be imported into the tables, the schema must be created.

  1. Connect to the primary server using MariaDB Client:

$ mariadb --host 192.168.0.100 --port 5001 \
      --user db_user --password \
      --default-character-set=utf8

After the command is executed, it will prompt you for a password.

  1. For each database that you are importing, create the database with the CREATE DATABASE statement:

CREATE DATABASE inventory;

For each table that you are importing, create the table with the CREATE TABLE statement:

CREATE TABLE inventory.products (
   product_name varchar(11) NOT NULL DEFAULT '',
   supplier varchar(128) NOT NULL DEFAULT '',
   quantity varchar(128) NOT NULL DEFAULT '',
   unit_cost varchar(128) NOT NULL DEFAULT ''
) ENGINE=Columnstore DEFAULT CHARSET=utf8;

Note: To get the best performance from Enterprise ColumnStore, make sure to follow Enterprise ColumnStore's best practices for schema design.

Appends Data

When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read. Appending data reduces the I/O requirements of bulk data loads, so that larger data sets can be loaded very efficiently.

While the bulk load is in progress, the newly appended data is temporarily hidden from queries.

After the bulk load is complete, the newly appended data is visible to queries.

Sort the Input File

When MariaDB Enterprise ColumnStore performs a bulk data load, it appends data to the table in the order in which the data is read.

The order of data can have a significant effect on performance with Enterprise ColumnStore, so it can be helpful to sort the data in the input file prior to importing it.

For additional information, see "Load Ordered Data in Proper Order".

Confirm the Field Delimiter

Before importing a file into MariaDB Enterprise ColumnStore, confirm that the field delimiter is not present in the data.

The field delimiter is determined by the columnstore_import_for_batchinsert_delimiter system variable. By default, Enterprise ColumnStore sets the field delimiter to the ASCII value 7, which corresponds to the BEL character.

To use another delimiter, you can set the field delimiter.

Load a Local Input File on the Client

If you are loading a file located on the client, you can use the LOAD DATA LOCAL INFILE statement. However, for this statement to work, the client must explicitly enable usage of a local infile.

If you are using MariaDB Client the --local-infile option can be used:

$ mariadb --host 192.168.0.1 \
      --user db_user --password \
      --default-character-set=utf8 \
      --local-infile

If you are using MariaDB Connector/C, the MYSQL_OPT_LOCAL_INFILE option can be set with the mysql_optionsv() function:

/* enable local infile */
unsigned int enable_local_infile = 1;
mysql_optionsv(mysql, MYSQL_OPT_LOCAL_INFILE, (void *) &enable_local_infile);

If you are using MariaDB Connector/J, the allowLocalInfile parameter can be set for the connection:

Connection connection = DriverManager.getConnection("jdbc:mariadb://192.168.0.1/test?user=test_user&password=myPassword&allowLocalInfile=true");

If you are using MariaDB Connector/Node.js, the permitLocalInfile parameter can be set for the connection:

mariadb.createConnection({
   host: '192.168.0.1',
   user:'test_user',
   password: 'myPassword',
   permitLocalInfile: 'true'
 });

If you are using MariaDB Connector/Python, the local_infile parameter can be set for the connection:

conn = mariadb.connect(
   user="test_user",
   password="myPassword",
   host="192.168.0.1",
   port=3306,
   local_infile=true)
Set the Field Delimiter

The default field delimiter for the LOAD DATA INFILE statement is a tab.

If your data file uses a different field delimiter, you can specify the field delimiter with the FIELDS TERMINATED BY clause.

To load a TSV (tab-separated values) file:

LOAD DATA INFILE 'inventory-products.tsv'
INTO TABLE inventory.products;
To load a CSV (comma-separated values) file:
LOAD DATA LOCAL INFILE 'inventory-products.csv'
INTO TABLE accounts.contacts
FIELDS TERMINATED BY ',';

Set the Quoting Style

By default, the LOAD DATA INFILE statement does not expect fields to be quoted.

If your data file uses quotes around fields, you can specify the quote character with the FIELDS [OPTIONALLY] ENCLOSED BY clause.

To load a TSV (tab-separated values) file that uses double quotes:

LOAD DATA INFILE 'inventory-products.tsv'
INTO TABLE inventory.products
FIELDS ENCLOSED BY '"';
To load a CSV (comma-separated values) file that uses optional single quotes:
LOAD DATA LOCAL INFILE 'inventory-products.csv'
INTO TABLE accounts.contacts
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\'';

Special Handling

ON DUPLICATE KEY

MariaDB Enterprise ColumnStore ignores the ON DUPLICATE KEY clause.

Please ensure that duplicate data is removed prior to performing a bulk data load.

This page is: Copyright © 2025 MariaDB. All rights reserved.

Data Loading with load_from_s3

Overview

MariaDB Enterprise ColumnStore includes a stored procedure called columnstore_info.load_from_s3, which can load data from a plain-text file containing delimiter-separated values (such as CSV or TSV) stored on S3-compatible cloud object storage.

Compatibility

MariaDB Enterprise ColumnStore 23.02

System Variables

Before you import data with the columnstore_info.load_from_s3 stored procedure, the authentication credentials and the region can be configured using system variables:

  • columnstore_s3_key

  • columnstore_s3_secret

  • columnstore_s3_region

For example, the following statements show how to set the system variables for your current session:

SET columnstore_s3_key='S3_KEY';
SET columnstore_s3_secret='S3_SECRET';
SET columnstore_s3_region='S3_REGION';

Import Data

To import data with the columnstore_info.load_from_s3 stored procedure, use the CALL statement:

CALL columnstore_info.load_from_s3('BUCKET_URL',
      'FILE_NAME',
      'DATABASE_NAME',
      'TABLE_NAME',
      'TERMINATED_BY',
      'ENCLOSED_BY',
      'ESCAPED_BY');
  • Replace 'BUCKET_URL' with the URL of your bucket. The protocol in the URL must be s3: for AWS S3 or gs: for Google Cloud Storage

  • Replace 'FILE_NAME' with the file name to load from. The file must be a plain-text file containing delimiter-separated values, such as a comma-separated values (CSV) or tab-separated values (TSV) file. The supported file format is similar to the plain-text file formats supported by cpimport and LOAD DATA [LOCAL] INFILE. Please note that this stored procedure can't load dump files created by mariadb-dump

  • Replace 'DATABASE_NAME' with the database to import into

  • Replace 'TABLE_NAME' with the table name to import into

  • Replace 'TERMINATED_BY' with the field terminator used in the file, similar to the -s command-line option for cpimport

  • Replace 'ENCLOSED_BY' with the quotes used in the file, similar to the -E command-line option for cpimport

  • Replace 'ESCAPED_BY' with the escape character used in the file, similar to the -C command-line option for cpimport

All parameters are mandatory.

For example, to load a comma-separated values (CSV) file from AWS S3:

CALL columnstore_info.load_from_s3('s3://mariadb-columnstore-test-data/',
   'test-data-db1-tab1.csv',
   'db1',
   'tab1',
   ',',
   '"',
   '\\');

When the stored procedure completes, it returns JSON containing the status of the operation. If the JSON shows an error or "success": false, check your table to see if some or all of your data was loaded, because many errors are non-fatal.

Permissions

When the data file is stored in Amazon S3, the AWS user only requires the s3:GetObject action on the bucket.

For example, the AWS user can use a user policy like the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "readBucket",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}

This page is: Copyright © 2025 MariaDB. All rights reserved.

Tutorials

MariaDB ColumnStore tutorials show how to set up, load data, and run fast SQL queries using columnar storage, often with Docker.

Reference

The MariaDB ColumnStore Reference is a key guide to its architecture, setup, SQL, and analytics use.

ColumnStore Commit

The COMMIT statement makes changes to a table permanent. You should only commit changes after you have verified the integrity of the changed data. Once data is committed, it cannot be undone with the ROLLBACK statement. To return the database to its former state, you must restore the data from backups.

images here

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Drop Procedure

The DROP PROCEDURE statement deletes a stored procedure from ColumnStore.

images here

The following statement drops the sp_complex_variable procedure:

DROP PROCEDURE sp_complex_variable;

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Rename Table

The RENAME TABLE statement renames one or more ColumnStore tables.

images here

Notes:

  • You cannot currently use RENAME TABLE to move a table from one database to another.

  • See the ALTER TABLE syntax for an alternate to RENAME table. The following statement renames the orders table:

RENAME TABLE orders TO customer_order;

The following statement renames both the orders table and customer table:

RENAME TABLE orders TO customer_orders,customer TO customers;

You may also use RENAME TABLE to swap tables. This example swaps the customer and vendor tables (assuming the temp_table does not already exist):

RENAME TABLE customer TO temp_table, vendor TO customer,temp_table to vendor;

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Rollback

The ROLLBACK statement undoes transactions that have not been permanently saved to the database with the COMMIT statement.You cannot rollback changes to table properties including ALTER, CREATE, or DROP TABLE statements.

images here

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Compression Mode

MariaDB ColumnStore has the ability to compress data and this is controlled through a compression mode. This compression mode may be set as a default for the instance or set at the session level.

To set the compression mode at the session level, the following command is used. Once the session has ended, any subsequent session will return to the default for the instance.

set infinidb_compression_type = n

where n is:

    1. compression is turned off. Any subsequent table create statements run will have compression turned off for that table unless any statement overrides have been performed. Any alter statements run to add a column will have compression turned off for that column unless any statement override has been performed.

    1. compression is turned on. Any subsequent table create statements run will have compression turned on for that table unless any statement overrides have been performed. Any alter statements run to add a column will have compression turned on for that column unless any statement override has been performed. ColumnStore uses snappy compression in this mode.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Conditions

  1. filter "filter"

  2. String comparisons "String comparisons"

  3. Pattern matching "Pattern matching"

  4. OR processing "OR processing"

  5. table filter "table filter"

  6. join "join"

A condition is a combination of expressions and operators that return TRUE, FALSE or NULL.The following syntax shows the conditions that can be used to return a TRUE, FALSE,or NULL condition.

filter

filter:
column| literal| function [=|!=|<>|<|<=|>=|>] column| literal| function | select_statement
column| function [NOT] IN (select_statement | literal, literal,...)
column| function [NOT] BETWEEN (select_statement | literal, literal,...)
column| function  IS [NOT] NULL
string_column|string_function [NOT] LIKE pattern
EXISTS (select_statement)

NOT (filter)
(filter|function) [AND|OR] (filter|function)

Note: A ‘literal’ may be a constant (e.g. 3) or an expression that evaluates to a constant [e.g. 100 - (27 * 3)]. For date columns, you may use the SQL ‘interval’ syntax to perform date arithmetic, as long as all the components of the expression are constants (e.g. ‘1998-12-01’ - interval ‘1’ year)

String comparisons

ColumnStore, unlike the MyISAM engine, is case sensitive for string comparisons used in filters. For the most accurate results, and to avoid confusing results, make sure string filter constants are no longer than the column width itself.

Pattern matching

Pattern matching as described with the LIKE condition allows you to use “” to match any single character and “%” to match an arbitrary number of characters (including zero characters). To test for literal instances of a wildcard character, (“%” or “”),precede it by the “\” character.

OR processing

OR Processing has the following restrictions:

  • Only column comparisons against a literal are allowed in conjunction with an OR. The following query would be allowed since all comparisons are against literals. SELECT count(*) from lineitem WHERE l_partkey < 100 OR l_linestatus =‘F‘;

  • ColumnStore binds AND’s more tightly than OR’s, just like any other SQLparser. Therefore you must enclose OR-relations in parentheses, just like in any other SQL parser.

SELECT count(*) FROM orders, lineitem 
  WHERE (lineitem.l_orderkey < 100 OR lineitem.l_linenumber > 10) 
    AND lineitem.l_orderkey =orders.o_orderkey;

table filter

The following syntax show the conditions you can use when executing a condition against two columns. Note that the columns must be from the same table.

col_name_1 [=|!=|<>|<|<=|>=|>] col_name_2

join

The following syntax show the conditions you can use when executing a join on two tables.

join_condition [AND join_condition]
join_condition:
           [col_name_1|function_name_1] = [col_name_2|function_name_2]

Notes:

  • ColumnStore tables can only be joined with non-ColumnStore tables in table mode only. See Operating Mode for information.

  • ColumnStore will require a join in the WHERE clause for each set of tables in the FROM clause. No cartesian product queries will be allowed.

  • ColumnStore requires that joins must be on the same datatype. In addition, numeric datatypes (INT variations, NUMERIC, DECIMAL) may be mixed in the join if they have the same scale.

  • Circular joins are not supported in ColumnStore. See the Troubleshooting section

  • When the join memory limit is exceeded, a disk-based join will be used for processing if this option has been enabled.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Data Types

ColumnStore supports the following data types:

Numeric Data Types

Datatypes
Column Size
Description

Datatypes

Column Size

Description

BOOLEAN

1-byte

A synonym for "TINYINT(1)". Supported from version 1.2.0 onwards.

TINYINT

1-byte

A very small integer. Numeric value with scale 0. Signed: -126 to +127. Unsigned: 0 to 253.

SMALLINT

2-bytes

A small integer. Signed: -32,766 to 32,767. Unsigned: 0 to 65,533.

MEDIUMINT

3-bytes

A medium integer. Signed: -8388608 to 8388607. Unsigned: 0 to 16777215. Supported starting with MariaDB ColumnStore 1.4.2.

INTEGER/INT

4-bytes

A normal-size integer. Numeric value with scale 0. Signed: -2,147,483,646 to 2,147,483,647. Unsigned: 0 to 4,294,967,293

BIGINT

8-bytes

A large integer. Numeric value with scale 0. Signed: -9,223,372,036,854,775,806 to+9,223,372,036,854,775,807 Unsigned: 0 to +18,446,744,073,709,551,613

DECIMAL/NUMERIC

2, 4, or 8 bytes

A packed fixed-point number that can have a specific total number of digits and with a set number of digits after a decimal. The maximum precision (total number of digits) that can be specified is 18.

FLOAT

4 bytes

Stored in 32-bit IEEE-754 floating point format. As such, the number of significant digits is about 6and the range of values is approximately +/- 1e38.The MySQL extension to specify precision and scale is not supported.

DOUBLE/REAL

8 bytes

Stored in 64-bit IEEE-754 floating point format. As such, the number of significant digits is about 15 and the range of values is approximately +/-1e308. The MySQL extension to specify precision and scale is not supported. “REAL” is a synonym for “DOUBLE”.

String Data Types

Datatypes
Column Size
Description

Datatypes

Column Size

Description

CHAR

1, 2, 4, or 8 bytes

Holds letters and special characters of fixed length. Max length is 255. Default and minimum size is 1 byte.

VARCHAR

1, 2, 4, or 8 bytes or 8-byte token

Holds letters, numbers, and special characters of variable length. Max length = 8000 bytes or characters and minimum length = 1 byte or character.

TINYTEXT

255 bytes

Holds a small amount of letters, numbers, and special characters of variable length. Supported from version 1.1.0 onwards.

TINYBLOB

255 bytes

Holds a small amount of binary data of variable length. Supported from version 1.1.0 onwards.

TEXT

64 KB

Holds letters, numbers, and special characters of variable length. Supported from version 1.1.0 onwards.

BLOB

64 KB

Holds binary data of variable length. Supported from version 1.1.0 onwards.

MEDIUMTEXT

16 MB

Holds a medium amount of letters, numbers, and special characters of variable length. Supported from version 1.1.0 onwards.

MEDIUMBLOB

16 MB

Holds a medium amount of binary data of variable length. Supported from version 1.1.0 onwards.

LONGTEXT

1.96 GB

Holds a large amount of letters, numbers, and special characters of variable length. Supported from version 1.1.0 onwards.

LONGBLOB

1.96 GB

Holds a large amount of binary data of variable length. Supported from version 1.1.0 onwards.

Date and Time Data Types

Datatypes
Column Size
Description

Datatypes

Column Size

Description

DATE

4-bytes

Date has year, month, and day. The internal representation of a date is a string of 4 bytes. The first 2 bytes represent the year, .5 bytes the month, and .75 bytes the day in the following format: YYYY-MM-DD. Supported range is 1000-01-01 to 9999-12-31.

DATETIME

8-bytes

A date and time combination. Supported range is 1000-01-01 00:00:00 to 9999-12-31 23:59:59. From version 1.2.0 microseconds are also supported.

TIME

8-bytes

Holds hour, minute, second and optionally microseconds for time. Supported range is '-838:59:59.999999' to '838:59:59.999999'. Supported from version 1.2.0 onwards.

TIMESTAMP

4-bytes

Values are stored as the number of seconds since 1970-01-01 00:00:00 UTC, and optionally microseconds. The max value is currently 2038-01-19 03:14:07 UTC. Supported starting with MariaDB ColumnStore 1.4.2.

Notes

  • ColumnStore treats a zero-length string as a NULL value.

  • As with core MariaDB, ColumnStore employs “saturation semantics” on integer values. This means that if a value is inserted into an integer field that is too big/small for it to hold (i.e. it is more negative or more positive than the values indicated above), ColumnStore will “saturate” that value to the min/max value indicated above as appropriate. For example, for a SMALLINT column, if 32800 is attempted, the actual value inserted will be 32767.

  • ColumnStore largest negative and positive numbers appears to be 2 less than what MariaDB supports. ColumnStore reserves these for its internal use and they cannot be used. For example, if there is a need to store -128 in a column, the TINYINT datatype cannot be used; the SMALLINT datatype must be used instead. If the value -128 is inserted into a TINYINT column, ColumnStore will saturate it to -126 (and issue a warning).

  • ColumnStore truncates rather than rounds decimal constants that have too many digits after the decimal point during bulk load and when running SELECT statements. For INSERT and UPDATE, however, the MariaDB parser will round such constants. You should verify that ETL tools used and any INSERT/UPDATEstatements only specify the correct number of decimal digits to avoid potential confusion.

  • An optional display width may be added to the BIGINT, INTEGER/INT, SMALLINT & TINYINT columns. As with core MariaDB tables, this value does not affect the internal storage requirements of the column nor does it affect the valid value ranges.

  • All columns in ColumnStore are nullable and the default value for any column is NULL. You may optionally specify NOT NULL for any column and/or one with a DEFAULT value.

  • Unlike other MariaDB storage engines, the actual storage limit for LONGBLOB/LONGTEXT is 2,100,000,000 bytes instead of 4GB per entry. MariaDB's client API is limited to a row length of 1GB.

  • Timestamp und current_timestamp still not supported. (MCOL-3694 / MCOL-1039)

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Decimal Math and Scale

  1. Enable/Disable decimal to double math "Enable/Disable decimal to double math"

  2. ColumnStore decimal scale "ColumnStore decimal scale"

  3. Enable/disable decimal scale "Enable/disable decimal scale"

  4. Set decimal scale level "Set decimal scale level"

MariaDB ColumnStore has the ability to change intermediate decimal mathematical results from decimal type to double. The decimal type has approximately 17-18 digits of precision but a smaller maximum range. Whereas the double type has approximately 15-16 digits of precision, but a much larger maximum range. In typical mathematical and scientific applications, the ability to avoid overflow in intermediate results with double math is likely more beneficial than the additional two digits of precision. In banking applications, however, it may be more appropriate to leave in the default decimal setting to ensure accuracy to the least significant digit.

Enable/Disable decimal to double math

The columnstore_double_for_decimal_math variable is used to control the data type for intermediate decimal results. This decimal for double math may be set as a default for the instance, set at the session level, or at the statement level by toggling this variable on and off.

To enable/disable the use of the decimal to double math at the session level, the following command is used. Once the session has ended, any subsequent session will return to the default for the instance.

set columnstore_double_for_decimal_math

where n is:

  • off (disabled, default)

  • on (enabled)

ColumnStore decimal scale

ColumnStore has the ability to support varied internal precision on decimal calculations. columnstore_decimal_scale is used internally by the ColumnStore engine to control how many significant digits to the right of the decimal point are carried through in suboperations on calculated columns. If, while running a query, you receive the message ‘aggregate overflow,’ try reducing columnstore_decimal_scale and running the query again. Note that, as you decrease columnstore_decimal_scale, you may see reduced accuracy in the least significant digit(s) of a returned calculated column. columnstore_decimal_scale is used internally by the ColumnStore engine to turn the use of this internal precision on and off. These two system variables may be set as a default for the instance or set at the session level.

Enable/disable decimal scale

To enable/disable the use of the decimal scale at the session level, the following command is used. Once the session has ended, any subsequent session will return to the default for the instance.

set columnstore_decimal_scale

where n is off (disabled) or on (enabled).

Set decimal scale level

To set the decimal scale at the session level, the following command is used. Once the session has ended, any subsequent session will return to the default for the instance.

set columnstore_use_decimal_scale

where n is the amount of precision desired for calculations.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Distributed Aggregate Functions

MariaDB ColumnStore supports the following aggregate functions, these can be used in the SELECT, HAVING, and ORDER BY clauses of the SQL statement.

Function
Description

Function

Description

AVG([DISTINCT] column)

Average value of a numeric (INT variations, NUMERIC, DECIMAL) column

CORR(ColumnY, ColumnX)

The correlation coefficient for non-null pairs in a group.

COUNT (*, [DISTINCT] column)

The number of rows returned by a query or grouping. All datatypes are supported

COVAR_POP(ColumnY, ColumnX)

The population covariance for non-null pairs in a group.

COVAR_SAMP(ColumnY, ColumnX)

The sample covariance for non-null pairs in a group.

MAX ([DISTINCT] column)

The maximum value of a column. All datatypes are supported.

MIN ([DISTINCT] column)

The maximum value of a column. All datatypes are supported.

REGR_AVGX(ColumnY, ColumnX)

Average of the independent variable (sum(ColumnX)/N), where N is number of rows processed by the query

REGR_AVGY(ColumnY, ColumnX)

Average of the dependent variable (sum(ColumnY)/N), where N is number of rows processed by the query

REGR_COUNT(ColumnY, ColumnX)

The total number of input rows in which both column Y and column X are nonnull

REGR_INTERCEPT(ColumnY, ColumnX)

The y-intercept of the least-squares-fit linear equation determined by the (ColumnX, ColumnY) pairs

REGR_R2(ColumnY, ColumnX)

Square of the correlation coefficient. correlation coefficient is the regr_intercept(ColumnY, ColumnX) for linear model

REGR_SLOPE(ColumnY, ColumnX)

The slope of the least-squares-fit linear equation determined by the (ColumnX, ColumnY) pairs

REGR_SXX(ColumnY, ColumnX)

REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs.

REGR_SXY(ColumnY, ColumnX)

REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs.

REGR_SYY(ColumnY, ColumnX)

REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs.

STD(), STDDEV(), STDDEV_POP()

The population standard deviation of a numeric (INT variations, NUMERIC, DECIMAL) column

STDDEV_SAMP()

The sample standard deviation of a numeric (INT variations, NUMERIC, DECIMAL) column

SUM([DISTINCT] column)

The sum of a numeric (INT variations, NUMERIC, DECIMAL) column

VARIANCE(), VAR_POP()

The population standard variance of a numeric (INT variations, NUMERIC, DECIMAL) column

VAR_SAMP()

The population standard variance of a numeric (INT variations, NUMERIC, DECIMAL) column

Note

  • Regression functions (REGR_AVGX to REGR_YY), CORR, COVAR_POP and COVAR_SAMP are supported for version 1.2.0 and higher

Example

An example group by query using aggregate functions is:

select year(o_orderdate) order_year, 
avg(o_totalprice) avg_totalprice, 
max(o_totalprice) max_totalprice, 
count(*) order_count 
from orders 
group by order_year 
order by order_year;

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Distributed Functions

ColumnStore supports the following functions. These functions can be specified in the projection (SELECT), WHERE and ORDER BY portions of the SQL statement and will be processed in a distributed manner.

  • &

  • ABS()

  • ACOS()

  • ADDDATE()

  • ADDTIME()

  • ASCII()

  • ASIN()

  • ATAN()

  • BETWEEN...AND...

  • BIT_AND()

  • BIT_OR()

  • BIT_XOR()

  • CASE()

  • CAST()

  • CEIL(), CEILING()

  • CHAR_LENGTH(), CHARACTER_LENGTH()

  • COALESCE()

  • CONCAT()

  • CONCAT_WS()

  • CONV()

  • CONVERT()

  • COS()

  • COT()

  • CRC32()

  • DATE()

  • DATE_ADD()

  • DATE_FORMAT()

  • DATE_SUB()

  • DATEDIFF()

  • DAY(), DAYOFMONTH()

  • DAYNAME()

  • DAYOFWEEK()

  • DAYOFYEAR()

  • DEGREES()

  • DIV

  • ELT()

  • EXP()

  • EXTRACT()

  • FIND_IN_SET()

  • FLOOR()

  • FORMAT()

  • FROM_DAYS()

  • FROM_UNIXTIME()

  • GET_FORMAT()

  • GREATEST()

  • GROUP_CONCAT()

  • HEX()

  • HOUR()

  • IF()

  • IFNULL()

  • IN

  • INET_ATON()

  • INET_NTOA()

  • INSERT()

  • INSTR()

  • ISNULL()

  • LAST_DAY()

  • LCASE()

  • LEAST()

  • LEFT()

  • LENGTH()

  • LIKE

  • LN()

  • LOCATE()

  • LOG()

  • LOG2()

  • LOG10()

  • LOWER()

  • LPAD()

  • LTRIM()

  • MAKEDATE()

  • MAKETIME()

  • MD5()

  • MICROSECOND()

  • MINUTE()

  • MOD()

  • MONTH()

  • MONTHNAME()

  • NOW()

  • NULLIF()

  • PERIOD_ADD()

  • PERIOD_DIFF()

  • POSITION()

  • POW(), POWER()

  • QUARTER()

  • RADIANS()

  • RAND()

  • REGEXP()

  • REPEAT()

  • REPLACE()

  • REVERSE()

  • RIGHT()

  • RLIKE()

  • ROUND()

  • RPAD()

  • RTRIM()

  • SEC_TO_TIME()

  • SECOND()

  • SHA(), SHA1()

  • SIGN()

  • SIN()

  • SPACE()

  • SQRT()

  • STR_TO_DATE()

  • STRCMP()

  • SUBDATE()

  • SUBSTR(), SUBSTRING()

  • SUBSTRING_INDEX()

  • SUBTIME()

  • SYSDATE()

  • TAN()

  • TIME()

  • TIME_FORMAT()

  • TIME_TO_SEC()

  • TIMEDIFF()

  • TIMESTAMPADD()

  • TIMESTAMPDIFF()

  • TO_DAYS()

  • TRIM()

  • TRUNCATE()

  • UCASE()

  • UNIX_TIMESTAMP()

  • UNIX_TIME()

  • UPPER()

  • WEEK()

  • WEEKDAY()

  • WEEKOFYEAR()

  • XOR()

  • YEAR()

  • YEARWEEK()

See also

  • ColumnStore Non-Distributed Post-Processed Functions

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Information Functions

  1. Functions "Functions"

  2. Example "Example"

Functions

MariaDB ColumnStore Information Functions are selectable pseudo functions that return MariaDB ColumnStore specific “meta” information to ensure queries can be locally directed to a specific node. These functions can be specified in the projection (SELECT), WHERE, GROUP BY, HAVING and ORDER BY portions of the SQL statement and will be processed in a distributed manner.

Function
Description

Function

Description

idbBlockId(column)

The Logical Block Identifier (LBID) for the block containing the physical row

idbDBRoot(column)

The DBRoot where the physical row resides

idbExtentId(column)

The Logical Block Identifier (LBID) for the first block in the extent containing the physical row

idbExtentMax(column)

The max value from the extent map entry for the extent containing the physical row

idbExtentMin(column)

The min value from the extent map entry for the extent containing the physical row

idbExtentRelativeRid(column)

The row id (1 to 8,388,608) within the column's extent

idbLocalPm()

The PM from which the query was launched. This function will return NULL if the query is launched from a standalone UM

idbPartition(column)

The three part partition id (Directory.Segment.DBRoot)

idbPm(column)

The PM where the physical row resides

idbSegmentDir(column)

The lowest level directory id for the column file containing the physical row

idbSegment(column)

The number of the segment file containing the physical row

ColumnStore Information Schema Tables

  1. COLUMNSTORE_TABLES "COLUMNSTORE_TABLES"

  2. COLUMNSTORE_COLUMNS "COLUMNSTORE_COLUMNS"

  3. COLUMNSTORE_EXTENTS "COLUMNSTORE_EXTENTS"

  4. COLUMNSTORE_FILES "COLUMNSTORE_FILES"

  5. Stored Procedures "Stored Procedures"

  6. total_usage() "total_usage()"

  7. table_usage() "table_usage()"

  8. compression_ratio() "compression_ratio()"

MariaDB ColumnStore has four information schema tables that expose information about the table and column storage. The tables were added in version 1.0.5 of ColumnStore and were heavily modified for 1.0.6.

COLUMNSTORE_TABLES

The first table is the INFORMATION_SCHEMA.COLUMNSTORE_TABLES. It contains information about the tables inside ColumnStore.

COLUMNSTORE_COLUMNS

The INFORMATION_SCHEMA.COLUMNSTORE_COLUMNS table contains information about every single column inside ColumnStore.

COLUMNSTORE_EXTENTS

This table displays the extent map in a user-consumable form. An extent is a collection of details about a section of data related to a columnstore column. A majority of columns in ColumnStore will have multiple extents, and the columns table above can be joined to this one to filter results by table or column. The table layout is as follows:

Column
Description

Column

Description

OBJECT_ID

The object ID for the extent

OBJECT_TYPE

Whether this is a "Column" or "Dictionary" extent

LOGICAL_BLOCK_START

ColumnStore's internal start LBID for this extent

LOGICAL_BLOCK_END

ColumnStore's internal end LBID for this extent

MIN_VALUE

This minimum value stored in this extent

MAX_VALUE

The maximum value stored in this extent

WIDTH

The data width for the extent

DBROOT

The DBRoot number for the extent

PARTITION_ID

The parition ID for the extent

SEGMENT_ID

The segment ID for the extent

BLOCK_OFFSET

The block offset for the data file, each data file can contain multiple extents for a column

MAX_BLOCKS

The maximum number of blocks for the extent

HIGH_WATER_MARK

The last block committed to the extent (starting at 0)

STATE

The state of the extent (see below)

STATUS

The availability status for the column which is either "Available", "Unavailable" or "Out of service"

DATA_SIZE

The uncompressed data size for the extent calculated as (HWM + 1) * BLOCK_SIZE

Notes:

  1. The state is "Valid" for a normal state, "Invalid" if a cpimport has completed but the table has not yet been accessed (min/max values will be invalid)or "Updating" if there is a DML statement writing to the column

  2. In ColumnStore the block size is 8192 bytes

  3. By default, ColumnStore will write and create an extent file of 2561024WIDTH bytes for the first partition; if this is too small, then for uncompressed data, it will create a file of the maximum size for the extent (MAX_BLOCKS * BLOCK_SIZE). Snappy always compression adds a header block.

  4. Object IDs of less than 3000 are for internal tables and will not appear in any of the information schema tables

  5. HWM is set to zero for the lower segments when there are multiple segments in an extent file; these can be observed when BLOCK_OFFSET > 0

  6. When HWM is 0, the DATA_SIZE will show 0 instead of 8192 to avoid confusion when there is multiple segments in an extent file

COLUMNSTORE_FILES

The columnstore_files table provides information about each file associated with extensions. Each extension can reuse a file at different block offsets, so this is not a 1:1 relationship to the columnstore_extents table.

Column
Description

Column

Description

OBJECT_ID

The object ID for the extent

SEGMENT_ID

The segment ID for the extent

PARTITION_ID

The partition ID for the extent

FILENAME

The full path and filename for the extent file, multiple extents for the same column can point to this file with different BLOCK_OFFSETs

FILE_SIZE

The disk file size for the extent

COMPRESSED_DATA_SIZE

The amount of the compressed file used, NULL if this is an uncompressed file

Stored Procedures

The total_usage() procedure gives a total disk usage summary for all the columns in ColumnStore except the columns used for internal maintenance. It is executed using the following query:

> call columnstore_info.total_usage();

table_usage()

The table_usage() procedure gives the total data disk usage, dictionary disk usage, and grand total disk usage per table. It can be called in several ways; the first gives a total for each table:

> call columnstore_info.table_usage(NULL, NULL);

Or for a specific table, my_table in my_schema in this example:

> call columnstore_info.table_usage('my_schema', 'my_table');

You can also request all tables for a specified schema:

> call columnstore_info.table_usage('my_schema', NULL);

Note: The quotes around the table name are required; an error will occur without them.

compression_ratio()

The compression_ratio() procedure calculates the average compression ratio across all the compressed extents in ColumnStore. It is called using

> call columnstore_info.compression_ratio();

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Naming Conventions

This lists the different naming conventions enforced by the column store, compared to the normal MariaDB naming conventions.

  • User names: 64 characters (MariaDB has 80)

  • Table and column names are restricted to alphanumeric and underscore only, i.e "A-Z a-z 0-9 _".

  • The first character of all table and column names should be an ASCII letter (a-z A-Z).

  • ColumnStore reserves certain words that MariaDB does not, such as SELECT, CHAR and TABLE, so even wrapped in backticks these cannot be used.

Reserved words

In addition to MariaDB Server reserved words, ColumnStore has additional reserved words that cannot be used as table names, column names or user defined variables, functions or stored procedure names.

Keyword

Keyword

ACTION

ADD

ALTER

AUTO_INCREMENT

BIGINT

BIT

CASCADE

CHANGE

CHARACTER

CHARSET

CHECK

CLOB

COLUMN

COLUMNS

COMMENT

CONSTRAINT

CONSTRAINTS

CREATE

CURRENT_USER

DATETIME

DEC

DECIMAL

DEFERRED

DEFAULT

DEFERRABLE

DOUBLE

DROP

ENGINE

EXISTS

FOREIGN

FULL

IDB_BLOB

IDB_CHAR

IDB_DELETE

IDB_FLOAT

IDB_INT

IF

IMMEDIATE

INDEX

INITIALLY

INTEGER

KEY

MATCH

MAX_ROWS

MIN_ROWS

MODIFY

NO

NOT

NULL_TOK

NUMBER

NUMERIC

ON

PARTIAL

PRECISION

PRIMARY

REAL

REFERENCES

RENAME

RESTRICT

SESSION_USER

SET

SMALLINT

SYSTEM_USER

TABLE

TIME

TINYINT

TO

TRUNCATE

UNIQUE

UNSIGNED

UPDATE

USER

VARBINARY

VARCHAR

VARYING

WITH

ZONE

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Non-Distributed Post-Processed Functions

ColumnStore supports all MariaDB functions that can be used in a post-processing manner where data is returned by ColumnStore first and then MariaDB executes the function on the data returned. The functions are currently supported only in the projection (SELECT) and ORDER BY portions of the SQL statement.

See also

  • ColumnStore Distributed Functions

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Operating Mode

ColumnStore has the ability to support full MariaDB query syntax through an operating mode. This operating mode may be set as a default for the instance or set at the session level. To set the operating mode at the session level, the following command is used. Once the session has ended, any subsequent session will return to the default for the instance.

set infinidb_vtable_mode = n

where n is:

    1. a generic, highly compatible row-by-row processing mode. Some WHERE clause components can be processed by ColumnStore, but joins are processed entirely by mysqld using a nested-loop join mechanism.

    1. (the default) query syntax is evaluated by ColumnStore for compatibility with distributed execution and incompatible queries are rejected. Queries executed in this mode take advantage of distributed execution and typically result in higher performance.

    1. auto-switch mode: ColumnStore will attempt to process the query internally, if it cannot, it will automatically switch the query to run in row-by-row mode.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore User Defined Aggregate and Window Functions

  1. Introduction "Introduction"

  2. Using user defined aggregate functions "Using user defined aggregate functions"

  3. Developing a new function "Developing a new function"

  4. Limitations "Limitations"

Introduction

Starting with MariaDB ColumnStore 1.1, the ability to create and use user defined aggregate and window functions is supported in addition to scalar functions. With Columnstore 1.2, multiple parameters are supported. A C++ SDK is provided as well as 3 reference examples that provide additional functions that may be of general use:

  • median - mathematical median, equivalent to percentile_cont(0.5)

  • avg_mode - mathematical mode, i.e. the most frequent value in the set

  • ssq - sum of squares, i.e. the sum of each individual number squared in the set

Similar to built-in functions, the SDK supports distributed aggregate execution where as much of the calculation is scaled out across PM nodes and then collected / finalized in the UM node. Window functions (due to the ordering requirement) are only executed at the UM level.

Using user defined aggregate functions

The reference examples above are included in the standard build of MariaDB ColumnStore and so can be used by registering them as user defined aggregate functions. The same can be done for new functions assuming the instance has the updated libraries included. From a mcsmysql prompt:

CREATE AGGREGATE FUNCTION median returns REAL soname 'libudf_mysql.so';
CREATE AGGREGATE FUNCTION avg_mode returns REAL soname 'libudf_mysql.so';
CREATE AGGREGATE FUNCTION ssq returns REAL soname 'libudf_mysql.so';

After this these may be used in the same way as any other aggregate or window function like sum:

SELECT grade, 
AVG(loan_amnt) avg, 
MEDIAN(loan_amnt) median 
FROM loanstats 
GROUP BY grade 
ORDER BY grade;

Developing a new function

This requires a MariaDB ColumnStore source tree and necessary tools to compile C/C++ code. The SDK and reference examples are available in the utils/udfsdk directory of the source tree. This contains the SDK documentation which is also available here:

  • 1.2.x UDAF SDK Guide "1.2.x UDAF SDK Guide"

Limitations

  • The implementation of the median and avg_mode functions will scale in memory consumption to the size of the set of unique values in the aggregation.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore User Defined Functions

  1. Introduction "Introduction"

  2. Developing a user defined function "Developing a user defined function"

  3. MariaDB server UDF implementation "MariaDB server UDF implementation"

  4. ColumnStore distributed UDF implementation "ColumnStore distributed UDF implementation"

  5. Deploying and using a UDF "Deploying and using a UDF"

Introduction

MariaDB provides extensibility support through user defined functions. For more details on the MariaDB server framework see the user-defined-functions article. This documentation applies to MariaDB ColumnStore version 1.0.10 and above.

MariaDB ColumnStore provides scale out query processing and as such requires a separate distributed implementation of each SQL function. This allows for the function application to happen on each PM server node providing distributed scale out performance.

Thus, to fully implement a user defined function for MariaDB ColumnStore requires implementing 2 different API's:

  • The MariaDB server UDF API: This allows utilization on all storage engines and is the implementation used if applied in the select list.

  • The ColumnStore distributed UDF API: This enables distributed execution of where clause and group by usage of the function and will be pushed down to PM nodes for execution where possible.

MariaDB ColumnStore supports user defined function implementations in C/C++. User defined aggregate and window functions are not supported in ColumnStore 1.0.

Developing a user defined function

The development kit can be found under the utils/udfsdk directory of the mariadb-columnstore-engine source tree. To develop a user defined function requires you to set up a development environment and be comfortable with c++ development. To setup a ColumnStore development environment please follow the instructions on dependencies in the ColumnStore server fork repository.

Three main files will need to be modified in order to add a new UDF:

  • udfmysql.cpp : mariadb server UDF implementation

  • udfsdk.h : class headers.

  • udfsdk.cpp : distributed columnstore UDF implementation.

Two reference implementations are provided to provide guidance on creating your own functions:

  • MCS_IsNull : this illustrates a simple one argument function providing the ability to return a Boolean if the expression parameter is null

  • MCS_Add: this illustrates a simple 2 argument function to illustrate adding 2 values and return the sum.

It is simplest to copy these and adapt to your needs. There are no system dependencies on the included reference implementations so these can be removed to simplify the class files if preferred.

MariaDB server UDF implementation

Three functions are required to be implemented (for more details see user-defined-functions):

  • x_init : perform any parameter validation or setup such as memory allocation.

  • x : perform the actual function implementation.

  • x_deinit : perform any clean up tasks such as deallocating memory where 'x' is the function name.

ColumnStore distributed UDF implementation

The function name and class must be registered in order to be recognized and used by the ColumnStore primitive processor. This is done by adding a line to perform the registration in the UDFSDK::UDFMap() function in the file udfsdk.cpp:

FuncMap UDFSDK::UDFMap() const
{
	FuncMap fm;
	
	// first: function name
	// second: Function pointer
	// please use lower case for the function name. Because the names might be 
	// case-insensitive in MariaDB depending on the setting. In such case,
	// the function names passed to the interface is always in lower case.
	fm["mcs_add"] = new MCS_add();
	fm["mcs_isnull"] = new MCS_isnull();

	return fm;
}

For any new user defined functions add a new entry into the FuncMap object mapping the name to the udf class.

The UDF class should be defined in file udfsdk.h and implemented in file udfsdk.cpp. It is easiest to adapt the example classes for new instance but each class must implement the funcexp::Func C++ class definition:

  • constructor: any initialization necessary

  • destructor: any de-initialization.

  • getOperationType: this performs parameter validation and returns the result data type.

  • getVal : computes and returns the value of the user defined function for each given return datatype.

The code changes can be built using make within the directory utils/udfsdk, this will create the following libraries in the same directory:

  • libudf_mysql.so.1.0.0

  • libudfsdk.so.1.0.0

containing the compiled code

Deploying and using a UDF

The 2 libraries created above must be deployed to the /usr/local/mariadb/columnstore/lib directory (or equivalent lib directory in a non root install) replacing the existing files. Symbolic links in the mariadb server directory point to these but should be validated. Run the following as root from the utils/udfsdk directory in the build tree (specifying a password to restartSystem if needed for a multi server cluster):

$ cp libudf_mysql.so.1.0.0 libudfsdk.so.1.0.0 /usr/local/mariadb/columnstore/lib/
$ ls -l /usr/local/mariadb/columnstore/mysql/lib/plugin/libudf_mysql.so
lrwxrwxrwx. 1 root root 56 Jul 19 09:47 /usr/local/mariadb/columnstore/mysql/lib/plugin/libudf_mysql.so -> /usr/local/mariadb/columnstore/lib/libudf_mysql.so.1.0.0

Repeat this for each ColumnStore UM and PM node in the cluster and then restart ColumnStore to make the libraries available.

After restarting the system the UDF must be registered with the MariaDB server to be usable:

$ mcsmysql
    > create function mcs_add returns integer soname 'libudf_mysql.so';

The function mcs_add can then be used. Verify that it can be used both in the select list and where clause for correct installation:

MariaDB [test]> create function mcs_add returns integer soname 'libudf_mysql.so';
Query OK, 0 rows affected (0.01 sec)

MariaDB [test]> create table t1(i1 int, i2 int) engine=columnstore;
Query OK, 0 rows affected (0.58 sec)

MariaDB [test]> insert into t1 values (1,1), (2,2);
Query OK, 2 rows affected (0.24 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> select i1, i2, mcs_add(i1,i2) sum from t1;
+------+------+------+
| i1   | i2   | sum  |
+------+------+------+
|    1 |    1 |    2 |
|    2 |    2 |    4 |
+------+------+------+
2 rows in set (0.05 sec)

MariaDB [test]> select i1, i2 from t1 where mcs_add(i1,i2) = 4;
+------+------+
| i1   | i2   |
+------+------+
|    2 |    2 |
+------+------+
1 row in set (0.02 sec)

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Utility Functions

MariaDB ColumnStore Utility Functions are a set of simple functions that return useful information about the system, such as whether it is ready for queries.

Function
Description

Function

Description

mcsSystemReady()

Returns 1 if the system can accept queries, 0 if it's not ready yet.

mcsSystemReadOnly()

Returns 1 if ColumnStore is in a write-suspended mode. That is, a user executed the SuspendDatabaseWrites.It returns 2 if in a read only state. ColumnStore puts itself into a read only state if it detects a logic error that may have corrupted data. Generally it means a ROLLBACK operation failed. Returns 0 if the system is writable.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Window Functions

  1. Introduction "Introduction"

  2. Syntax "Syntax"

  3. Supported Functions "Supported Functions"

  4. Note "Note"

  5. Examples "Examples"

  6. Example Schema "Example Schema"

  7. Cumulative Sum and Running Max Example "Cumulative Sum and Running Max Example"

  8. Partitioned Cumulative Sum and Running Max Example "Partitioned Cumulative Sum and Running Max Example"

  9. Ranking / Top Results "Ranking / Top Results"

  10. First and Last Values "First and Last Values"

  11. Prior and Next Example "Prior and Next Example"

  12. Quartiles Example "Quartiles Example"

  13. Percentile Example "Percentile Example"

Introduction

MariaDB ColumnStore provides support for window functions broadly following the SQL 2003 specification. A window function allows for calculations relating to a window of data surrounding the current row in a result set. This capability provides for simplified queries in support of common business questions such as cumulative totals, rolling averages, and top 10 lists.

Aggregate functions are utilized for window functions however differ in behavior from a group by query because the rows remain ungrouped. This provides support for cumulative sums and rolling averages, for example.

Two key concepts for window functions are Partition and Frame:

  • A Partition is a group of rows, or window, that have the same value for a specific column, for example a Partition can be created over a time period such as a quarter or lookup values.

  • The Frame for each row is a subset of the row's Partition. The frame typically is dynamic allowing for a sliding frame of rows within the Partition. The Frame determines the range of rows for the windowing function. A Frame could be defined as the last X rows and next Y rows all the way up to the entire Partition.

Window functions are applied after joins, group by, and having clauses are calculated.

Syntax

A window function is applied in the select clause using the following syntax:

function_name ([expression [, expression ... ]]) OVER ( window_definition )

where window_definition is defined as:

[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

PARTITION BY:

  • Divides the window result set into groups based on one or more expressions.

  • An expression may be a constant, column, and non window function expressions.

  • A query is not limited to a single partition by clause. Different partition clauses can be used across different window function applications.

  • The partition by columns do not need to be in the select list but do need to be available from the query result set.

  • If there is no PARTITION BY clause, all rows of the result set define the group.

ORDER BY

  • Defines the ordering of values within the partition.

  • Can be ordered by multiple keys which may be a constant, column or non window function expression.

  • The order by columns do not need to be in the select list but need to be available from the query result set.

  • Use of a select column alias from the query is not supported.

  • ASC (default) and DESC options allow for ordering ascending or descending.

  • NULLS FIRST and NULL_LAST options specify whether null values come first or last in the ordering sequence. NULLS_FIRST is the default for ASC order, and NULLS_LAST is the default for DESC order.

and the optional frame_clause is defined as:

{ RANGE | ROWS } frame_start
{ RANGE | ROWS } BETWEEN frame_start AND frame_end

and the optional frame_start and frame_end are defined as (value being a numeric expression):

UNBOUNDED PRECEDING
value PRECEDING
CURRENT ROW
value FOLLOWING
UNBOUNDED FOLLOWING

RANGE/ROWS:

  • Defines the windowing clause for calculating the set of rows that the function applies to for calculating a given rows window function result.

  • Requires an ORDER BY clause to define the row order for the window.

  • ROWS specify the window in physical units, i.e. result set rows and must be a constant or expression evaluating to a positive numeric value.

  • RANGE specifies the window as a logical offset. If the expression evaluates to a numeric value then the ORDER BY expression must be a numeric or DATE type. If the expression evaluates to an interval value then the ORDER BY expression must be a DATE data type.

  • UNBOUNDED PRECEDING indicates the window starts at the first row of the partition.

  • UNBOUNDED FOLLOWING indicates the window ends at the last row of the partition.

  • CURRENT ROW specifies the window start or ends at the current row or value.

  • If omitted, the default is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Supported Functions

Function
Description

Function

Description

AVG()

The average of all input values.

COUNT()

Number of input rows.

CUME_DIST()

Calculates the cumulative distribution, or relative rank, of the current row to other rows in the same partition. Number of peer or preceding rows / number of rows in partition.

DENSE_RANK()

Ranks items in a group leaving no gaps in ranking sequence when there are ties.

FIRST_VALUE()

The value evaluated at the row that is the first row of the window frame (counting from 1); null if no such row.

LAG()

The value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null. LAG provides access to more than one row of a table at the same time without a self-join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position.

LAST_VALUE()

The value evaluated at the row that is the last row of the window frame (counting from 1); null if no such row.

LEAD()

Provides access to a row at a given physical offset beyond that position. Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null.

MAX()

Maximum value of expression across all input values.

MEDIAN()

An inverse distribution function that assumes a continuous distribution model. It takes a numeric or datetime value and returns the middle value or an interpolated value that would be the middle value once the values are sorted. Nulls are ignored in the calculation. Not available in MariaDB Columnstore 1.1

MIN()

Minimum value of expression across all input values.

NTH_VALUE()

The value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row.

NTILE()

Divides an ordered data set into a number of buckets indicated by expr and assigns the appropriate bucket number to each row. The buckets are numbered 1 through expr. The expr value must resolve to a positive constant for each partition. Integer ranging from 1 to the argument value, dividing the partition as equally as possible.

PERCENT_RANK()

Relative rank of the current row: (rank - 1) / (total rows - 1).

PERCENTILE_CONT()

An inverse distribution function that assumes a continuous distribution model. It takes a percentile value and a sort specification, and returns an interpolated value that would fall into that percentile value with respect to the sort specification. Nulls are ignored in the calculation. Not available in MariaDB Columnstore 1.1

PERCENTILE_DISC()

An inverse distribution function that assumes a discrete distribution model. It takes a percentile value and a sort specification and returns an element from the set. Nulls are ignored in the calculation. Not available in MariaDB Columnstore 1.1

RANK()

Rank of the current row with gaps; same as row_number of its first peer.

REGR_COUNT(ColumnY, ColumnX)

The total number of input rows in which both column Y and column X are nonnull

REGR_SLOPE(ColumnY, ColumnX)

The slope of the least-squares-fit linear equation determined by the (ColumnX, ColumnY) pairs

REGR_INTERCEPT(ColumnY, ColumnX)

The y-intercept of the least-squares-fit linear equation determined by the (ColumnX, ColumnY) pairs

REGR_R2(ColumnY, ColumnX)

Square of the correlation coefficient. correlation coefficient is the regr_intercept(ColumnY, ColumnX) for linear model

REGR_SXX(ColumnY, ColumnX)

REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs.

REGR_SXY(ColumnY, ColumnX)

REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs.

REGR_SYY(ColumnY, ColumnX)

REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs.

ROW_NUMBER()

Number of the current row within its partition, counting from 1

STDDEV() STDDEV_POP()

Computes the population standard deviation and returns the square root of the population variance.

STDDEV_SAMP()

Computes the cumulative sample standard deviation and returns the square root of the sample variance.

SUM()

Sum of expression across all input values.

VARIANCE() VAR_POP()

Population variance of the input values (square of the population standard deviation).

VAR_SAMP()

Sample variance of the input values (square of the sample standard deviation).

Examples

Example Schema

The examples are all based on the following simplified sales opportunity table:

create table opportunities (
id int,
accountName varchar(20),
name varchar(128),
owner varchar(7),
amount decimal(10,2),
closeDate date,
stageName varchar(11)
) engine=columnstore;

Some example values are (thanks to www.mockaroo.com for sample data generation):

id
accountName
name
owner
amount
closeDate
stageName

id

accountName

name

owner

amount

closeDate

stageName

1

Browseblab

Multi-lateral executive function

Bob

26444.86

2016-10-20

Negotiating

2

Mita

Organic demand-driven benchmark

Maria

477878.41

2016-11-28

ClosedWon

3

Miboo

De-engineered hybrid groupware

Olivier

80181.78

2017-01-05

ClosedWon

4

Youbridge

Enterprise-wide bottom-line Graphic Interface

Chris

946245.29

2016-07-02

ClosedWon

5

Skyba

Reverse-engineered fresh-thinking standardization

Maria

696241.82

2017-02-17

Negotiating

6

Eayo

Fundamental well-modulated artificial intelligence

Bob

765605.52

2016-08-27

Prospecting

7

Yotz

Extended secondary infrastructure

Chris

319624.20

2017-01-06

ClosedLost

8

Oloo

Configurable web-enabled data-warehouse

Chris

321016.26

2017-03-08

ClosedLost

9

Kaymbo

Multi-lateral web-enabled definition

Bob

690881.01

2017-01-02

Developing

10

Rhyloo

Public-key coherent infrastructure

Chris

965477.74

2016-11-07

Prospecting

The schema, sample data, and queries are available as an attachment to this article.

Cumulative Sum and Running Max Example

Window functions can be used to achieve cumulative / running calculations on a detail report. In this case a won opportunity report for a 7 day period adds columns to show the accumulated won amount as well as the current highest opportunity amount in preceding rows.

select owner, 
accountName, 
CloseDate, 
amount, 
sum(amount) over (order by CloseDate rows between unbounded preceding and current row) cumeWon, 
max(amount) over (order by CloseDate rows between unbounded preceding and current row) runningMax
from opportunities 
where stageName='ClosedWon' 
and closeDate >= '2016-10-02' and closeDate <= '2016-10-09' 
order by CloseDate;

with example results:

owner
accountName
CloseDate
amount
cumeWon
runningMax

owner

accountName

CloseDate

amount

cumeWon

runningMax

Bill

Babbleopia

2016-10-02

437636.47

437636.47

437636.47

Bill

Thoughtworks

2016-10-04

146086.51

583722.98

437636.47

Olivier

Devpulse

2016-10-05

834235.93

1417958.91

834235.93

Chris

Linkbridge

2016-10-07

539977.45

2458738.65

834235.93

Olivier

Trupe

2016-10-07

500802.29

1918761.20

834235.93

Bill

Latz

2016-10-08

857254.87

3315993.52

857254.87

Chris

Avamm

2016-10-09

699566.86

4015560.38

857254.87

Partitioned Cumulative Sum and Running Max Example

The above example can be partitioned, so that the window functions are over a particular field grouping such as owner and accumulate within that grouping. This is achieved by adding the syntax "partition by " in the window function clause.

select owner,  
accountName,  
CloseDate,  
amount,  
sum(amount) over (partition by owner order by CloseDate rows between unbounded preceding and current row) cumeWon,  
max(amount) over (partition by owner order by CloseDate rows between unbounded preceding and current row) runningMax 
from opportunities  
where stageName='ClosedWon' 
and closeDate >= '2016-10-02' and closeDate <= '2016-10-09'  
order by owner, CloseDate;

with example results:

owner
accountName
CloseDate
amount
cumeWon
runningMax

owner

accountName

CloseDate

amount

cumeWon

runningMax

Bill

Babbleopia

2016-10-02

437636.47

437636.47

437636.47

Bill

Thoughtworks

2016-10-04

146086.51

583722.98

437636.47

Bill

Latz

2016-10-08

857254.87

1440977.85

857254.87

Chris

Linkbridge

2016-10-07

539977.45

539977.45

539977.45

Chris

Avamm

2016-10-09

699566.86

1239544.31

699566.86

Olivier

Devpulse

2016-10-05

834235.93

834235.93

834235.93

Olivier

Trupe

2016-10-07

500802.29

1335038.22

834235.93

Ranking / Top Results

The rank window function allows for ranking or assigning a numeric order value based on the window function definition. Using the Rank() function will result in the same value for ties / equal values and the next rank value skipped. The Dense_Rank() function behaves similarly except the next consecutive number is used after a tie rather than skipped. The Row_Number() function will provide a unique ordering value. The example query shows the Rank() function being applied to rank sales reps by the number of opportunities for Q4 2016.

select owner, 
wonCount, 
rank() over (order by wonCount desc) rank 
from (
  select owner, 
  count(*) wonCount 
  from opportunities 
  where stageName='ClosedWon' 
  and closeDate >= '2016-10-01' and closeDate < '2016-12-31'  
  group by owner
) t
order by rank;

with example results (note the query is technically incorrect by using closeDate < '2016-12-31' however this creates a tie scenario for illustrative purposes):

owner
wonCount
rank

owner

wonCount

rank

Bill

19

1

Chris

15

2

Maria

14

3

Bob

14

3

Olivier

10

5

If the dense_rank function is used the rank values would be 1,2,3,3,4 and for the row_number function the values would be 1,2,3,4,5.

First and Last Values

The first_value and last_value functions allow determining the first and last values of a given range. Combined with a group by this allows summarizing opening and closing values. The example shows a more complex case where detailed information is presented for first and last opportunity by quarter.

select a.year, 
a.quarter, 
f.accountName firstAccountName, 
f.owner firstOwner, 
f.amount firstAmount, 
l.accountName lastAccountName, 
l.owner lastOwner, 
l.amount lastAmount 
from (
  select year, 
  quarter, 
  min(firstId) firstId, 
  min(lastId) lastId 
  from (
    select year(closeDate) year, 
    quarter(closeDate) quarter, 
    first_value(id) over (partition by year(closeDate), quarter(closeDate) order by closeDate rows between unbounded preceding and current row) firstId, 
    last_value(id) over (partition by year(closeDate), quarter(closeDate) order by closeDate rows between current row and unbounded following) lastId 
    from opportunities  where stageName='ClosedWon'
  ) t 
  group by year, quarter order by year,quarter
) a 
join opportunities f on a.firstId = f.id 
join opportunities l on a.lastId = l.id 
order by year, quarter;

with example results:

year
quarter
firstAccountName
firstOwner
firstAmount
lastAccountName
lastOwner
lastAmount

year

quarter

firstAccountName

firstOwner

firstAmount

lastAccountName

lastOwner

lastAmount

2016

3

Skidoo

Bill

523295.07

Skipstorm

Bill

151420.86

2016

4

Skimia

Chris

961513.59

Avamm

Maria

112493.65

2017

1

Yombu

Bob

536875.51

Skaboo

Chris

270273.08

Prior and Next Example

Sometimes it useful to understand the previous and next values in the context of a given row. The lag and lead window functions provide this capability. By default the offset is one providing the prior or next value but can also be provided to get a larger offset. The example query is a report of opportunities by account name showing the opportunity amount, and the prior and next opportunity amount for that account by close date.

select accountName, 
closeDate,  
amount currentOppAmount, 
lag(amount) over (partition by accountName order by closeDate) priorAmount, lead(amount) over (partition by accountName order by closeDate) nextAmount 
from opportunities 
order by accountName, closeDate 
limit 9;

with example results:

accountName
closeDate
currentOppAmount
priorAmount
nextAmount

accountName

closeDate

currentOppAmount

priorAmount

nextAmount

Abata

2016-09-10

645098.45

NULL

161086.82

Abata

2016-10-14

161086.82

645098.45

350235.75

Abata

2016-12-18

350235.75

161086.82

878595.89

Abata

2016-12-31

878595.89

350235.75

922322.39

Abata

2017-01-21

922322.39

878595.89

NULL

Abatz

2016-10-19

795424.15

NULL

NULL

Agimba

2016-07-09

288974.84

NULL

914461.49

Agimba

2016-09-07

914461.49

288974.84

176645.52

Agimba

2016-09-20

176645.52

914461.49

NULL

Quartiles Example

The NTile window function allows for breaking up a data set into portions assigned a numeric value to each portion of the range. NTile(4) breaks the data up into quartiles (4 sets). The example query produces a report of all opportunities summarizing the quartile boundaries of amount values.

select t.quartile, 
min(t.amount) min, 
max(t.amount) max 
from (
  select amount, 
  ntile(4) over (order by amount asc) quartile 
  from opportunities 
  where closeDate >= '2016-10-01' and closeDate <= '2016-12-31'
  ) t 
group by quartile 
order by quartile;

With example results:

quartile
min
max

quartile

min

max

1

6337.15

287634.01

2

288796.14

539977.45

3

540070.04

748727.51

4

753670.77

998864.47

Percentile Example

The percentile functions have a slightly different syntax from other window functions as can be seen in the example below. These functions can be only applied against numeric values. The argument to the function is the percentile to evaluate. Following 'within group' is the sort expression which indicates the sort column and optionally order. Finally after 'over' is an optional partition by clause, for no partition clause use 'over ()'. The example below utilizes the value 0.5 to calculate the median opportunity amount in the rows. The values differ sometimes because percentile_cont will return the average of the 2 middle rows for an even data set while percentile_desc returns the first encountered in the sort.

select owner,  
accountName,  
CloseDate,  
amount,
percentile_cont(0.5) within group (order by amount) over (partition by owner) pct_cont,
percentile_disc(0.5) within group (order by amount) over (partition by owner) pct_disc
from opportunities  
where stageName='ClosedWon' 
and closeDate >= '2016-10-02' and closeDate <= '2016-10-09'  
order by owner, CloseDate;

With example results:

owner
accountName
CloseDate
amount
pct_cont
pct_disc

owner

accountName

CloseDate

amount

pct_cont

pct_disc

Bill

Babbleopia

2016-10-02

437636.47

437636.4700000000

437636.47

Bill

Thoughtworks

2016-10-04

146086.51

437636.4700000000

437636.47

Bill

Latz

2016-10-08

857254.87

437636.4700000000

437636.47

Chris

Linkbridge

2016-10-07

539977.45

619772.1550000000

539977.45

Chris

Avamm

2016-10-09

699566.86

619772.1550000000

539977.45

Olivier

Devpulse

2016-10-05

834235.93

667519.1100000000

500802.29

Olivier

Trupe

2016-10-07

500802.29

667519.1100000000

500802.29

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Data Definition Statements

You can use most normal statements from the MariaDB data definition language (DDL) with ColumnStore tables. This section lists DDL that differs for ColumnStore compared to normal MariaDB usage.

ColumnStore Alter Table

  1. Syntax "Syntax"

  2. ADD "ADD"

  3. Online alter table add column "Online alter table add column"

  4. CHANGE "CHANGE"

  5. DROP "DROP"

  6. RENAME "RENAME"

The ALTER TABLE statement modifies existing tables. It includes adding, deleting, and renaming columns as well as renaming tables.

Syntax

ALTER TABLE tbl_name
    alter_specification [, alter_specification] ...

alter_specification:
    table_option ...
  | ADD [COLUMN] col_name column_definition
  | ADD [COLUMN] (col_name column_definition,...)
  | ALTER [COLUMN] col_name {SET DEFAULT literal | DROP DEFAULT}
  | CHANGE [COLUMN] old_col_name new_col_name column_definition
  | DROP [COLUMN] col_name
  | RENAME [TO] new_tbl_name
 

column_definition:
    data_type
      [NOT NULL | NULL]
      [DEFAULT default_value]
      [COMMENT '[compression=0|1];']

table_options:
    table_option [[,] table_option] ...  (see CREATE TABLE options)

images here

ADD

The ADD clause allows you to add columns to a table. You must specify the data type after the column name. The following statement adds a priority column with an integer datatype to the orders table:

ALTER TABLE orders ADD COLUMN priority INTEGER;
  • Compression level (0 for no compression, 1 for compression) is set at the system level. If a session default exists, it will override the system default. In turn, it can be overridden by the table-level compression comment and finally, a compression comment at the column level.

Online alter table add columntable-level

The ColumnStore engine fully supports online DDL (one session can be adding columns to a table while another session is querying that table). MariaDB ColumnStore has provided it own syntax to do so for adding columns to a table, one at a time only. Do not attempt to use it for any other purpose. Follow the example below as closely as possible

We have also provided the following workaround. This workaround is intended for adding columns to a table, one at a time only. Do not attempt to use it for any other purpose. Follow the example below as closely as possible.

Scenario: Add an INT column named col7 to the existing table foo:

select calonlinealter('alter table foo add column col7 int;');
alter table foo add column col7 int comment 'schema sync only';

The select statement may take several tens of seconds to run, depending on how many rows are currently in the table. Regardless, other sessions can select against the table during this time (but they won’t be able to see the new column yet). The ALTER TABLE statement will take less than 1 second (depending on how busy MariaDB is), and during this brief time interval, other table reads will be held off.

CHANGE

The CHANGE clause allows you to rename a column in a table.

Notes to CHANGE COLUMN:

  • You cannot currently use CHANGE COLUMN to change the definition of that column.

  • You can only change a single column at a time. The following example renames the order_qty field to quantity in the orders table:

ALTER TABLE orders CHANGE COLUMN order_qty quantity
INTEGER;

DROP

The DROP clause allows you to drop columns. All associated data is removed when the column is dropped. You can DROP COLUMN (column_name). The following example alters the orders table to drop the priority column:

ALTER TABLE orders DROP COLUMN priority;

RENAME

The RENAME clause allows you to rename a table. The following example renames the orders table:

ALTER TABLE orders RENAME TO customer_orders;

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Alter View

  1. Syntax "Syntax"

Alters the definition of a view. CREATE OR REPLACE VIEW may also be used to alter the definition of a view.

Syntax

CREATE
    [OR REPLACE]
    VIEW view_name [(column_list)]
    AS select_statement

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Create Procedure

  1. Syntax "Syntax"

Creates a stored routine in ColumnStore.

Syntax

CREATE
    [DEFINER = { user | CURRENT_USER }]
    PROCEDURE sp_name ([proc_parameter[,...]])
    [characteristic ...] routine_body

proc_parameter:
    [ IN | OUT | INOUT ] param_name type

type:
    Any valid MariaDB ColumnStore data type

routine_body:
    Valid SQL procedure statement

ColumnStore currently accepts definition of stored procedures with only input arguments and a single SELECT query while in Operating Mode = 1 (VTABLE mode). However, while in the Operating Mode = 0 (TABLE mode), ColumnStore will allow additional complex definition of stored procedures (i.e., OUT parameter, declare, cursors,etc.)

See Operating Mode for information on Operating Modes

images here

The following statements create and call the sp_complex_variable stored procedure:

CREATE PROCEDURE sp_complex_variable(in arg_key int, in arg_date date)
  begin
    Select *
    from lineitem, orders
    where o_custkey < arg_key
    and l_partkey < 10000
    and l_shipdate>arg_date
    and l_orderkey = o_orderkey
    order by l_orderkey, l_linenumber;
  end

call sp_complex_variable(1000, '1998-10-10');

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Create Table

  1. Syntax "Syntax"

  2. Notes: "Notes:"

A database consists of tables that store user data. You can create multiple columns with the CREATE TABLE statement. The data type follows the column name when adding columns.

Syntax

CREATE TABLE [IF NOT EXISTS] tbl_name
    (create_definition,...)  
engine=columnstore  [ DEFAULT CHARSET=character-set] 
[COMMENT '[compression=0|1][;]
CREATE TABLE [IF NOT EXISTS] tbl_name
   { LIKE old_table_name | (LIKE old_table_name) }
create_definition:
    { col_name column_definition } 
column_definition:
    data_type
      [NOT NULL | NULL]
      [DEFAULT default_value]
      [COMMENT '[compression=0|1]
      [COMMENT='schema sync only']
      [COMMENT 'autoincrement column_name'];

images here

Notes:

  • ColumnStore tables should not be created in the mysql, information_schema, calpontsys, or test databases.

  • ColumnStore stores all object names in lowercase.

  • CREATE TABLE AS SELECT is not supported and will instead create the table in the default storage engine.

  • Compression level (0 for no compression, 1 for compression) is set at the system level. If a session default exists, it will override the system default. In turn, it can be overridden by the table-level compression comment and, finally, a compression comment at the column level.

  • A table is created in the front end only by using a ‘schema sync only’ comment.

  • The column DEFAULT value can be a maximum of 64 characters.

  • For maximum compatibility with external tools, MariaDB ColumnStore will accept the following table attributes; however, these are not implemented within MariaDB ColumnStore:

    • MIN_ROWS

    • MAX_ROWS

    • AUTO_INCREMENT

All of these are ignored by ColumnStore. The following statement creates a table called "orders" with two columns: "orderkey" with datatype integer and "customer" with datatype varchar:

CREATE TABLE orders (
  orderkey INTEGER, 
  customer VARCHAR(45)
) ENGINE=ColumnStore

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Create View

  1. Syntax "Syntax"

Creates a stored query in the MariaDB ColumnStore

Syntax

CREATE
    [OR REPLACE]
    VIEW view_name [(column_list)]
    AS select_statement

Notes to CREATE VIEW:

  • If you describe a view in MariaDB ColumnStore, the column types reported may not match the actual column types in the underlying tables. This is normal and can be ignored. The following statement creates a customer view of orders with status:

CREATE VIEW v_cust_orders (cust_name, order_number, order_status) as
select c.cust_name, o.ordernum, o.status from customer c, orders o
where c.custnum = o.custnum;

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Drop Table

  1. Syntax "Syntax"

  2. See also "See also"

The DROP TABLE statement deletes a table from ColumnStore.

Syntax

DROP  TABLE [IF EXISTS] 
    tbl_name 
    [RESTRICT ]

The RESTRICT clause limits the table to being dropped in the front end only. This could be useful when the table has been dropped on one user module, and needs to be synced to others.

images here

The following statement drops the orders table on the front end only:

DROP TABLE orders RESTRICT;

See also

  • DROP TABLE

This page is licensed: CC BY-SA / Gnu FDL

DDL statements that differ for ColumnStore

In most cases, a ColumnStore table works just as any other MariaDB table. There are however a few differences.

The following table lists the data definition statements (DDL) that differ from normal MariaDB DDL when used on ColumnStore tables.

DDL
Difference

DDL

Difference

DROP TABLE

Columnstore supports DROP TABLE ...RESTRICT which only drops the table in the front end.

RENAME TABLE

ColumnStore doesn't allow one to rename a table between databases.

CREATE TABLE

ColumnStore doesn't need indexes, partitions and many other table and column options. See here for ColumnStore Specific Syntax

CREATE INDEX

ColumnStore doesn't need indexes. Hence an index many not be created on a table that is defined with engine=columnstore

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Data Manipulation Statements

Learn data manipulation statements for MariaDB ColumnStore. This section covers INSERT, UPDATE, DELETE, and LOAD DATA operations, optimized for efficient handling of large analytical datasets.

ColumnStore Delete

The DELETE statement is used to remove rows from tables.

  1. Syntax "Syntax"

Syntax

DELETE 
 [FROM] tbl_name 
    [WHERE where_condition]
    [ORDER BY ...]
    [LIMIT row_count]

No disk space is recovered after a DELETE. TRUNCATE and DROP PARTITION can be used to recover space, or alternatively CREATE TABLE, loading only the remaining rows, then using DROP TABLE on the original table and RENAME TABLE).

LIMIT will limit the number of rows deleted, which will perform the DELETE more quickly. The DELETE ... LIMIT statement can then be performed multiple times to achieve the same effect as DELETE with no LIMIT.

The following statement deletes customer records with a customer key identification between 1001 and 1999:

DELETE FROM customer 
  WHERE custkey > 1000 and custkey <2000

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Disk-Based Joins

  1. Introduction "Introduction"

  2. Per user join memory limit "Per user join memory limit"

Introduction

Joins are performed in memory unless disk-based joins are enabled via AllowDiskBasedJoin in the columnstore.xml. When a join operation exceeds the memory allocated for query joins, the query is aborted with an error code IDB-2001. Disk-based joins enable such queries to use disk for intermediate join data in case when the memory needed for the join exceeds the memory limit. Although slower in performance as compared to a fully in-memory join and bound by the temporary space on disk, it does allow such queries to complete.

Note: Disk-based joins do not include aggregation and DML joins.

The following variables in the HashJoin element in the Columnstore.xml configuration file relate the o disk-based joins. Columnstore.xml resides in the etc directory for your installation (/usr/local/mariadb/columnstore/etc).

  • AllowDiskBasedJoin—Option to use disk-based joins. Valid values are Y (enabled) or N (disabled). The default is disabled.

  • TempFileCompression—Option to use compression for disk join files. Valid values are Y (use compressed files) or N (use non-compressed files).

  • TempFilePath—The directory path used for the disk joins. By default, this path is the tmp directory for your installation (i.e., /tmp/columnstore_tmp_files/joins/). Files in this directory will be created and cleaned on an as-needed basis. The entire directory is removed and recreated by ExeMgr at startup.)

Note: When using disk-based joins, it is strongly recommended that the TempFilePath reside on its partition, as the partition may fill up as queries are executed.

Per user join memory limit

In addition to the system-wide flags at the SQL global and session levels, the following system variables exist for managing per-user memory limits for joins.

  • columnstore_um_mem_limit - A value for memory limit in MB per user. When this limit is exceeded by a join, it will switch to a disk-based join. By default, the limit is not set (value of 0).

For modification at the global level: In my.cnf file (example: /etc/my.cnf.d/server.cnf):

[mysqld]
...
columnstore_um_mem_limit = value
where value is the value in Mb for in memory limitation per user.

For modification at the session level, before issuing your join query from the SQL client, set the session variable as follows.

set columnstore_um_mem_limit = value

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Insert

The INSERT statement allows you to add data to tables.

  1. Syntax "Syntax"

  2. INSERT SELECT "INSERT SELECT"

  3. Autoincrement "Autoincrement"

Syntax

INSERT 
 INTO tbl_name [(col,...)]
 {VALUES | VALUE} ({expr | DEFAULT},...),(...),...

The following statement inserts a row with all column values into the customer table:

INSERT INTO customer (custno, custname, custaddress, phoneno, cardnumber, comments) 
  VALUES (12, ‘JohnSmith’, ‘100 First Street, Dallas’, ‘(214) 555-1212’,100, ‘On Time’)

The following statement inserts two rows with all column values into the customer table:

INSERT INTO customer (custno, custname, custaddress, phoneno, cardnumber, comments) VALUES 
  (12, ‘JohnSmith’, ‘100 First Street, Dallas’, ‘(214) 555-1212’,100, ‘On Time’),
  (13, ‘John Q Public’, ‘200 Second Street, Dallas’, ‘(972) 555-1234’, 200, ‘LatePayment’);

INSERT SELECT

With INSERT ... SELECT, you can quickly insert many rows into a table from one or more other tables.

  • ColumnStore ignores the ON DUPLICATE KEY clause.

  • Non-transactional INSERT ... SELECT is directed to ColumnStores cpimport tool by default, which significantly increases performance.

  • Transactional INSERT ... SELECT statements (that is with AUTOCOMMIT off or after a START TRANSACTION) are processed through normal DML processes.

Autoincrement

Example

create table autoinc_test(
id int,
name varchar(10))
engine=columnstore comment 'autoincrement=id';

INSERT INTO autoinc_test (name) VALUES ('John');
INSERT INTO autoinc_test (name) VALUES ('Doe');

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore LOAD DATA INFILE

The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. The file name must be given as a literal string.

LOAD DATA [LOCAL] INFILE 'file_name' 
  INTO TABLE tbl_name
  [CHARACTER SET charset_name]
  [{FIELDS | COLUMNS}
    [TERMINATED BY 'string']
    [[OPTIONALLY] ENCLOSED BY 'char']
    [ESCAPED BY 'char']
  ]
  [LINES
    [STARTING BY 'string']
    [TERMINATED BY 'string']
]
  • ColumnStore ignores the ON DUPLICATE KEY clause.

  • Non-transactional LOAD DATA INFILE is directed to ColumnStores cpimport tool by default, which significantly increases performance.

  • Transactional LOAD DATA INFILE statements (that is, with AUTOCOMMIT off or after a START TRANSACTION) are processed through normal DML processes.

  • Use cpimport for importing UTF-8 data that contains multi-byte values

The following example loads data into a simple 5- column table: A file named /simpletable.tblhas the following data in it.

1|100|1000|10000|Test Number 1|
2|200|2000|20000|Test Number 2|
3|300|3000|30000|Test Number 3|

The data can then be loaded into the simpletable table with the following syntax:

LOAD DATA INFILE 'simpletable.tbl' INTO TABLE simpletable FIELDS TERMINATED BY '|'

If the default mode is set to use cpimport internally, any output error files will be written to /var/log/mariadb/columnstore/cpimport/ directory. It can be consulted for troubleshooting any errors reported.

See Also

LOAD DATA INFILE

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Select

The SELECT statement is used to query the database and display table data. You can add many clauses to filter the data.

  1. Syntax "Syntax"

  2. Projection List (SELECT) "Projection List (SELECT)"

  3. WHERE "WHERE"

  4. GROUP BY "GROUP BY"

  5. HAVING "HAVING"

  6. ORDER BY "ORDER BY"

  7. UNION "UNION"

  8. LIMIT "LIMIT"

Syntax

SELECT
    [ALL | DISTINCT ]
    select_expr [, select_expr ...]
    [ FROM table_references
      [WHERE where_condition]
      [GROUP BY {col_name | expr | position} [ASC | DESC], ... [WITH ROLLUP]]
      [HAVING where_condition]
      [ORDER BY {col_name | expr | position} [ASC | DESC], ...]
      [LIMIT {[offset,] row_count | row_count OFFSET offset}]
      [PROCEDURE procedure_name(argument_list)]
      [INTO OUTFILE 'file_name' [CHARACTER SET charset_name] [export_options]
         | INTO DUMPFILE 'file_name' | INTO var_name [, var_name] ]
export_options:
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]

<<toc>>

Projection List (SELECT)

If the same column needs to be referenced more than once in the projection list, a unique name is required for each column using a column alias. The total length of the name of a column, inclusive of the length of functions, in the projection list must be 64 characters or less.

WHERE

The WHERE clause filters data retrieval based on criteria. Note that column_alias cannot be used in the WHERE clause. The following statement returns rows in the region table where the region = ‘ASIA’:

SELECT * FROM region WHERE name = ’ASIA’;

GROUP BY

GROUP BY groups data based on values in one or more specific columns. The following statement returns rows from the lineitem table where /orderkeyis less than 1 000 000 and groups them by the quantity.

SELECT quantity, count(*) FROM lineitem WHERE orderkey < 1000000 GROUP BY quantity;

HAVING

HAVING is used in combination with the GROUP BY clause. It can be used in a SELECT statement to filter the records that a GROUP BY returns.The following statement returns shipping dates, and the respective quantity where the quantity is 2500 or more.

SELECT shipdate, count(*) FROM lineitem GROUP BYshipdate HAVING count(*) >= 2500;

ORDER BY

The ORDER BY clause presents results in a specific order. Note that the ORDER BY clause represents a statement that is post-processed by MariaDB. The following statement returns an ordered quantity column from the lineitem table.

SELECT quantity FROM lineitem WHERE orderkey < 1000000 order by quantity;

The following statement returns an ordered shipmode column from the lineitem table.

Select shipmode from lineitem where orderkey < 1000000 order by 1;

NOTE: When ORDER BY is used in an inner query and LIMIT on an outer query, LIMIT is applied first and then ORDER BY is applied when returning results.

UNION

Used to combine the result from multiple SELECT statements into a single result set.The UNION or UNION DISTINCT clause returns query results from multiple queries into one display and discards duplicate results. The UNION ALL clause displays query results from multiple queries and does not discard the duplicates. The following statement returns the p_name rows in the part table and the partno table and discards the duplicate results:

SELECT p_name FROM part UNION select p_name FROM  partno;

The following statement returns all the p_name rows in the part table and the partno table:

SELECT p_name FROM part UNION ALL select p_name FROM  partno;

LIMIT

A limit is used to constrain the number of rows returned by the SELECT statement. LIMIT can have up to two arguments. LIMIT must contain a row count and may optionally contain an offset of the first row to return (the initial row is 0). The following statement returns 5 customer keys from the customer table:

SELECT custkey from customer limit 5;

The following statement returns 5 customer keys from the customer table beginning at offset 1000:

SELECT custkey from customer limit 1000,5;

NOTE: When LIMIT is applied on a nested query's results, and the inner query contains ORDER BY, LIMIT is applied first, and then ORDER BY is applied.

This page is licensed: CC BY-SA / Gnu FDL

ColumnStore Update

The UPDATE statement changes data stored in rows.

  1. Syntax "Syntax"

Syntax

Single-table syntax:

UPDATE  table_reference 
  SET col1={expr1|DEFAULT} [,col2={expr2|DEFAULT}] ...
  [WHERE where_condition]
  [ORDER BY ...]
  [LIMIT row_count]

Multiple-table syntax:

UPDATE table_references
    SET col1={expr1|DEFAULT} [, col2={expr2|DEFAULT}] ...
    [WHERE where_condition]

Note: It can only 1 table (but multiple columns) be updated from the table list in table_references.

This page is licensed: CC BY-SA / Gnu FDL