Learn data manipulation statements for MariaDB ColumnStore. This section covers INSERT, UPDATE, DELETE, and LOAD DATA operations, optimized for efficient handling of large analytical datasets.
The DELETE statement is used to remove rows from tables.
DELETE
[FROM] tbl_name
[WHERE where_condition]
[ORDER BY ...]
[LIMIT row_count]
No disk space is recovered after a DELETE. TRUNCATE and DROP PARTITION can be used to recover space, or alternatively CREATE TABLE, loading only the remaining rows, then using DROP TABLE on the original table and RENAME TABLE).
LIMIT will limit the number of rows deleted, which will perform the DELETE more quickly. The DELETE ... LIMIT statement can then be performed multiple times to achieve the same effect as DELETE with no LIMIT.
The following statement deletes customer records with a customer key identification between 1001 and 1999:
DELETE FROM customer
WHERE custkey > 1000 AND custkey <2000
This page is licensed: CC BY-SA / Gnu FDL
Joins are performed in memory unless disk-based joins are enabled via AllowDiskBasedJoin in the columnstore.xml. When a join operation exceeds the memory allocated for query joins, the query is aborted with an error code IDB-2001. Disk-based joins enable such queries to use disk for intermediate join data in case when the memory needed for the join exceeds the memory limit. Although slower in performance as compared to a fully in-memory join and bound by the temporary space on disk, it does allow such queries to complete.
The following variables in the HashJoin element in the Columnstore.xml configuration file relate the o disk-based joins. Columnstore.xml resides in the etc directory for your installation (/usr/local/mariadb/columnstore/etc).
AllowDiskBasedJoin—Option to use disk-based joins. Valid values are Y (enabled) or N (disabled). The default is disabled.
TempFileCompression—Option to use compression for disk join files. Valid values are Y (use compressed files) or N (use non-compressed files).
TempFilePath—The directory path used for the disk joins. By default, this path is the tmp directory for your installation (i.e., /tmp/columnstore_tmp_files/joins/). Files in this directory will be created and cleaned on an as-needed basis. The entire directory is removed and recreated by ExeMgr at startup.)
In addition to the system-wide flags at the SQL global and session levels, the following system variables exist for managing per-user memory limits for joins.
columnstore_um_mem_limit - A value for memory limit in MB per user. When this limit is exceeded by a join, it will switch to a disk-based join. By default, the limit is not set (value of 0).
For modification at the global level: In my.cnf file (example: /etc/my.cnf.d/server.cnf):
[mysqld]
...
columnstore_um_mem_limit = value
where value is the value in Mb for in memory limitation per user.
For modification at the session level, before issuing your join query from the SQL client, set the session variable as follows.
set columnstore_um_mem_limit = value
This page is licensed: CC BY-SA / Gnu FDL
The INSERT statement allows you to add data to tables.
INSERT
INTO tbl_name [(col,...)]
{VALUES | VALUE} ({expr | DEFAULT},...),(...),...
The following statement inserts a row with all column values into the customer table:
INSERT INTO customer (custno, custname, custaddress, phoneno, cardnumber, comments)
VALUES (12, ‘JohnSmith’, ‘100 First Street, Dallas’, ‘(214) 555-1212’,100, ‘On Time’)
The following statement inserts two rows with all column values into the customer table:
INSERT INTO customer (custno, custname, custaddress, phoneno, cardnumber, comments) VALUES
(12, ‘JohnSmith’, ‘100 First Street, Dallas’, ‘(214) 555-1212’,100, ‘On Time’),
(13, ‘John Q Public’, ‘200 Second Street, Dallas’, ‘(972) 555-1234’, 200, ‘LatePayment’);
With INSERT ... SELECT, you can quickly insert many rows into a table from one or more other tables.
ColumnStore ignores the ON DUPLICATE KEY clause.
Non-transactional INSERT ... SELECT is directed to ColumnStores cpimport tool by default, which significantly increases performance.
Transactional INSERT ... SELECT statements (that is with AUTOCOMMIT off or after a START TRANSACTION) are processed through normal DML processes.
Example
CREATE TABLE autoinc_test(
id INT,
name VARCHAR(10))
ENGINE=columnstore COMMENT 'autoincrement=id';
INSERT INTO autoinc_test (name) VALUES ('John');
INSERT INTO autoinc_test (name) VALUES ('Doe');
This page is licensed: CC BY-SA / Gnu FDL
The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. The file name must be given as a literal string.
LOAD DATA [LOCAL] INFILE 'file_name'
INTO TABLE tbl_name
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
ColumnStore ignores the ON DUPLICATE KEY clause.
Non-transactional LOAD DATA INFILE is directed to ColumnStores cpimport tool by default, which significantly increases performance.
Transactional LOAD DATA INFILE statements (that is, with AUTOCOMMIT off or after a START TRANSACTION) are processed through normal DML processes.
Use cpimport for importing UTF-8 data that contains multi-byte values
The following example loads data into a simple 5- column table: A file named /simpletable.tblhas the following data in it.
1|100|1000|10000|Test Number 1|
2|200|2000|20000|Test Number 2|
3|300|3000|30000|Test Number 3|
The data can then be loaded into the simpletable table with the following syntax:
LOAD DATA INFILE 'simpletable.tbl' INTO TABLE simpletable FIELDS TERMINATED BY '|'
If the default mode is set to use cpimport internally, any output error files will be written to /var/log/mariadb/columnstore/cpimport/ directory. It can be consulted for troubleshooting any errors reported.
This page is licensed: CC BY-SA / Gnu FDL
The SELECT statement is used to query the database and display table data. You can add many clauses to filter the data.
SELECT
[ALL | DISTINCT ]
select_expr [, select_expr ...]
[ FROM table_references
[WHERE where_condition]
[GROUP BY {col_name | expr | POSITION} [ASC | DESC], ... [WITH ROLLUP]]
[HAVING where_condition]
[ORDER BY {col_name | expr | POSITION} [ASC | DESC], ...]
[LIMIT {[offset,] ROW_COUNT | ROW_COUNT OFFSET OFFSET}]
[PROCEDURE procedure_name(argument_list)]
[INTO OUTFILE 'file_name' [CHARACTER SET charset_name] [export_options]
| INTO DUMPFILE 'file_name' | INTO var_name [, var_name] ]
export_options:
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
<<toc>>
If the same column needs to be referenced more than once in the projection list, a unique name is required for each column using a column alias. The total length of the name of a column, inclusive of the length of functions, in the projection list must be 64 characters or less.
The WHERE clause filters data retrieval based on criteria. Note that column_alias cannot be used in the WHERE clause. The following statement returns rows in the region table where the region = ‘ASIA’:
SELECT * FROM region WHERE name = ’ASIA’;
GROUP BY groups data based on values in one or more specific columns. The following statement returns rows from the lineitem table where /orderkeyis less than 1 000 000 and groups them by the quantity.
SELECT quantity, COUNT(*) FROM lineitem WHERE orderkey < 1000000 GROUP BY quantity;
HAVING is used in combination with the GROUP BY clause. It can be used in a SELECT statement to filter the records that a GROUP BY returns.The following statement returns shipping dates, and the respective quantity where the quantity is 2500 or more.
SELECT shipdate, COUNT(*) FROM lineitem GROUP BYshipdate HAVING COUNT(*) >= 2500;
The ORDER BY clause presents results in a specific order. Note that the ORDER BY clause represents a statement that is post-processed by MariaDB. The following statement returns an ordered quantity column from the lineitem table.
SELECT quantity FROM lineitem WHERE orderkey < 1000000 ORDER BY quantity;
The following statement returns an ordered shipmode column from the lineitem table.
SELECT shipmode FROM lineitem WHERE orderkey < 1000000 ORDER BY 1;
NOTE: When ORDER BY is used in an inner query and LIMIT on an outer query, LIMIT is applied first and then ORDER BY is applied when returning results.
Used to combine the result from multiple SELECT statements into a single result set.The UNION or UNION DISTINCT clause returns query results from multiple queries into one display and discards duplicate results. The UNION ALL clause displays query results from multiple queries and does not discard the duplicates. The following statement returns the p_name rows in the part table and the partno table and discards the duplicate results:
SELECT p_name FROM part UNION SELECT p_name FROM partno;
The following statement returns all the p_name rows in the part table and the partno table:
SELECT p_name FROM part UNION ALL SELECT p_name FROM partno;
A limit is used to constrain the number of rows returned by the SELECT statement. LIMIT can have up to two arguments. LIMIT must contain a row count and may optionally contain an offset of the first row to return (the initial row is 0). The following statement returns 5 customer keys from the customer table:
SELECT custkey FROM customer LIMIT 5;
The following statement returns 5 customer keys from the customer table beginning at offset 1000:
SELECT custkey FROM customer LIMIT 1000,5;
NOTE: When LIMIT is applied on a nested query's results, and the inner query contains ORDER BY, LIMIT is applied first, and then ORDER BY is applied.
This page is licensed: CC BY-SA / Gnu FDL
The UPDATE statement changes data stored in rows.
Single-table syntax:
UPDATE table_reference
SET col1={expr1|DEFAULT} [,col2={expr2|DEFAULT}] ...
[WHERE where_condition]
[ORDER BY ...]
[LIMIT row_count]
Multiple-table syntax:
UPDATE table_references
SET col1={expr1|DEFAULT} [, col2={expr2|DEFAULT}] ...
[WHERE where_condition]
This page is licensed: CC BY-SA / Gnu FDL