All pages
Powered by GitBook
1 of 9

Subqueries

Learn about subqueries in MariaDB Server. This section details how to use nested queries to perform complex data retrieval, filtering, and manipulation operations within a single SQL statement.

Subqueries and ALL

Subqueries using the ALL keyword will return true if the comparison returns true for each row returned by the subquery, or the subquery returns no rows.

Syntax

scalar_expression comparison_operator ALL <Table subquery>
  • scalar_expression may be any expression that evaluates to a single value.

  • comparison_operator may be any one of: =, >, <, >=, <=, <> or !=

ALL returns:

  • NULL if the comparison operator returns NULL for at least one row returned by the Table subquery or scalar_expression returns NULL.

  • FALSE if the comparison operator returns FALSE for at least one row returned by the Table subquery.

  • TRUE if the comparison operator returns TRUE for all rows returned by the Table subquery, or if Table subquery returns no rows.

NOT IN is an alias for <> ALL.

Examples

CREATE TABLE sq1 (num TINYINT);

CREATE TABLE sq2 (num2 TINYINT);

INSERT INTO sq1 VALUES(100);

INSERT INTO sq2 VALUES(40),(50),(60);

SELECT * FROM sq1 WHERE num > ALL (SELECT * FROM sq2);
+------+
| num  |
+------+
|  100 |
+------+

Since 100 > all of 40,50 and 60, the evaluation is true and the row is returned.

Adding a second row to sq1, where the evaluation for that record is false:

INSERT INTO sq1 VALUES(30);

SELECT * FROM sq1 WHERE num > ALL (SELECT * FROM sq2);
+------+
| num  |
+------+
|  100 |
+------+

Adding a new row to sq2, causing all evaluations to be false:

INSERT INTO sq2 VALUES(120);

SELECT * FROM sq1 WHERE num > ALL (SELECT * FROM sq2);
Empty set (0.00 sec)

When the subquery returns no results, the evaluation is still true:

SELECT * FROM sq1 WHERE num > ALL (SELECT * FROM sq2 WHERE num2 > 300);
+------+
| num  |
+------+
|  100 |
|   30 |
+------+

Evaluating against a NULL will cause the result to be unknown, or not true, and therefore return no rows:

INSERT INTO sq2 VALUES (NULL);

SELECT * FROM sq1 WHERE num > ALL (SELECT * FROM sq2);

This page is licensed: CC BY-SA / Gnu FDL

Subqueries and ANY

Subqueries using the ANY keyword will return true if the comparison returns true for at least one row returned by the subquery.

Syntax

The required syntax for an ANY or SOME quantified comparison is:

scalar_expression comparison_operator ANY <Table subquery>

Or:

scalar_expression comparison_operator SOME <Table subquery>
  • scalar_expression may be any expression that evaluates to a single value.

  • comparison_operator may be any one of =, >, <, >=, <=, <> or !=.

ANY returns:

  • TRUE if the comparison operator returns TRUE for at least one row returned by the Table subquery.

  • FALSE if the comparison operator returns FALSE for all rows returned by the Table subquery, or Table subquery has zero rows.

  • NULL if the comparison operator returns NULL for at least one row returned by the Table subquery and doesn't returns TRUE for any of them, or if scalar_expression returns NULL.

SOME is a synonym for ANY, and IN is a synonym for = ANY .

Examples

CREATE TABLE sq1 (num TINYINT);

CREATE TABLE sq2 (num2 TINYINT);

INSERT INTO sq1 VALUES(100);

INSERT INTO sq2 VALUES(40),(50),(120);

SELECT * FROM sq1 WHERE num > ANY (SELECT * FROM sq2);
+------+
| num  |
+------+
|  100 |
+------+

100 is greater than two of the three values, and so the expression evaluates as true.

SOME is a synonym for ANY:

SELECT * FROM sq1 WHERE num < SOME (SELECT * FROM sq2);
+------+
| num  |
+------+
|  100 |
+------+

IN is a synonym for = ANY, and here there are no matches, so no results are returned:

SELECT * FROM sq1 WHERE num IN (SELECT * FROM sq2);
Empty set (0.00 sec)
INSERT INTO sq2 VALUES(100);
Query OK, 1 row affected (0.05 sec)

SELECT * FROM sq1 WHERE num <> ANY (SELECT * FROM sq2);
+------+
| num  |
+------+
|  100 |
+------+

Reading this query, the results may be counter-intuitive. It may seem to read as SELECT * FROM sq1 WHERE num does not match any results in sq2. Since it does match 100, it could seem that the results are incorrect. However, the query returns a result if the match does not match any of sq2. Since 100 already does not match 40, the expression evaluates to true immediately, regardless of the 100's matching. It may be more easily readable to use SOME in a case such as this:

SELECT * FROM sq1 WHERE num <> SOME (SELECT * FROM sq2);
+------+
| num  |
+------+
|  100 |
+------+

This page is licensed: CC BY-SA / Gnu FDL

Subqueries and EXISTS

Syntax

SELECT ... WHERE EXISTS <Table subquery>

Description

Subqueries using the EXISTS keyword will return true if the subquery returns any rows. Conversely, subqueries using NOT EXISTS will return true only if the subquery returns no rows from the table.

EXISTS subqueries ignore the columns specified by the SELECT of the subquery, since they're not relevant. For example,

SELECT col1 FROM t1 WHERE EXISTS (SELECT * FROM t2);

and

SELECT col1 FROM t1 WHERE EXISTS (SELECT col2 FROM t2);

produce identical results.

Examples

CREATE TABLE sq1 (num TINYINT);

CREATE TABLE sq2 (num2 TINYINT);

INSERT INTO sq1 VALUES(100);

INSERT INTO sq2 VALUES(40),(50),(60);

SELECT * FROM sq1 WHERE EXISTS (SELECT * FROM sq2 WHERE num2>50);
+------+
| num  |
+------+
|  100 |
+------+

SELECT * FROM sq1 WHERE NOT EXISTS (SELECT * FROM sq2 GROUP BY num2 HAVING MIN(num2)=40);
Empty set (0.00 sec)

This page is licensed: CC BY-SA / Gnu FDL

Subqueries and JOINs

A subquery can quite often, but not in all cases, be rewritten as a JOIN.

Rewriting Subqueries as JOINS

A subquery using IN can be rewritten with the DISTINCT keyword, for example:

SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table2);

can be rewritten as:

SELECT DISTINCT table1.* FROM table1, table2 WHERE table1.col1=table2.col1;

NOT IN or NOT EXISTS queries can also be rewritten. For example, these two queries returns the same result:

SELECT * FROM table1 WHERE col1 NOT IN (SELECT col1 FROM table2);
SELECT * FROM table1 WHERE NOT EXISTS (SELECT col1 FROM table2 
         WHERE table1.col1=table2.col1);

and both can be rewritten as:

SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id 
       WHERE table2.id IS NULL;

Subqueries that can be rewritten as a LEFT JOIN are sometimes more efficient.

Using Subqueries instead of JOINS

There are some scenarios, though, which call for subqueries rather than joins:

  • When you want duplicates, but not false duplicates. Suppose Table_1 has three rows — {1,1,2} — and Table_2 has two rows — {1,2,2}. If you need to list the rows in Table_1 which are also in Table_2, only this subquery-based SELECT statement will give the right answer (1,1,2):

SELECT Table_1.column_1 
FROM   Table_1 
WHERE  Table_1.column_1 IN 
  (SELECT Table_2.column_1 
    FROM   Table_2);
  • This SQL statement won't work:

SELECT Table_1.column_1 
FROM   Table_1,Table_2 
WHERE  Table_1.column_1 = Table_2.column_1;
  • because the result will be {1,1,2,2} — and the duplication of 2 is an error. This SQL statement won't work either:

SELECT DISTINCT Table_1.column_1 
FROM   Table_1,Table_2 
WHERE  Table_1.column_1 = Table_2.column_1;
  • because the result will be {1,2} — and the removal of the duplicated 1 is an error too.

  • When the outermost statement is not a query. The SQL statement:

UPDATE Table_1 SET column_1 = (SELECT column_1 FROM Table_2);
  • can't be expressed using a join unless some rare SQL3 features are used.

  • When the join is over an expression. The SQL statement:

SELECT * FROM Table_1 
WHERE column_1 + 5 =
  (SELECT MAX(column_1) FROM Table_2);
  • is hard to express with a join. In fact, the only way we can think of is this SQL statement:

SELECT Table_1.*
FROM   Table_1, 
      (SELECT MAX(column_1) AS max_column_1 FROM Table_2) AS Table_2
WHERE  Table_1.column_1 + 5 = Table_2.max_column_1;
  • which still involves a parenthesized query, so nothing is gained from the transformation.

  • When you want to see the exception. For example, suppose the question is: Which books are longer than Das Kapital? These two queries are effectively almost the same:

SELECT DISTINCT Bookcolumn_1.*                     
FROM   Books AS Bookcolumn_1 JOIN Books AS Bookcolumn_2 USING(page_count) 
WHERE  title = 'Das Kapital';

SELECT DISTINCT Bookcolumn_1.* 
FROM   Books AS Bookcolumn_1 
WHERE  Bookcolumn_1.page_count > 
  (SELECT DISTINCT page_count 
  FROM   Books AS Bookcolumn_2 
  WHERE  title = 'Das Kapital');
  • The difference is between these two SQL statements is, if there are two editions of Das Kapital (with different page counts), then the self-join example will return the books which are longer than the shortest edition of Das Kapital. That might be the wrong answer, since the original question didn't ask for "... longer than ANY book named Das Kapital" (it seems to contain a false assumption that there's only one edition).

This page is licensed: CC BY-SA / Gnu FDL

Subqueries in a FROM Clause (Derived Tables)

Although subqueries are more commonly placed in a WHERE clause, they can also form part of the FROM clause. Such subqueries are commonly called derived tables.

If a subquery is used in this way, you must also use an AS clause to name the result of the subquery.

ORACLE mode

MariaDB starting with 10.6.0

Anonymous subqueries in a FROM clause (no AS clause) are permitted in ORACLE mode.

Anonymous subqueries in a FROM clause (no AS clause) are not permitted in ORACLE mode.

Correlation Column List

MariaDB starting with 11.7.0

It is possible to assign column names in the derived table name syntax element.

It is not possible to assign column names in the derived table name syntax element.

Examples

CREATE TABLE student (name CHAR(10), test CHAR(10), score TINYINT); 

INSERT INTO student VALUES 
  ('Chun', 'SQL', 75), ('Chun', 'Tuning', 73), 
  ('Esben', 'SQL', 43), ('Esben', 'Tuning', 31), 
  ('Kaolin', 'SQL', 56), ('Kaolin', 'Tuning', 88), 
  ('Tatiana', 'SQL', 87), ('Tatiana', 'Tuning', 83);

Assume that, given the data above, you want to return the average total for all students. In other words, the average of Chun's 148 (75+73), Esben's 74 (43+31), etc.

You cannot do the following:

SELECT AVG(SUM(score)) FROM student GROUP BY name;
ERROR 1111 (HY000): Invalid use of group function

A subquery in the FROM clause is however permitted:

SELECT AVG(sq_sum) FROM (SELECT SUM(score) AS sq_sum FROM student GROUP BY name) AS t;
+-------------+
| AVG(sq_sum) |
+-------------+
|    134.0000 |
+-------------+

The following is permitted:

SELECT * FROM (SELECT 1 FROM DUAL), (SELECT 2 FROM DUAL);

In this example, the second column of the derived table dt is used both within (WHERE c2 > 0), and outside, (WHERE a2 > 10), the specification. Both conditions apply to t1.c2.

CREATE OR REPLACE TABLE t1(c1 INT, c2 INT, c3 INT);

SELECT a1, a2 FROM (SELECT c1, c2, c3 FROM t1 WHERE c2 > 0) AS dt (a1, a2, a3);

This page is licensed: CC BY-SA / Gnu FDL

Row Subqueries

A row subquery is a subquery returning a single row, as opposed to a scalar subquery, which returns a single column from a row, or a literal.

Examples

CREATE TABLE staff (name VARCHAR(10), age TINYINT);

CREATE TABLE customer (name VARCHAR(10), age TINYINT);

INSERT INTO staff VALUES ('Bilhah',37), ('Valerius',61), ('Maia',25);

INSERT INTO customer VALUES ('Thanasis',48), ('Valerius',61), ('Brion',51);

SELECT * FROM staff WHERE (name,age) = (SELECT name,age FROM customer WHERE name='Valerius');
+----------+------+
| name     | age  |
+----------+------+
| Valerius |   61 |
+----------+------+

Finding all rows in one table and also in another:

SELECT name,age FROM staff WHERE (name,age) IN (SELECT name,age FROM customer);
+----------+------+
| name     | age  |
+----------+------+
| Valerius |   61 |
+----------+------+

This page is licensed: CC BY-SA / Gnu FDL

Scalar Subqueries

A scalar subquery is a subquery that returns a single value. This is the simplest form of a subquery, and can be used in most places a literal or single column value is valid.

The data type, length and character set and collation are all taken from the result returned by the subquery. The result of a subquery can always be NULL, that is, no result returned. Even if the original value is defined as NOT NULL, this is disregarded.

A subquery cannot be used where only a literal is expected, for example LOAD DATA INFILE expects a literal string containing the file name, and LIMIT requires a literal integer.

Examples

CREATE TABLE sq1 (num TINYINT);

CREATE TABLE sq2 (num TINYINT);

INSERT INTO sq1 VALUES (1);

INSERT INTO sq2 VALUES (10* (SELECT num FROM sq1));

SELECT * FROM sq2;
+------+
| num  |
+------+
|   10 |
+------+

Inserting a second row means the subquery is no longer a scalar, and this particular query is not valid:

INSERT INTO sq1 VALUES (2);

INSERT INTO sq2 VALUES (10* (SELECT num FROM sq1));
ERROR 1242 (21000): Subquery returns more than 1 row

No rows in the subquery, so the scalar is NULL:

INSERT INTO sq2 VALUES (10* (SELECT num FROM sq3 WHERE num='3'));

SELECT * FROM sq2;
+------+
| num  |
+------+
|   10 |
| NULL |
+------+

A more traditional scalar subquery, as part of a WHERE clause:

SELECT * FROM sq1 WHERE num = (SELECT MAX(num)/10 FROM sq2); 
+------+
| num  |
+------+
|    1 |
+------+

This page is licensed: CC BY-SA / Gnu FDL

Subquery Limitations

There are a number of limitations regarding subqueries, which are discussed below.

The following tables and data will be used in the examples that follow:

CREATE TABLE staff(name VARCHAR(10),age TINYINT);

CREATE TABLE customer(name VARCHAR(10),age TINYINT);
INSERT INTO staff VALUES 
('Bilhah',37), ('Valerius',61), ('Maia',25);

INSERT INTO customer VALUES 
('Thanasis',48), ('Valerius',61), ('Brion',51);

ORDER BY and LIMIT

To use ORDER BY or limit LIMIT in subqueries both must be used.. For example:

SELECT * FROM staff WHERE name IN (SELECT name FROM customer ORDER BY name);
+----------+------+
| name     | age  |
+----------+------+
| Valerius |   61 |
+----------+------+

is valid, but

SELECT * FROM staff WHERE name IN (SELECT NAME FROM customer ORDER BY name LIMIT 1);
ERROR 1235 (42000): This version of MariaDB doesn't 
  yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

is not.

Modifying and Selecting from the Same Table

It's not possible to both modify and select from the same table in a subquery. For example:

DELETE FROM staff WHERE name = (SELECT name FROM staff WHERE age=61);
ERROR 1093 (HY000): Table 'staff' is specified twice, both 
  as a target for 'DELETE' and as a separate source for data

Row Comparison Operations

There is only partial support for row comparison operations. The expression in

expr op {ALL|ANY|SOME} subquery,

must be scalar and the subquery can only return a single column.

However, because of the way IN is implemented (it is rewritten as a sequence of = comparisons and AND), the expression in

expression [NOT] IN subquery

is permitted to be an n-tuple and the subquery can return rows of n-tuples.

For example:

SELECT * FROM staff WHERE (name,age) NOT IN (
  SELECT name,age FROM customer WHERE age >=51]
);
+--------+------+
| name   | age  |
+--------+------+
| Bilhah |   37 |
| Maia   |   25 |
+--------+------+

is permitted, but

SELECT * FROM staff WHERE (name,age) = ALL (
  SELECT name,age FROM customer WHERE age >=51
);
ERROR 1241 (21000): Operand should contain 1 column(s)

is not.

Correlated Subqueries

Subqueries in the FROM clause cannot be correlated subqueries. They cannot be evaluated for each row of the outer query since they are evaluated to produce a result set during when the query is executed.

Stored Functions

A subquery can refer to a stored function which modifies data. This is an extension to the SQL standard, but can result in indeterminate outcomes. For example, take:

SELECT ... WHERE x IN (SELECT f() ...);

where f() inserts rows. The function f() could be executed a different number of times depending on how the optimizer chooses to handle the query.

This sort of construct is therefore not safe to use in replication that is not row-based, as there could be different results on the master and the slave.

This page is licensed: CC BY-SA / Gnu FDL