Unicode
Unicode is a standard for encoding text across multiple writing systems. MariaDB supports a number of character sets for storing Unicode data:
Character Set
Description
ucs2
UCS-2, each character is represented by a 2-byte code with the most significant byte first. Fixed-length 16-bit encoding.
utf8
Until MariaDB 10.5, this was a UTF-8 encoding using one to three bytes per character. Basic Latin letters, numbers and punctuation use one byte. European and Middle East letters mostly fit into 2 bytes. Korean, Chinese, and Japanese ideographs use 3-bytes. No supplementary characters are stored. From MariaDB 10.6, utf8 is an alias for utf8mb3, but this can changed to ut8mb4 by changing the default value of the old_mode system variable. From MariaDB 11.8, utf8 is an alias for utf8mb4.
utf8mb3
UTF-8 encoding using one to three bytes per character. Basic Latin letters, numbers and punctuation use one byte. European and Middle East letters mostly fit into 2 bytes. Korean, Chinese, and Japanese ideographs use 3-bytes. No supplementary characters are stored. Until MariaDB 10.5, this was an alias for utf8. From MariaDB 10.6, utf8 is by default an alias for utf8mb3, but this can changed to ut8mb4 by changing the default value of the old_mode system variable.
utf8mb4
UTF-8 encoding the same as utf8mb3 but which stores supplementary characters in four bytes.
utf16
UTF-16, same as ucs2, but stores supplementary characters in 32 bits. 16 or 32-bits.
utf32
UTF-32, fixed-length 32-bit encoding.
Support for the UCA-14.0.0 collations was added in MariaDB 10.10 (MDEV-27009) .
Support for the MySQL 8.0 UCA-9-0-0 (utf8mb4_0900_...) collations will be added to MariaDB 11.4.5
This page is licensed: CC BY-SA / Gnu FDL
Last updated
Was this helpful?