Data Types

Many data types make no or little sense when applied to plain files. This whyCONNECT supports only a restricted set of data types. However, ODBC, JDBC or MYSQL source tables may contain data types not supported by CONNECT. In this case, CONNECT makes an automatic conversion to a similar supported type when it is possible.

The data types currently supported by CONNECT are:

Type name
Description
Used for

Type name

Description

Used for

TYPE_STRING

Zero ended string

TYPE_INT

4 bytes integer

TYPE_SHORT

2 bytes integer

TYPE_TINY

1 byte integer

TYPE_BIGINT

8 bytes integer

bigint, longlong

TYPE_DOUBLE

8 bytes floating point

double, float, real

TYPE_DECIM

Numeric value

decimal, numeric, number

TYPE_DATE

4 bytes integer

TYPE_STRING

This type corresponds to what is generally known as CHAR or VARCHAR by database users, or as strings by programmers. Columns containing characters have a maximum length but the character string is of fixed or variable length depending on the file format.

The DATA_CHARSET option must be used to specify the character set used in the data source or file. Note that, unlike usually with MariaDB, when a multi-byte character set is used, the column size represents the number of bytes the column value can contain, not the number of characters.

TYPE_INT

The INTEGER type contains signed integer numeric 4-byte values (the int/ of &#xNAN;the C language) ranging from –2,147,483,648 to 2,147,483,647 for signed &#xNAN;type and 0 to 4,294,967,295 for unsigned type.

TYPE_SHORT

The SHORT data type contains signed integer numeric 2-byte values (the short &#xNAN;integer of the C language) ranging from –32,768 to 32,767 for signed type and 0 to 65,535 for unsigned type.

TYPE_TINY

The TINY data type contains integer numeric 1-byte values (the char of the C language) ranging from –128 to 127 for signed type and 0 to255 for unsigned type. For some table types, TYPE_TINY is used to represent Boolean values (0 is false, anything else is true).

TYPE_BIGINT

The BIGINT data type contains signed integer 8-byte values (the long long of the C language) ranging from -9,223,372,036,854,775,808 to9,223,372,036,854,775,807 for signed type and from 0 to18,446,744,073,709,551,615 for unsigned type.

Inside tables, the coding of all integer values depends on the table type. In tables represented by text files, the number is written in characters, while in tables represented by binary files (BIN or VEC) the number is directly stored in the binary representation corresponding to the platform.

The length (or precision) specification corresponds to the length of the table field in which the value is stored for text files only. It is used to set the output field length for all table types.

TYPE_DOUBLE

The DOUBLE data type corresponds to the C language double type, a floating-point double precision value coded with 8 bytes. Like for integers, the internal coding in tables depends on the table type, characters for text files, and platform binary representation for binary files.

The length specification corresponds to the length of the table field in which the value is stored for text files only. The scale (wasprecision) is the number of decimal digits written into text files. For binary table types (BIN and VEC) this does not apply. The length andscale specifications are used to set the output field length and number of decimals for all types of tables.

TYPE_DECIM

The DECIMAL data type corresponds to what MariaDB or ODBC data sources call NUMBER, NUMERIC, or DECIMAL: a numeric value with a maximum number of digits (the precision) some of them eventually being decimal digits (the scale). The internal coding in CONNECT is a character representation of the number. For instance:

colname decimal(14,6)

This defines a column colname as a number having a precision of 14 and a scale of 6. Supposing it is populated by:

insert into xxx values (-2658.74);

The internal representation of it will be the character string-2658.740000. The way it is stored in a file table depends on the table type. The length field specification corresponds to the length of the table field in which the value is stored and is calculated by CONNECT from theprecision and the scale values. This length is precision plus 1 ifscale is not 0 (for the decimal point) plus 1 if this column is not unsigned (for the eventual minus sign). In fix formatted tables the number is right justified in the field of width length, for variable formatted tables, such as CSV, the field is the representing character string.

Because this type is mainly used by CONNECT to handle numeric or decimal fields of ODBC, JDBC and MySQL table types, CONNECT does not provide decimal calculations or comparison by itself. This is why decimal columns of CONNECT tables cannot be indexed.

DATE Data type

Internally, date/time values are stored by CONNECT as a signed 4-byte integer. The value 0 corresponds to 01 January 1970 12:00:00 am coordinated universal time (UTC). All other date/time values are represented by the number of seconds elapsed since or before midnight (00:00:00), 1 January 1970, to that date/time value. Date/time values before midnight 1 January 1970 are represented by a negative number of seconds.

CONNECT handles dates from 13 December 1901, 20:45:52 to18 January 2038, 19:14:07.

Although date and time information can be represented in both CHAR and INTEGER data types, the DATE data type has special associated properties. For each DATE value, CONNECT can store all or only some of the following information: century, year, month, day, hour, minute, and second.

Date Format in Text Tables

Internally, date/time values are handled as a signed 4-byte integer. But in text tables (type DOS, FIX, CSV, FMT, and DBF) dates are most of the time stored as a formatted character string (although they also can be stored as a numeric string representing their internal value). Because there are infinite ways to format a date, the format to use for decoding dates, as well as the field length in the file, must be associated to date columns (except when they are stored as the internal numeric value).

Note that this associated format is used only to describe the way the temporal value is stored internally. This format is used both for output to decode the date in a SELECT statement as well as for input to encode the date in INSERT or UPDATE statements. However, what is kept in this value depends on the data type used in the column definition (all the MariaDB temporal values can be specified). When creating a table, the format is associated to a date column using the DATE_FORMAT option in the column definition, for instance:

create table birthday (
  Name varchar(17),
  Bday date field_length=10 date_format='MM/DD/YYYY',
  Btime time field_length=8 date_format='hh:mm tt')
engine=CONNECT table_type=CSV;

insert into birthday values ('Charlie','2012-11-12','15:30:00');

select * from birthday;

The SELECT query returns:

Name
Bday
Btime

Name

Bday

Btime

Charlie

2012-11-12

15:30:00

The values of the INSERT statement must be specified using the standard MariaDB syntax and these values are displayed as MariaDB temporal values. Sure enough, the column formats apply only to the way these values are represented inside the CSV files. Here, the inserted record will be:

Charlie,11/12/2012,03:30 PM

Note: The field_length option exists because the MariaDB syntax does not allow specifying the field length between parentheses for temporal column types. If not specified, the field length is calculated from the date format (sometimes as a max value) or made equal to the default length value if there is no date format. In the above example it could have been removed as the calculated values are the ones specified. However, if the table type would have been DOS or FIX, these values could be adjusted to fit the actual field length within the file.

A CONNECT format string consists of a series of elements that represent a particular piece of information and define its format. The elements will be recognized in the order they appear in the format string. Date and time format elements will be replaced by the actual date and time as they appear in the source string. They are defined by the following groups of characters:

Element
Description

Element

Description

YY

The last two digits of the year (that is, 1996 would be coded as "96").

YYYY

The full year (that is, 1996 could be entered as "96" but displayed as “1996”).

MM

The one or two-digit month number.

MMM

The three-character month abbreviation.

MMMM

The full month name.

DD

The one or two-digit month day.

DDD

The three-character weekday abbreviation.

DDDD

The full weekday name.

hh

The one or two-digit hour in 12-hour or 24-hour format.

mm

The one or two-digit minute.

ss

The one or two-digit second.

t

The one-letter AM/PM abbreviation (that is, AM is entered as "A").

tt

The two-letter AM/PM abbreviation (that is, AM is entered as "AM").

Usage Notes

  • To match the source string, you can add body text to the format string, enclosing it in single quotes or double quotes if it would be ambiguous. Punctuation marks do not need to be quoted.

  • The hour information is regarded as 12-hour format if a “t” or “tt” element follows the “hh” element in the format or as 24-hour format otherwise.

  • The "MM", "DD", "hh", "mm", "ss" elements can be specified with one or two letters (e.g. "MM" or "M") making no difference on input, but placing a leading zero to one-digit values on output[1] for two-letter elements.

  • If the format contains elements DDD or DDDD, the day of week name is skipped on input and ignored to calculate the internal date value. On output, the correct day of week name is generated and displayed.

  • Temporal values are always stored as numeric in BIN and VEC tables.

Handling dates that are out of the range of supported CONNECT dates

If you want to make a table containing, for instance, historical dates not being convertible into CONNECT dates, make your column CHAR or VARCHAR and store the dates in the MariaDB format. All date functions applied to these strings will convert them to MariaDB dates and will work as if they were real dates. Of course they must be inserted and will be displayed using the MariaDB format.

NULL handling

CONNECT handles null values for data sources able to produce nulls. Currently this concerns mainly the ODBC, JDBC, MONGO, MYSQL, XML, JSON and INI table types. For INI, JSON, MONGO or XML types, null values are returned when the key is missing in the section (INI) or when the corresponding node does not exist in a row (XML, JSON, MONGO).

For other file tables, the issue is to define what a null value is. In a numeric column, 0 can sometimes be a valid value but, in some other cases, it can make no sense. The same for character columns; is a blank field a valid value or not?

A special case is DATE columns with a DATE_FORMAT specified. Any value not matching the format can be regarded as NULL.

CONNECT leaves the decision to you. When declaring a column in the CREATE TABLE statement, if it is declared NOT NULL, blank or zero values will be considered as valid values. Otherwise they will be considered as NULL values. In all cases, nulls are replaced on insert or update by pseudo null values, a zero-length character string for text types or a zero value for numeric types. Once converted to pseudo null values, they will be recognized as NULL only for columns declared as nullable.

For instance:

create table t1 (a int, b char(10)) engine=connect;
insert into t1 values (0,'zero'),(1,'one'),(2,'two'),(null,'???');
select * from t1 where a is null;

The select query replies:

a
b

a

b

NULL

zero

NULL

???

Sure enough, the value 0 entered on the first row is regarded as NULL for a nullable column. However, if we execute the query:

select * from t1 where a = 0;

This will return no line because a NULL is not equal to 0 in an SQL where clause.

Now let us see what happens with not null columns:

create table t1 (a int not null, b char(10) not null) engine=connect;
insert into t1 values (0,'zero'),(1,'one'),(2,'two'),(null,'???');

The insert statement will produce a warning saying:

Level
Code
Message

Level

Code

Message

Warning

1048

Column 'a' cannot be null

It is replaced by a pseudo null 0 on the fourth row. Let us see the result:

select * from t1 where a is null;
select * from t1 where a = 0;

The first query returns no rows, 0 are valid values and not NULL. The second query replies:

a
b

a

b

0

zero

0

???

It shows that the NULL inserted value was replaced by a valid 0 value.

Unsigned numeric types

They are supported by CONNECT since version 1.01.0010 for fixed numeric types (TINY, SHORT, INTEGER, and BITINT).

Data type conversion

CONNECT is able to convert data from one type to another in most cases. These conversions are done without warning even when this leads to truncation or loss of precision. This is true, in particular, for tables of type ODBC, JDBC, MYSQL and PROXY (via MySQL) because the source table may contain some data types not supported by CONNECT. They are converted when possible to CONNECT types.

When converted, MariaDB types are converted as:

MariaDB Types
CONNECT Type
Remark

MariaDB Types

CONNECT Type

Remark

TYPE_INT

4 byte integer

TYPE_SHORT

2 byte integer

TYPE_TINY

1 byte integer

TYPE_STRING

Same length

double, float, real

TYPE_DOUBLE

8 byte floating point

decimal, numeric

TYPE_DECIM

Length depends on precision and scale

all date related types

TYPE_DATE

Date format can be set accordingly

bigint, longlong

TYPE_BIGINT

8 byte integer

TYPE_STRING

Numeric value not accessible

All text types

TYPE_STRING TYPE_ERROR

Depending on the value of the connect_type_conv system variable value.

Other types

TYPE_ERROR

Not supported, no conversion provided.

For ENUM, the length of the column is the length of the longest value of the enumeration. For SET the length is enough to contain all the set values concatenated with comma separator.

In the case of TEXT columns, the handling depends on the values given to the connect_type_conv andconnect_conv_size system variables.

Note: BLOB is currently not converted by default until a TYPE_BIN type is added to CONNECT. However, the FORCE option (from Connect 1.06.006) can be specified for blob columns containing text and the SKIP option also applies to ODBC BLOB columns.

ODBC SQL types are converted as:

SQL Types
Connect Type
Remark

SQL Types

Connect Type

Remark

SQL_CHAR, SQL_VARCHAR

TYPE_STRING

SQL_LONGVARCHAR

TYPE_STRING

len = min(abs(len), connect_conv_size) If the column is generated by discovery (columns not specified) its length is connect_conv_size.

SQL_NUMERIC, SQL_DECIMAL

TYPE_DECIM

SQL_INTEGER

TYPE_INT

SQL_SMALLINT

TYPE_SHORT

SQL_TINYINT, SQL_BIT

TYPE_TINY

SQL_FLOAT, SQL_REAL, SQL_DOUBLE

TYPE_DOUBLE

SQL_DATETIME

TYPE_DATE

len = 10

SQL_INTERVAL

TYPE_STRING

len = 8 + ((scale) ? (scale+1) : 0)

SQL_TIMESTAMP

TYPE_DATE

len = 19 + ((scale) ? (scale +1) : 0)

SQL_BIGINT

TYPE_BIGINT

SQL_GUID

TYPE_STRING

llen=36

SQL_BINARY, SQL_VARBINARY, SQL_LONG-VARBINARY

TYPE_STRING

len = min(abs(len), connect_conv_size) Only if the value of connect_type_conv is force. The column should use the binary charset.

Other types

TYPE_ERROR

Not supported.

JDBC SQL types are converted as:

JDBC Types
Connect Type
Remark

JDBC Types

Connect Type

Remark

(N)CHAR, (N)VARCHAR

TYPE_STRING

LONG(N)VARCHAR

TYPE_STRING

len = min(abs(len), connect_conv_size) If the column is generated by discovery (columns not specified), its length is connect_conv_size

NUMERIC, DECIMAL, VARBINARY

TYPE_DECIM

INTEGER

TYPE_INT

SMALLINT

TYPE_SHORT

TINYINT, BIT

TYPE_TINY

FLOAT, REAL, DOUBLE

TYPE_DOUBLE

DATE

TYPE_DATE

len = 10

TIME

TYPE_DATE

len = 8 + ((scale) ? (scale+1) : 0)

TIMESTAMP

TYPE_DATE

len = 19 + ((scale) ? (scale +1) : 0)

BIGINT

TYPE_BIGINT

UUID (specific to PostgreSQL)

TYPE_STRINGTYPE_ERROR

Other types

TYPE_ERROR

Not supported.

Note: The connect_type_conv SKIP option also applies to ODBC and JDBC tables.

  1. Here input and output are used to specify respectively decoding the date to get its numeric value from the data file and encoding a date to write it in the table file. Input is performed within SELECT queries; output is performed in UPDATE or INSERT queries.

This page is licensed: GPLv2

Last updated

Was this helpful?