Navigation
Learn About
Developing With
Ingres Talk
Information
Toolbox
Views
DataTypeComparisons
From Ingres Community Wiki
|
Comparing Ingres TEXT, CLOB and BLOB Data Types to the MySQL TEXT and BLOB Data Types
When migrating an application from MySQL to Ingres, confusion arises around what data types the MySQL BLOB and TEXT should be mapped to in Ingres. An understanding of the data to be stored in these data types is needed to ensure the best fit is chosen for the Ingres Data Types.
Background Information
MySQL has four TEXT data types (TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT) and four BLOB data types (TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB). The natural inclination is to map the MySQL data types to either the Ingres TEXT data type or the Ingres CLOB and BLOB data types but that is often not the best choice. The Ingres TEXT data type does have some behavior that may be unexpected as it is available for backward compatibility reasons and dates back to the University Ingres timeframe. The Ingres TEXT data type was largely obsoleted with the introduction of the VARCHAR data type. For example, the Ingres TEXT data type allows all ASCII characters except the null character (\0) within text strings. Null characters are converted to blanks. Blanks are only ignored when text strings are compared with C data. Unlike VARCHAR, if the strings are unequal in length, blanks are not added to the shorter string. These caveats are the reason VARCHAR is recommended for varying character data.
The Ingres data types vary from MySQL data types in their default handling when the length is not specified. For example, in MySQL, if you create a column of TEXT(199), MySQL will automatically create the column as TINYTEXT with the maximum storage size being 255; if you just specify TEXT with no maximum size that it will use the default type TEXT and allow storage of up to 64K, & if you specify TEXT(66000), it will actually create the column as MEDIUMTEXT & so on. The MySQL BLOB data type function in the same manner.
The Ingres CHAR, TEXT, and VARCHAR data types differ in that if you don’t specify the maximum allowable length it will default to 1 instead of the maximum size for the column. The Ingres CLOB/BLOB data types do provide for the maximum allowable size of 2GB.
i.e. an Ingres column of type TEXT will create a field size of TEXT(1); a column of type VARCHAR will create a VARCHAR(1), a BYTE will create a field size of BYTE(1), and a VARBYTE will create a field size of VARYBYTE(1). To create a column of maximum storage size for TEXT or VARCHAR you need to specify TEXT(32000) or VARCHAR(32000). To create a column of maximum storage size for BYTE or VARBYTE you need to specify BYTE(32000) or VARBYTE(32000).
The decision of which data type to use comes down to the size of data needing to be stored. If the maximum column length will be 32000 or less, the Ingres TEXT or VARCHAR data type is better to use for character data and BYTE or VARBYTE (BYTE VARYING) for binary data. Although the Ingres CLOB and BLOB data types have the advantage of storing up to 2 Gig per record, they will cost you a lot in performance and have other limitations (e.g. you cannot use them in an index or as part of an “ORDER BY” qualifier, and cannot be used in a scrollable cursor, that is, one that can scroll backwards as well as forwards). If the maximum column size will be larger than 32000 the Ingres CLOB or BLOB data types will need to be used.
Ingres types CHAR vs VARCHAR vs TEXT
Char is described as a fixed length type, while TEXT and VARCHAR are described as variable length. Don't be fooled into thinking that TEXT or VARCHAR implies variable length storage! A better term might be "known length". A CHAR column has to pad out its length with spaces, and as a result has no way of differentiating "foo" and "foo ". CHAR semantics will either strip all spaces, or keep all spaces (depending on operator), whereas TEXT and VARCHAR can work with the exact value.
A quick summary of the semantic differences:
- TEXT will always work with the exact string value, no adding or dropping spaces. "foo" is not equal to "foo ".
- VARCHAR operates with the exact string value for many operators, including specifically LIKE and concatenation. The traditional testing operators (=, !=, and so on) ignore trailing blanks. "foo" is equal to "foo ", but "foo" is NOT LIKE "foo ".
- CHAR has no notion of "exact string value" as such; you're always working with either the blank-stripped value or the fully-padded-out value. LIKE and concatenation use the fully-padded-out value, testing operators use the blank-stripped value.
TEXT is actually the most processor efficient, by a very tiny margin. The lack of trailing blank stripping when testing can lead to unexpected results, though. It is best to avoid TEXT unless you're sure you have a deep understanding of the type semantics and coercion rules. (and if you did, you probably wouldn't be reading this!)
The final character-string type, C, has very strange semantics that ignore leading, trailing, and repeated blanks. This causes lots of overhead, prevents the use of hashing algorithms such as hash join, and just generally confuses things. C type is not recommended at all.
The BYTE and BYTE VARYING types are analogous to CHAR and VARCHAR, except that the byte types fill (or ignore) trailing zeros rather than trailing spaces.
Recommendations
In summary, the recommended Ingres data types are:
- VARCHAR – variable (ie known) length character data up to 32,000 bytes, you need to specify the length
- CHAR – Fixed length character data up to 32,000 bytes, you need to specify the length
- VARBYTE (BYTE VARYING) – variable length binary data up to 32,000 bytes, you need to specify the length.
- BYTE – fixed length binary data up to 32,000 bytes, you need to specify the length
- LONG VARCHAR (or CLOB) – variable length character data more than 32,000 bytes and up to 2 Gig, you do not need to specify length
- LONG BYTE (or BLOB) – variable length binary data more than 32,000 bytes and up to 2 Gig, you do not need to specify length
Categories: DBMS | Migrations | Articles

