Some text from "MySQL 5 Certification Study Guide" about binary and non-binary string data types is as below:
Three data types store non-binary strings: CHAR, VARCHAR, and TEXT. Three data types store binary strings: BINARY, VARBINARY, and BLOB.
The most general difference is that non-binary strings have a character set and consist of characters in that character set, whereas binary strings consist simply of bytes that are distinguished only by their numeric values.
Non-binary strings have the following characteristics:
-
A non-binary string is a sequence of characters that belong to a specific character set. Characters may consist of a single byte, or multiple bytes if the character set allows it. For example, MySQL's default character set is latin1 (also known as ISO-8859-1). The latin1 character set uses one byte per character. In contrast, sjis (the Japanese SJIS character set), contains so many characters that they cannot all be represented in a single byte, so each character requires multiple bytes to store.
-
Multi-byte character sets may require a fixed or variable number of bytes per character. The ucs2 Unicode character set uses two bytes per character, whereas the utf8 Unicode character set uses from one to three bytes per character.
-
Non-binary string comparisons are based on the collation (sorting order) of the character set associated with the string. A given character set may have one or more collations, but a given string has only one of those collations.
-
Multi-byte character comparisons are performed in character units, not in byte units.
-
The collation determines whether uppercase and lowercase versions of a given character are equivalent. If the collation is not case sensitive, strings such as 'ABC', 'Abc', and 'abc' are all considered equal. If the collation is case sensitive, the strings are all considered different.
-
The collation also determines whether to treat instances of a given character with different accent marks as equivalent. The result is that comparisons of non-binary strings may not be accent sensitive. For example, an 'a' with no accent may be considered the same as the 'á' and 'à' characters. A given collation may be case or accent sensitive, or both.
-
A collation can be a binary collation. In this case, comparisons are based on numeric character values. One effect of this is that for character sets with uppercase and lowercase characters or accented characters, the collation is case sensitive and accent sensitive because each of these characters has a different numeric value. Comparison based on a binary collation differs from comparison of binary strings: A binary collation is performed per character, and characters might consist of multiple bytes. Comparisons for binary strings are always byte-based.
A given character set may have several collations to choose from. This enables you to select different sort orders for the same character set. For example, with the latin1 character set, you can choose from any of the following collations, many of which correspond to the sorting order rules of specific languages:
mysql> SHOW COLLATION LIKE 'latin1%';
+-------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | | 0 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | | 0 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | | 0 |
| latin1_general_cs | latin1 | 49 | | | 0 |
| latin1_spanish_ci | latin1 | 94 | | | 0 |
+-------------------+---------+----+---------+----------+---------+
Each collation name ends with _ci, _cs, or _bin, signifying that the collation is case insensitive, case sensitive, or binary.
Binary strings have the following characteristics:
-
A binary string is treated as a sequence of byte values. It might appear to contain characters, because you can write a binary string value as a quoted string, but it "really" contains binary data as far as MySQL is concerned.
-
Because binary strings contain bytes, not characters, comparisons of binary strings are performed on the basis of the byte values in the string. This has the implication that the concept of lettercase does not apply the same way as for non-binary strings. Binary strings may appear to be case sensitive, but that is because uppercase and lowercase versions of a given character have different numeric byte values. A binary string also may appear to be accent sensitive, but that is because versions of a character with different accents have different byte values.
The following example shows the difference in how non-binary and binary strings are treated with respect to lettercase. The non-binary string is converted to uppercase by UPPER() because it contains characters for which lettercase applies. The binary string remains unchanged because it consists of byte values that have no lettercase.
mysql> SELECT UPPER('AaBb'), UPPER(BINARY 'AaBb');
+---------------+----------------------+
| UPPER('AaBb') | UPPER(BINARY 'AaBb') |
+---------------+----------------------+
| AABB | AaBb |
+---------------+----------------------+
- A multi-byte character, if stored in a binary string, is treated simply as multiple individual bytes. Character boundaries of the original data no longer apply.
The different treatment of non-binary and binary strings in MySQL is important when it comes to choosing data types for table columns. You normally base the decision on whether you want to treat column values as containing characters or raw bytes. Thus, non-binary columns are more suitable for character strings such as textual descriptions, and binary columns are more suitable for raw data such as images or compressed data.
You can mix non-binary and binary string columns within a single table. Also, for non-binary string columns, different columns can use different character sets and collations. For example, assume that you want to create a table named auth_info, to store login name and password authorization information for users of an application, as well as a picture to associate with each user. You want login names to match in any lettercase, passwords to be case sensitive, and the picture column must store binary image data. The following table definition satisfies these requirements:
CREATE TABLE auth_info
(
login CHAR(32) CHARACTER SET latin1,
password CHAR(32) CHARACTER SET latin1 COLLATE latin1_general_cs,
picture MEDIUMBLOB
);