MySQL FAQs
FAQs Categories
Client Server Commands
Database Structure
Table Types or Storage Engines
Indexes
SQL Statements
Table Joins
Funtions and Operators
Tricky Select Queries
Speed Up Queries
Data Back Up
General Questions
Errors
1PLs Company - #1Payday.Loans Agency - Loans online and near me $100-$2500 (Same Day)
Powered by MySQL
 
Home / SQL Statements / Question No: 222

How to manage case sensitivity in string comparisons?

Answer No: 222

String comparisons are somewhat more complex than numeric or temporal comparisons. Numbers sort in numeric order and dates and times sort in temporal order, but string comparisons depend not only on the specific content of the strings, but on whether they are non-binary or binary. A letter in uppercase may compare as the same or different than the same letter in lowercase, and a letter with one type of accent may be considered the same or different than that letter with another type of accent. Here is a summary:

  • A non-binary string contains characters from a particular character set, and is associated with one of the collations (sorting orders) available for the character set. Characters may consist of single or multiple bytes. A collation can be case insensitive (lettercase is not significant), case sensitive (lettercase is significant), or binary (comparisons are based on numeric character values).
  • A binary string is treated as raw bytes. It has no character set and no collation. Comparisons between binary strings are based on numeric byte values.

The rules that govern string comparison apply in several ways. They determine the result of comparisons performed explicitly with operators such as = and <, and comparisons performed implicitly by ORDER BY, GROUP BY, and DISTINCT operations.

The default character set and collation for literal strings depend on the values of the character_set_connection and collation_connection system variables. The default character set is latin1. The default collation is latin1_swedish_ci, which is case insensitive as indicated by the "_ci" at the end of the collation name. Assuming these connection settings, literal strings are not case sensitive by default. You can see this by comparing strings that differ only in lettercase:

mysql> SELECT 'Hello' = 'hello';
+-------------------+
| 'Hello' = 'hello' |
+-------------------+
|                 1 |
+-------------------+

A given collation might cause certain accented characters to compare the same as other characters. For example, 'ü' and 'ue' are different in the default latin1_swedish_ci collation, but with the latin1_german2_ci collation ("German phone-book" collation), they have the same sort value and thus compare as equal:

mysql> SELECT 'Müller' = 'Mueller';
+----------------------+
| 'Müller' = 'Mueller' |
+----------------------+
|                    0 |
+----------------------+

mysql> SET collation_connection = latin1_german2_ci;

mysql> SELECT 'Müller' = 'Mueller';
+----------------------+
| 'Müller' = 'Mueller' |
+----------------------+
|                    1 |
+----------------------+

For binary strings, lettercase is significant. However, this is not because binary strings are case sensitive per se, because binary strings have no character set. Rather, it is because uppercase and lowercase versions of a character have different numeric values.  A non-binary string can be treated as a binary string by preceding it with the BINARY keyword. If either string in a comparison is binary, both strings are treated as binary:

mysql> SELECT BINARY 'Hello' = 'hello';
+--------------------------+
| BINARY 'Hello' = 'hello' |
+--------------------------+
|                        0 |
+--------------------------+

mysql> SELECT 'Hello' = BINARY 'hello';
+--------------------------+
| 'Hello' = BINARY 'hello' |
+--------------------------+
|                        0 |
+--------------------------+

The sorting principles just described were demonstrated using literal strings, but the same principles apply to string-valued table columns. Suppose that a table t contains a column c and has the following rows:

mysql> SELECT c FROM t;
+-----------+
| c         |
+-----------+
| Hello     |
| goodbye   |
| Bonjour   |
| au revoir |
+-----------+

If c is a CHAR column that has the latin1_swedish_ci collation, it is a non-binary column with a case-insensitive collation. Uppercase and lowercase letters are treated as identical and a sort operation that uses ORDER BY produces results like this:

mysql> SELECT c FROM t ORDER BY c;
+-----------+
| c         |
+-----------+
| au revoir |
| Bonjour   |
| goodbye   |
| Hello     |
+-----------+

If c is declared as a BINARY column instead, it has no character set or collation. ORDER BY sorts using raw byte codes and produces a different result. Assuming that the values are stored on a machine that uses ASCII codes, the numeric values for uppercase letters precede those for lowercase letters and the result looks like this:

mysql> SELECT c FROM t ORDER BY c;
+-----------+
| c         |
+-----------+
| Bonjour   |
| Hello     |
| au revoir |
| goodbye   |
+-----------+

String comparison rules also apply to GROUP BY and DISTINCT operations. Suppose that t has a column c with the following contents:

mysql> SELECT c FROM t;
+---------+
| c       |
+---------+
| Hello   |
| hello   |
| Goodbye |
| goodbye |
+---------+

If c is a non-binary, case-insensitive column, GROUP BY and DISTINCT do not make lettercase distinctions:

mysql> SELECT c, COUNT(*) FROM t GROUP BY c;
+---------+----------+
| c       | COUNT(*) |
+---------+----------+
| Goodbye |        2 |
| Hello   |        2 |
+---------+----------+

mysql> SELECT DISTINCT c FROM t;
+---------+
| c       |
+---------+
| Hello   |
| Goodbye |
+---------+

On the other hand, if c is a BINARY column, those operations use byte values for sorting:

mysql> SELECT c, COUNT(*) FROM t GROUP BY c;
+---------+----------+
| c       | COUNT(*) |
+---------+----------+
| Goodbye |        1 |
| Hello   |        1 |
| goodbye |        1 |
| hello   |        1 |
+---------+----------+

mysql> SELECT DISTINCT c FROM t;
+---------+
| c       |
+---------+
| Hello   |
| hello   |
| Goodbye |
| goodbye |
+---------+

The preceding discussion shows that to understand sorting and comparison behavior for strings, it's important to know whether they are non-binary or binary. This is important when using string functions as well. String functions may treat their arguments as non-binary or binary strings, or return binary or non-binary results. It depends on the function. Here are some examples:

  • LENGTH() returns the length of a string in bytes, whereas CHAR_LENGTH() returns the length in characters. For strings that contain only single-byte characters, the two functions return identical results. For strings that contain multi-byte characters, you should choose the function that is appropriate for the type of result you want. For example, the sjis character set includes characters that require two bytes to represent. The value of LENGTH() for any string containing such characters will be greater than the value of CHAR_LENGTH().
  • The UPPER() and LOWER() functions perform case conversion only if the argument is a non-binary string. Suppose that 'AbCd' is non-binary. In that case, the two functions return a value in the requested lettercase:
    mysql> SELECT UPPER('AbCd'), LOWER('AbCd');
    +---------------+---------------+
    | UPPER('AbCd') | LOWER('AbCd') |
    +---------------+---------------+
    | ABCD          | abcd          |
    +---------------+---------------+
    

    However, if 'AbCd' is a binary string, it has no character set. In that case, the concept of lettercase does not apply, and UPPER() and LOWER() do nothing:

    mysql> SELECT UPPER(BINARY 'AbCd'), LOWER(BINARY 'AbCd');
    +----------------------+----------------------+
    | UPPER(BINARY 'AbCd') | LOWER(BINARY 'AbCd') |
    +----------------------+----------------------+
    | AbCd                 | AbCd                 |
    +----------------------+----------------------+
    

    To make the two functions perform case conversion for a binary string, convert it to a non-binary string. For example:

    mysql> SELECT UPPER(CONVERT(BINARY 'AbCd' USING latin1));
    +--------------------------------------------+
    | UPPER(CONVERT(BINARY 'AbCd' USING latin1)) |
    +--------------------------------------------+
    | ABCD                                       |
    +--------------------------------------------+
    
  • MD5() takes a string argument and produces a 32-byte checksum represented as a string of hexadecimal digits. It treats its argument as a binary string:
    mysql> SELECT MD5('a');
    +----------------------------------+
    | MD5('a')                         |
    +----------------------------------+
    | 0cc175b9c0f1b6a831c399e269772661 |
    +----------------------------------+
    
    mysql> SELECT MD5('A');
    +----------------------------------+
    | MD5('A')                         |
    +----------------------------------+
    | 7fc56270e7a70fa81a5935b72eacbe29 |
    +----------------------------------+
    

Related MySQL FAQs to the Above FAQ

What-does-alias-means-in-MySQL What does alias means in MySQL?

What-does-SQL-Expressions-mean-in-MySQL What does SQL Expressions mean in MySQL?

How-to-use-numeric-expressions-in-MySQL How to use numeric expressions in MySQL?

How-to-use-string-expressions-in-MySQL How to use string expressions in MySQL?

How-to-use-LIKE-pattern-matching-operator How to use LIKE pattern-matching operator?

How-to-use-date-or-temporal-expressions How to use date or temporal expressions?

What-does-NULL-value-mean-in-MySQL What does NULL value mean in MySQL?

How-to-use-comments-in-SQL-statements How to use comments in SQL statements?

How-to-validate-user-login-in-case-sensitive-manner How to validate user login in case sensitive manner?

What-is-difference-between-delete-and-truncate-statements-of-MySQL What is difference between delete and truncate statements of MySQL?

About FAQs: Recently Added FAQs

About MySQL FAQs: Site Map | Bookmark Us | Recommend this Site to Your Friend | Contact Us

Useful Links: Wikipedia.org | Oracle.com | w3schools.com | www.php.net | Github.com

© 2023  www.mysqlfaqs.net
All rights reserved.