DISTINCT helps to eliminate duplicates. If a query returns a result that contains duplicate rows, you can remove duplicates to produce a result set in which every row is unique. To do this, include the keyword DISTINCT after SELECT and before the output column list. Suppose that a query returns a result set that contains duplicated rows:
mysql> SELECT last_name FROM t;
+-----------+
| last_name |
+-----------+
| Brown |
| Larsson |
| Brown |
| Larsson |
+-----------+
Adding DISTINCT removes the duplicates and returns only unique rows:
mysql> SELECT DISTINCT last_name FROM t;
+-----------+
| last_name |
+-----------+
| Brown |
| Larsson |
+-----------+
Duplicate elimination for string values happens differently for non-binary and binary strings. The strings 'ABC', 'Abc', and 'abc' are considered distinct if they're binary strings. If they are non-binary strings, they are considered distinct if they have different values based on their collation. DISTINCT treats all NULL values within a given column as having the same value. Suppose that a table t contains the following rows:
mysql> SELECT i, j FROM t;
+------+------+
| i | j |
+------+------+
| 1 | 2 |
| 1 | NULL |
| 1 | NULL |
+------+------+
For purposes of DISTINCT, the NULL values in the second column are the same, so the second and third rows are identical. Adding DISTINCT to the query eliminates one of them as a duplicate:
mysql> SELECT DISTINCT i, j FROM t;
+------+------+
| i | j |
+------+------+
| 1 | 2 |
| 1 | NULL |
+------+------+
Using DISTINCT is logically equivalent to using GROUP BY on all selected columns with no aggregate function. For such a query, GROUP BY just produces a list of distinct grouping values. If you display and group by a single column, the query produces the distinct values in that column. If you display and group by multiple columns, the query produces the distinct combinations of values in the column. For example, the following two queries produce the same set of rows:
SELECT DISTINCT id FROM t;
SELECT id FROM t GROUP BY id;
As do these:
SELECT DISTINCT id, name FROM t;
SELECT id, name FROM t GROUP BY id, name;
Another correspondence between the behavior of DISTINCT and GROUP BY is that for purposes of assessing distinctness, DISTINCT considers all NULL values the same. This is analogous to the way that GROUP BY groups NULL values. A difference between DISTINCT and GROUP BY is that DISTINCT doesn't cause row sorting. In MySQL, GROUP BY does cause sorting.
DISTINCT can be used with the COUNT() function to count how many distinct values a column contains. In this case, NULL values are ignored:
mysql> SELECT j FROM t;
+------+
| j |
+------+
| 2 |
| NULL |
| NULL |
+------+
mysql> SELECT COUNT(DISTINCT j) FROM t;
+-------------------+
| COUNT(DISTINCT j) |
+-------------------+
| 1 |
+-------------------+
COUNT(DISTINCT expression) counts the number of distinct (unique) non-NULL values of the given expression. expression can be a column name to count the number of distinct non-NULL values in the column:
mysql> SELECT COUNT(DISTINCT i), COUNT(DISTINCT j) FROM t;
+-------------------+-------------------+
| COUNT(DISTINCT i) | COUNT(DISTINCT j) |
+-------------------+-------------------+
| 1 | 3 |
+-------------------+-------------------+
It's also possible to give a list of expressions separated by commas. In this case, COUNT() returns the number of distinct combinations of values that contain no NULL values. The following query counts the number of distinct rows for which neither i nor j is NULL:
mysql> SELECT COUNT(DISTINCT i, j) FROM t;
+----------------------+
| COUNT(DISTINCT i, j) |
+----------------------+
| 2 |
+----------------------+