Nov 10, 2018

How to remove duplicates from existing MySQL table

Identifying Duplicates:

Following is the query to count duplicate records with first_name and last_name in a table.

mysql> SELECT COUNT(*) as duplicates, last_name, first_name
    -> FROM contact
    -> GROUP BY last_name, first_name
    -> HAVING duplicates > 1;

This query will return a list of all the duplicate records in contact table. In general, to identify sets of duplicate values, do the following:
  1. Determine which columns contain the values that may be duplicated.
  2. List those columns in the column selection list, along with COUNT(*).
  3. List the columns in the GROUP BY clause as well.
  4. Add a HAVING clause that eliminates unique values by requiring group counts to be greater than one.
Removing Duplicates from a Query Result:

You can use DISTINCT along with SELECT statement to find out unique records available in a table.

mysql> SELECT DISTINCT last_name, first_name
    -> FROM contact
    -> ORDER BY last_name;

An alternative to DISTINCT is to add a GROUP BY clause that names the columns you're selecting. This has the effect of removing duplicates and selecting only the unique combinations of values in the specified columns:

mysql> SELECT last_name, first_name
    -> FROM contact
    -> GROUP BY (last_name, first_name);

Removing Duplicates Using Table Replacement:

If you have duplicate records in a table and you want to remove all the duplicate records from that table, then here is the procedure:

mysql> CREATE TEMPORARY TABLE contact_copy SELECT last_name, first_name, other_columns
    ->                  FROM contact;
    ->                  GROUP BY (last_name, first_name);
mysql> RENAME TABLE contact TO contact_old, contact_copy TO contact;
mysql> DROP TABLE contact_old;

An easy way of removing duplicate records from a table is to add an INDEX or PRIMAY KEY to that table. Even if this table is already available, you can use this technique to remove duplicate records and you will be safe in future as well.

mysql> ALTER IGNORE TABLE contact
    -> ADD PRIMARY KEY (last_name, first_name);