Duplicate Emails
Problem
Write a SQL query to find all duplicate emails in the Person table.
SQL: SELECT email AS Email FROM Person GROUP BY email HAVING COUNT(*) > 1;
emails = ["[email protected]", "[email protected]", "[email protected]"]["[email protected]"]SELECT email AS Email FROM Person
GROUP BY email HAVING COUNT(*) > 1;
Explanation
A duplicate email is just an address that shows up more than once in the table. So we group the rows by email and keep only the groups that contain at least two rows.
GROUP BY email collapses all rows sharing the same address into one bucket. Inside each bucket, COUNT(*) tells us how many rows fell into it.
The HAVING clause then filters those buckets. We cannot use WHERE for this because WHERE runs before grouping; HAVING runs after, so it can test the group's count. We keep groups where COUNT(*) > 1.
Example: emails ["[email protected]", "[email protected]", "[email protected]"]. Grouping gives [email protected] with count 2 and [email protected] with count 1. Only [email protected] passes COUNT(*) > 1, so the result is ["[email protected]"].
The database builds the counts in essentially one pass, much like filling a hash map of email to tally, which is why this is efficient.