Minimum Deletions to Make Character Frequencies Unique
Problem
Given a string s, return the minimum number of characters to delete so that the frequencies of all remaining characters are pairwise distinct.
s = "aaabbbcc"2from collections import Counter
def min_deletions(s):
counts = sorted(Counter(s).values(), reverse=True)
used = set()
dels = 0
for f in counts:
while f > 0 and f in used:
f -= 1
dels += 1
if f > 0: used.add(f)
return dels
function minDeletions(s) {
const c = new Map();
for (const ch of s) c.set(ch, (c.get(ch) || 0) + 1);
const counts = [...c.values()].sort((a, b) => b - a);
const used = new Set();
let dels = 0;
for (let f of counts) {
while (f > 0 && used.has(f)) { f--; dels++; }
if (f > 0) used.add(f);
}
return dels;
}
class Solution {
public int minDeletions(String s) {
int[] c = new int[26];
for (char ch : s.toCharArray()) c[ch - 'a']++;
Set<Integer> used = new HashSet<>();
int dels = 0;
for (int f : c) {
while (f > 0 && used.contains(f)) { f--; dels++; }
if (f > 0) used.add(f);
}
return dels;
}
}
int minDeletions(string s) {
int c[26] = {0};
for (char ch : s) c[ch - 'a']++;
unordered_set<int> used;
int dels = 0;
for (int f : c) {
while (f > 0 && used.count(f)) { f--; dels++; }
if (f > 0) used.insert(f);
}
return dels;
}
Explanation
We need every remaining letter to have a different frequency. When two letters share a frequency, we have to delete copies of one of them until its frequency is free. The greedy choice is to keep the bigger frequencies intact and shrink the clashing ones downward.
We count each letter, then process the frequencies from largest to smallest. We keep a used set of frequencies already claimed. For each count f, while f is positive and already taken, we drop it by 1 (a deletion) until it lands on a free value or hits 0.
Processing big counts first means a frequency only ever moves down into smaller, still-available slots — which guarantees the fewest total deletions. A frequency that falls to 0 means that letter was wiped out entirely, and 0 is never added to used.
Example: s = "aaabbbcc". Counts sorted desc are [3, 3, 2]. The first 3 claims slot 3. The second 3 clashes, so it drops to 2 (1 deletion); 2 is still free... but next the count-2 letter would clash, so one of them slides to 1. Total deletions: 2, leaving frequencies {3, 2, 1}.
Counting is linear; the shrinking loops add at most a quadratic-in-alphabet amount of work, which is tiny.