Top K Frequent Words

medium string hash map heap

Problem

Given an array of strings words and an integer k, return the k most frequent strings. Sort the answer by frequency from highest to lowest. Sort words with the same frequency by lexicographic order.

Inputwords = ["i","love","leetcode","i","love","coding"], k = 2

Output["i", "love"]

"i" and "love" each appear twice; "i" comes first alphabetically.

words (comma-separated)

import heapq
from collections import Counter
def top_k_frequent(words, k):
    cnt = Counter(words)
    heap = []
    for w, c in cnt.items():
        heapq.heappush(heap, (c, [-ord(ch) for ch in w], w))
        if len(heap) > k:
            heapq.heappop(heap)
    return [w for _, _, w in sorted(heap, key=lambda x: (-x[0], x[2]))]

function topKFrequent(words, k) {
  const cnt = new Map();
  for (const w of words) cnt.set(w, (cnt.get(w) || 0) + 1);
  return [...cnt.entries()]
    .sort((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]))
    .slice(0, k)
    .map(([w]) => w);
}

class Solution {
    public List<String> topKFrequent(String[] words, int k) {
        Map<String, Integer> cnt = new HashMap<>();
        for (String w : words) cnt.merge(w, 1, Integer::sum);
        PriorityQueue<String> heap = new PriorityQueue<>((a, b) ->
            cnt.get(a).equals(cnt.get(b)) ? b.compareTo(a) : cnt.get(a) - cnt.get(b));
        for (String w : cnt.keySet()) {
            heap.offer(w);
            if (heap.size() > k) heap.poll();
        }
        LinkedList<String> ans = new LinkedList<>();
        while (!heap.isEmpty()) ans.addFirst(heap.poll());
        return ans;
    }
}

vector<string> topKFrequent(vector<string>& words, int k) {
    unordered_map<string, int> cnt;
    for (auto& w : words) cnt[w]++;
    auto cmp = [&](const string& a, const string& b) {
        return cnt[a] == cnt[b] ? a < b : cnt[a] > cnt[b];
    };
    priority_queue<string, vector<string>, decltype(cmp)> heap(cmp);
    for (auto& p : cnt) {
        heap.push(p.first);
        if ((int)heap.size() > k) heap.pop();
    }
    vector<string> ans;
    while (!heap.empty()) { ans.push_back(heap.top()); heap.pop(); }
    reverse(ans.begin(), ans.end());
    return ans;
}

Explanation

This is the classic "keep the top k" pattern, but with a twist: ties in frequency must be broken alphabetically. We count words, then keep only the best k in a size-k min-heap.

First cnt tallies how many times each word appears. Then for each word we push a tuple onto the heap and, if the heap grows beyond k, pop the smallest. Whatever survives is the top k.

The tricky part is the heap key (c, [-ord(ch) for ch in w], w). A min-heap pops the smallest, but for ties we want the alphabetically larger word to be considered "smaller" (so it gets popped first when we trim). Negating each character code flips the alphabetical order, so among equal counts the word later in the dictionary is treated as the weakest and is the first to go.

At the end we sort the surviving entries by count descending, then word ascending to produce the required output order.

Example: words = ["i","love","leetcode","i","love","coding"], k = 2. Counts are i:2, love:2, leetcode:1, coding:1. The two most frequent are i and love, and since both appear twice, i comes first alphabetically, giving ["i", "love"].

Time: O(n log k) Space: O(n)