H-Index

medium array counting sort

Problem

Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper, return compute the researcher's h-index. According to the definition of h-index on Wikipedia: A scientist has an index h if h of their n papers have at least h citations each, and the other n − h papers have no more than h citations each. If there are several possible values for h, the maximum one is taken as the h-index.

Inputcitations = [3, 0, 6, 1, 5]

Output3

3 papers have ≥ 3 citations (the ones with 3, 6, 5). No 4 papers have ≥ 4 citations. So h = 3.

citations (non-negative, comma-separated)

def h_index(citations):
    n = len(citations)
    buckets = [0] * (n + 1)
    for c in citations:
        buckets[min(c, n)] += 1
    count = 0
    for h in range(n, -1, -1):
        count += buckets[h]
        if count >= h:
            return h
    return 0

function hIndex(citations) {
  const n = citations.length;
  const buckets = new Array(n + 1).fill(0);
  for (const c of citations) buckets[Math.min(c, n)]++;
  let count = 0;
  for (let h = n; h >= 0; h--) {
    count += buckets[h];
    if (count >= h) return h;
  }
  return 0;
}

class Solution {
    public int hIndex(int[] citations) {
        int n = citations.length;
        int[] buckets = new int[n + 1];
        for (int c : citations) buckets[Math.min(c, n)]++;
        int count = 0;
        for (int h = n; h >= 0; h--) {
            count += buckets[h];
            if (count >= h) return h;
        }
        return 0;
    }
}

int hIndex(vector<int>& citations) {
    int n = citations.size();
    vector<int> buckets(n + 1, 0);
    for (int c : citations) buckets[min(c, n)]++;
    int count = 0;
    for (int h = n; h >= 0; h--) {
        count += buckets[h];
        if (count >= h) return h;
    }
    return 0;
}

Explanation

The h-index is the largest h such that at least h papers each have >= h citations. You could sort the citations, but here we get the answer in linear time using a clever bucket (counting-sort) idea.

The key insight: the h-index can never exceed n, the number of papers. So any citation count larger than n is just as useful as n. We make n + 1 buckets and, for each paper, do buckets[min(c, n)] += 1 — capping big counts at n.

Then we sweep h downward from n to 0, adding up papers as we go. The running count after adding buckets[h] equals the number of papers with at least h citations. The first time count >= h, that h is the answer — and because we scan high-to-low, it is automatically the maximum.

Example: citations = [3, 0, 6, 1, 5] with n = 5. The buckets fill so that at h = 3 the running count of papers with >= 3 citations reaches 3 (the papers with 3, 5, and 6). Since 3 >= 3, the h-index is 3.

No sorting is required, so the whole thing runs in time proportional to the number of papers.

Time: O(n) Space: O(n)