Find Median from Data Stream
Problem
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values. For example, for arr = [2,3,4], the median is 3.
add(1), add(2), median()→1.5, add(3), median()→2L for the smaller half and a min-heap R for the larger half. Maintain |L| − |R| ∈ {0, 1}. The median is L.top() (odd) or (L.top() + R.top())/2 (even).import heapq
class MedianFinder:
def __init__(self):
self.lo = [] # max-heap (negated)
self.hi = [] # min-heap
def add(self, x):
heapq.heappush(self.lo, -heapq.heappushpop(self.hi, x))
if len(self.lo) > len(self.hi) + 1:
heapq.heappush(self.hi, -heapq.heappop(self.lo))
def median(self):
if len(self.lo) > len(self.hi):
return -self.lo[0]
return (-self.lo[0] + self.hi[0]) / 2.0
// MaxHeap and MinHeap are simple binary heap classes (omitted).
class MedianFinder {
constructor() { this.lo = new MaxHeap(); this.hi = new MinHeap(); }
add(x) {
if (this.lo.size() === 0 || x <= this.lo.top()) this.lo.push(x);
else this.hi.push(x);
if (this.lo.size() > this.hi.size() + 1) this.hi.push(this.lo.pop());
else if (this.hi.size() > this.lo.size()) this.lo.push(this.hi.pop());
}
median() {
if (this.lo.size() > this.hi.size()) return this.lo.top();
return (this.lo.top() + this.hi.top()) / 2;
}
}
class MedianFinder {
PriorityQueue<Integer> lo = new PriorityQueue<>((a, b) -> b - a);
PriorityQueue<Integer> hi = new PriorityQueue<>();
public void add(int x) {
if (lo.isEmpty() || x <= lo.peek()) lo.add(x); else hi.add(x);
if (lo.size() > hi.size() + 1) hi.add(lo.poll());
else if (hi.size() > lo.size()) lo.add(hi.poll());
}
public double median() {
if (lo.size() > hi.size()) return lo.peek();
return (lo.peek() + hi.peek()) / 2.0;
}
}
class MedianFinder {
priority_queue<int> lo; // max-heap
priority_queue<int, vector<int>, greater<>> hi; // min-heap
public:
void add(int x) {
if (lo.empty() || x <= lo.top()) lo.push(x); else hi.push(x);
if ((int) lo.size() > (int) hi.size() + 1) { hi.push(lo.top()); lo.pop(); }
else if ((int) hi.size() > (int) lo.size()) { lo.push(hi.top()); hi.pop(); }
}
double median() {
if (lo.size() > hi.size()) return lo.top();
return (lo.top() + hi.top()) / 2.0;
}
};
Explanation
The median sits in the middle of the sorted values, so we keep the numbers split into a lower half and an upper half and just peek at the boundary. We use two heaps: lo is a max-heap holding the smaller half (its top is the biggest of the small numbers) and hi is a min-heap holding the larger half (its top is the smallest of the big numbers).
Python only has min-heaps, so we fake a max-heap by storing negated values in lo. The top of the lower half is then -self.lo[0].
The clever one-liner in add is heapq.heappush(self.lo, -heapq.heappushpop(self.hi, x)). It pushes x into hi, immediately pops hi's smallest, and moves that (negated) into lo. This guarantees every value in lo stays no bigger than every value in hi. Then if lo grew too large, we shift its biggest back over to hi to keep the sizes balanced.
For the median: if lo has one extra element (odd count), the median is its top -self.lo[0]; otherwise the two halves are equal and the median is the average of the two tops.
Example: add 1 then 2 — the halves become lo=[1], hi=[2], so the median is (1 + 2) / 2 = 1.5. Add 3 and the count is odd again, giving median 2.