Median Employee Salary
Problem
From Employee(Id, Company, Salary), return the median salary row of every company.
rows = [(1,'A',2341),(2,'A',341),(3,'A',15),(4,'A',15314),(5,'A',451),(6,'A',513),(7,'B',15),(8,'B',13),(9,'B',1154),(10,'B',1345),(11,'B',1221),(12,'B',234),(13,'C',2345),(14,'C',2645),(15,'C',2645),(16,'C',2652),(17,'C',65)]median rows per companySELECT Id, Company, Salary FROM (
SELECT Id, Company, Salary,
ROW_NUMBER() OVER (PARTITION BY Company ORDER BY Salary) AS r,
COUNT(*) OVER (PARTITION BY Company) AS n
FROM Employee
) t
WHERE r BETWEEN n/2 + (n%2=1) AND n/2 + 1;
Explanation
To find a median you need each company's salaries lined up in order and then pick the value(s) in the middle. The query does exactly this using window functions so it never has to physically sort the whole table by hand.
The inner query stamps every row with two numbers per company. ROW_NUMBER() OVER (PARTITION BY Company ORDER BY Salary) gives each row its rank r within its company from lowest salary to highest, and COUNT(*) OVER (PARTITION BY Company) gives n, how many employees that company has.
The outer WHERE keeps only the middle row(s). For an odd count there is one middle rank; for an even count there are two. The bound r BETWEEN n/2 + (n%2=1) AND n/2 + 1 captures both cases at once.
Example: a company with 5 employees has n=5. The bounds become BETWEEN 2+1 AND 2+1, i.e. rank 3 only — the single middle salary. A company with 6 employees has bounds BETWEEN 3 AND 4, keeping the two central rows.
The cost is dominated by ordering salaries within each partition, which is why it is O(N log N).