Leetcode 295: Find Median from Data Stream [Hard]
Link to the problem: https://leetcode.com/problems/find-median-from-data-stream/
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.
- For example, for
arr = [2,3,4]
, the median is3
. - For example, for
arr = [2,3]
, the median is(2 + 3) / 2 = 2.5
.
Implement the MedianFinder class:
MedianFinder()
initializes theMedianFinder
object.void addNum(int num)
adds the integernum
from the data stream to the data structure.double findMedian()
returns the median of all elements so far. Answers within10-5
of the actual answer will be accepted.

Naive Solution:
As the numbers are streamed, append them to an array. Sort the array and then find the median of the array. Very naive and inefficient approach as it sorts the array O(n²) each time an element is added.
Another naive approach can be using Insertion sort to sort the elements as they are streamed. Insertion sort only considers data to sort input only till that point. This makes insertion sort suitable for sorting incoming elements. But still the time complexity is O(n²)
Decent O(n) Solution:
As the elements are streamed, we can place the elements in a sorted manner in an array using binary search. We use the binary search to find the index at which the incoming element has to be placed in the array. The time complexity is O(logn) for this and O(n) for inserting element at that index. So,overall a O(n) solution.
Efficient O(logn) Solution:
An efficient way to solve in O(logn) is by using two heaps. A left heap which is maximum heap and a right heap which is a minimum heap. By storing in this way, if the lengths of both the heaps are same, then the median is sum of top elemets in both heaps, else the median is the top element of the heap whose length is lower than the other. But for this, we have to make sure the difference in elements in both the heaps does not exceed one.
In python, heapq class is inbuilt for heaps. But it implements min-heap. So inorder to implement max-heap using heapq, we multiply the element by -1 while inserting and hence while extracting also we multiply it by -1 to get back the original value.
Follow up Questions:
- If all integer numbers from the stream are in the range
[0, 100]
, how would you optimize your solution?
If all integers are in the range [0,100] then we can maintain an array of size 101 to store the count of each number and to find median, we iterate through the array. The time complexity for this is O(1) since the array is of constant size.
- If
99%
of all integer numbers from the stream are in the range[0, 100]
, how would you optimize your solution?
For this we maintain 3 groups. One group for numbers <0 and we use the two heap approach for this. For numbers [0,100] we use the fixed array for storing count and another group for numbers greater than 100 we use the two heaps approach. Based on the count of elements in each groups we find the median.