Leetcode 295: Find Median from Data Stream [Hard]

Pruthvik Reddy
3 min readJun 8, 2021

Link to the problem: https://leetcode.com/problems/find-median-from-data-stream/

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.

Naive Solution:

As the numbers are streamed, append them to an array. Sort the array and then find the median of the array. Very naive and inefficient approach as it sorts the array O(n²) each time an element is added.

Another naive approach can be using Insertion sort to sort the elements as they are streamed. Insertion sort only considers data to sort input only till that point. This makes insertion sort suitable for sorting incoming elements. But still the time complexity is O(n²)

Decent O(n) Solution:

As the elements are streamed, we can place the elements in a sorted manner in an array using binary search. We use the binary search to find the index at which the incoming element has to be placed in the array. The time complexity is O(logn) for this and O(n) for inserting element at that index. So,overall a O(n) solution.

Efficient O(logn) Solution:

An efficient way to solve in O(logn) is by using two heaps. A left heap which is maximum heap and a right heap which is a minimum heap. By storing in this way, if the lengths of both the heaps are same, then the median is sum of top elemets in both heaps, else the median is the top element of the heap whose length is lower than the other. But for this, we have to make sure the difference in elements in both the heaps does not exceed one.

In python, heapq class is inbuilt for heaps. But it implements min-heap. So inorder to implement max-heap using heapq, we multiply the element by -1 while inserting and hence while extracting also we multiply it by -1 to get back the original value.

Follow up Questions:

  • If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?

If all integers are in the range [0,100] then we can maintain an array of size 101 to store the count of each number and to find median, we iterate through the array. The time complexity for this is O(1) since the array is of constant size.

  • If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?

For this we maintain 3 groups. One group for numbers <0 and we use the two heap approach for this. For numbers [0,100] we use the fixed array for storing count and another group for numbers greater than 100 we use the two heaps approach. Based on the count of elements in each groups we find the median.

Sign up to discover human stories that deepen your understanding of the world.

Pruthvik Reddy
Pruthvik Reddy

Written by Pruthvik Reddy

Purdue CS Student. Looking for opportunities in software development.

Responses (1)

Write a response

If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
in that case, the median is for sure inside the [0..100] range, why need keep each every number that is < 0 or > 0, a simple counter is all we need.

--