Find Median from Data Stream
The problem is asking for a way to efficiently find the median of a set of numbers as they are added one by one. A typical solution to this problem involves using two heaps: a maxheap to store the smaller half of the numbers, and a minheap to store the larger half.
 Initialize Two Heaps: A maxheap to store the left half of the numbers (
left_half
) and a minheap to store the right half (right_half
).  Adding a Number (
addNum
): When adding a number: If
left_half
is empty or the number is less than the maximum value inleft_half
, add it toleft_half
.  Otherwise, add it to
right_half
.  If the heaps are unbalanced (their sizes differ by more than 1), move a number from the larger heap to the smaller one.
 If
 Finding the Median (
findMedian
): The median is: The maximum value in
left_half
if there are an odd number of numbers.  The average of the maximum value in
left_half
and the minimum value inright_half
if there are an even number of numbers.
 The maximum value in
Here’s the code:


Key Takeaways
 By using two heaps, we can efficiently add numbers and find the median in O(log n) time.
 The maxheap keeps track of the smaller half of the numbers, and the minheap keeps track of the larger half.
 Rebalancing the heaps ensures that we can quickly find the median without sorting or scanning the entire data set.
Identifying Problem Isomorphism
“Find Median from Data Stream” deals with the challenge of maintaining and extracting the median value from a continuously incoming data stream.
Two problems that could be considered isomorphic are:
“703. Kth Largest Element in a Stream”: This problem is a simpler counterpart. Here, instead of maintaining the median, we are tasked with maintaining the kth largest element in the stream. It is simpler because it doesn’t require balancing two halves of the data stream like in “Find Median from Data Stream”.
“295. Sliding Window Median”: This problem can be seen as a more complex version. The main task is to find the median of the current window of numbers from a data stream that can change as elements slide in and out of the window. It is more complex due to the added component of the sliding window which also requires managing the data stream.
The order of complexity from simple to more complex:
 “703. Kth Largest Element in a Stream”
 “Find Median from Data Stream”
 “480. Sliding Window Median”
These are related through the underlying theme of managing data streams and maintaining certain properties of the streamed data.
10 Prerequisite LeetCode Problems
“Find Median from Data Stream” (LeetCode Problem #295) involves understanding of data stream, sorting, and binary search. Here are 10 LeetCode problems to prepare:
“Running Sum of 1d Array” (LeetCode Problem #1480): This problem introduces the concept of maintaining a running total of an array, which is a simple form of handling a data stream.
“Kth Missing Positive Number” (LeetCode Problem #1539): This problem introduces the concept of finding specific statistics (in this case, the kth missing number) from an array, which is similar to finding the median from a data stream.
“Top K Frequent Elements” (LeetCode Problem #347): This problem requires processing an array to find specific statistics, which are techniques also needed for “Find Median from Data Stream”.
“Sliding Window Maximum” (LeetCode Problem #239): This problem involves maintaining statistics over a changing “window” of a data stream, which is similar to maintaining the median over a data stream.
“Insert, Delete, GetRandom  O(1)” (LeetCode Problem #380): This problem involves maintaining a data structure with various constraints on operation time complexity, similar to what’s needed for “Find Median from Data Stream”.
“Sort List” (LeetCode Problem #148): As “Find Median from Data Stream” involves sorting a data stream, understanding sorting of a linked list can be a helpful prerequisite.
“Kth Largest Element in a Stream” (LeetCode Problem #703): This problem is very similar to “Find Median from Data Stream”, but it only asks for the Kth largest element rather than the median. This would be good practice for handling data streams.
“Find All Numbers Disappeared in an Array” (LeetCode Problem #448): This problem practices techniques of array manipulation and dealing with missing elements, providing a base for managing streams of data.
“Data Stream as Disjoint Intervals” (LeetCode Problem #352): This problem also involves managing a stream of data, albeit with a different objective than finding the median.
“Maximum Product of Word Lengths” (LeetCode Problem #318): While not directly related to data streams or medians, this problem involves optimizing over a pair of elements, which can be similar to maintaining the two middle elements to find the median in a data stream.
These cover how to handle data streams and finding specific statistics from them, preparing you to tackle the “Find Median from Data Stream” problem.


Problem Classification
The task is to calculate the median of a sequence of numbers as they arrive one by one. This is a Stream Analysis Problem.
The ‘What’ components are:
 Stream of integers as input data
 Need to store data in a structure
 Insertion of new elements into structure
 Finding median of elements seen so far
 Median is middle value in sorted order
 Handle even and odd number of elements
I would categorize this problem as:
 Online algorithm, since data comes as a stream
 Design problem, needing to design a data structure to support operations
 Statistics problem, since it involves calculating the median
 Sorting problem, as median relies on having sorted order
I would classify it as an online median finding problem, which requires designing an efficient data structure that can handle inserts and find the current median after each operation in sorted order.
The key requirements are online processing, custom data structure design, supporting inserts and median finding in sorted order, and efficiently handling even and odd numbers of elements.
Clarification Questions
Here are some clarification questions I would ask about this online median finding problem:
What are the constraints on memory usage? Should we optimize for space or time?
Can elements be deleted as well as inserted?
What should the data structure do if given duplicate elements?
Does the data structure need to support finding any percentiles other than the median?
Is the stream of numbers infinite or finite?
Can we assume the numbers are uniformly randomly distributed?
Does findMedian() need to be constant time O(1), or is linear time O(n) allowed?
Do the numbers come in ascending, descending, or random order in the stream?
What should be done if addNum() is called with an invalid number outside the bounds?
Should the data structure preserve the original input order of numbers?
What should the output be if findMedian() is called before any numbers added?
Are there other methods needed beyond addNum() and findMedian()?
Getting clarity on these requirements would allow me to make better assumptions and design decisions when modeling the data structure and algorithms.
Problem Analysis and Key Insights
Here are some key insights gained from analyzing the median finding problem statement:
The streaming input suggests an online algorithm is needed.
We need to design and implement a custom data structure to support efficient inserts and median finding.
Maintaining sorted order is crucial to allow O(1) median finding.
The data structure needs to handle both even and odd numbers of elements.
New elements can arrive indefinitely, so the structure needs to scale well.
No limit given for memory usage, so optimizing for fast operations is likely the priority.
A balance of moderate insert speed (~log n) and fast O(1) find time is ideal.
The integers are within a reasonable range that fits primitives, so overflow is not a concern.
No specifics given on duplicate handling, so simplest approach can be chosen.
In summary, the core requirements are maintaining sorted order for fast median finding while supporting efficient inserts in an online setting. The stream input and lack of bounds on data size necessitate a scalable solution.
Problem Boundary
Here are some key points about the scope of this online median finding problem:
The input is a stream of integer numbers ranging from 105 to 105.
Need to design a data structure to store the input numbers that supports efficient insertion and median finding operations.
The data structure needs to process inserts in online fashion as numbers arrive.
Median finding must leverage the elements being in sorted order.
Both even and odd numbers of inserted elements must be handled.
Only the addNum() and findMedian() methods need to be implemented.
There are no constraints specified for memory usage or runtime complexity.
Output is simply the median value, not the full sorted order.
The total usage of addNum() and findMedian() is bounded to 5*104 operations.
No need to support deletion or other analytics beyond median.
So in summary, the scope focuses on efficient online insert and median finding within reasonable input size bounds, using a tailored data structure design approach. Details like deletions, duplicate handling, and actual input ordering are not specified.
Here are some ways we can establish boundaries for this online median finding problem:
Input Space
 Integer numbers
 Range from 105 to 105
 Up to 5*104 total numbers inserted
 Stream order unspecified
Output Space
 Median double value
 Between 105 to 105 inclusive
Data Structure
 Must support insertion and median finding
 Median finding must leverage sorted order
 Should scale to 5*104 items
 No deletion needed
Runtime
 addNum() should be faster than O(n) per operation
 findMedian() must be O(1) constant time
Memory
 No specific space complexity bounds given
Other
 Duplicate handling unspecified
 Output for empty structure unspecified
Establishing these clear input, output, functionality, and performance boundaries provides a solid problem specification to design a solution within.
Distilling the Problem to Its Core Elements
This online median finding problem is fundamentally based on the principles of data structures and algorithm design.
In the simplest terms, I would describe it as:
“Designing a structure to efficiently store numbers and find the middle value as new numbers continuously arrive.”
The core problem is storing a stream of numbers in a data structure that allows fast finding of median value. We can simplify it to:
“Build data structure to support fast inserts and median finding on stream.”
The key components are:
 Stream processing
 Custom data structure
 Insertion method
 Median finding method
 Maintaining sorted order
The minimal operations needed are:
 Model the structure
 Implement insertion logic
 Implement median finding logic
 Maintain internal sorting
At its essence, the problem requires applying data structures knowledge to model an efficient structure for fast online statistic tracking on streaming data. The requirements add specifics, but the core is about optimizing data storage and access.
Visual Model of the Problem
Here are some ways to visualize the online median finding problem statement:
Show a stream of numbers being inserted into a data structure. Animate the numbers continuously arriving.
Visually represent the internal state of the data structure, showing the elements in sorted order. Update the visualization on insert.
On median finding, highlight the median element(s) in the internal data structure representation.
Use increasing heights of elements in the structure to represent order and magnitude of numbers.
Show multiple findMedian() calls during insertion process and highlight median upon each call.
Visualize how structure balances on insertion to maintain sorted order and support fast finding.
Contrast with naive unsorted structure that requires linear search for median.
Show earlier insertions being obscured from view to represent streaming window.
Demonstrate handling even and odd numbers of elements differently.
Use color coding and animations to enhance intuitiveness.
These types of visuals could help make core concepts like streaming insertion, maintaining sort order, constant time median finding, and online processing more concrete and intuitive.
Problem Restatement
Here’s how I would paraphrase the problem statement for this online median finding problem:
We need to design a data structure that can store a stream of incoming integers and efficiently find the median value of the elements inserted so far at any point.
The data structure needs to support two main operations:
addNum(num)  Inserts a new integer from the stream into the structure.
findMedian()  Calculates the median value of all elements in the structure currently.
The median is defined as the middle number in sorted order if the count is odd, or the average of the two middle numbers if even.
The data structure needs to leverage the elements being in sorted order somehow to allow finding the median very quickly in constant time.
We don’t know the full input stream in advance. The numbers are arriving one by one in online fashion.
There are no specific bounds given for memory usage or runtime complexity. But addNum() should be reasonably fast, while findMedian() needs to be constant time.
Let me know if I’m missing or unclear on any aspects of the problem based on this paraphrasing! Getting an accurate understanding of the requirements helps drive the solution design.
Abstract Representation of the Problem
Here is one way we could formulate an abstract representation of this online median finding problem:
Let’s consider a data structure D that stores a set S of numbers that arrive in a streaming, online fashion.
D needs to support these operations:
insert(x)  Inserts number x into S
median()  Returns median value of elements in S
Where median is defined as:
 If S is odd, the middle element of the sorted S
 If S is even, average of the middle two elements of sorted S
We want to design the internals of D such that:
 insert(x) has fast O(log n) time complexity
 median() has constant O(1) time complexity
By abstracting the specifics of numbers, streams, and median finding, we can represent the problem more generally as:
 Designing data structure D to store incoming data
 Supporting fast insertions and constant time reads of statistical summary
 Maintaining internal structure of D to enable fast analytics
This framing as an online data structure design problem focused on efficient operations and statistics highlights the key aspects without domainspecific details.
Terminology
Here are some key terms and concepts crucial for understanding this online median finding problem:
Streaming  Continuous arrival of input data over time. Requires online processing.
Online algorithm  Algorithm that processes input piecebypiece in sequential order, without seeing the entire input up front. Needed here.
Median  The middle number in a sorted sequence. The statistic we need to efficiently find.
Sorted order  Elements arranged by increasing/decreasing value. Enables fast median finding.
Balanced tree  Data structure that maintains elements in sorted order and balanced form to enable fast lookups, inserts. Useful here.
Time complexity  How runtime scales with input size. Want fast O(log n) inserts, O(1) median finding.
Space complexity  How memory usage scales with input size. No specific limit given for this problem.
Amortized analysis  Average runtime over entire input stream. Helps analyze incremental costs.
The core aspects are supporting operations in an online, streaming setting, maintaining sorted order for fast statistics, and achieving the desired time complexities. The data structure needs to balance these needs.
Problem Simplification and Explanation
Here’s one way I could explain this median finding problem in simpler terms:
Let’s compare it to a continuously growing line of people waiting to buy tickets (analogy). People keep joining the end of the line. We want to be able to quickly find the middle person in the line at any point in time.
The key concepts are:
People joining the line = numbers being inserted
The line = our data structure storing the numbers
Finding the middle person = determining the median number
Keeping people ordered by arrival = maintaining elements in sorted order
Quickly checking position = fast O(1) median finding time
Joining the end without disrupting order = efficient inserts
So we need a data structure that can store numbers in sorted order, so we can use the middle position to quickly lookup the median. But it also needs to allow efficient inserts at the end.
Balanced trees meet these requirements. As numbers come in, we insert them in the tree to keep sorted order. The tree structure enables fast median finding and reasonably fast inserts.
Constraints
Here are some characteristics of this online median finding problem that we can leverage:
Integer numbers can be stored efficiently without messy decimal precision issues.
Reasonable number range means we don’t have to worry about overflow or underflow issues.
Streaming input means we can focus on incremental insertion rather than batch loading.
Bounded max number of operations allows simplified analysis without needing to handle unbounded streams.
No need to reconstruct full stream ordering since median relies only on value, not position.
Median finding only requires relative ordering, not actual sorting. Maintaining any sorted structure suffices.
Small ongoing insertions are preferred over large recalculations, playing to strengths of tree structures.
No duplicate handling specified, so simplest semantics can be assumed.
Output is just the median value, not the full structure state. Simplifies return logic.
Overall, the discrete and bounded input space, incremental processing, modest output needs, and lack of complex constraints means we can leverage simpler structures like balanced trees rather than more complex stream processing engines.
Here are some key insights gained from analyzing the constraints:
The integer input with reasonable bounds simplifies storage and calculations.
Streaming input promotes an incremental insertoriented approach rather than batch loading.
Bounded usage allows simplified analysis and prevents need to handle unbounded streams.
Output simplicity reduces burden to track stream order or provide debugging info.
Median leverage ordering not full sorting, easing structural constraints.
Small steady inserts better suit a balanced tree rather than periodic rebuild.
No specified duplicate handling simplifies insertion semantics.
Modest output of just median value avoids complex return handling.
Lack of specifics on several aspects like memory or optimality provides flexibility.
In summary, the constraints enable a simple, focused, incremental solution without need to handle complex edge cases, unbounded streams, duplicate logic, or intricate output demands. This allows optimization opportunities.
Case Analysis
Here are some additional test cases covering a wide range of scenarios:
Typical Case
Input: [1, 5, 3, 4, 7, 2, 9]
Output:
addNum(1) > 1 addNum(5) > 3 addNum(3) > 3 addNum(4) > 3.5 addNum(7) > 4 addNum(2) > 3.5 addNum(9) > 4
Large Stream
Input: Stream of 50,000 random integers
Output: Median values should be reasonable for randomness.
Negative Numbers
Input: [3, 1, 5, 2]
Output:
addNum(3) > 3 addNum(1) > 2 addNum(5) > 3 addNum(2) > 2.5
Empty Structure
Input: findMedian() before any addNum()
Output: Undefined behavior
Duplicate Numbers
Input: [2, 2, 3, 2]
Output: Need to define handling, such as skip or place after.
Edge Cases:
 Max/min integers
 All identical numbers
 Ascending/descending streams
Checking these cases validates correct handling of possible input scenarios like negatives, randomness, duplicates, and edge cases.
Here are some ideas for visualizing these median finding test cases:
Typical Case
 Show inserts incrementally building a balanced tree structure
 Highlight median at each step
Large Stream
 Animate continuous insertion into fluctuating tree
 Median value fluctuates randomly
Negative Numbers
 Some elements colored red to denote negative
Empty Structure
 Show findMedian() on empty tree structure
Duplicate Numbers
 Show duplicate elements in tree, highlight issue
Edge Cases
 Inserts skewed all to one side
 All nodes identical values
 Tree imbalance or rebuilding cases
In general:
 Visualize streaming inserts altering tree
 Highlight median finding process
 Call out edge cases through color/animation
These visualizations can help debug issues, gain intuition about tree structure changes on streaming inserts, verify correct median finding, and identify potential edge case flaws.
Here are some key insights gained from analyzing these median finding test cases:
The typical case validates basic functionality and median finding logic.
Large streams help analyze scalability.
Negatives and edge cases reveal potential bugs in ordering.
Empty structure highlights initialization assumptions.
Duplicates emphasize the need for defined handling.
Ascending/descending streams stress test tree balancing logic.
Identical values could produce unstable medians without tiebreaking.
Visualizations provide intuition for balancing, ordering, and median access.
Varied examples validate correctness across diverse inputs.
Edge cases are crucial for ensuring robustness.
Overall, these insights demonstrate the importance of comprehensive testing to verify correctness, reveal assumptions, analyze scalability, identify edge conditions, and gain visual intuition. A wide range of test cases is key to developing a robust solution.
Identification of Applicable Theoretical Concepts
Here are some mathematical and algorithmic concepts that could help optimize this online median finding problem:
Balanced binary search trees  Maintain elements in sorted order to enable O(1) median finding and O(log n) insertion. BSTs like AVL trees or redblack trees are perfect candidates.
Heaps  Heap data structure also maintains order and can be used to find medians. Pair with hash table for speed.
Incremental algorithms  Designing the structure and logic focused on efficient incremental inserts is key for the streaming setting.
Amortized analysis  Helps analyze performance across a sequence of operations rather than worstcase. Useful for incremental costs.
Randomized algorithms  Leverage probability for efficiency, like random pivots for faster expected sorting time.
Sampling  Statistics on a subset can provide approximation. Useful for large data scenarios.
Information theory  Entropy metrics could help guide optimal structure design and analysis.
Applying structures like balanced BSTs and techniques like amortized analysis and randomized algorithms can enhance performance and simplify modeling and solving this problem.
Simple Explanation
Here’s how I would explain this online median finding problem in simple nontechnical terms:
Let’s say you have a neverending line of people waiting to buy tickets, and people keep joining the line (metaphor).
As the line grows, we want to be able to instantly know who the middle person in the line is at any moment.
Whenever someone new joins the line, we need to quickly figure out if they became the new middle person or not.
We can use techniques like keeping people ordered from shortest to tallest, so finding the middle person is easy by looking at their height.
As more people join, we may need to rearrange their positions a bit to keep the shortesttotallest order intact.
The key is, we want to design a system so we can instantly identify the middle person in a continuously growing and changing line of people.
In programming terms, the people are numbers, the line is a data structure, finding the middle person is getting the median number, and keeping them in order is maintaining a sorted structure.
Problem Breakdown and Solution Methodology
Here is a detailed stepbystep approach I would take to solve this online median finding problem:
Use a balanced binary search tree (BST) like an AVL or redblack tree to store the inserted numbers. This keeps elements sorted at all times.
For addNum(num), insert the number into the BST. The tree structure maintains sorting upon insertion.
For findMedian(), if tree size is odd, return root node value (middle element). If even size, return mean of two middle elements.
The tree’s sorted order allows accessing the median element(s) in constant O(1) time. Inserts take O(log n) time.
To handle a stream, simply insert each number via addNum() as it arrives. findMedian() calculates median of current tree state.
Rebalancing may occur during inserts to keep tree balanced for performance.
Increasing stream size increases insert times and tree size. We can optimize with faster BST variants like treaps or randomized BSTs.
Let’s walk through inserting [5, 1, 3] sequentially:
 Insert 5: Tree is [5], median is 5
 Insert 1: Tree is [1, 5], median is 3
 Insert 3: Tree is [1, 3, 5], median is 3
The balanced BST maintains sorted order to allow fast median finding on streams.
Inference of ProblemSolving Approach from the Problem Statement
Here are the key terms and how they guide the solution approach:
Stream  Continuous inputs suggest an incremental online algorithm.
Median  Need to efficiently find middle element in sorted order.
Sorted order  Maintaining sort order enables fast median finding. Signals using a sorted structure.
Insertion  Need to efficiently insert new numbers into the structure. Indicates insertfriendly structures.
Balance  Keeping structure balanced improves performance. Points to balanced trees.
Online  Handling continuously arriving data. Approach needs to focus on fast incremental operations.
Time complexity  O(log n) inserts, O(1) median finding requirements guide design.
Space complexity  No specified limit, but efficiency still preferred.
The core terms of stream, median, and time/space complexity requirements clearly point to modeling this as an online problem needing a sorted, balanced structure that supports efficient inserts and order statistic lookup. This drives selection of a balanced BST as an ideal data structure.
Here are some ways to visualize the properties and concepts of this problem using diagrams:
Balanced BST Operations
 Show insertions modifying structure of BST
 Highlight how balance is maintained
 Indicate order statistics like median accessible in O(1)
Streaming Inserts
 Animate sequential inserts into BST structure
 Values arrive continuously over time from stream
Median Finding
 On query, visually select and highlight median element(s)
 Show how median changes as inserts occur
 Contrast with linear scan of unsorted structure
Time Complexity
 Table of operation times, indicating O(log n) inserts and O(1) median finding
Space Usage
 Treemap visualizing memory usage growing with stream size
These types of diagrams help illustrate key concepts like balanced trees enabling efficient online inserts and order statistic queries, the streaming aspect, and time/space complexities.
How did you infer from the problem statement that this problem can be solved using ?
Simple Explanation of the Proof
I’m having trouble understanding the proof of this algorithm. Could you explain it in a way that’s easy to understand?
Stepwise Refinement
Here is a stepwise refinement for the online median finding problem:
Highlevel approach: Use a balanced BST to store elements and find median
Break into steps:
 Implement balanced BST (AVL, RedBlack, etc)
 Insert method inserts into BST, maintaining balance
 Median method finds inorder predecessor/successor for even/odd size
 Size variable tracks number of elements
 Independent subproblems:
 Balancing rotations on insertion
 Finding median element(s)
 Tracking size
 Repeatable patterns:
 Insertion into BST
 Potential rebalancing on insert
 Order statistic finding for median
The key is modeling it as inserts and order statistics queries on a balanced BST. The core subtasks are the BST operations, median finding, and size tracking. The overall solution revolves around optimized inserts and median queries on the BST.
Solution Approach and Analysis
Here is a detailed stepbystep approach to solving the online median finding problem:
Use a selfbalancing BST like AVL or RedBlack tree to store elements. This keeps elements sorted for fast median finding.
Implement the standard BST insert and search operations to support adding numbers and fetching nodes.
The insert method adds numbers into the tree. Perform any needed rotations to rebalance the tree after inserting.
To find the median, first check if tree size is even or odd.
If odd, return the root node value. If even, return average of predecessor and successor of root, which are the two middle elements.
Track the size of the tree to determine if size is even or odd. Update on every insert.
For the stream, simply insert each number via the insert method. Median can be found in constant time when needed.
Increasing the stream size degrades insert performance and space usage slowly, but finding median remains fast.
Let’s walk through inserting [3, 1, 5, 4, 2] sequentially:
 Insert 3: Size 1, median 3
 Insert 1: Size 2, median 2
 Insert 5: Size 3, median 3
 Insert 4: Size 4, median 3.5
 Insert 2: Size 5, median 3
This leverages the underlying balanced BST to achieve fast inserts and median finding.
Identify Invariant
The invariant in this online median finding problem is:
At all times, the binary search tree maintains the ordering of elements inserted so far in sorted order.
This means after each insertion, the BST structure continues to store all previously inserted numbers in sorted order along with the newly inserted number.
We can maintain this invariant because:
The BST only allows insertions that maintain the underlying sorted structure.
The balancing rotations applied preserve the ordering while restoring balance.
The median finding relies only on the ordering, not the structure itself.
So long as this sorted ordering is maintained after every operation, we can be sure:
 Insertions are all successful
 Median finding works correctly
The invariant provides the relationship between elements needed for correct functioning of the data structure through all operations.
Identify Loop Invariant
What is the loop invariant in this problem?
Is invariant and loop invariant the same for this problem?
Identify Recursion Invariant
Is there an invariant during recursion in this problem?
Is invariant and invariant during recursion the same for this problem?
Generate content for this problem based on this structure: We must show that this loop invariant holds prior to the first iteration of the for loop of lines x  y, that each iteration of the loop maintains the invariant, and that the invariant provides a useful property to show correctness when the loop terminates.
Initialization: Prior to the first iteration of the loop, we have … Maintenance: To see that each iteration maintains the loop invariant, … lines a – b perform the appropriate action to maintain the loop invariant. Termination: At termination,
Thought Process
Here is how I would approach solving this online median finding problem:
The cues from the problem statement:
 Stream of numbers as input
 Need to insert numbers
 Require finding median value
 Median needs sorted order
 Fast insert and lookup times desired
These cues suggest:
 Model stream as sequential inserts
 Maintain sorted structure
 Use tree to enable fast statistics
 Balance tree for performance
Thought process:
 Use a selfbalancing binary search tree (BST)
 Insert numbers sequentially into BST
 For median, check if tree size is even/odd
 If odd, return root value, else return mean of middle values
 Repeat steps 24 as stream progresses
Python code:


This leverages a balanced BST that handles insertion/balancing automatically. We simply insert numbers sequentially and find median in constant time.
Here is an implementation of MedianFinder in Python using a SortedList as the underlying balanced BST structure:


The key aspects are:
Maintain elements in sorted order using SortedList (implements balanced BST)
addNum() inserts into SortedList to keep sorted
findMedian() checks even/odd size to compute median from sorted list
This leverages the underlying sorted structure to achieve O(logn) insert and O(1) median finding.
Establishing Preconditions and Postconditions
Here are the details for the addNum() method:
 Parameters:
 num: Integer  The number to insert
 Preconditions:
 num is within allowed range
 Functionality:
 Inserts num into the underlying data structure
 Maintains sorted order
 Postconditions:
 num is inserted in correct sorted position
 Data structure size is incremented by 1
 Error handling:
 Invalid num value outside +/105 range raises ValueError
 On other errors, raises RuntimeError
In summary, this inserts a number into the structure, keeping sorted order to allow fast future median finding. Input validation is performed and errors handled properly.
And for findMedian():
Parameters: None
Preconditions:
 Data structure contains at least 1 element
 Functionality:
 Finds the current median value of elements
 Postconditions:
 Median float value is returned
 Error handling:
 Empty structure raises ValueError
 On other errors, raises RuntimeError
So this calculates the median value in constant time by leveraging the underlying sorted structure. Errors are handled if preconditions are violated.
Problem Decomposition
Here’s how I would break down this online median finding problem:
Maintain a stream of numbers in a data structure that supports efficiently finding the median value on demand.
Major parts are:
 Structure design and implementation
 Inserting numbers
 Finding median
 Handling stream
Breakdown:
 Implement balanced BST
 Insert method to add numbers
 Median method to calculate value
 Method to get even/odd size
 Stream processing loop
 Error handling
Generalized tasks:
 Balanced BST implementation
 Element insertion
 Statistics calculation
 Stream iteration
 Input validation
Abstraction looks good.
Naming is clear.
Order:
 Design structure
 Implement insertion
 Implement median finding
 Process stream
 Perform error handling
The key is designing a structure for efficient order statistic queries, implementing the operations, and leveraging it for the stream use case. The breakdown identifies reusable components like the BST, insertion, median finding, etc.
From Brute Force to Optimal Solution
Here is a brute force approach and optimizations for the online median finding problem:
Brute force:
 Store inserted numbers in an unsorted array
 On median requests, linear scan entire array to find middle
This requires O(n) for both insert and median finding.
Optimizations:
 Maintain sorted array using binary insertion:
 Insert into sorted position in O(log n)
 Find median by accessing middle index in O(1)
 Use a balanced BST like AVL/RedBlack tree instead of array:
 Insert in O(log n) amortized time
 Improves median to worst case O(1) instead of average
 Store size instead of recounting:
 Avoid O(n) size scans for even/odd
 Move even median logic outside:
 Check parity and pick median strategy just once
 Handle streams externally:
 Makes tree usage generic
These optimizations improve the complexity from O(n) for both operations to:
 Insertion: O(log n) amortized
 Median finding: O(1) worst case
The space improves from O(n) for array to O(n) for tree.
This is achieved by applying sorted structures, balancing, caching, and modular design. The optimizations yield substantial gains on both time and space complexity.
Code Explanation and Design Decisions
Here is an analysis of the logic and reasoning behind this median finding code:
It initializes a SortedList to maintain sorted order. Size tracks count. These represent the problem state.
The main loop iterates over the stream numbers. Each iteration processes the next number in sequence.
The if condition checks if size is even for median calculation. This handles the two median cases.
Size is updated on each insert to track count. SortedList maintains order on inserts.
The invariant is SortedList always sorted. This enables fast median finding.
The final median value returned is calculated on the latest stream state. It satisfies the querying requirement.
Overall, the logic directly maps to processing a stream, leveraging a sorted structure to incrementally update state, handling order constraints, and returning the desired query result. The reasoning ties the code to the problem requirements.
Coding Constructs
Here is an analysis of the highlevel logic of this median finding code:
It uses a data structure approach to efficiently store and access data.
I would explain the code stores values in a clever way that allows easily finding the middle value.
The core logic includes data structures, looping, conditionals, functions and algorithms.
The algorithm inserts numbers into a balanced search tree. This allows efficiently finding the middle value in constant time.
The key steps are insertion while maintaining order, tracking size, checking parity, accessing inorder elements for the median. This leverages the sorted structure.
The overall strategy is incremental insertion into an ordered balanced structure to optimize a statistic query. This aligns to online algorithms and amortized analysis.
Overall, this analysis extracts the highlevel semantics around orderly data structures, online processing, optimized queries, and incremental updates in a language and paradigm agnostic way. The focus is on the conceptual patterns and problemsolving approach.
Language Agnostic Coding Drills
Here is a decomposition of key concepts in this median finding code:
Easy
 Array/list manipulation  Adding elements, accessing by index
 Math operations  Calculating mean
 Variable assignment  Tracking size, median value
 Looping  Iterating over stream
Medium
 Data structures  Implementing a sorted structure like BST
 Sorting elements  Maintaining sorted order
 Modular design  Separating median finding logic
Advanced
 Balanced tree  Selfbalancing for O(log n) inserts
 Amortized analysis  Understanding amortized costs
 Order statistics  Calculating median from element position
 Online algorithms  Optimizing for streaming incremental input
The overall problem solving approach is:
 Implement a selfbalancing ordered data structure
 Build insertion logic to maintain sorted structure
 Encapsulate median finding logic separately
 Insert streaming numbers sequentially
 Return median value for current state
Each concept builds on the last to construct an efficient online median finding system  from basic data manipulation, to complex ordered structures, to online algorithm design.
Targeted Drills in Python
Here are Python coding drills for the key concepts:
Basic
Array manipulation:


Math operations:


Looping:


Intermediate
Sorted structure:


Encapsulation:


Advanced
Balanced BST:


Online algorithms:


These provide building blocks to assemble the overall online median finding system  combining basic coding, data structures, algorithms and modularity.
Q&A
Similar Problems
Here are 10 distinct problems that use similar underlying concepts:
Kth Largest Element in a Stream (703)  Maintains kth largest element using BST, similar ordered structure techniques.
My Calendar I, II, III (729, 731, 732)  Online insertions and queries on interval tree, another ordered structure.
Design Search Autocomplete System (642)  Prefix trie structure to optimize online query.
Design Compressed String Iterator (604)  Online iterator logic with backing data structure.
Design Log Storage System (635)  Appends and splits ordered log chunks like BST splitting.
Sliding Window Median (480)  Online median maintenance within sliding window.
Data Stream as Disjoint Intervals (352)  Online aggregation of numeric stream into structure.
Game of Life (289)  Grid simulation mirrors online stream processing.
Design Log Storage System (635) – Online ordered storage structure.
Subarray Sum Equals K (560)  Sliding window online aggregation techniques.
Common themes include online processing, custom data structures, ordered/indexed design, and aggregated statistics  aligning with our median finding approach.
Language Agnostic Coding Drills
Understanding of data structures: This code uses a data structure called a heap. A heap is a complete binary tree that satisfies the heap property. It’s typically used when we want to keep track of the largest or smallest element that we’ve seen so far. In Python, the heapq library provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.
Heap operations: Understanding how to use heap operations such as
heappush
, which pushes an item on the heap, andheappop
, which pops and returns the smallest item from the heap, is critical. If you’re interested in a max heap, Python’s heapq module does not support it directly, but we can simulate a max heap by inverting the order through multiplying elements by 1.Maintaining the Median: The central concept is maintaining the median as new numbers arrive. This problem is solved using two heaps  a max heap to represent the lower half and a min heap for the upper half. The lengths of the two halves differ by at most one. This approach ensures that the median is at the root of one of the heaps (or both if the total number of items is even).
Constructor Implementation: The
__init__
function is a special method in Python classes, which is automatically called when an object is instantiated. In this case, it initializes two empty heaps.Adding a Number: In the
addNum
function, every new number is always first inserted into the max heap (the lower half). Then, the maximum element from the lower half is moved to the upper half. If after these operations the size of the lower half is smaller than the upper half, we balance them by moving the minimum element from the upper half to the lower half.Finding the Median: In the
findMedian
function, if the sizes of the halves are not equal, the median is the root of the heap with more elements. Otherwise, the median is the average of the roots of both heaps.Final Assembly: The combination of these drills is the final solution. Each incoming number is added using the
addNum
function, and the median is always accessible in O(1) time using thefindMedian
function.
This sequence of drills and concepts arranged in increasing order of difficulty offers a comprehensive understanding of the problemsolving approach used in the given code. It illustrates how heaps can be effectively utilized to solve dynamic median finding problems.
Targeted Drills in Python
 Understanding of data structures: In this case, you need to understand how to work with heaps in Python. Start by creating a basic heap and performing simple operations.


 Heap operations: Practice using
heappush
andheappop
. Try pushing and popping elements and observe how the heap changes.


 Simulating a max heap: Python’s heapq doesn’t support max heap directly, but you can simulate it by inverting the order.


 Constructor Implementation: Write a class with a constructor that initializes two empty heaps.


 Adding a Number: Implement a method that adds a number to the correct heap and maintains the balance.


 Finding the Median: Implement a method that returns the median of the numbers added so far.


You can incrementally build up the final solution by first getting comfortable with heap operations, then implementing each method of the class one by one. Note that the addNum
and findMedian
methods depend on each other to maintain the balance of the two heaps and accurately find the median.