Why compare when you can just count?
- 2 minutes read - 414 wordsThe Bucket Sort algorithm
LC75 - example of Sorting Colors
The problem statement is as below.
Given an array nums with n objects colored red, white, or blue, sort them in-place so that objects of the same color
are adjacent, with the colors in the order red, white, and blue.
We will use the integers 0, 1, and 2 to represent the color red, white, and blue, respectively.
You must solve this problem without using the library's sort function.
class Solution:
def sortColors(self, nums: List[int]) -> None:
"""
Do not return anything, modify nums in-place instead.
"""
# bucket sort
value_range = 3 #red, white or blue
counts = [0] * value_range
for i, n in enumerate(nums):
counts[n] += 1
i = 0
for value, freq in enumerate(counts):
print(value, freq)
for _ in range(freq):
nums[i] = value
i += 1
Quick rundown
The bucket sort algorithm relies on the fact that values belong to a known range. In this example it is [0,2] ∈ Z.
The input arrays have already been prepared, by encoding categorical values into numerical ones. The same
thing could be said about a different numerical range where the start of range is re-indexed or shifted back to 0.
Given the value range is known, the bucket sort algorithm simply counts frequency of apparition of each value. From there, iterating over the range in ascending or descending order provides the values in an ordered fashion. Those values are inserted as many times as they appear in the initial array by using the frequency.
This yields a quite simple algorithm where no comparison is ever done.
Even with the nested loop, the time complexity is left to O(n) where
n is the array length given the known range is usually small.
I see you sitting at the back with your hand raised, and yes, you’re right! Technically, we could consider a bucket sort on an integer range of [-223, 232] but this range is way too large and defies the purpose of this algorithm. In fact, it is not recommended to use bucket sort for extremely large ranges as there will be no benefits in using a frequency-based sorting algorithm.
This sorting algorithm is considered unstable - in case of a tie between two values, it does not keep any track of which one came first, as shown above. This could be made stable in different conditions - for instance by inserting the elements in a linked list instead of counting them.