From this blog, we will touch on the randomized algorithms. And today we will focus on Randomized Quicksort Algorithm.
Why we need Randomized Algorithms?
- Faster, but weaker guarantees.
- Simpler code, but harder to analyze.
- Sometimes only option, e.g. Big Data, Machine Learning, Security, etc.
First, Let’s review the QuickSort Algorithm
1
2
3
4
5
6
7
8
function QS(S) //Assumes all elements in S are distinct.
if |S| ≤ 1 then
return S
else
Pick pivot x ∈ S
L ← {y ∈ S | y < x}
R ← {y ∈ S | y > x}
return QS(L) + [x] + QS(R)
For details, you can check it in another blog. We know the average-case running time is ${O(n \lg n)}$ and worst-case running time is ${O(n^2)}$ when the input array is in reverse order.
Randomized Quicksort
We just need to modify the strategy in “pivot selecting” step.
1
2
3
4
5
6
7
8
9
function RandQS(S)
Assumes all elements in S are distinct.
if |S| ≤ 1 then
return S
else
Pick pivot x ∈ S, uniformly at random
L ← {y ∈ S | y < x}
R ← {y ∈ S | y > x}
return RandQS(L) + [x] + RandQS(R)
Following, we will show the analysis of Randomized QuickSort.
Analysis of Rand QuickSort
Let ${\left[S_{(1)}, \ldots, S_{(n)}\right] := \text{RandQS}(S)}$.
For ${i < j}$ let random variable ${X_{ij} \in \{0, 1\}}$ be the number of times that ${S_{(i)}}$ and ${S_{(j)}}$ are compared.
\[\mathbb{E}[\#\text{comparisons}] = \mathbb{E}\left[\sum_{1 \leq i < j \leq n} X_{ij}\right] = \sum_{1 \leq i < j \leq n} \mathbb{E}[X_{ij}]\]Here we uses linearity of expectation
$$ \begin{equation} \begin{aligned} \mathbb{E}[A + B] &= \sum_x \sum_y (x+y) \mathbb{P}(X=x,Y=y)\\ &= \sum_x \sum_y x \mathbb{P}(X=x,Y=y) + \sum_x \sum_y y \mathbb{P}(X=x,y=y)\\ &=\sum_x x \sum_y \mathbb{P}(X=x,Y=y) + \sum_y y \sum_x \mathbb{P}(X=x,Y=y) \\ &= \sum_x \sum_y x \mathbb{P}(X=x) + \sum_x \sum_y y \mathbb{P}(Y=y) \\ &=\mathbb{E}[A] + \mathbb{E}[B] \end{aligned} \end{equation} $$ Note: we don't need ${X}$ and ${Y}$ are **independent** here.Since ${X_{ij} \in \{0, 1\}}$, it is an indicator random variable for the event that ${S_{(i)}}$ and ${S_{(j)}}$ are compared.
Let ${p_{ij}}$ be the probability of this event. Then
\[\mathbb{E}[X_{ij}] = 0 \cdot (1 - p_{ij}) + 1 \cdot p_{ij} = p_{ij}\]Thus the expectation of an indicator variable equals the probability of the indicated event.
\[\mathbb{E}[\#\text{comparisons}] = \sum_{i<j} \mathbb{E}[X_{ij}] = \sum_{i<j} p_{ij}\]Lemma: ${S_{(i)}}$ and ${S_{(j)}}$ are compared iff ${S_{(i)}}$ or ${S_{(j)}}$ is first of ${S_{(i)}, \ldots, S_{(j)}}$ to be chosen as pivot.
Proof. Each recursive call returns ${\left[S_{(a)}, \ldots, S_{(b)}\right]}$. Let $x = S_{(c)}$ be the pivot. Suppose $a \leq i < j \leq b$, that looks like
${a}$ | ${\cdots}$ | ${i}$ | ${\cdots}$ | ${j}$ | ${\cdots}$ | ${b}$ |
-
Case 1: ${c < i}$ or ${c > j}$, ${S_{(i)}}$ and ${S_{(j)}}$ not compared now, but together in recursion.
-
Case 2: ${i < c < j}$, ${S_{(i)}}$ and ${S_{(j)}}$ never compared.
-
Case 3: ${c = i}$ or ${c = j}$, ${S_{(i)}}$ and ${S_{(j)}}$ compared once.
So decision only made when ${i < c < j}$. ${\square}$
Thus, the probability of the event that ${S_{(i)}}$ and ${S_{(j)}}$ are compared is
\[p_{ij} = \frac{2}{j + 1 - i}\]And, we have (denote ${H_n = 1+ \frac{1}{2} + \frac{1}{3}+ \cdots \frac{1}{n}}$)
\(\begin{aligned} \mathbb{E}[\#\text{comparisons}] &= \sum_{i<j} p_{ij} \\ &= \sum_{i<j} \frac{2}{j + 1 - i} \\ &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j + 1 - i} (\text{ replace } j+1-i \text{ as } k \text{, that is } j=k+i-1)\\ &= \sum_{i=1}^{n-1} \sum_{k=2}^{n+1-i} \frac{2}{k}\\ &<\sum_{i=1}^{n} \sum_{k=1}^{n} \frac{2}{k}\\ & = \sum_{i=1}^{n} 2H_n \\ & = 2n H_n \end{aligned}\) We know ${H_n = O(\ln n)}$, so the expected number of comparisons is ${O(n \ln n)}$. In fact, we can also show RandQS can finish in ${O(n \ln n)}$ with high probability. (We don’t show it here.)