Randomized Algs:Randomized QuickSort

From this blog, we will touch on the randomized algorithms. And today we will focus on Randomized Quicksort Algorithm.

Why we need Randomized Algorithms?

Faster, but weaker guarantees.
Simpler code, but harder to analyze.
Sometimes only option, e.g. Big Data, Machine Learning, Security, etc.

First, Let’s review the QuickSort Algorithm

1
2
3
4
5
6
7
8
function QS(S)  //Assumes all elements in S are distinct.
   if |S| ≤ 1 then
      return S
   else
      Pick pivot x ∈ S
      L ← {y ∈ S | y < x}
      R ← {y ∈ S | y > x}
      return QS(L) + [x] + QS(R)

For details, you can check it in another blog. We know the average-case running time is ${O(n \lg n)}$ and worst-case running time is ${O(n^2)}$ when the input array is in reverse order.

Randomized Quicksort

We just need to modify the strategy in “pivot selecting” step.

1
2
3
4
5
6
7
8
9
function RandQS(S)
   Assumes all elements in S are distinct.
   if |S| ≤ 1 then
      return S
   else
      Pick pivot x ∈ S, uniformly at random
      L ← {y ∈ S | y < x}
      R ← {y ∈ S | y > x}
      return RandQS(L) + [x] + RandQS(R)

Following, we will show the analysis of Randomized QuickSort.

Analysis of Rand QuickSort

Let ${\left[S_{(1)}, \ldots, S_{(n)}\right] := \text{RandQS}(S)}$.

For ${i < j}$ let random variable ${X_{ij} \in \{0, 1\}}$ be the number of times that ${S_{(i)}}$ and ${S_{(j)}}$ are compared.

\[\mathbb{E}[\#\text{comparisons}] = \mathbb{E}\left[\sum_{1 \leq i < j \leq n} X_{ij}\right] = \sum_{1 \leq i < j \leq n} \mathbb{E}[X_{ij}]\]

Here we uses linearity of expectation

$$ \begin{equation} \begin{aligned} \mathbb{E}[A + B] &= \sum_x \sum_y (x+y) \mathbb{P}(X=x,Y=y)\\ &= \sum_x \sum_y x \mathbb{P}(X=x,Y=y) + \sum_x \sum_y y \mathbb{P}(X=x,y=y)\\ &=\sum_x x \sum_y \mathbb{P}(X=x,Y=y) + \sum_y y \sum_x \mathbb{P}(X=x,Y=y) \\ &= \sum_x \sum_y x \mathbb{P}(X=x) + \sum_x \sum_y y \mathbb{P}(Y=y) \\ &=\mathbb{E}[A] + \mathbb{E}[B] \end{aligned} \end{equation} $$ Note: we don't need ${X}$ and ${Y}$ are **independent** here.

Since ${X_{ij} \in \{0, 1\}}$, it is an indicator random variable for the event that ${S_{(i)}}$ and ${S_{(j)}}$ are compared.

Let ${p_{ij}}$ be the probability of this event. Then

\[\mathbb{E}[X_{ij}] = 0 \cdot (1 - p_{ij}) + 1 \cdot p_{ij} = p_{ij}\]

Thus the expectation of an indicator variable equals the probability of the indicated event.

\[\mathbb{E}[\#\text{comparisons}] = \sum_{i<j} \mathbb{E}[X_{ij}] = \sum_{i<j} p_{ij}\]

Lemma: ${S_{(i)}}$ and ${S_{(j)}}$ are compared iff ${S_{(i)}}$ or ${S_{(j)}}$ is first of ${S_{(i)}, \ldots, S_{(j)}}$ to be chosen as pivot.

Proof. Each recursive call returns ${\left[S_{(a)}, \ldots, S_{(b)}\right]}$. Let $x = S_{(c)}$ be the pivot. Suppose $a \leq i < j \leq b$, that looks like

${a}$

${\cdots}$

${i}$

${\cdots}$

${j}$

${\cdots}$

${b}$

Case 1: ${c < i}$ or ${c > j}$, ${S_{(i)}}$ and ${S_{(j)}}$ not compared now, but together in recursion.
Case 2: ${i < c < j}$, ${S_{(i)}}$ and ${S_{(j)}}$ never compared.
Case 3: ${c = i}$ or ${c = j}$, ${S_{(i)}}$ and ${S_{(j)}}$ compared once.

So decision only made when ${i < c < j}$. ${\square}$

Thus, the probability of the event that ${S_{(i)}}$ and ${S_{(j)}}$ are compared is

\[p_{ij} = \frac{2}{j + 1 - i}\]

And, we have (denote ${H_n = 1+ \frac{1}{2} + \frac{1}{3}+ \cdots \frac{1}{n}}$)

$\begin{aligned} \mathbb{E}[\#\text{comparisons}] &= \sum_{i<j} p_{ij} \\ &= \sum_{i<j} \frac{2}{j + 1 - i} \\ &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j + 1 - i} (\text{ replace } j+1-i \text{ as } k \text{, that is } j=k+i-1)\\ &= \sum_{i=1}^{n-1} \sum_{k=2}^{n+1-i} \frac{2}{k}\\ &<\sum_{i=1}^{n} \sum_{k=1}^{n} \frac{2}{k}\\ & = \sum_{i=1}^{n} 2H_n \\ & = 2n H_n \end{aligned}$ We know ${H_n = O(\ln n)}$, so the expected number of comparisons is ${O(n \ln n)}$. In fact, we can also show RandQS can finish in ${O(n \ln n)}$ with high probability. (We don’t show it here.)

PREVIOUSLec 21 Rounding and Dynamic Programming:The Bin-Packing problem

NEXTLec 23 Randomized Algs:Randomized Min-Cut