Randomized Algs:Randomized QuickSort

 

From this blog, we will touch on the randomized algorithms. And today we will focus on Randomized Quicksort Algorithm.

Why we need Randomized Algorithms?

  • Faster, but weaker guarantees.
  • Simpler code, but harder to analyze.
  • Sometimes only option, e.g. Big Data, Machine Learning, Security, etc.

First, Let’s review the QuickSort Algorithm

1
2
3
4
5
6
7
8
function QS(S)  //Assumes all elements in S are distinct.
   if |S| ≤ 1 then
      return S
   else
      Pick pivot x ∈ S
      L ← {y ∈ S | y < x}
      R ← {y ∈ S | y > x}
      return QS(L) + [x] + QS(R)

For details, you can check it in another blog. We know the average-case running time is ${O(n \lg n)}$ and worst-case running time is ${O(n^2)}$ when the input array is in reverse order.

Randomized Quicksort

We just need to modify the strategy in “pivot selecting” step.

1
2
3
4
5
6
7
8
9
function RandQS(S)
   Assumes all elements in S are distinct.
   if |S| ≤ 1 then
      return S
   else
      Pick pivot x ∈ S, uniformly at random
      L ← {y ∈ S | y < x}
      R ← {y ∈ S | y > x}
      return RandQS(L) + [x] + RandQS(R)

Following, we will show the analysis of Randomized QuickSort.

Analysis of Rand QuickSort

Let ${\left[S_{(1)}, \ldots, S_{(n)}\right] := \text{RandQS}(S)}$.

For ${i < j}$ let random variable ${X_{ij} \in \{0, 1\}}$ be the number of times that ${S_{(i)}}$ and ${S_{(j)}}$ are compared.

\[\mathbb{E}[\#\text{comparisons}] = \mathbb{E}\left[\sum_{1 \leq i < j \leq n} X_{ij}\right] = \sum_{1 \leq i < j \leq n} \mathbb{E}[X_{ij}]\]
Here we uses linearity of expectation $$ \begin{equation} \begin{aligned} \mathbb{E}[A + B] &= \sum_x \sum_y (x+y) \mathbb{P}(X=x,Y=y)\\ &= \sum_x \sum_y x \mathbb{P}(X=x,Y=y) + \sum_x \sum_y y \mathbb{P}(X=x,y=y)\\ &=\sum_x x \sum_y \mathbb{P}(X=x,Y=y) + \sum_y y \sum_x \mathbb{P}(X=x,Y=y) \\ &= \sum_x \sum_y x \mathbb{P}(X=x) + \sum_x \sum_y y \mathbb{P}(Y=y) \\ &=\mathbb{E}[A] + \mathbb{E}[B] \end{aligned} \end{equation} $$ Note: we don't need ${X}$ and ${Y}$ are **independent** here.

Since ${X_{ij} \in \{0, 1\}}$, it is an indicator random variable for the event that ${S_{(i)}}$ and ${S_{(j)}}$ are compared.

Let ${p_{ij}}$ be the probability of this event. Then

\[\mathbb{E}[X_{ij}] = 0 \cdot (1 - p_{ij}) + 1 \cdot p_{ij} = p_{ij}\]

Thus the expectation of an indicator variable equals the probability of the indicated event.

\[\mathbb{E}[\#\text{comparisons}] = \sum_{i<j} \mathbb{E}[X_{ij}] = \sum_{i<j} p_{ij}\]

Lemma: ${S_{(i)}}$ and ${S_{(j)}}$ are compared iff ${S_{(i)}}$ or ${S_{(j)}}$ is first of ${S_{(i)}, \ldots, S_{(j)}}$ to be chosen as pivot.

Proof. Each recursive call returns ${\left[S_{(a)}, \ldots, S_{(b)}\right]}$. Let $x = S_{(c)}$ be the pivot. Suppose $a \leq i < j \leq b$, that looks like

${a}$ ${\cdots}$ ${i}$ ${\cdots}$ ${j}$ ${\cdots}$ ${b}$
  • Case 1: ${c < i}$ or ${c > j}$, ${S_{(i)}}$ and ${S_{(j)}}$ not compared now, but together in recursion.

  • Case 2: ${i < c < j}$, ${S_{(i)}}$ and ${S_{(j)}}$ never compared.

  • Case 3: ${c = i}$ or ${c = j}$, ${S_{(i)}}$ and ${S_{(j)}}$ compared once.

So decision only made when ${i < c < j}$. ${\square}$

Thus, the probability of the event that ${S_{(i)}}$ and ${S_{(j)}}$ are compared is

\[p_{ij} = \frac{2}{j + 1 - i}\]

And, we have (denote ${H_n = 1+ \frac{1}{2} + \frac{1}{3}+ \cdots \frac{1}{n}}$)

\(\begin{aligned} \mathbb{E}[\#\text{comparisons}] &= \sum_{i<j} p_{ij} \\ &= \sum_{i<j} \frac{2}{j + 1 - i} \\ &= \sum_{i=1}^{n-1} \sum_{j=i+1}^n \frac{2}{j + 1 - i} (\text{ replace } j+1-i \text{ as } k \text{, that is } j=k+i-1)\\ &= \sum_{i=1}^{n-1} \sum_{k=2}^{n+1-i} \frac{2}{k}\\ &<\sum_{i=1}^{n} \sum_{k=1}^{n} \frac{2}{k}\\ & = \sum_{i=1}^{n} 2H_n \\ & = 2n H_n \end{aligned}\) We know ${H_n = O(\ln n)}$, so the expected number of comparisons is ${O(n \ln n)}$. In fact, we can also show RandQS can finish in ${O(n \ln n)}$ with high probability. (We don’t show it here.)