This blog, we still talk about randomized algorithms. Today we will focus on some analysis tools in randomized algorithms. In this blog, we want to explore the probability that some extreme event happens, that is the random variable takes “large” value. So, Markov’s and Chebyshev’s inequalities are also called Tail Inequalities.
Occupancy Problems
Imagine we have ${m}$ indistinguishable objects (“balls”), that we randomly assign to ${n}$ distinct classes (“bins”).
- What is expected maximum number of balls in any bin?
- What is the expected number of bins with ${k}$ balls?
These are called occupancy problems.
Let ${m = n \geq 3}$, and for ${i = 1, \ldots, n}$ let random variable ${X_i}$ be the number of balls in the ${i}$th bin. And ${X_i}$ is a binomial distribution, ${X_i \sim B(n,\frac{1}{n})}$.
We want to find ${k}$ such that, with very high probability, no bin contains more than ${k}$ balls. Let ${\mathcal{E}_j(k)}$ be the event that bin ${j}$ contains at least ${k}$ balls $({X_j \geq k})$.
First consider ${\mathcal{E}_i(k)}$,
\[\begin{aligned} \Pr[X_i = j] &= \binom{n}{j} \left(\frac{1}{n}\right)^j \left(1 - \frac{1}{n}\right)^{n-j} \\ &\leq \binom{n}{j} \left(\frac{1}{n}\right)^j \\ &\leq \left(\frac{ne}{j}\right)^j \left(\frac{1}{n}\right)^j \\ &= \left(\frac{e}{j}\right)^j \\ \end{aligned}\]Note: in above scaling we use Stirling’s approximation in step 3, that is ${n! = \sqrt{2 \pi n} \left(\frac{n}{e}\right)^n e^{c_n}, 0< c_n < \frac{1}{12n} }$. So we have
\[\begin{aligned} \binom{n}{j} &= \frac{n!}{j!(n-j)!} \\ &\leq \frac{n^j}{j!} \\ &\leq \frac{n^j}{\sqrt{2 \pi j} \left( \frac{j}{e}\right)^j}\\ &\leq \left(\frac{ne}{j}\right)^j \end{aligned}\]Next, we have
\[\begin{aligned} \Pr[\mathcal{E}_i(k)] &\leq \sum_{j=k}^{n} \binom{n}{j} \left(\frac{1}{n}\right)^j \\ &\leq \sum_{j=k}^{n} \left(\frac{e}{j}\right)^j \\ &\leq \left(\frac{e}{k}\right)^k \left(1 + \frac{e}{k} + \left(\frac{e}{k}\right)^2 + \cdots + \left(\frac{e}{k}\right)^{n-k+1} \right) \\ &\leq \left(\frac{e}{k}\right)^k \left(\frac{1}{1 - \frac{e}{k}}\right) \end{aligned}\]Note: in above scaling we use the union bound in step 1,
\[\begin{aligned} \Pr[\mathcal{E}_i(k)] &= Pr\left[\cup_{j=k}^n [X_i = j]\right]\\ & \leq \sum_{j=k}^n Pr[X_i = j]\\ &\leq \sum_{j=k}^{n} \binom{n}{j} \left(\frac{1}{n}\right)^j \\ \end{aligned}\]Let ${k^* = \min\{n+1, \left\lceil \frac{2e}{1-e} \frac{\ln n}{\ln \ln n} \right\rceil\}} \leq \left\lceil 3.164 \frac{\ln n}{\ln \ln n} \right\rceil$, then
\[\begin{aligned} \Pr[\mathcal{E}_i(k^*)] &\leq \left( \frac{e}{k^*} \right)^{k^*} \left( \frac{1}{1 - \frac{e}{k^*}} \right) \leq n^{-2} \end{aligned}\]The same holds for all ${i}$, so
\[\begin{aligned} \Pr\left[\bigcup_{i=1}^n \mathcal{E}_i(k^*)\right] &\leq \sum_{i=1}^n \Pr[\mathcal{E}_i(k^*)] \leq \frac{1}{n} \end{aligned}\]We have shown:
Theorem With probability at least ${1 - \frac{1}{n}}$, every bin has less than ${k^* = \min\{n+1, \left\lceil \frac{2e}{1-e} \frac{\ln n}{\ln \ln n} \right\rceil\}} $ balls in it.
Birthday Problem
Suppose ${m}$ balls are randomly assigned to ${n}$ bins. What is the probability that all balls land in distinct bins?
For ${n = 365}$ the question can be interpreted as “how large must a group of people be before it is likely two people have the same birthday”?
Let ${\mathcal{E}_i}$ be the event that the ${i}$th ball lands in an empty bin. From previous blog we know:
\[\begin{aligned} \Pr\left[\bigcap_{i=2}^m \mathcal{E}_i\right] &= \Pr[\mathcal{E}_2] \cdot \Pr[\mathcal{E}_3 | \mathcal{E}_2] \cdot \ldots \cdot \Pr\left[\mathcal{E}_m | \cap_{i=2}^{m-1} \mathcal{E}_i\right]\\ &= \prod_{i=2}^m \left(1 - \frac{i - 1}{n}\right)\\ &\leq \prod_{i=2}^m e^{-\frac{i-1}{n}} = e^{-\frac{m(m-1)}{2n}} \end{aligned}\]For ${m \geq \sqrt{2n} + 1}$, the probability that all are distinct is at most ${1/e}$.
Markov’s Inequality
Theorem: Let $Y$ be a random variable taking only non-negative values. Then for all $t > 0$:
\[\Pr[Y \geq t] \leq \frac{\mathbb{E}[Y]}{t}\]equivalently, for $k > 0$:
\[\Pr[Y \geq k\mathbb{E}[Y]] \leq \frac{1}{k}\]Proof. Let $Z$ be indicator variable for the event $Y \geq t$, that is
\[\begin{equation} Z = \begin{cases} 1 & Y \geq t\\ 0 & Y < t \end{cases} \end{equation}\]Then $Z \leq \frac{Y}{t}$, and thus
\[\Pr[Y \geq t] =\Pr[Z=1]=0 \cdot \Pr[Z=0]+1 \cdot\Pr[X=1]= \mathbb{E}[Z] \leq \mathbb{E}\left[\frac{Y}{t}\right] = \frac{\mathbb{E}[Y]}{t}\]Setting $t = k\mathbb{E}[Y]$ we get
\(Pr[Y \geq k\mathbb{E}[Y]] = Pr[Y \geq t] \leq \frac{\mathbb{E}[Y]}{t} = \frac{1}{k}\) Hence, we prove the Markov’s Inequality. ${\square}$
There’s another proof, I think, which is easier to understand
Proof. By definition, $\mathbb{E}[Y]={\sum_{y} y \Pr[Y=y]}$, we have
\[\begin{aligned} \mathbb{E}[Y] &={\sum_{y} y \Pr[Y=y]}\\ &= {\sum_{y\geq a} y \Pr[Y=y]} + {\sum_{y<a} y \Pr[Y=y]}\\ & \geq {\sum_{y\geq a} a \Pr[Y=y]} + 0 \\ & = a \Pr[Y\geq a] \end{aligned}\]Hence, we have ${\Pr[Y \geq t] \leq \frac{\mathbb{E}[Y]}{t}}$. ${\square}$
Chebyshev’s Inequality
Given a random variable $X$ with expectation $\mathbb{E}[X] = \mu_X$, define its variance as $\sigma_X^2 := \mathbb{E}[(X - \mu_X)^2]$, and its standard deviation as $\sigma_X := \sqrt{\mathbb{E}[(X - \mu_X)^2]}$.
Theorem: Let $X$ be a random variable with expectation $\mu_X$ and standard deviation $\sigma_X$. Then for all $t > 0$:
\[\Pr[|X - \mu_X| \geq t\sigma_X] \leq \frac{1}{t^2}\]Proof. Let $k = t^2$ and $Y = (X - \mu_X)^2$. Then $\sigma_X^2 = \mathbb{E}[Y]$ (by definition) and
\[\Pr[|X - \mu_X| \geq t\sigma_X] = \Pr[(X - \mu_X)^2 \geq t^2\sigma_X^2] = \Pr[Y \geq k \mathbb{E}[Y]]\]By Markov’s inequality, we have
\[\Pr[Y \geq k \mathbb{E}[Y]] \leq \frac{1}{k} = \frac{1}{t^2}\]Summing 2-Independent Variances
Random variables $X_1, \ldots, X_m \in \mathcal{X}$ are pairwise independent iff for all $i \neq j$ and all possible $x, y \in \mathbb{R}$, $\Pr[X_i = x \mid X_j = y] = \Pr[X_i = x]$.
Claim: If $X_1, \ldots, X_m \in \mathcal{X}$ are pairwise independent, for all $i \neq j$ we have ${\mathbb{E}[X_i X_j] =\mathbb{E}[X_i]\mathbb{E}[X_j] }$.
Proof. ${\Pr[X_i = x \cap X_j = y] = \Pr[X_i = x \mid X_j = y] \cdot \Pr[X_j = y] = \Pr[X_i = x]\Pr[X_j = y] }$, then we have \(\begin{aligned} \mathbb{E}[X_i X_j] &= \sum_{x} \sum_{y} xy \Pr[X_i = x \cap X_j = y]\\ & = \sum_{x} \sum_{y} xy \Pr[X_i = x]\Pr[X_j = y] \\ & = \left(\sum_{x} x \Pr[X_i = x]\right)\left(\sum_{y} y \Pr[X_j = y]\right) \end{aligned}\)
Thus, we prove ${\mathbb{E}[X_i X_j] =\mathbb{E}[X_i]\mathbb{E}[X_j] }$. ${\square}$
Lemma: Let $X_1, \ldots, X_m$ be pairwise independent random variables, and let $X = \sum_{i=1}^m X_i$. Then $\sigma_X^2 = \sum_{i=1}^m \sigma_{X_i}^2$.
Proof. Let ${\mu_i = \mathbb{E}[X_i]}$ and ${\mu = \sum_{i=1}^m \mu_i}$. By definition,
\[\begin{aligned} \sigma_X^2 &= \mathbb{E}[(X - \mu)^2] \\ &= \mathbb{E}\left[\left(\sum_{i=1}^m X_i - \mathbb{E}[X]\right)^2\right]\\ & =\mathbb{E}\left[\left(\sum_{i=1}^m X_i - \mathbb{E}\left[\sum_{i=1}^m X_i\right]\right)^2\right] \\ & = \mathbb{E}\left[\left(\sum_{i=1}^m X_i - \sum_{i=1}^m \mathbb{E}[X_i]\right]\right)^2\\ &= \mathbb{E}\left[\left(\sum_{i=1}^m (X_i - \mu_i)\right)^2\right] \\ &= \left(\sum_{i=1}^m E[X_i - \mu_i]\right)^2 \\ &= \sum_{i=1}^m \mathbb{E}[(X_i - \mu_i)^2] + 2 \sum_{i<j} \mathbb{E}[(X_i - \mu_i)(X_j - \mu_j)] \text{ (By claim)}\\ &= \sum_{i=1}^m \mathbb{E}[(X_i - \mu_i)^2] + 2 \sum_{i<j} \mathbb{E}[X_i - \mu_i]\mathbb{E}[X_j - \mu_j] \\ &= \sum_{i=1}^m \sigma_{X_i}^2 + 2 \sum_{i<j} 0 \cdot 0 \end{aligned}\]Hence, we prove $\sigma_X^2 = \sum_{i=1}^m \sigma_{X_i}^2$. ${\square}$