This blog is still talking about gready algorithm and local search. And this blog will focus on the k-center problem.
Statement of Problem
k-center problem
- Given graph $G = (V, E)$, and an integer $k$.
- For each vertex pair $(i, j) \in V, d_{ij}$ is the distance between $i$ and $j$; $d_{ij} = 0$ if $i=j$; $d_{ij} = d_{ji}$.
- Satisfy triangular inequality: for each $i, j, \ell \in V$: $d_{ij} + d_{j\ell} \geq d_{i\ell}$.
- Distances model similarity: vertices close to each other are similar and far implies less similar.
- Goal: find $k$ clusters by grouping similar points in one cluster while minimizing the radius of each cluster.
- Goal(formal):
- find $S \subseteq V$ of size $k$, where each vertex of $S$ represents a cluster center.
- Each other vertex assign itself to closest cluster center. For vertex $i$ and set $S$, their distance is: $d(i, S) = \min_{j \in S} d_{ij}$. Radius of $S = \max_{i \in V} d(i, S)$.
- Goal is to find $k$ centers with minimum radius.
Algorithm and Analysis
Algorithm:
-
Step 1: Choose the first center arbitrarily. ${S \leftarrow {i}}$
-
Step 2: Next choose centers one by one until all $k$ centers are chosen and always choose the point that is furthest from the all currently chosen centers. In specific,
\[j \leftarrow \arg\max_{j\in V} d(j,S), S \leftarrow S \cup \{j\}\]
Theorem: The greedy algorithm finds a 2-approximation of k-center.
Proof.
- $S^*$: optimal set of centers; $r^*$: optimal radius.
- Each center in $S$ is connected to some center in $S^*$ and the distance is $\leq r^*$ (because ${r^* = \max_{i\in V} d(i,S)}$). Connect each center of $S$ to nearest center of $S^*$.
Case 1: Each center in $S^*$ is connected to exactly one center in $S$.
a) for each $v \in V$, $\exists s’ \in S^*$, such that $d(s’, v) \leq r^*$. b) let center $s’$ is connected to center $s \in S$, $d(s’, s) \leq r^*$. c) by triangular inequality: $d(s, v) \leq d(s’,s) + d(s’,v) \leq 2r^*$.
Case 2: There exists one center $j \in S^*$ is connected to at least two centers $j’, j’’ \in S$.
a) by triangular inequality: $d(j’, j’’) \leq d(j’, j) + d(j, j’’) \leq 2r^*$. b) w.l.g. let greedy algorithm choose $j’$ first. c) when algorithm choose $j’’$, it was farthest from all chosen centers. By algorithm, ${\forall i \in V, d(j’‘,C) \geq d(i,C)}$, where ${C}$ is the current chosen centers’ set. So, we have \(d_{j'j''} \geq d(j'',C) \geq d(i,C) \geq d(i,S), \forall i \in V\)
d) as $d(j’, j’’) \leq 2r^*$, we know any point has distance at most $2r^*$ from $S$.
In summary, the greedy algorithm is a 2-approximation algorithm. $\square$