Gready Alg and Local Search:The k-center problem

 

This blog is still talking about gready algorithm and local search. And this blog will focus on the k-center problem.

Statement of Problem

k-center problem

  • Given graph $G = (V, E)$, and an integer $k$.
  • For each vertex pair $(i, j) \in V, d_{ij}$ is the distance between $i$ and $j$; $d_{ij} = 0$ if $i=j$; $d_{ij} = d_{ji}$.
  • Satisfy triangular inequality: for each $i, j, \ell \in V$: $d_{ij} + d_{j\ell} \geq d_{i\ell}$.
  • Distances model similarity: vertices close to each other are similar and far implies less similar.
  • Goal: find $k$ clusters by grouping similar points in one cluster while minimizing the radius of each cluster.
  • Goal(formal):
    1. find $S \subseteq V$ of size $k$, where each vertex of $S$ represents a cluster center.
    2. Each other vertex assign itself to closest cluster center. For vertex $i$ and set $S$, their distance is: $d(i, S) = \min_{j \in S} d_{ij}$. Radius of $S = \max_{i \in V} d(i, S)$.
    3. Goal is to find $k$ centers with minimum radius.

Algorithm and Analysis

Algorithm:

  • Step 1: Choose the first center arbitrarily. ${S \leftarrow {i}}$

  • Step 2: Next choose centers one by one until all $k$ centers are chosen and always choose the point that is furthest from the all currently chosen centers. In specific,

    \[j \leftarrow \arg\max_{j\in V} d(j,S), S \leftarrow S \cup \{j\}\]

Theorem: The greedy algorithm finds a 2-approximation of k-center.

Proof.

  • $S^*$: optimal set of centers; $r^*$: optimal radius.
  • Each center in $S$ is connected to some center in $S^*$ and the distance is $\leq r^*$ (because ${r^* = \max_{i\in V} d(i,S)}$). Connect each center of $S$ to nearest center of $S^*$.

Case 1: Each center in $S^*$ is connected to exactly one center in $S$.

a) for each $v \in V$, $\exists s’ \in S^*$, such that $d(s’, v) \leq r^*$. b) let center $s’$ is connected to center $s \in S$, $d(s’, s) \leq r^*$. c) by triangular inequality: $d(s, v) \leq d(s’,s) + d(s’,v) \leq 2r^*$.

Case 2: There exists one center $j \in S^*$ is connected to at least two centers $j’, j’’ \in S$.

a) by triangular inequality: $d(j’, j’’) \leq d(j’, j) + d(j, j’’) \leq 2r^*$. b) w.l.g. let greedy algorithm choose $j’$ first. c) when algorithm choose $j’’$, it was farthest from all chosen centers. By algorithm, ${\forall i \in V, d(j’‘,C) \geq d(i,C)}$, where ${C}$ is the current chosen centers’ set. So, we have \(d_{j'j''} \geq d(j'',C) \geq d(i,C) \geq d(i,S), \forall i \in V\)

d) as $d(j’, j’’) \leq 2r^*$, we know any point has distance at most $2r^*$ from $S$.

In summary, the greedy algorithm is a 2-approximation algorithm. $\square$