This blog will talking about the gready algorithm to solve unweighted and weighted set cover problem. And in the end we will show how to use dual problem to analyze to get better approximation bound.
Gready algorithm for unweighted set cover
Algorithm ${ 4 }$
Let ${ \hat{S_j} }$ be set of uncovered items in ${ S_j }$ at any step of any step.
Step 1: ${ I = \emptyset }$, ${\forall j=1,2, \cdots, m }$ let ${ \hat{S_j} = S_j }$
Step 2: while some ${ e_i }$ is uncovered
-
add ${ \ell }$ to ${ I }$, here ${ \lvert \hat{S_{\ell}} \lvert }$ is maximum.
-
for all ${ j }$, update ${ \hat{S_j} }$
Algorithm ${ 4 }$ is ${ H_n }$-approximation algorithm
Claim 1: Let ${ k_i }$ be the number of uncovered element before just ${ e_i }$ gets covered. ${ \sum_{i=1}^n 1/ k_i \leq H_n }$, here ${ H_n = \sum_{i=1}^n \frac{1}{i} \approx \ln n }$
Proof. without loss of generality, elements are covered in order ${ e_n, e_{n-1},\cdots, e_1 }$.
Before ${ e_i }$ gets covered, at least ${ i }$ elements are still uncovered, that means ${ k_i \geq i }$.
Hence, ${ \sum_{i=1}^n 1/ k_i \leq \sum_{i=1}^n \frac{1}{i} = H_n}$. ${ \square }$
Lemma 1: Let ${ A_1, A_2, \cdots, A_q }$ be finite sets. Then ${ \sum_{j=1}^q \vert A_j \vert \geq \vert \bigcup_{j=1}^q A_j \vert }$
Proof. it’s trivial to get. ${ \square }$
Claim 2: When adding ${ S_{\ell} }$, for each new item ${ e_i \in \hat{S_{\ell}} }$ let ${ y_i = \frac{1}{\vert \hat{S_{\ell}} \vert} }$. We have ${ y_i \leq \frac{OPT}{k_i} }$
Proof. Take arbitrary element ${ e_i \in S_{\ell} }$. For any ${ j = 1,2,\cdots, m}$, let ${ \hat{S_j} }$ be the set of uncovered element in ${ S_j }$, just before ${ e_i }$ gets covered.
By the selection rule of algorithm, ${\forall j, \vert \hat{S_{\ell}} \vert \geq \vert \hat{S_j} \vert }$. (here notice that if some ${ \hat{S_j}=\emptyset }$, we still have ${\forall j, \vert \hat{S_{\ell}} \vert \geq \vert \hat{S_j} \vert = 0}$
Let ${ I’ }$ be an optimal solution ${ OPT = \vert I’ \vert = \sum_{j \in I’} 1 \geq \sum_{j \in I’} \frac{\vert \hat{S_j} \vert}{\vert \hat{S_{\ell}} \vert} = \frac{1}{\vert \hat{S_{\ell}} \vert } \sum_{j \in I’} \vert \hat{S_j} \vert }$
By lemma 1, we have ${ \frac{1}{\vert \hat{S_{\ell}} \vert } \sum_{j \in I’} \vert \hat{S_j} \vert \geq \frac{1}{\vert \hat{S_{\ell}} \vert } \lvert \bigcup_{j \in I’} \hat{S_j} \rvert }$. Here ${ \bigcup_{j \in I’} \hat{S_j} }$ is exactly ${ k_i }$ (the number of uncovered element before just ${ e_i }$ gets covered). And by definition, ${ y_i = \frac{1}{\vert \hat{S_{\ell}} \vert } }$, we have
So, we get ${ y_i \leq \frac{OPT}{k_i} }$. ${ \square }$
Theorem: Algorithm ${ 4 }$ ${ H_n }$-approximates the optimal set cover
-
Polynomial Run-time: it’s clearly in polynomial time
-
Feasibility: by the termination condition, we will cover all the elements in ${ E }$.
-
${ H_n }$-approximation:
When we add a new set ${ S_{\ell} }$, we divide its cost (here is ${ 1 }$) to all newly covered elements. Hence, for each new item ${ e_i \in \hat{S_{\ell}} }$, its take cost ${ y_i = \frac{1}{\hat{S_{\ell}}} }$. Because, we add cost to each item. So, ${ \vert I \vert = \sum_{i =1}^n y_i }$. By claim 2, we have
Hence, Algorithm ${ 4 }$ is ${ H_n }$-approximation algorithm. ${ \square }$
Gready algorithm for weighted set cover
Algorithm ${ 5 }$
Let ${ \hat{S_j} }$ be set of uncovered items in ${ S_j }$ at any step of any step.
Step 1: ${ I = \emptyset }$, ${\forall j=1,2, \cdots, m }$ let ${ \hat{S_j} = S_j }$
Step 2: while some ${ e_i }$ is uncovered
-
add ${ \ell }$ to ${ I }$, here ${ \frac{w_{\ell}}{\lvert \hat{S_{\ell}} \lvert} }$ is minimum.
-
for all ${ j }$, update ${ \hat{S_j} }$
Algorithm ${ 5 }$ is ${ H_n }$-approximation algorithm
Claim 3: When adding ${ S_{\ell} }$, for each new item ${ e_i \in \hat{S_{ell}} }$ let ${ y_i = \frac{w_{\ell}}{\hat{S_{\ell}}} }$. We have ${ y_i \leq \frac{OPT}{k_i} }$
Proof. Take arbitrary element ${ e_i \in S_{\ell} }$. For any ${ j = 1,2,\cdots, m}$, let ${ \hat{S_j} }$ be the set of uncovered element in ${ S_j }$, just before ${ e_i }$ gets covered.
By the selection rule of algorithm, ${\forall j,\vert \hat{S_{j}} \vert \frac{w_{\ell}}{\vert \hat{S_{\ell}} \vert} \leq w_j }$. (here notice that if some ${ \hat{S_j}=\emptyset }$, we still have ${\forall j,0= \vert\hat{S_{j}} \vert \frac{w_{\ell}}{\vert \hat{S_{\ell}} \vert} \leq w_j }$
Let ${ I’ }$ be an optimal solution ${ OPT = \vert I’ \vert = \sum_{j \in I’} w_j \geq \sum_{j \in I’} w_{\ell} \frac{\vert \hat{S_j} \vert}{\vert \hat{S_{\ell}} \vert} = \frac{w_{\ell}}{\vert \hat{S_{\ell}} \vert } \sum_{j \in I’} \vert \hat{S_j} \vert }$
By lemma 1, we have ${ \frac{w_{\ell}}{\vert \hat{S_{\ell}} \vert } \sum_{j \in I’} \vert \hat{S_j} \vert \geq \frac{w_{\ell}}{\vert \hat{S_{\ell}} \vert } \lvert \bigcup_{j \in I’} \hat{S_j} \rvert }$. Here ${ \bigcup_{j \in I’} \hat{S_j} }$ is exactly ${ k_i }$ (the number of uncovered element before just ${ e_i }$ gets covered). And by definition, ${ y_i = \frac{w_{\ell}}{\vert \hat{S_{\ell}} \vert } }$, we have
So, we get ${ y_i \leq \frac{OPT}{k_i} }$. ${ \square }$
Theorem: Algorithm ${ 5 }$ ${ H_n }$-approximates the optimal set cover
-
Polynomial Run-time: it’s clearly in polynomial time
-
Feasibility: by the termination condition, we will cover all the elements in ${ E }$.
-
${ H_n }$-approximation:
When we add a new set ${ S_{\ell} }$, we divide its cost (here is ${ 1 }$) to all newly covered elements. Hence, for each new item ${ e_i \in \hat{S_{\ell}} }$, its take cost ${ y_i = \frac{1}{\hat{S_{\ell}}} }$. Because, we add cost to each item. So, ${ \sum_{j \in I} w_j = \sum_{i =1}^n y_i }$. By claim 2, we have
Hence, Algorithm ${ 5 }$ is ${ H_n }$-approximation algorithm. ${ \square }$
Dual fitting
Using dual LP for analysis, we can get better approximation bound
Let ${ S_j = \{e_1, e_2, \cdots, e_{\vert S_j\vert} \} }$. And we assume the they are covered in order ${ e_{\vert S_j\vert}, e_{\vert S_j\vert-1}, \cdots, e_1 }$
For element ${ e_i }$, just before ${ e_i }$ gets covered, the number of uncovered items in ${ S_j }$ at least ${ i }$, that is ${ \vert \hat{S_j} \vert \geq i }$.
Let ${ e_i }$ is covered by selecting ${ S_{\ell} }$ in some loop of algorithm. We have ${ y_i := \frac{w_{\ell}}{\vert \hat{S_{\ell}} \vert } \leq \frac{w_{j}}{\vert \hat{S_{j}} \vert } \leq \frac{w_{j}}{i } }$
Let ${ g = \max_j \vert S_j \vert }$, we have ${ \sum_{i \in \{i \mid e_i \in S_j\}} y_i = \sum_{i=1}^{\vert S_j\vert } y_i \leq \sum_{i=1}^{\vert S_j\vert } \frac{w_j}{i} = w_j H_{\vert S_j\vert } \leq w_j H_g}$
Let ${\forall i, y’_i = \frac{y_i}{H_g} }$, ${ y’ }$ is a feasible solution.
So, total cost ${ \sum_{j \in I}^n w_j = \sum_{i =1}^n y_i = H_g \sum_{i =1}^n y’_i \leq H_g Z_D^* = H_g Z_{LP}^* = H_g Z_{ILP}^* \leq H_g \cdot OPT}$.
Hence, we use dual fiiting get a better bound, Algorithm ${ 5 }$ is ${ H_g }$-approximation algorithm.