Gready Alg and Local Search:Parallel Macines Job Scheduling

 

This blog is still talking about greedy algorithm and local search. And this blog will focus on the problem “Scheduling jobs on identical parallel machines”.

Statement of Problem

Parallel machine scheduling

  • Given ${n}$ jobs and ${m}$ machines each able to process one job at a time.
  • Each job is associated with a process time ${p_j}$. All jobs are available at time ${0}$.
  • Solution: assign ${n}$ jobs to ${m}$ machines.
  • Cost: If we complete job $j$ at time ${C_j}$, cost of $j$ is ${C_j}$.
  • Objective: minimize (makespan) ${C_{\text{max}} = \max_j C_j}$
  • NP Hard!!

Algorithm 1 (list scheduling)

Algorithm:

  • Step 1: Place the jobs in arbitrary order.
  • Step 2: Following this order place the jobs one by one on a machines which has the smallest workload.

It’s trivial that the algorithm runs in poly time as each job is scheduled exactly once.

Analysis:

Theorem: Algorithms 1 is 2-approximation algorithms.

Proof. First, we can have the lower bound of the optimal schedule.

  1. Length of the OPT schedule is at least length of the longest job: ${C^*_{\text{max}} \geq \max_j p_j}$

  2. Length of the OPT schedule is at least the average of machine load: ${C^*_{\text{max}} \geq \frac{1}{m} \sum_{j} p_j}$

Them, let’s analysis the upper bound of the makespan generate by algorithm.

  1. Denote the length ${C_{\text{max}}}$ be the makespan of the algorithms.

  2. Suppose job $\ell$ completes last, which implies ${C_{\text{max}} = C_\ell}$, and denote its start time be ${S_\ell}$.

  3. No machine has load smaller than ${S_l}$, otherwise we should put $\ell$ into that machine.

  4. We have

\[\begin{aligned} C_{\text{max}} &= C_l = S_l + p_l \\ & \leq \frac{1}{m} \sum_{j\neq \ell} p_j + p_l \quad \text{By item 3}\\ &= \frac{1}{m} \sum_{j=1}^m p_j + \left( 1 - \frac{1}{m} \right)p_l \\ & \leq C^*_\text{max} + \left( 1 - \frac{1}{m} \right)C^*_\text{max} \quad \text{By lower bound of OPT} \\ &\leq \left( 2 - \frac{1}{m} \right)C^*_{\text{max}} \\ \end{aligned}\]

Note: the lower bound of OPT, ${C^*_{\text{max}} \geq \frac{1}{m} \sum_{j=1}^m p_j}$ and ${C^*_{\text{max}} \geq p_{\text{max}} \geq p_l}$.

Algorithm 2 (Longest processing time first, LPT)

Algorithm:

  • Step 1: Order jobs by their processing time: ${p_1 \geq p_2 \geq \ldots \geq p_n}$.
  • Step 2: Apply the list scheduling algorithm with this order.

It’s trivial that the algorithm runs in poly time as each job is scheduled exactly once.

Analysis

Lemma 1: Given $2n$ numbers ${a_n \geq a_{n-1} \geq \cdots \geq a_{1} \geq b_1 \geq b_{2} \geq \cdots \geq b_{n} }$, divide them as $n$ pairs ${P_1,P_2,\cdots,P_n}$, our goal is to minimize ${\max_{i} { \sum_{e\in P_i} e }}$. Hence, ${P_i = {a_i,b_i}, i=1,2,\cdots, n}$ is the minimum way to pair.

Proof. In the pairing way mentioned in lemma 1, we denote the maximum pair is ${a_{\ell},b_{\ell}}$. We will show any other pairing way will leads to a pair ${P_i}$ such that ${\sum P_i \geq a_{\ell}+ b_{\ell}}$.

  1. If pairing way contains a pair like ${a_{\ell}, a_j}$. We can have ${a_{\ell} + a_j \geq a_{\ell}+ b_{\ell}}$, because ${b_{\ell} \leq a_j}$.

  2. If pairing way contains a pair like ${a_{\ell}, b_j, j \neq \ell}$

    • If $j < \ell$, We have $a_{\ell} + b_j \geq a_{\ell}+ b_{\ell}$ because $b_j \geq b_{\ell}$

    • Else, $j > \ell$, so there exists some $k > \ell$ such that $a_k$ is paired with $b_i$, here $i < j$, otherwise $a_{\ell+1},\cdots, a_{n}$ need pair to $b_{j+1},\cdots,b_n$, which is impossible. Hence, we have $a_k \geq a_{\ell}, b_i \geq b_j$. That implies $a_{k} + b_j \geq a_{\ell}+ b_{\ell}$.

Thus we proved Lemma 1. $\square$

Lemma 2: For any input to the problem of minimizing the makespan on identical parallel machines for which the processing requirement of each job is more than one third the optimal makespan, the longest processing time rule (LPT rule) computes an optimal schedule. In another word, ${C^*\text{max} = C\text{max}}$.

Proof. Due to ${p_n > C^*_{\text{max}} / 3}$, we know each machine can have at most ${2}$ jobs, otherwise makespan will greater than ${3p_n \geq C^*_{\text{max}}}$. Hence, we know the total number of jobs will not greater than ${2m}$.

If the total number of jobs is less or equal to $m$, it’s trivial that longest processing time (LPT) rule returns the optimal schedule.

Suppose the total number of jobs is ${m+k, 0 < k \leq m}$. Let’s sort the process time in non-increasing order and denote them as ${f_1,f_2,\cdots, f_{m-k}, \ell_1,\ell_2,\cdots \ell_k, s_1,s_2,\cdots,s_k }$, which satisfying

\[\begin{equation} f_1 \geq f_2 \geq \cdots \geq f_{m-k} \geq \ell_1\geq \ell_2 \geq \cdots \ell_k \geq s_k \geq s_{k-1} \geq \cdots \geq s_1 \nonumber \end{equation}\]

Claim 1: Longest processing time (LPT) rule can return schedule as ${f_1,f_2,\cdots, f_{m-k}}$ are separately assigned to $m-k$ machines, and for remaining $k$ machine, each of them is loaded by some ${\ell_i}$ and ${s_i}$, here ${i = 1,2,\cdots, k}$

Without loss of generality, we assign jobs to machines in reverse order.

First, by LPT rule, we assign ${f_1,f_2,\cdots, f_{m-k}}$ to machine ${m^{\text{th}},(m-1)^{\text{th}},\cdots, (k+1)^{\text{th}}}$ and ${\ell_1,\ell_2,\cdots \ell_k}$ to machine ${k^{\text{th}},(k-1)^{\text{th}},\cdots,1^{\text{st}}}$.

Then ${s_k}$ will be assigned to the machine with smallest current complete time, i.e., ${1^{\text{st}}}$ machine and paired with job ${\ell_k}$.

Induction Hypothesis: ${s_i}$ is paired with ${\ell_i}$ on one identical machine, ${k \geq i}$.

Let’s consider job ${s_{i-1}}$, because ${s_i,s_{i+1},\cdots,s_k}$ are paired with ${\ell_i,\ell_{i+1},\cdots,\ell_k}$, so ${1^{\text{st}},2^{\text{nd}},\cdots, 1^{\text{th}}}$ machine are already occupied by $2$ jobs. Hence, we can only assign job ${s_{i-1}}$ to remaining machines, and the machine with smallest current complete time is ${(i-1)^{\text{th}}}$ machine. So, ${s_{i-1}}$ is paired with ${\ell_{i-1}}$.

By induction, we prove the claim 1. Q.E.D.

For the schedule generated by LPT rule as above, we have two cases

  • If some $f_j = C_{\text{max}}, 1\leq j \leq m-k$, we can get $f_j$ is the longest job, so $f_j \leq C_{\text{max}}^*$. Hence, $C_{\text{max}} \leq C_{\text{max}}^*$, which implies $C_{\text{max}} = C_{\text{max}}^*$

  • If some $\ell_i + s_i = C_{\text{max}}, 1\leq i \leq k$, by lemma 1, we know it’s the minimum way to schedule. So, it’s an optimal schedule by LPT.

Thus, we prove Lemma 2. $\square$

Theorem: LPT is a $4/3$-approximation algorithms.

Proof. We will show it by induction.

  • ${n = 1}$: trivial.

  • Induction Hypothesis: LPT is a $4/3$-approximation algorithms for ${n’ < n}$.

Let’s denote the instance $I$, which has $n$ jobs; And the LPT generate schedule $\sigma$; The makespan returned by algorithm is ${C_{\text{max}}}$. Denote the last complete job is ${\ell}$.

Case 1: if ${\ell < n}$, that means $C_\text{max} = C_\ell > C_n $

  • We can remove job n and let the remaining schedule as $\sigma’$; And we know the schedule ${\sigma}$ and ${\sigma’}$ has the same length. Denote the remaining jobs as instance $I’$.

  • By induction $C^{\sigma’}_{\text{max}}$ is at most $4/3$ times optimal length of $I’$. And we know the optimal schedule of ${I}$ is greater or equal to $I’$. We have

\[C^{\sigma}_{\text{max}} = C^{\sigma'}_{\text{max}} \leq 4/3 \cdot OPT(I') \leq 4/3 \cdot OPT(I)\]

Case 2: if $l = n$

  • Let’s denote $S_n$ as the start time of job $n$
  • We have ${C_{\text{max}} = S_n + p_n \leq \frac{1}{m} \left( \sum_{j} p_j \right) + p_n \leq C^*_{\text{max}} + p_n}$

    • If $p_n \leq \frac{C^*_{\text{max}}}{3}$, it’s trivial to get ${C_\text{max} \leq 4/3 \cdot C^*_\text{max}}$.

    • Otherwise $p_n > \frac{C^*_{\text{max}}}{3}$. By lemma 2, we have $C_{\text{max}} = C^*_\text{max}$.

By induction, we prove that LPT generate a $4/3$-approximation algorithms. $\square$