Dynamic Programming:Weighted interval Scheduling

 

This blog I will share my notes in Penn State CSE 565: Algorithm Design and Analysis.

Dynamic Program

key point: divide into small instances

  1. Recursivly divide the main problem into sub problem which could be overlapping.

  2. start by solving the subproblems and then combind them

ex. max subarray problem

  1. In order to get an optimal solution for the problem, we first need to find the optimal solution for the subproblem. Then combine these solution of the smaller subprblems to find a solution of the larger ones.

  2. However, unlike greedy in the DP, combing the subproblems are much more non-trivial.

Problems

  1. Weighted Interval scheduling

  2. Knapsack problem

  3. Fibonicci number

  4. RNA folding

  5. sequence alignment

    • Edit distance (ED)
    • Longest common sequence (LCS)

Weighted interval Scheduling

we are given a set of jobs ${ 1 \cdots n }$

  1. Here job j starts at time stamp ${ s_j }$ finishes at ${ f_j }$, and has a weight ${ w_j }$.

  2. Two jobs i,j are compatible, if they do not overlap, that is either

$$ s_j > f_i \text{ or } s_i > f_j$$
  • Objective: find a subset of mutually compatible jobs that maximizes the weight.

Unweighted Interval Scheduling

Let’s consider the easy version. We are going to ignore the weights.

For each ${ w_j = 1}$. Thus the objective is to maximize the size of the set of mutually compatible jobs. We will use a greedy algorithm

Algorithm

  1. Consider the jobs in increasing order of their finish time ${ f_1 \leq f_2 \cdots \leq f_n }$

  2. At the ${ i^th }$ round, consider job i, and if it’s compatible with all the previously selected jobs, add i

Correctness

Let’s prove its correctness.

Proof. Let’s denote the set got by greedy algorithm as ${ I ={ i_1,i_2,\cdots,i_k }}$. And the optimal solution as ${ J= { j_1,\cdots,j_m } }$.

Assume ${ i_1,i_2,\cdots,i_k }$ is not the optimal set. So ${ m > k }$

For ${ q = 1,2,\cdots, k }$, compare ${ i_q \text{ and } j_q }$. If ${ i_q = j_q }$, move to next comparison. If not, we can replace ${ j_q }$ as ${ i_q }$. Because ${ \forall p< q, i_p = j_p }$ and we sort finish time at first, that is,${ f_{i_q} < f_{j_q}}$, this operation still keep solution ${ J }$ optimal.

When ${ q = k+1 }$, we cannot find another feasible job, otherwise, we can extend solution ${ I }$. Therefore, ${ m = k }$, which contradicts to ${ m>k }$. Thus, solution ${ I }$ is optimal.

DP Algorithm

Does it workvfor weighed interval scheduling? No!

Take a example:

job ${ a }$: from 1 min to 100 min, weight 10

job ${ b }$: from 2 min to 5 min, weight 1

job ${ c }$: from 7min to 10 min, weight 2

In this example, for greedy algorithm, we get result ${ b ,c }$. But, this is not the optimal solution.

We will introduce Dynamic Programming to solve “Weighted Interval Scheduling”.

algorithm in detail

  1. Order the jobs in increasing order of finish time.

  2. For a job ${ j }$, define

$$ P(j) = \text{ largest index } i, i < j, \text{ such that } i,j \text{ are non-overlapping}. $$
  • DP Formulation
$$ OPT(j) = \text{ value of the optimal solution of } \{1,2,...,j\} $$

For base case, ${ OPT(j) = 0 }$ if ${ j = 0 }$.

Otherwise, we have two cases

Case 1: ${ OPT(j) }$ includes job ${ j }$

-> That means: all jobs from ${ {P(j)+1, …, j-1} }$ are incompatible with ${ j }$

-> Find ${ OPT(P(j)) }$

-> Add ${ w_j }$ to ${ OPT(P(j)) }$

Case 2: ${ OPT(j) }$ does not include job ${ j }$

-> Find ${ OPT(j-1) }$

-> We don’ include job ${ j }$, so just return ${ OPT(j-1) }$

Therefore the state transition equation is

$$ OPT(j) = \begin{cases} 0 \text{ if } j = 1\\ max \{w_j + OPT(P(j)), OPT(j-1) \}, \text{ otherwise.} \end{cases} $$

Pseudocode

1
2
3
4
5
6
7
8
9
10
Weighted-Interval-Scheduling((s_1,f_1,w_1), ..., (s_n,f_n,w_n)):
    Sorting as finish time f_j
    Compute P(1) ... P(n)
    return Compute_OPT(n)

Compute_OPT(j):
    if j = 0
        return 0
    else
        return max(w_j + Compute_OPT(P(j)), Compute_OPT(j-1))

Correctness

We will prove following theorem to prove the correctness of this dynamic programming algprithm

Theorem: Compute_OPT(j) return the optimal value of job ${ { 1, 2, \cdots, j } }$

Proof: Base case ${ j = 0 }$,${ OPT(0) = 0 }$, correct!

Induction step

Let for all ${ i < j }$, Compute_OPT(i) is the optimal value

  • Due P(j) < j, Computer_OPT(P(j)) is optimal

  • Due j-1 < j, Compute_OPT(j-1) is optimal

If we include ${ j }$, we have to discard all jobs after ${ P(j) }$. In this case, optimal value of ${ { 1, 2, \cdots, j } }$ is optimal value of ${ P(j) }$ add ${ w_j }$.

If don’t have ${ j }$, we just need to compute optimal value of ${ { 1, 2, \cdots, j-1 } }$, which is Compute_OPT(j-1).

According to definition, Compute_OPT(j) ${ = max {w_j + OPT(P(j)), OPT(j-1) }}$. Because we only have two cases and we select the max one, Compute_OPT(j) indeed the optimal value of ${ { 1, 2, \cdots, j } }$.

Memorizing Recursion

In this Dynamic programming algorithm, for each ${ OPT(n) }$ we need to calculate ${ OPT(n-1) }$ and ${ OPT(P(n)) }$. The running time is exponential. If we calculate from ${ OPT(0) }$ to ${ OPT(j) }$ and record these values. It’s a ${ O(n\lg n) }$ algorithm, because sorting algorithm is ${ O(n \lg n) }$ and the other part of algorithm is linear time.

Here is the pseudocode of “Memorizing Recursion

1
2
3
4
5
6
7
8
Compute_OPT(n):
    // Initialization
    for j = 0 to n
        M[j] = 0
    // Memorizing Recursion
    for j = 1 to n
        M[j] = max(w_j + Compute_OPT(P(j)), Compute_OPT(j-1))
    return M[n]

Identifing jobs

To identify all the jobs in the optimal solution, we use the following psudocode to do it, which check all the jobs from post to front.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Find_jobs(n):
    // Initialization
    for j = 0 to n
        Jobs[j] = 0
    // check from post
    Check_job(n)

    //Def Check_job(n)
    Check_job(i)
    if (i == 0)
        return 0
    else if (w_j + Compute_OPT(P(j)) > Compute_OPT(j-1))
        Jobs[i] = 1
        Check_job(P(i))
    else
        Check_job(i-1)
    Return Jobs