This blog I will share my notes in Penn State CSE 565: Algorithm Design and Analysis.
Dynamic Program
key point: divide into small instances
-
Recursivly divide the main problem into sub problem which could be overlapping.
-
start by solving the subproblems and then combind them
ex. max subarray problem
-
In order to get an optimal solution for the problem, we first need to find the optimal solution for the subproblem. Then combine these solution of the smaller subprblems to find a solution of the larger ones.
-
However, unlike greedy in the DP, combing the subproblems are much more non-trivial.
Problems
-
Weighted Interval scheduling
-
Knapsack problem
-
Fibonicci number
-
RNA folding
-
sequence alignment
- Edit distance (ED)
- Longest common sequence (LCS)
Weighted interval Scheduling
we are given a set of jobs ${ 1 \cdots n }$
-
Here job j starts at time stamp ${ s_j }$ finishes at ${ f_j }$, and has a weight ${ w_j }$.
-
Two jobs i,j are compatible, if they do not overlap, that is either
- Objective: find a subset of mutually compatible jobs that maximizes the weight.
Unweighted Interval Scheduling
Let’s consider the easy version. We are going to ignore the weights.
For each ${ w_j = 1}$. Thus the objective is to maximize the size of the set of mutually compatible jobs. We will use a greedy algorithm
Algorithm
-
Consider the jobs in increasing order of their finish time ${ f_1 \leq f_2 \cdots \leq f_n }$
-
At the ${ i^th }$ round, consider job i, and if it’s compatible with all the previously selected jobs, add i
Correctness
Let’s prove its correctness.
Proof. Let’s denote the set got by greedy algorithm as ${ I ={ i_1,i_2,\cdots,i_k }}$. And the optimal solution as ${ J= { j_1,\cdots,j_m } }$.
Assume ${ i_1,i_2,\cdots,i_k }$ is not the optimal set. So ${ m > k }$
For ${ q = 1,2,\cdots, k }$, compare ${ i_q \text{ and } j_q }$. If ${ i_q = j_q }$, move to next comparison. If not, we can replace ${ j_q }$ as ${ i_q }$. Because ${ \forall p< q, i_p = j_p }$ and we sort finish time at first, that is,${ f_{i_q} < f_{j_q}}$, this operation still keep solution ${ J }$ optimal.
When ${ q = k+1 }$, we cannot find another feasible job, otherwise, we can extend solution ${ I }$. Therefore, ${ m = k }$, which contradicts to ${ m>k }$. Thus, solution ${ I }$ is optimal.
DP Algorithm
Does it workvfor weighed interval scheduling? No!
Take a example:
job ${ a }$: from 1 min to 100 min, weight 10
job ${ b }$: from 2 min to 5 min, weight 1
job ${ c }$: from 7min to 10 min, weight 2
In this example, for greedy algorithm, we get result ${ b ,c }$. But, this is not the optimal solution.
We will introduce Dynamic Programming to solve “Weighted Interval Scheduling”.
algorithm in detail
-
Order the jobs in increasing order of finish time.
-
For a job ${ j }$, define
- DP Formulation
For base case, ${ OPT(j) = 0 }$ if ${ j = 0 }$.
Otherwise, we have two cases
Case 1: ${ OPT(j) }$ includes job ${ j }$
-> That means: all jobs from ${ {P(j)+1, …, j-1} }$ are incompatible with ${ j }$
-> Find ${ OPT(P(j)) }$
-> Add ${ w_j }$ to ${ OPT(P(j)) }$
Case 2: ${ OPT(j) }$ does not include job ${ j }$
-> Find ${ OPT(j-1) }$
-> We don’ include job ${ j }$, so just return ${ OPT(j-1) }$
Therefore the state transition equation is
Pseudocode
1
2
3
4
5
6
7
8
9
10
Weighted-Interval-Scheduling((s_1,f_1,w_1), ..., (s_n,f_n,w_n)):
Sorting as finish time f_j
Compute P(1) ... P(n)
return Compute_OPT(n)
Compute_OPT(j):
if j = 0
return 0
else
return max(w_j + Compute_OPT(P(j)), Compute_OPT(j-1))
Correctness
We will prove following theorem to prove the correctness of this dynamic programming algprithm
Theorem: Compute_OPT(j) return the optimal value of job ${ { 1, 2, \cdots, j } }$
Proof: Base case ${ j = 0 }$,${ OPT(0) = 0 }$, correct!
Induction step
Let for all ${ i < j }$, Compute_OPT(i) is the optimal value
-
Due P(j) < j, Computer_OPT(P(j)) is optimal
-
Due j-1 < j, Compute_OPT(j-1) is optimal
If we include ${ j }$, we have to discard all jobs after ${ P(j) }$. In this case, optimal value of ${ { 1, 2, \cdots, j } }$ is optimal value of ${ P(j) }$ add ${ w_j }$.
If don’t have ${ j }$, we just need to compute optimal value of ${ { 1, 2, \cdots, j-1 } }$, which is Compute_OPT(j-1).
According to definition, Compute_OPT(j) ${ = max {w_j + OPT(P(j)), OPT(j-1) }}$. Because we only have two cases and we select the max one, Compute_OPT(j) indeed the optimal value of ${ { 1, 2, \cdots, j } }$.
Memorizing Recursion
In this Dynamic programming algorithm, for each ${ OPT(n) }$ we need to calculate ${ OPT(n-1) }$ and ${ OPT(P(n)) }$. The running time is exponential. If we calculate from ${ OPT(0) }$ to ${ OPT(j) }$ and record these values. It’s a ${ O(n\lg n) }$ algorithm, because sorting algorithm is ${ O(n \lg n) }$ and the other part of algorithm is linear time.
Here is the pseudocode of “Memorizing Recursion”
1
2
3
4
5
6
7
8
Compute_OPT(n):
// Initialization
for j = 0 to n
M[j] = 0
// Memorizing Recursion
for j = 1 to n
M[j] = max(w_j + Compute_OPT(P(j)), Compute_OPT(j-1))
return M[n]
Identifing jobs
To identify all the jobs in the optimal solution, we use the following psudocode to do it, which check all the jobs from post to front.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Find_jobs(n):
// Initialization
for j = 0 to n
Jobs[j] = 0
// check from post
Check_job(n)
//Def Check_job(n)
Check_job(i)
if (i == 0)
return 0
else if (w_j + Compute_OPT(P(j)) > Compute_OPT(j-1))
Jobs[i] = 1
Check_job(P(i))
else
Check_job(i-1)
Return Jobs