We must know, we will know !

万物并作,吾以观复

Extra topic 1:Palindrome and Manacher algorithm

A DNA string is a reverse palindrome if it is equal to its reverse complement. For instance, $\textbf{ATGCAT}$ is a reverse palindrome, because its reverse string $\textbf{TACGTA}$ can complementarily match to origin string.

\[\begin{align*} & \text{ATGCAT}\\ & \mid \,\; \mid \,\; \mid \,\; \mid \,\; \mid \,\; \mid \,\\ & \text{TACGTA} \end{align*}\]

This phenomenon is quite common in our DNA or RNA sequences. We hope to find all occurrences of reverse palindrome in text $T$. In fact, there are at most $\Theta(n^2)$ palindrome substring in text with length $\vert T \vert = n$. In this blog, we will first give a naïve algorithm and use Manacher algorithm to solve this problem in $\mathcal{O}(n)$. In fact, we can do lowest common accessor on Suffix tree to implement this also, which will be introduced in Extra topic 2.

In this topic, we also consider other problem which is quite similar to above problem. Instead of consider reverse palindrome, we only consider symmetric string that is $T = T^\text{R}$ here $T^\text{R}$ is the reverse string of $T$, so in this case we have odd and even symmetric string. For example, $\textbf{ATGGTA}$ and $\textbf{ATGCGTA}$.

Read more

Lecture 1:Knuth-Morris-Pratt (KMP) algorithm

At the beginning of our bioinformatic algorithms journal, we will focus on the similarity of strings. We we touch following aspects,

  • Equality
    • Find occurrences of string $S$ in a given string $T$.
  • Edit distance
    • Find edit distance between strings $S$ and $T$ (global alignment)
    • Find substring of $T$ with minimal edit distance to given string $S$ (local alignment)
  • Probability
    • Construct probability model that best describes a given set of strings.
    • Check if a given strings is likely to be of the same type.

Today, we will introduce the algorithm for first problem: find all occurrences of a string $P$ (called pattern string) in a given string $T$ as a substring.

Read more

Prerequisite:Logic and Number Theory

In this series, I will introduce the Fundamental Contents in Computer Theory and Algorithms. If you hope to learn more advanced algorithms and techniques, please move to my another series Algorithm Design and Analysis.

In this blog, I try to do some preparation for following contents. I prefer to introduce the algorithms in a more rigorous and strict way. In theoretical computer science and algorithms, we will encounter lot of statements, like the correctness and running time for some specific algorithm. The way to assert a statement is true is Proof, then the statement will become a theorem (or lemma, claim, corollary anyway). Hence, we start from Logic which is very beginning of our story (may not be romantic but must be interesting and attractive).

Another part of this blog is number theory, which is powerful and common tool in our journey. Off course, we hope we can master more math, which are always the enrich toolbox for computer scientists.

Read more

Virtual Memory

This blog talks about Virtual Memory. For each process (the running program), we need to allocate memory to it. We have two goals

(1) User use “memory” like a large and uniform (continuous) array.

(2) And each process has private memory space for safety.

So, we need virtual memory, which is a simple abstraction provided to users. CPU and OS can help us to translate efficient address (virtual address) into physical address.

Basically, we can use the strategy like “Cache”, Then we have some terms

  • A virtual memory “Block” is called a “Page”.
  • A virtual memory address translation “Miss” is called a “Page fault”.

Read more

Randomized Algs:Permutation Routing Problem

This blog, we still talk about randomized algorithms. Let’s consider the parallel computation problem: A network of parallel processors is modeled by a directed graph ${G = (N, E)}$. The nodes ${N}$ represent the processors and the edges ${E}$ model the communication links between the processors. All communication between processors occurs in synchronous steps. Each link can carry at most one unit message (packet) in one step. During a step, a processor can send at most one packet to each of its neighbors. Each processor is uniquely identified by a number between 1 and ${N}$.

Read more