Link to PDF

Project Proposal


We are going to implement a parallel edit distance algorithm. Implementations on both CUDA and C++ (Gates machines) will be compared.


The edit distance between two strings $X$ and $Y$ measures the minimum number of single-character modifications required to change $X$ to $Y$. A popular dynamic programming (DP) algorithm can compute the edit distance in $O(|X||Y|)$ time and space. This involves creating a DP table of size $|X|$ by $|Y|$ and evaluating the entries in row-major order.

The edit distance provides a useful measure of similarity between strings, and can be used for many applications from DNA profiling to spell checking to approximate string matching.

The Challenge

In the most popular dynamic programming edit distance algorithm, the value of each DP entry depends only on the value above, to the left, and of the upper-left neighbor. Because of this, one potential parallel execution in evaluating the DP entries is along a “diagonal frontier.” Since the length of the longest diagonal row changes as we traverse the table, the amount of achievable parallelism changes dynamically throughout a single problem instance. This poses an interesting challenge and involves a very dynamic but predictable allocation of resources.

Another potential avenue of parallelization is to evaluate the DP table in row-major order, but compute each row in parallel. However, as each entry additionally depends on the evaluated value of its left neighbor, this is non-trivial and involves a scan-esque procedure.

As seen, the primary challenge presents itself in the form of non-static parallelizability (diagonal method) or non-trivial parallel computation (row-major method). This is, of course, assuming our implementation is based on the standard DP algorithm. Alternative algorithms will also be considered should they provide potential for parallelization.


We will most likely start from code provided by Hongyi Xin, a reasearcher on this topic, but he has not yet gotten back to us on the subject. If he does not respond in time, then our first order of business will be to write some starter code as a foundation for our final product. In any case, we will investigate parallel implementations on both CUDA and the CPU on the Gates machines. Due to the simplicity of the problem statement, we will also likely experiment with implementations written from scratch. Useful tools for implementation include the Open MP and Cuda libraries provided with the 15418 homework assignments.


Plan to Achieve

Fast implementation of edit distance on the Gates machines. We wish to achieve reasonable speedup on either a 6-core processor or an Nvidia graphics card.

Hope to Achieve

Visual DNA matching demo which also displays performance metrics. Additionally, a useful application of approximate string matching.


At minimum, a speed-up analysis comparing the sequential and parallel execution times.

Platform Choice

The platform we have chosen is C++ running on the Gates machines. The main reason for this is that we’re looking to investigate the performance of our implementation, which is most easily done in C++ due to the low abstraction distance from assembly. Additionally, because we are primarily concerned with achieving high speed-up on a shared memory space, machines with many cores will not be necessary.