Project Checkpoint

Work Completed So Far

Having met with Hongyi, a researcher in this field, we now have several avenues of approach for implementing a parallelized edit distance. He pointed us towards a fast bit-vector algorithm for approximate string matching, which shows promise in parallization over the naive approach.

A serial implementation along with a timing mechanism is now complete; multiple input formats are supported. The gathering of necessary libraries and code bases are now complete, allowing us to begin implementation without further obstruction.



The primary challenge in the pthread implementation (with diagonal parallelization on the DP table) lies in dealing with dynamic scheduling and resource allocation which is necessitated by the non-static nature of the amount of parallelism available throughout a single problem instance. Furthermore, we must be careful in how we arrange data on memory in order to take advantage of locality. Currently, we plan on a “diagonal-major” ordering of the DP table. However, this is non-trivial as the diagonals are of variable length.

For the bit-vector implementation, the main challenge lies in communicating the information at each step between the parallel executing units. A simpler solution might just be to keep all the execution between one warp on the GPU so that the algorithm runs in lock-step, but this limits the possible speedup and the input size. If this were implemented on multiple warps, it is likely that barriers would have to be used, reducing the potential speedup.

Plan to Achieve

We plan to achieve reasonable speedup on both CPU implementations on the Gates machines and CUDA implementations on the NVidia Kepler architecture GPUs.

Hope to Achieve

Visual demo of real-time approximate string matching on one or both of the following:


At minimum, a speed-up analysis comparing the sequential and parallel execution times.