GSOC/transalign

Optimizing transalign

  • Mentor: Ketil (and Christian?)
  • Umbrella: probably OBF?

Rationale

I recently developed and published a method and implementation for more sensitive pairwise alignments. The paper is here and a copy here. I’m really happy about this method, and if nothing else, check the SCOP benchmark. Although it’s difficult to construct a good test case using more complex methods (training sets for HMMs and whatnot) I don’t know anything that is as good as this.

The current implementation is in Haskell, and although it works correctly, it is a bit slow, and more problematic, it consumes too much memory (so going multi-threaded, although pretty easy, may not be of any help).

Approach and goals

The goal is to make this into a more practical tool by reducing resource requirements.
A prospective student would either:

  1. Optimize the Haskell program, primarily to reduce memory footprint, and secondarily to make use of multi-CPU systems.

  2. Reimplement the algorithm (or parts of it) in a different language, and achieving the same as above.

Advantages of 1:

  • Already have a working program, and the type system makes it easy to refactor without introducing errors.
  • Haskell supports lots of good multi-threading programming models (like STM)
  • I know Haskell pretty well, and will be hopefully be able to mentor.

Disadvantages:

  • Haskell has some good debugging tools, but they tend to work really poorly for large memory (i.e. it takes a long time to generate profiles)
  • Needs somebody with a bit (or a lot) of experience optimizing Haskell, and good knowledge of high-perf libraries (like vector)

Advantages of 2:

  • Easier to get a student with adequate skills.
  • More predictable performance models in other languages.
  • Easier to compile and install for many users.

Disadvantages:

  • Ideally, should know enough Haskell to read and understand the code.
  • Likely needs a co-mentor with knowledge of the language in question.

Please contact me at with any questions!