An important goal of the biohaskell effort is to standardize and unify libraries. Below are the various libraries and their current state.
Core library: biocore
This contains (very) basic functionality and data types intended to be shared among other libraries. Using it ensures that libraries are compatible, and that the same types are used to represent the same things.
Libraries split off from bio
After some discussion, it was decided that the old monolithic “bio” library was to be refactored into a core library, with several small libraries implementing specific functionality. Programs using the old bio library should have their imports adjusted to use these instead.
Reading and writing (although biocore can generate Fasta-formatted ByteStrings directly) Fasta-formatted sequences.
Reading and writing (but see above) FastQ-formatted sequences.
The ACE alignment output format.
The PHD sequence format, as output by Phred.
Parsing the BLAST XML output format.
A small library enabling reading and writing of PSL files, as output by e.g. BLAT. It also contains some example programs for extracting and manipulating PSL data.
Functionality for dealing with SFF files, as produced by Roche 454 and ABI Ion Torrent sequences. Includes the flower executable, which can convert SFF files into a variety of formats.
The biostockholm package supports parsing and rendering of files in Stockholm 1.0 format. These formats are used by Pfam and Rfam for multiple sequence alignments. The library supports both an streaming interface that runs in constant memory and a convenient document interface that uses as much memory as the largest family in the Stockholm file. Both interfaces are accessed using the conduit but a lazy version for the document interface is provided for one-off scripts.
Work with RNA secondary structure parameter files, transform strings into highly efficient internal format and some functions for dealing with Infernal covariance models. The library will be extended with several “DataSource”s soon. This will allow users to import typical data easily.
Note: the other libraries Biobase* are deprecated, their functionality is now included in Biobase.
RNA secondary structure folding using the algorithm from the ViennaRNA 2.0 package. We only deal with ‘double dangles’ and no lonely pairs. (This library is for testing ghc7+vector+llvm)
The seqloc library by Nick Ingolia provides facilities for working with sequence locations, for instance to describe and manipulate genome annotations.
This library contains data types for sequences and various kinds of alignments. Functionality for reading and writing many different file formats. Development is driven by the needs of applications, so while large parts of the library is solid and efficient, other parts are less mature or feature complete.
No new development will take place here, and functionality will gradually be ported from this library and rolled out as several smaller libraries, based on biocore.
Releases : http://hackage.haskell.org/package/bio Home page : http://blog.malde.org/index.php/the-haskell-bioinformatics-library/ Darcs repository : http://malde.org/~ketil/biohaskell/biolib