Libraries
An important goal of the biohaskell effort is to standardize and unify libraries. Below are the various libraries and their current state.
Core library: biocore
This contains (very) basic functionality and data types intended to be shared among other libraries. Using it ensures that libraries are compatible, and that the same types are used to represent the same things.
Libraries split off from bio
After some discussion, it was decided that the old monolithic “bio” library was to be refactored into a core library, with several small libraries implementing specific functionality. Programs using the old bio library should have their imports adjusted to use these instead.
- biofasta
Reading and writing (although biocore can generate Fasta-formatted ByteStrings directly) Fasta-formatted sequences.
- biofastq
Reading and writing (but see above) FastQ-formatted sequences.
- bioace
The ACE alignment output format.
- bioalign
Calculating alignments.
- biophd
The PHD sequence format, as output by Phred.
- blastxml
Parsing the BLAST XML output format.
- biopsl
A small library enabling reading and writing of PSL files, as output by e.g. BLAT. It also contains some example programs for extracting and manipulating PSL data.
- biosff
Functionality for dealing with SFF files, as produced by Roche 454 and ABI Ion Torrent sequences. Includes the flower executable, which can convert SFF files into a variety of formats.
Other libraries
biostockholm
The biostockholm package supports parsing and rendering of files in Stockholm 1.0 format. These formats are used by Pfam and Rfam for multiple sequence alignments. The library supports both an streaming interface that runs in constant memory and a convenient document interface that uses as much memory as the largest family in the Stockholm file. Both interfaces are accessed using the conduit but a lazy version for the document interface is provided for one-off scripts.
Biobase
Work with RNA secondary structure parameter files, transform strings into highly efficient internal format and some functions for dealing with Infernal covariance models. The library will be extended with several “DataSource”s soon. This will allow users to import typical data easily.
Note: the other libraries Biobase* are deprecated, their functionality is now included in Biobase.
RNAFold
RNA secondary structure folding using the algorithm from the ViennaRNA 2.0 package. We only deal with ‘double dangles’ and no lonely pairs. (This library is for testing ghc7+vector+llvm)
SeqLoc
The seqloc library by Nick Ingolia provides facilities for working with sequence locations, for instance to describe and manipulate genome annotations.
samtools
For reading BAM files (which are Binary SAM files), there is a samtools library, with separate libraries providing iteratee and enumerator interfaces. Available on Hackage.
Deprecated: bio
This library contains data types for sequences and various kinds of alignments. Functionality for reading and writing many different file formats. Development is driven by the needs of applications, so while large parts of the library is solid and efficient, other parts are less mature or feature complete.
No new development will take place here, and functionality will gradually be ported from this library and rolled out as several smaller libraries, based on biocore.
Releases : http://hackage.haskell.org/package/bio Home page : http://blog.malde.org/index.php/the-haskell-bioinformatics-library/ Darcs repository : http://malde.org/~ketil/biohaskell/biolib
