HACKING.md

   1 # Working with development sources
   2
   3 BioPerl uses [Dist::Zilla](http://dzil.org/) to author releases.  You
   4 will also need the `Dist::Zilla::PluginBundle::BioPerl` installed as
   5 well as its dependencies.  Then, you can run the following commands:
   6
   7     dzil test
   8     dzil install
   9
  10 # The Directory Structure
  11
  12 The bioperl-live repository structure is organized as follows:
  13
  14 * `lib/` - BioPerl modules
  15
  16 * `examples/` - Scripts demonstrating the many uses of BioPerl
  17
  18 * `scripts/` - Useful production-quality scripts with POD documentation
  19
  20 * `t/` - Perl built-in tests, tests are divided into subdirectories
  21   based on the specific classes being tested
  22
  23 * `t/data/` - Data files used for the tests, provides good example data
  24
  25 * `travis_scripts/` - script to customize Travis
  26
  27 ## `Bio::` namespace summary
  28
  29 * `Bio::Seq` is for *Sequences* (protein and DNA).
  30     * `Bio::PrimarySeq` is a plain sequence (sequence data +
  31       identifiers)
  32     * `Bio::Seq` is a fancier `PrimarySeq`, in that it has annotation
  33       (via `Bio::Annotation::Collection`) and sequence features (via
  34       `Bio::SeqFeatureI` objects, attached via `Bio::FeatureHolderI`).
  35     * `Bio::Seq::RichSeq` is all of the above, plus it has slots for
  36       extra information specific to GenBank/EMBL/SwissProt files.
  37     * `Bio::Seq::LargeSeq` is for sequences which are too big for
  38       fitting into memory.
  39
  40 * `Bio::SeqIO` is for *reading and writing Sequences*.  It is a front
  41   end module for separate driver modules supporting the different
  42   sequence formats.
  43
  44 * `Bio::SeqFeature` represent start/stop/strand-based localised
  45   annotations (features) of sequences
  46     * `Bio::SeqFeature::Generic` is basic catchall
  47     * `Bio::SeqFeature::Similarity` a similarity sequence feature
  48     * `Bio::SeqFeature::FeaturePair` a sequence feature which is
  49       pairwise such as query/hit pairs
  50
  51 * `Bio::SearchIO` is for reading and writing pairwise alignment
  52   reports, like BLAST or FASTA.
  53
  54 * `Bio::Search` is where the alignment objects for `SearchIO` are
  55   defined
  56     * `Bio::Search::Result::GenericResult` is the result object (a
  57       blast query is a `Result` object)
  58     * `Bio::Search::Hit::GenericHit` is the `Hit` object (a query will
  59       have 0 to many hits in a database)
  60     * `Bio::Search::HSP::GenericHSP` is the High-scoring Segment Pair
  61       object defining the alignment(s) of the query and hit.
  62
  63 * `Bio::SimpleAlign` is for multiple sequence alignments
  64
  65 * `Bio::AlignIO` is for reading and writing multiple sequence
  66   alignment formats
  67
  68 * `Bio::Assembly` provides the start of an infrastructure for assemblies and
  69   `Bio::Assembly::IO` *IO converters* for them
  70
  71 * `Bio::DB` is the namespace for database query classes
  72     * `Bio::DB::GenBank/GenPept` are two modules which query NCBI
  73       entrez for sequences.
  74     * `Bio::DB::SwissProt/EMBL` query various EMBL and SwissProt
  75       repositories for a sequences.
  76     * `Bio::DB::GFF` is Lincoln Stein's fast, lightweight feature and
  77       sequence database which is the backend to his
  78       [GBrowse](www.gmod.org) system.
  79     * `Bio::DB::Flat` is a fast implementation of the OBDA flat-file
  80       indexing system (cross-language and cross-platform supported by
  81       O|B|F projects see http://obda.open-bio.org).
  82     * `Bio::DB::BioFetch/DBFetch` for OBDA, Web (HTTP) access to
  83       remote databases.
  84     * `Bio::DB::InMemoryCache/FileCache` (fast local caching of
  85       sequences from remote dbs to speed up your access).
  86     * `Bio::DB::Registry` interface to the OBDA specification for
  87       remote data sources.
  88     * `Bio::DB::Biblio` for access to remote bibliographic databases.
  89     * `Bio::DB::EUtilities` is the initial set of modules used for
  90       generic queried using NCBI's eUtils.
  91
  92 * `Bio::Annotation` collection of annotation objects (comments,
  93   DBlinks, References, and misc key/value pairs)
  94
  95 * `Bio::Coordinate`** is a system for mapping between different
  96   coordinate systems such as DNA to protein or between assemblies
  97
  98 * `Bio::Index` is for locally indexed flatfiles with BerkeleyDB
  99
 100 * `Bio::Tools` contains many *miscellaneous parsers and functions* for different
 101   bioinformatics needs such as:
 102     * Gene prediction parser (Genscan, MZEF, Grail, Genemark)
 103     * Annotation format (GFF)
 104     * Enumerate codon tables and valid sequences symbols (CodonTable,
 105       IUPAC)
 106     * Phylogenetic program parsing (PAML, Molphy, Phylip)
 107
 108 * `Bio::Map` represents genetic and physical map representations
 109
 110 * `Bio::Structure` parse and represent protein structure data
 111
 112 * `Bio::TreeIO` is for reading and writing Tree formats
 113
 114 * `Bio::Tree` is the namespace for all associated Tree classes
 115     * `Bio::Tree::Tree` is the basic tree object
 116     * `Bio::Tree::Node` are the nodes which make up the tree
 117     * `Bio::Tree::Statistics` is for computing statistics for a tree
 118     * `Bio::Tree::TreeFunctionsI` is where specific tree functions are
 119       implemented (like `is_monophyletic` and `lca`)
 120
 121 * `Bio::Biblio` is where bibliographic data and database access
 122   objects are kept
 123
 124 * `Bio::Variation` represent sequences with mutations and variations
 125   applied so one can compare and represent wild-type and mutation
 126   versions of a sequence.
 127
 128 * `Bio::Root` are basic objects for the internals of BioPerl
 129
 130
 131 # Releases
 132
 133 BioPerl currently uses a [semantic versioning](https://semver.org/)
 134 scheme for version numbers.  Basically, a version has three numbers in
 135 the form `MAJOR.MINOR.PATH`, each of which changes when:
 136
 137 1. `MAJOR` --- incompatible API changes,
 138 2. `MINOR` --- new functionality in a backwards-compatible manner,
 139 3. `PATCH` --- backwards-compatible bug fixes.
 140
 141 ## 1.7 releases
 142
 143 Before 1.7 release, the BioPerl project had a single distribution with
 144 all of BioPerl modules.  During the 1.7 release series, subsets of the
 145 modules were extracted into separate distribution.
 146
 147 ## Pre 1.7 releases
 148
 149 From version 1.0 until 1.6, even numbers (e.g. version 1.4) indicated
 150 stable releases.  Stable releases were well tested and recommended for
 151 most uses.  Odd numbers (e.g. version 1.3) were development releases
 152 which one would only use if interested in the latest features.  The
 153 final number (e.g. in `1.2.1`) is the point or patch release. The
 154 higher the number the more bug fixes has been incorporated. In theory
 155 you can upgrade from one point or patch release to the next with no
 156 changes to your own code (for production cases, obviously check things
 157 out carefully before you switch over).