1 # Working with development sources
3 BioPerl uses [Dist::Zilla](http://dzil.org/) to author releases. You
4 will also need the `Dist::Zilla::PluginBundle::BioPerl` installed as
5 well as its dependencies. Then, you can run the following commands:
10 # The Directory Structure
12 The bioperl-live repository structure is organized as follows:
14 * `lib/` - BioPerl modules
16 * `examples/` - Scripts demonstrating the many uses of BioPerl
18 * `scripts/` - Useful production-quality scripts with POD documentation
20 * `t/` - Perl built-in tests, tests are divided into subdirectories
21 based on the specific classes being tested
23 * `t/data/` - Data files used for the tests, provides good example data
25 * `travis_scripts/` - script to customize Travis
27 ## `Bio::` namespace summary
29 * `Bio::Seq` is for *Sequences* (protein and DNA).
30 * `Bio::PrimarySeq` is a plain sequence (sequence data +
32 * `Bio::Seq` is a fancier `PrimarySeq`, in that it has annotation
33 (via `Bio::Annotation::Collection`) and sequence features (via
34 `Bio::SeqFeatureI` objects, attached via `Bio::FeatureHolderI`).
35 * `Bio::Seq::RichSeq` is all of the above, plus it has slots for
36 extra information specific to GenBank/EMBL/SwissProt files.
37 * `Bio::Seq::LargeSeq` is for sequences which are too big for
40 * `Bio::SeqIO` is for *reading and writing Sequences*. It is a front
41 end module for separate driver modules supporting the different
44 * `Bio::SeqFeature` represent start/stop/strand-based localised
45 annotations (features) of sequences
46 * `Bio::SeqFeature::Generic` is basic catchall
47 * `Bio::SeqFeature::Similarity` a similarity sequence feature
48 * `Bio::SeqFeature::FeaturePair` a sequence feature which is
49 pairwise such as query/hit pairs
51 * `Bio::SearchIO` is for reading and writing pairwise alignment
52 reports, like BLAST or FASTA.
54 * `Bio::Search` is where the alignment objects for `SearchIO` are
56 * `Bio::Search::Result::GenericResult` is the result object (a
57 blast query is a `Result` object)
58 * `Bio::Search::Hit::GenericHit` is the `Hit` object (a query will
59 have 0 to many hits in a database)
60 * `Bio::Search::HSP::GenericHSP` is the High-scoring Segment Pair
61 object defining the alignment(s) of the query and hit.
63 * `Bio::SimpleAlign` is for multiple sequence alignments
65 * `Bio::AlignIO` is for reading and writing multiple sequence
68 * `Bio::Assembly` provides the start of an infrastructure for assemblies and
69 `Bio::Assembly::IO` *IO converters* for them
71 * `Bio::DB` is the namespace for database query classes
72 * `Bio::DB::GenBank/GenPept` are two modules which query NCBI
74 * `Bio::DB::SwissProt/EMBL` query various EMBL and SwissProt
75 repositories for a sequences.
76 * `Bio::DB::GFF` is Lincoln Stein's fast, lightweight feature and
77 sequence database which is the backend to his
78 [GBrowse](www.gmod.org) system.
79 * `Bio::DB::Flat` is a fast implementation of the OBDA flat-file
80 indexing system (cross-language and cross-platform supported by
81 O|B|F projects see http://obda.open-bio.org).
82 * `Bio::DB::BioFetch/DBFetch` for OBDA, Web (HTTP) access to
84 * `Bio::DB::InMemoryCache/FileCache` (fast local caching of
85 sequences from remote dbs to speed up your access).
86 * `Bio::DB::Registry` interface to the OBDA specification for
88 * `Bio::DB::Biblio` for access to remote bibliographic databases.
89 * `Bio::DB::EUtilities` is the initial set of modules used for
90 generic queried using NCBI's eUtils.
92 * `Bio::Annotation` collection of annotation objects (comments,
93 DBlinks, References, and misc key/value pairs)
95 * `Bio::Coordinate`** is a system for mapping between different
96 coordinate systems such as DNA to protein or between assemblies
98 * `Bio::Index` is for locally indexed flatfiles with BerkeleyDB
100 * `Bio::Tools` contains many *miscellaneous parsers and functions* for different
101 bioinformatics needs such as:
102 * Gene prediction parser (Genscan, MZEF, Grail, Genemark)
103 * Annotation format (GFF)
104 * Enumerate codon tables and valid sequences symbols (CodonTable,
106 * Phylogenetic program parsing (PAML, Molphy, Phylip)
108 * `Bio::Map` represents genetic and physical map representations
110 * `Bio::Structure` parse and represent protein structure data
112 * `Bio::TreeIO` is for reading and writing Tree formats
114 * `Bio::Tree` is the namespace for all associated Tree classes
115 * `Bio::Tree::Tree` is the basic tree object
116 * `Bio::Tree::Node` are the nodes which make up the tree
117 * `Bio::Tree::Statistics` is for computing statistics for a tree
118 * `Bio::Tree::TreeFunctionsI` is where specific tree functions are
119 implemented (like `is_monophyletic` and `lca`)
121 * `Bio::Biblio` is where bibliographic data and database access
124 * `Bio::Variation` represent sequences with mutations and variations
125 applied so one can compare and represent wild-type and mutation
126 versions of a sequence.
128 * `Bio::Root` are basic objects for the internals of BioPerl
133 BioPerl currently uses a [semantic versioning](https://semver.org/)
134 scheme for version numbers. Basically, a version has three numbers in
135 the form `MAJOR.MINOR.PATH`, each of which changes when:
137 1. `MAJOR` --- incompatible API changes,
138 2. `MINOR` --- new functionality in a backwards-compatible manner,
139 3. `PATCH` --- backwards-compatible bug fixes.
143 Before 1.7 release, the BioPerl project had a single distribution with
144 all of BioPerl modules. During the 1.7 release series, subsets of the
145 modules were extracted into separate distribution.
149 From version 1.0 until 1.6, even numbers (e.g. version 1.4) indicated
150 stable releases. Stable releases were well tested and recommended for
151 most uses. Odd numbers (e.g. version 1.3) were development releases
152 which one would only use if interested in the latest features. The
153 final number (e.g. in `1.2.1`) is the point or patch release. The
154 higher the number the more bug fixes has been incorporated. In theory
155 you can upgrade from one point or patch release to the next with no
156 changes to your own code (for production cases, obviously check things
157 out carefully before you switch over).