README.md

   1 [![Build Status](https://travis-ci.org/bioperl/bioperl-live.svg?branch=master)](https://travis-ci.org/bioperl/bioperl-live)
   2 [![Coverage Status](https://coveralls.io/repos/bioperl/bioperl-live/badge.png?branch=master)](https://coveralls.io/r/bioperl/bioperl-live?branch=master)
   3 [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.16344.svg)](http://dx.doi.org/10.5281/zenodo.16344)
   4 [![Documentation Status](https://readthedocs.org/projects/bioperl/badge/?version=latest)](https://readthedocs.org/projects/bioperl/?badge=latest)
   5
   6 # Getting Started
   7
   8 Please see the the [INSTALL](http://bioperl.org/INSTALL.html) or [INSTALL.WIN](http://bioperl.org/INSTALL.WIN.html) documents for installation
   9 instructions.
  10
  11 # About BioPerl
  12
  13 BioPerl is a package of public domain Perl tools for computational molecular
  14 biology.
  15
  16 Our website (http://bioperl.org/) provides an online resource of modules,
  17 scripts, and web links for developers of Perl-based software for life science
  18 research.
  19
  20 # Contact Information
  21
  22 * BioPerl mailing list: bioperl-l@bioperl.org
  23
  24 * Project website : http://bioperl.org/
  25
  26 * Bug reports : https://github.com/bioperl/bioperl-live/issues
  27
  28 Please submit bugs, in particular about documentation which you think is
  29 unclear, or about problems in installation. We are also very interested in functions which don't work the way you think they do!
  30
  31 # The Directory Structure
  32
  33 The BioPerl directory structure is organized as follows:
  34
  35 * **`Bio/`** - BioPerl modules
  36
  37 * **`deobfuscator/`** - Code for tracing OOP relationships
  38
  39 * **`examples/`** - Scripts demonstrating the many uses of BioPerl
  40
  41 * **`ide/`** - Files for developing BioPerl using an IDE
  42
  43 * **`maintenance/`** - BioPerl housekeeping scripts
  44
  45 * **`models/`** - DIA drawing program generated OO UML for BioPerl classes
  46   (these are quite out-of-date)
  47
  48 * **`scripts/`** - Useful production-quality scripts with POD documentation
  49
  50 * **`t/`** - Perl built-in tests, tests are divided into subdirectories
  51   based on the specific classes being tested
  52
  53 * **`t/data/`** - Data files used for the tests, provides good example data
  54
  55 * **`travis_scripts/`** - script to customize Travis
  56
  57 # Documentation
  58
  59 For documentation on BioPerl see the **HOWTO** documents online at http://bioperl.org/howtos.
  60
  61 Useful documentation in the form of example code can also be found in the
  62 **`examples/`** and **`scripts/`** directories. The current collection includes
  63 scripts that run BLAST, index flat files, parse PDB structure files, make
  64 primers, retrieve ESTs based on tissue, align protein to nucleotide sequence,
  65 run GENSCAN on multiple sequences, and much more! See `bioscripts.pod` for a
  66 complete listing.
  67
  68 Individual `*.pm` modules have their own embedded POD documentation as well. A
  69 complete set of hyperlinked POD, or module, documentation is available at
  70 http://www.bioperl.org/.
  71
  72 Remember that '`perldoc`' is your friend. You can use it to read any file
  73 containing POD formatted documentation without needing any type of translator
  74 (e.g. '`perldoc Bio::SeqIO`').
  75
  76 If you used the Build.PL installation, and depending on your platform, you may
  77 have documentation installed as man pages, which can be accessed in the usual
  78 way.
  79
  80 # Releases
  81
  82 BioPerl releases are always available from the website at http://www.bioperl.org/DIST or in CPAN. The latest code can be found at https://github.com/bioperl.
  83
  84 * BioPerl currently uses a sematic numbering scheme to indicate stable release
  85   series vs. development release series. A release number is a three digit
  86   number like `1.2.0`.
  87   * The *first digit indicates the major release*, the idea being that all the
  88     API calls in a major release are reasonably consistent.
  89   * The *second number is the release series*. This is probably the most
  90     important number, and represents added functionality that is
  91     backwards-compatible.
  92   * The *third number is the point or patch release* and represents mainly bug
  93     fixes or additional code that doesn't add significant functionality to the
  94     code base.
  95
  96 From the **1.0 release until the 1.6 release** even numbers (e.g. `1.4`) indicated stable releases. Stable releases were well tested and recommended for most uses. Odd numbers (e.g. `1.3`) were development releases which one would only use if one were interested in the latest features. The final number (e.g. in `1.2.1`) is the point or patch release. The higher the number the more bug fixes has been incorporated. In theory you can upgrade from one point or patch release to the next with no changes to your own code (for production cases, obviously check things out carefully before you switch over).
  97
  98 The upcoming **1.7 release** will be the last release series to utilize the alternating 'stable'/'developer' convention. Starting immediately after the final 1.6 branch, we will start splitting BioPerl into several smaller easier-to-manage distributions. These will have independent versions, all likely starting with v1.7.0. **We do not anticipate major API changes in the 1.7.x release series, merely that the code will be restructured in a way to make maintenance more feasible.** We anticipate retaining semantic versioning until the 2.x release.
  99
 100 # Caveats and Warnings
 101
 102 When you run the tests with `./Build test` some tests may issue warnings messages or even fail. Sometimes this is because we didn't have anyone to test the test system on the combination of your operating system, version of perl, and associated libraries and other modules. Because BioPerl depends on several
 103 outside libraries we may not be able to test every single combination so if
 104 there are warnings you may find that the package is still perfectly useful.
 105
 106 If you install the bioperl-run system and run tests when you don't have the
 107 program installed you'll get messages like `program XXX not found, skipping
 108 tests`. That's okay, BioPerl is doing what it is supposed to do. If you wanted
 109 to run the program you'd need to install it first.
 110
 111 Not all scripts in the `examples/` directory are correct and up-to-date. If you find an issue with a script please submit a bug report to https://github.com/bioperl/bioperl-live/issues and consider helping out in their maintenance.
 112
 113 If you are confused about what modules are appropriate when you try and solve a
 114 particular issue in bioinformatics we urge you to look at HOWTO documents first.
 115
 116 # A simple module summary
 117
 118 Here is a quick summary of many of the useful modules and how the toolkit is
 119 laid out:
 120
 121 All modules are in the **`Bio/`** namespace,
 122
 123 * **`Perl`** is for *new users*, and gives a functional interface to the main
 124   parts of the package.
 125
 126 * **`Seq`** is for *Sequences* (protein and DNA).
 127     * `Bio::PrimarySeq` is a plain sequence (sequence data + identifiers)
 128     * `Bio::Seq` is a fancier `PrimarySeq`, in that it has annotation (via
 129     `Bio::Annotation::Collection`) and sequence features (via `Bio::SeqFeatureI` objects, attached via
 130     `Bio::FeatureHolderI`).
 131     * `Bio::Seq::RichSeq` is all of the above, plus it has slots for extra information specific to GenBank/EMBL/SwissProt files.
 132     * `Bio::Seq::LargeSeq` is for sequences which are too big for
 133     fitting into memory.
 134
 135 * **`SeqIO`** is for *reading and writing Sequences*. It is a front end module
 136   for separate driver modules supporting the different sequence formats
 137
 138 * **`SeqFeature`** represent *start/stop/strand-based localized annotations (features) of sequences*
 139     * **`Bio::SeqFeature::Generic`** is basic catchall
 140     * **`Bio::SeqFeature::Similarity`** a similarity sequence feature
 141     * **`Bio::SeqFeature::FeaturePair`** a sequence feature which is pairwise
 142     such as query/hit pairs
 143
 144 * **`SearchIO`** is for *reading and writing pairwise alignment reports*, like
 145   BLAST or FASTA
 146
 147 * **`Search`** is where the *alignment objects for `SearchIO` are defined*
 148     * **`Bio::Search::Result::GenericResult`** is the result object (a blast
 149     query is a `Result` object)
 150     * **`Bio::Search::Hit::GenericHit`** is the `Hit` object (a query will have
 151     0 to many hits in a database)
 152     * **`Bio::Search::HSP::GenericHSP`** is the High-scoring Segment Pair
 153     object defining the alignment(s) of the query and hit.
 154
 155 * **`SimpleAlign`** is for *multiple sequence alignments*
 156
 157 * **`AlignIO`** is for *reading and writing multiple sequence alignment
 158   formats*
 159
 160 * **`Assembly`** provides the start of an *infrastructure for assemblies* and
 161   **`Assembly::IO`** *IO converters* for them
 162
 163 * **`DB`** is the namespace for *all the database query classes*
 164     * **`Bio::DB::GenBank/GenPept`** are two modules which query NCBI entrez for
 165       sequences
 166     * **`Bio::DB::SwissProt/EMBL`** query various EMBL and SwissProt
 167       repositories for a sequences
 168     * **`Bio::DB::GFF`** is Lincoln Stein's fast, lightweight feature and
 169       sequence database which is the backend to his GBrowse system (see
 170       www.gmod.org)
 171     * **`Bio::DB::Flat`** is a fast implementation of the OBDA flat-file
 172       indexing system (cross-language and cross-platform supported by O|B|F
 173       projects see http://obda.open-bio.org).
 174     * **`Bio::DB::BioFetch/DBFetch`** for OBDA, Web (HTTP) access to remote
 175       databases.
 176     * **`Bio::DB::InMemoryCache/FileCache`** (fast local caching of sequences
 177       from remote dbs to speed up your access).
 178     * **`Bio::DB::Registry`** interface to the OBDA specification for remote
 179       data sources
 180     * **`Bio::DB::Biblio`** for access to remote bibliographic databases.
 181     * **`Bio::DB::EUtilities`** is the initial set of modules used for generic
 182       queried using NCBI's eUtils.
 183
 184 * **`Annotation`** collection of *annotation objects* (comments, DBlinks,
 185   References, and misc key/value pairs)
 186
 187 * **`Coordinate`** is a system for *mapping between different coordinate systems*
 188   such as DNA to protein or between assemblies
 189
 190 * **`Index`** is for *locally indexed flatfiles* with BerkeleyDB
 191
 192 * **`Tools`** contains many *miscellaneous parsers and functions* for different
 193   bioinformatics needs
 194     * Gene prediction parser (Genscan, MZEF, Grail, Genemark)
 195     * Annotation format (GFF)
 196     * Enumerate codon tables and valid sequences symbols (CodonTable,
 197     IUPAC)
 198     * Phylogenetic program parsing (PAML, Molphy, Phylip)
 199
 200 * **`Map`** represents *genetic and physical map representations*
 201
 202 * **`Structure`** - parse and represent *protein structure data*
 203
 204 * **`TreeIO`** is for reading and writing *Tree formats*
 205
 206 * **`Tree`** is the namespace for **all associated Tree classes**
 207     * **`Bio::Tree::Tree`** is the basic tree object
 208     * **`Bio::Tree::Node`** are the nodes which make up the tree
 209     * **`Bio::Tree::Statistics`** is for computing statistics for a tree
 210     * **`Bio::Tree::TreeFunctionsI`** is where specific tree functions are
 211       implemented (like `is_monophyletic` and `lca`)
 212
 213 * **`Bio::Biblio`** is where *bibliographic data and database access objects*
 214   are kept
 215
 216 * **`Variation`** represent *sequences with mutations and variations* applied so one can compare and represent wild-type and mutation versions of a sequence.
 217
 218 * **`Root`**, basic objects for the *internals of BioPerl*
 219
 220 # Upgrading from an older version
 221
 222 If you have a previously installed version of BioPerl on your system some of
 223 these notes may help you.
 224
 225 * Some modules have been removed because they have been superceded by new
 226   development efforts. They are documented in the **`DEPRECATED`** file that is
 227   included in the release.
 228
 229 * Some methods, or the Application Programming Interface (API), have changed or
 230   been removed. You may find that scripts which worked with BioPerl 1.4 may give you warnings or may not work at all (although we have tried very hard to
 231   minimize this!). Send an email to the list and we'll be happy to give you
 232   pointers.