From 36a6e2c4859a554891edff61a25313f14d4c434b Mon Sep 17 00:00:00 2001
From: Chris Fields <cjfields@bioperl.org>
Date: Mon, 2 Sep 2013 22:04:30 -0500
Subject: [PATCH] create README.md file

---
 README.md | 246 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 246 insertions(+)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 000000000..2cebd7635
--- /dev/null
+++ b/README.md
@@ -0,0 +1,246 @@
+# Getting Started
+
+Please see the the `INSTALL` or `INSTALL.WIN` documents for installation
+instructions.
+
+# About BioPerl
+
+BioPerl is a package of public domain Perl tools for computational molecular
+biology.
+
+Our website (http://bioperl.org/) provides an online resource of modules,
+scripts, and web links for developers of Perl-based software for life science
+research.
+
+# Contact info
+
+BioPerl mailing list: bioperl-l@bioperl.org
+
+There's quite a variety of tools available in BioPerl, and more are added all
+the time. If the tool you're looking for isn't described in the documentation
+please write us, it could be undocumented or in process.
+
+* Project website : http://bioperl.org/
+
+* Bug reports : https://redmine.open-bio.org/projects/bioperl/
+
+Please send us bugs, in particular about documentation which you think is
+unclear or problems in installation. We are also very interested in functions
+which don't work the way you think they do!
+
+# The directory structure
+
+The BioPerl directory structure is organized as follows:
+
+* **`Bio/`** - BioPerl modules
+
+* **`doc/`** - Documentation utilities
+
+* **`examples/`** - Scripts demonstrating the many uses of BioPerl
+
+* **`ide/`** - files for developing BioPerl using an IDE
+
+* **`maintenance/`** - BioPerl housekeeping scripts
+
+* **`models/`** - DIA drawing program generated OO UML for BioPerl classes
+  (these are quite out-of-date)
+
+* **`scripts/`** - Useful production-quality scripts with POD documentation
+
+* **`t/`** - Perl built-in tests, tests are divided into subdirectories
+  based on the specific classes being tested
+
+* **`t/data/`** - Data files used for the tests, provides good example data
+
+# Documentation
+
+For documentation on BioPerl see the **HOWTO** documents and tutorials online at
+http://bioperl.org.
+
+Useful documentation in the form of example code can also be found in the
+**`examples/`** and **`scripts/`** directories. The current collection includes
+scripts that run BLAST, index flat files, parse PDB structure files, make
+primers, retrieve ESTs based on tissue, align protein to nucleotide sequence,
+run GENSCAN on multiple sequences, and much more! See `bioscripts.pod` for a
+complete listing.
+
+Individual `*.pm` modules have their own embedded POD documentation as well. A
+complete set of hyperlinked POD, or module, documentation is available at
+http://www.bioperl.org/.
+
+Remember that '`perldoc`' is your friend. You can use it to read any file
+containing POD formatted documentation without needing any type of translator
+(e.g. '`perldoc Bio::SeqIO`').
+
+If you used the Build.PL installation, and depending on your platform, you may
+have documentation installed as man pages, which can be accessed in the usual
+way.
+
+# Releases
+
+BioPerl releases are always available from the website at
+http://www.bioperl.org/DIST or in CPAN. The latest code can be found at
+https://github.com/bioperl.
+
+BioPerl formerly used a numbering scheme to indicate stable release series vs.
+development release series. A release number is a three digit number like 1.2.0.
+The first digit indicates the major release, the idea being that all the API
+calls in a major release are reasonably consistent. The second number is the
+release series. This is probably the most important number.
+
+From the 1.0 release until the 1.6 release, even numbers (1.0, 1.2 etc)
+indicated stable releases. Stable releases were well tested and recommended for
+most uses. Odd numbers (1.1, 1.3 etc) were development releases which one would
+only use if one were interested in the latest and greatest features. The final
+number (e.g. 1.2.0, 1.2.1) is the bug fix release. The higher the number the
+more bug fixes has been incorporated. In theory you can upgrade from one bug fix
+release to the next with no changes to your own code (for production cases,
+obviously check things out carefully before you switch over).
+
+The 1.7 release will be the last release series to utilize the alternating
+'stable'/'developer' convention. Starting immediately after the final 1.6
+branch, we will start splitting BioPerl into several smaller easier-to-manage
+distributions, including a developer distribution for cutting-edge (in
+development) code, untested modules, and alternative implementations.
+
+# Caveats and warnings
+
+When you run the tests ("`./Build test`") some tests may issue warnings messages
+or even fail. Sometimes this is because we didn't have anyone to test the test
+system on the combination of your operating system, version of perl, and
+associated libraries and other modules. Because BioPerl depends on several
+outside libraries we may not be able to test every single combination so if
+there are warnings you may find that the package is still perfectly useful.
+
+If you install the bioperl-run system and run tests when you don't have the
+program installed you'll get messages like '`program XXX not found, skipping
+tests`'. That's okay, BioPerl is doing what it is supposed to do. If you wanted
+to run the program you'd need to install it first.
+
+Not all scripts in the `examples/` directory are correct and up-to-date. We need
+volunteers to help maintain these so if you find they do not submit a bug report
+to https://redmine.open-bio.org/projects/bioperl/ and consider helping out in
+their maintenance.
+
+If you are confused about what modules are appropriate when you try and solve a
+particular issue in bioinformatics we urge you to look at HOWTO documents first.
+
+# A simple module summary
+
+Here is a quick summary of many of the useful modules and how the toolkit is
+laid out:
+
+All modules are in the **`Bio/`** namespace,
+
+* **`Perl`** is for **new users**, and gives a functional interface to the main
+  parts of the package.
+
+* **`Seq`** is for **Sequences** (protein and DNA).
+    * `Bio::PrimarySeq` is a plain sequence (sequence data + identifiers)
+    * `Bio::Seq` is a fancier `PrimarySeq`, in that it has annotation (via
+    `Bio::Annotation::Collection`) and sequence features (via `Bio::SeqFeatureI` objects, attached via
+    `Bio::FeatureHolderI`).
+    * `Bio::Seq::RichSeq` is all of the above, plus it has slots for extra information specific to GenBank/EMBL/SwissProt files.
+    * `Bio::Seq::LargeSeq` is for sequences which are too big for
+    fitting into memory.
+
+* **`SeqIO`** is for **reading and writing Sequences**. It is a front end module
+  for separate driver modules supporting the different sequence formats
+
+* **`SeqFeature`** represent **start/stop/strand-based localized annotations (features) of sequences**
+    * **`Bio::SeqFeature::Generic`** is basic catchall
+    * **`Bio::SeqFeature::Similarity`** a similarity sequence feature
+    * **`Bio::SeqFeature::FeaturePair`** a sequence feature which is pairwise
+    such as query/hit pairs
+
+* **`SearchIO`** is for **reading and writing pairwise alignment reports**, like
+  BLAST or FASTA
+
+* **`Search`** is where the **alignment objects for `SearchIO` are defined**
+    * **`Bio::Search::Result::GenericResult`** is the result object (a blast
+    query is a `Result` object)
+    * **`Bio::Search::Hit::GenericHit`** is the `Hit` object (a query will have
+    0 to many hits in a database)
+    * **`Bio::Search::HSP::GenericHSP`** is the High-scoring Segment Pair
+    object defining the alignment(s) of the query and hit.
+
+* **`SimpleAlign`** is for **multiple sequence alignments**
+
+* **`AlignIO`** is for **reading and writing multiple sequence alignment
+  formats**
+
+* **`Assembly`** provides the start of an **infrastructure for assemblies** and
+  **`Assembly::IO`** IO converters for them
+
+* **`DB`** is the namespace for **all the database query classes**
+    * **`Bio::DB::GenBank/GenPept`** are two modules which query NCBI entrez for
+      sequences
+    * **`Bio::DB::SwissProt/EMBL`** query various EMBL and SwissProt
+      repositories for a sequences
+    * **`Bio::DB::GFF`** is Lincoln Stein's fast, lightweight feature and
+      sequence database which is the backend to his GBrowse system (see
+      www.gmod.org)
+    * **`Bio::DB::Flat`** is a fast implementation of the OBDA flat-file
+      indexing system (cross-language and cross-platform supported by O|B|F
+      projects see http://obda.open-bio.org).
+    * **`Bio::DB::BioFetch/DBFetch`** for OBDA, Web (HTTP) access to remote
+      databases.
+    * **`Bio::DB::InMemoryCache/FileCache`** (fast local caching of sequences
+      from remote dbs to speed up your access).
+    * **`Bio::DB::Registry`** interface to the OBDA specification for remote
+      data sources
+    * **`Bio::DB::Biblio`** for access to remote bibliographic databases.
+    * **`Bio::DB::EUtilities`** is the initial set of modules used for generic
+      queried using NCBI's eUtils.
+
+* **`Annotation`** collection of **annotation objects** (comments, DBlinks,
+  References, and misc key/value pairs)
+
+* **`Coordinate`** is a system for **mapping between different coordinate systems**
+  such as DNA to protein or between assemblies
+
+* **`Index`** is for **locally indexed flatfiles** with BerkeleyDB
+
+* **`Tools`** contains many **miscellaneous parsers and functions** for different
+  bioinformatics needs
+    * Gene prediction parser (Genscan, MZEF, Grail, Genemark)
+    * Annotation format (GFF)
+    * Enumerate codon tables and valid sequences symbols (CodonTable,
+    IUPAC)
+    * Phylogenetic program parsing (PAML, Molphy, Phylip)
+
+* **`Map`** represents **genetic and physical map representations**
+
+* **`Structure`** - parse and represent **protein structure data**
+
+* **`TreeIO`** is for reading and writing **Tree formats**
+
+* **`Tree`** is the namespace for **all associated Tree classes**
+    * **`Bio::Tree::Tree`** is the basic tree object
+    * **`Bio::Tree::Node`** are the nodes which make up the tree
+    * **`Bio::Tree::Statistics`** is for computing statistics for a tree
+    * **`Bio::Tree::TreeFunctionsI`** is where specific tree functions are
+      implemented (like `is_monophyletic` and `lca`)
+
+* **`Bio::Biblio`** is where *bibliographic data and database access objects*
+  are kept
+
+* **`Variation`** represent sequences with mutations and variations applied so
+  one can compare and represent wild-type and mutation versions of a sequence.
+
+* **`Root`**, basic objects for the internals of BioPerl
+
+# Upgrading from an older version
+
+If you have a previously installed version of BioPerl on your system some of
+these notes may help you.
+
+* Some modules have been removed because they have been superceded by new
+  development efforts. They are documented in the `DEPRECATED` file that is
+  included in the release.
+
+* Some methods, or the Application Programming Interface (API), have changed or
+  been removed. You may find that scripts which worked with BioPerl 1.4 may give
+  you warnings or may not work at all (although we have tried very hard to
+  minimize this!). Send an email to the list and we'll be happy to give you
+  pointers.
-- 
2.11.4.GIT