bioperl.pod

   1 ## $Id: bioperl.pod,v 1.4 1999-04-12 18:10:00 birney Exp $
   2
   3 ## Should contain general info about the distribution including
   4 ## links to all the various modules.
   5 ##
   6 ## 'cookbook' type examples are probably better off being placed
   7 ## in the local embedded module PODs. This will make it easier for
   8 ## authors to update and maintain.
   9
  10 =head1 NAME
  11
  12 Bioperl - Coordinated OOP-Perl Modules for Biology
  13
  14 =head1 SYNOPSIS
  15
  16 Not very appropriate to put a synopsis  - many
  17 different objects to use. Read on...
  18
  19 =head1 DESCRIPTION
  20
  21 Bioperl contains a number of Perl objects which are useful in biology.
  22 Examples include Sequence objects, Alignment objects and database
  23 searching objects. These objects not only do what they are advertised
  24 to do in the documentation, but they also interact - Alignment
  25 objects are made from the Sequence objects and so on. This means that
  26 the objects provide a coordinated framework to do computational
  27 biology.
  28
  29 Bioperl development is focused on the objects themselves, and less on the
  30 scripts (programs) that put these objects together. There are some example
  31 scripts provided in the distribution, but it is not the focus of the
  32 objects that are distributed. Of course, as the objects do most of the
  33 hardwork for you, all you have to do is combine a number of objects
  34 together sensibly.
  35
  36 The intent of the bioperl development effort is to make reusable tools
  37 that aid people in creating their own site or job specific applications.
  38
  39 The bio.perl.org (http://bio.perl.org) website also attempts to maintain
  40 links and archives of standalone bio-related perl tools that are not
  41 affiliated or related to the core bioperl effort. Check the site for
  42 useful code ideas and contribute your own if possible.
  43
  44
  45 =head1 INSTALLATION
  46
  47 The Bioperl modules are distributed as a tar file that expands into a
  48 standard perl CPAN distribution.  Detailed installation directions
  49 can be found in the distribution README file.
  50
  51 The Bioperl modules can now interact with local flat file databases.
  52 To learn how to set this up, look at the bioback.pod documentation
  53 (perldoc bioback will work once it has been installed. Alternatively
  54 go perldoc bioback.pod directly).
  55
  56 =head1 GETTING STARTED
  57
  58 The directory scripts/ have fully working, industrial strength scripts
  59 for use with bioperl. These are documented (perldoc <scriptname> will
  60 work). This area only started in the 0.05 distribution, and so not
  61 that many scripts have been written (you are more than welcome to
  62 contribute!)
  63
  64 The example scripts in the distribution I<examples/> directory and sub
  65 directories therein give you an idea of how to use some of the modules
  66 and driver code.
  67
  68 If you have installed bioperl in the standard way, as said in the
  69 README in the distribution these examples should work by just running
  70 them. If you have a not installed it in a standard way you will
  71 have to change the 'use lib' to point to your installation.
  72
  73
  74 I<examples/rev_and_trans.pl> - examples using Bio::Seq.pm for reversing and translating sequences
  75
  76 I<examples/restriction.pl> - example code for using the Bio::Tools::RestrictionEnzyme.pm module.
  77
  78 I<examples/simplealign.pl> - example code for using the Bio::SimpleAlign module.
  79
  80 I<examples/psw.pl> - example code for using the XS extensions for a Protein Smith-Waterman comparison.
  81
  82 I<examples/blast/> - example code for using the Bio::Tools::Blast.pm module.
  83
  84 I<examples/seq/> - example code for working with multiple sequence files.
  85
  86 I<examples/root_object/> - example code for using Bio::Root::Object.pm.
  87
  88
  89 =head1 OBJECTS
  90
  91
  92 =head2 Sequence objects
  93
  94 I<Bio::Seq> Sequence object
  95
  96 This module is the generic sequence object which lies at
  97 the core of the bioperl project. It stores DNA, RNA, or
  98 amino acid sequence information and brief annotation. It has
  99 associated methods to perform various manipulations of
 100 sequences and support for a reading and writing sequence
 101 data in a variety of file formats.
 102
 103 Seq.pm has its own detailed documentation.
 104
 105 =head2 Blast objects
 106
 107 I<Bio::Tools::Blast>
 108
 109 The Bio::Tools::Blast.pm module encapsulates data and
 110 methods for running, parsing, and analyzing pre-existing
 111 BLAST reports.
 112
 113 Blast.pm and all associated helper modules all have
 114 their own detailed documentation.
 115
 116 FEATURES:
 117
 118        o Supports NCBI Blast1.x, Blast2.x, and WashU-Blast2.x,
 119          gapped and ungapped. Can parse HTML-formatted as well
 120          as non-HTML-formatted reports.
 121
 122        o Launch new Blast analyses remotely or locally.
 123          Blast objects can be constructed directly from the
 124          results of the run.  (Support for local Blasts
 125          is not yet complete.)
 126
 127        o Construct Blast objects from pre-existing files or from
 128          a new run.
 129          Build a Blast object from a single file or build
 130          multiple Blast objects from an input stream containing
 131          multiple reports.
 132
 133        o Add hypertext links from a BLAST report.
 134
 135        o Generate sequence and sequence alignment objects from
 136          HSP sequences.
 137
 138
 139 =head2 Restriction Enzyme objects
 140
 141 I<Bio::Tools::RestrictionEnzyme>
 142
 143 The Bio::Tools::RestrictionEnzyme.pm module encapsulates
 144 generic data and methods for using restriction
 145 endonucleases for in silico restriction analysis of DNA
 146 sequences.
 147
 148 RestrictionEnzyme.pm has its own detailed documentation.
 149
 150 =head2 Protein Smith-Waterman alignment objects
 151
 152 I<Bio::Tools::pSW>
 153
 154 This module allows the production of Smith Waterman alignments.
 155 I<Warning> it requires a compiled-C extension (bp_sw) which is
 156 provided in the distribution. The Bio::Tools::pSW object is
 157 an object factory which builds Bio::SimpleAlign objects from
 158 two protein sequences objects (Bio::Seq).
 159
 160 DNA alignments will be added soon.
 161
 162 =head2 SimpleAlign objects
 163
 164 I<Bio::SimpleAlign>
 165
 166 Bio::SimpleAlign encapsulates multiple alignments as simple blocks of
 167 immutable sequences. This modules provides principly I/O of multiple
 168 alignments and some easy ways to iterate over an alignment
 169
 170 It is not capable of complex join or editing functions, which is
 171 better provided by Georg Fuellen's UnivAln module. Nor does it
 172 I<make> alignments, which must be done by external programs or, for
 173 pairwise alignments, the Bio::Tools::pSW module
 174
 175 =head2 Other Modules
 176
 177 (** to be completed **)
 178
 179 =head2 Using the Bio::Root modules with your own perl5 classes
 180
 181 The example directory in the bioperl distribution I<examples/root_object/>
 182 contains code and scripts that show how the Bio::Root modules can be used
 183 as a foundation for robust and fault tolerant perl5 classes.
 184
 185 =head1 GETTING INVOLVED
 186
 187 Bioperl is a completely open community of developers. We are not
 188 funded and we don't have a mission statement. We encourage
 189 collaborative code, in particular in perl. You can help us in many
 190 different ways, from just a simple statement about how you have used
 191 bioperl to do something interesting to contributing a whole new object
 192 heirarchy. See http://bio.perl.org for more information. Here are
 193 some ways of helping us
 194
 195 =head2 Telling us you used it
 196
 197 We are very interested to hear how you experienced using bioperl. Did
 198 it install cleanly? Did you understand the documentation? Could you
 199 get the objects to do what you wanted it to do? If bioperl was useless
 200 we want to know why, and if it was great - that too. Post a message to
 201 vsns-bcd-perl-guts@lists.uni-bielefeld.de (the bioperl 'guts' mailing
 202 list, where all the developers are).
 203
 204 Only by getting people's feedback do we know whether we are providing
 205 anything useful.
 206
 207 =head2 Writing a script that uses it
 208
 209 By writing a good script that uses bioperl you both show that bioperl
 210 is useful and probably save someone elsewhere writing it. If you
 211 contribute it to the 'script central' at http://bio.perl.org then other
 212 people can view and use it
 213
 214 =head2 Find bugs!
 215
 216 We know that there are bugs in there. If you find something which you are
 217 pretty sure is a problem, post a note to bioperl-bugs@bio.perl.org and
 218 we will get on it as soon as possible. (you can also access the bug
 219 system through the web pages).
 220
 221 =head2 Suggest new functionality
 222
 223 You can suggest areas where the objects are not ideally written and
 224 could be done better. The best way is to find the main developer
 225 of the module (each module was written principly by one person
 226 except for Seq.pm). Talk to him or her and suggest changes.
 227
 228 =head2 Make your own objects
 229
 230 If you can make a useful object we will happily include it into the
 231 core. Probably you will want to read alot of the documentation
 232 in the Bio::Root section and also talk to people on the 'guts'
 233 mailing list vsns-bcd-perl-guts@lists.uni-bielefeld.de
 234
 235
 236 =head1 CONVENTIONS
 237
 238 =head2 Alphabets
 239
 240 Bioperl modules use the standard extended single-letter genetic
 241 alphabets to represent nucleotide and amino acid sequences.
 242
 243 In addition to the standard alphabet, the following symbols
 244 are also acceptable in a biosequence:
 245
 246  ?  (a missing nucleotide or amino acid)
 247  -  (gap in sequence)
 248
 249 =head2 Extended Dna / Rna alphabet
 250
 251  (includes symbols for nucleotide ambiguity)
 252  ------------------------------------------
 253  Symbol       Meaning      Nucleic Acid
 254  ------------------------------------------
 255   A            A           Adenine
 256   C            C           Cytosine
 257   G            G           Guanine
 258   T            T           Thymine
 259   U            U           Uracil
 260   M          A or C
 261   R          A or G
 262   W          A or T
 263   S          C or G
 264   Y          C or T
 265   K          G or T
 266   V        A or C or G
 267   H        A or C or T
 268   D        A or G or T
 269   B        C or G or T
 270   X      G or A or T or C
 271   N      G or A or T or C
 272
 273
 274  IUPAC-IUB SYMBOLS FOR NUCLEOTIDE NOMENCLATURE:
 275    Cornish-Bowden (1985) Nucl. Acids Res. 13: 3021-3030.
 276
 277
 278 =head2  Amino Acid alphabet
 279
 280  ------------------------------------------
 281  Symbol           Meaning
 282  ------------------------------------------
 283  A        Alanine
 284  B        Aspartic Acid, Asparagine
 285  C        Cystine
 286  D        Aspartic Acid
 287  E        Glutamic Acid
 288  F        Phenylalanine
 289  G        Glycine
 290  H        Histidine
 291  I        Isoleucine
 292  K        Lysine
 293  L        Leucine
 294  M        Methionine
 295  N        Asparagine
 296  P        Proline
 297  Q        Glutamine
 298  R        Arginine
 299  S        Serine
 300  T        Threonine
 301  V        Valine
 302  W        Tryptophan
 303  X        Unknown
 304  Y        Tyrosine
 305  Z        Glutamic Acid, Glutamine
 306  *        Terminator
 307
 308
 309  IUPAC-IUP AMINO ACID SYMBOLS:
 310    Biochem J. 1984 Apr 15; 219(2): 345-373
 311    Eur J Biochem. 1993 Apr 1; 213(1): 2
 312
 313
 314
 315 =head1 TODO
 316
 317 There are many many aspects of bioperl that are being worked on or
 318 should be worked on. Below lists a non exhaustive set: it is very likely
 319 by the time you read this document that some of these things have been
 320 done already, so check out http://bio.perl.org for more details.
 321 Some modules have their own TODO section, which will contain module-
 322 specific action items.
 323
 324 =over
 325
 326 =item Documentation clean-up
 327
 328 'Meta' documentation is still spread around the different objects
 329 rather than in here
 330
 331 =item Bio::SeqFeature and Bio::Entry
 332
 333 The Bio::Entry is the 'heavy weight' sequence object which is more
 334 like a representation of a Genbank or EMBL flat file complete with
 335 feature table. Ian Korf is working on these objects.
 336
 337 =item Bio::Structure
 338
 339 A number of people have been talking about a Structure object
 340 (probably an object heirarchy). See http://bio.perl.org/Projects/Structure
 341 for the current state-of-affairs in this area and to learn how
 342 to get involved.
 343
 344
 345 =item Bio::SearchDist
 346
 347 A perl wrapper for Sean Eddy's Extreme Value and Gaussian fitting
 348 code (already provided in the bp_sw XS extension). Ewan Birney
 349 is working on this.
 350
 351 =item Bio::Tools::dAlign
 352
 353 DNA alignments like pSW provided in a perl wrapper. Ewan Birney is
 354 working on this.
 355
 356
 357 =item Perl version support
 358
 359 All modules included in the intial Bioperl distribution support 5.003 and
 360 higher. At some point we will begin adding features and modules that require
 361 later versions of perl. Individual modules perhaps should explicitly impose
 362 their own perl version requirements. Consider this issue open for discussion
 363 on the guts (developer) mailing list.
 364
 365 =back
 366
 367 =head1 ACKNOWLEDGEMENTS
 368
 369 Bioperl owes its early organizational support to its association with
 370 the award-winning VSNS-BCD BioComputing Courses; some students of the
 371 1996 course (Chris Dagdigian, Richard Resnick, Lew Gramer, Alessandro
 372 Guffanti, and others) have contributed code and commentary. Georg
 373 Fuellen, the VSNS-BCD chief organizer is one of the driving forces
 374 behind bioperl. Steve Brenner, who was an early adopter of Perl for
 375 bioinformatics provided some of the early work on bioperl.
 376
 377 Bioperl was then taken up by people developing code at the large
 378 genome centres. In particular at Stanford, Steve Chervitz (the current
 379 bioperl coordinator), at the Genome Sequencing Centre (St Louis) Ian
 380 Korf and at the Sanger Centre (Cambridge UK) Ewan Birney.  All of the
 381 C code XS extensions were provided by Ewan Birney. Bioperl is used in
 382 anger at these sites, indicating that is both useful and that it
 383 works.
 384
 385 Uni-bielefeld provides us with our Mailing lists.
 386
 387 Jon Orwant at The Perl Journal (http://www.tpj.com) gave us permission to reprint
 388 Lincoln Stein's great article on our website. He also worked with the Perl Institute
 389 (http://www.perl.org) to arrange our perl.org DNS entry.
 390
 391 Hardware for bio.perl.org was donated by Genetics Institute (http://www.genetics.com)
 392
 393 Bandwith and internet connectivity (ISDN) donated by Chris Dagdigian until we
 394 co-locate somewhere faster :)
 395
 396
 397 =head1 COPYRIGHT
 398
 399  Copyright (c) 1996-1998 Georg Fuellen, Richard Resnick, Steven E. Brenner,
 400  Chris Dagdigian, Steve A. Chervitz, Ewan Birney and others. All Rights Reserved.
 401  This module is free software; you can redistribute it and/or modify
 402  it under the same terms as Perl itself.
 403
 404 =cut
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414