bioback.pod

   1 ## $Id: bioback.pod,v 1.3 1999-04-12 18:09:59 birney Exp $
   2
   3 #
   4 # Documentation about the backend of bioperl
   5 #
   6 # This is for sysadmin/db admin on a new site to get a quick
   7 # overview about how to build things on their system for bioperl
   8 #
   9
  10 =head1 NAME
  11
  12 bioperl backend - how to customise bioperl for your site
  13
  14 =head1 SYNOPSIS
  15
  16 Not really appropiate for a synopsis. Read on
  17
  18 =head1 DESCRIPTION
  19
  20 This document is designed to let you customise bioperl on
  21 your site. Bioperl can work with a number of database formats
  22 (at the moment, simple fasta flat file formats and EMBL/Swissprot
  23 .dat format), allowing users to retrieve sequences from these
  24 databases. In addition another layer, above flat file indexing
  25 is provided, allowing sites to retrieve sequences from GenBank via
  26 the web or via flat file indexing, or - if you have the time to
  27 do so, you can write your own interface to an in-house RDB. Using
  28 DBI this should be quite simple.
  29
  30 Two scripts are provided to get you started with the bioperl backend:
  31
  32 =over
  33
  34 =item bpfetch
  35
  36 Fetches sequences from a Database
  37
  38 =item bpindex
  39
  40 Builds indexes for flat files databases which are easily accessible
  41 by bpfetch
  42
  43 =back
  44
  45 The core of the backend system is found in following modules
  46
  47 =over
  48
  49 =item Bio::DB::*
  50
  51 generic access to databases, whether flat file, web or rdb. At the
  52 moment, this provides random access retrieval, on the basis of ids or
  53 accession numbers, but does not provide the ability to loop over the
  54 entire database, nor does it provide any complex querying ability.
  55
  56 Bio::DB::BioSeqI is the abstract interface (hence the I) for the
  57 databases.  Bio::DB::GenBank and Bio::DB::GenPept are concrete
  58 implementations for network access to the GenBank and GenPept
  59 databases held at NCBI, using http as a protocol.
  60
  61 =item Bio::Index::*
  62
  63 flat file indexing system, for read-only, flat file distributions. These
  64 provide for specific instances generic type access, but the underlying
  65 machinery can be customised for any number of different flat file systems.
  66
  67 The Index modules EMBL and Fasta, as they are designed as Sequence databases
  68 conform to the Bio::DB::BioSeqI interface, meaning they can be used whereever
  69 the Bio::DB::BioSeqI is expected.
  70
  71 =item Bio::SeqIO::*
  72
  73 conversion systems for Bio::Seq objects, either to or from sequence
  74 streams. The move of things into SeqIO prevents the Bio::Seq object
  75 bloating up with format code, and the SeqIO system has the benefit
  76 of being very easy to extend to new formats.
  77
  78 =back
  79
  80 =head1 SETTING UP BIOPERL INDICES
  81
  82 If you want to use the bioperl indexing of fasta and embl/swissprot
  83 .dat files then the bpfetch and bpindex scripts are great ways to
  84 start off (and also reading the scripts shows you how to use the
  85 bioperl indexing stuff). bpfetch and bpindex coordinate by the use
  86 of two environment variables
  87
  88   BIOPERL_INDEX - directory where the indices are kept
  89
  90   BIOPERL_INDEX_TYPE - type of DBM file to use for the index
  91
  92 The basic way of indexing a database, once BIOPERL_INDEX has been
  93 set up, is to go
  94
  95   bpindex <index-name> <filenames as full path>
  96
  97 eg, for Fasta files
  98
  99   bpindex est /nfs/somewhere/fastafiles/est*.fa
 100
 101 Or, for embl/swissprot files
 102
 103   bpindex -fmt=EMBL swiss /nfs/somewhere/swiss/swissprot.dat
 104
 105 To retrieve sequences from the index go
 106
 107   bpfetch <index-name>:<id>
 108
 109 eg,
 110
 111   bpfetch est:AA01234
 112
 113 or
 114
 115   bpfetch swiss:VAV_HUMAN
 116
 117
 118 bpfetch has other options to connect to genbank across the network.
 119
 120 =head1 CHECKLIST
 121
 122    make a directory called /nfs/datadisk/bioperlindex/
 123
 124    setenv BIOPERL_INDEX (or export in Bash) in the system login
 125    script to /nfs/datadisk/bioperlindex/
 126
 127    go bpindex swissprot /nfs/datadisk/swiss/swissprot.dat
 128    etc
 129
 130    You are ready to use bpfetch
 131
 132
 133
 134
 135