README

   1 # $Id$
   2
   3 The bioperl-db package contains interfaces and adaptors that work
   4 with a BioSQL database and serialize and de-serialize Bioperl
   5 objects.
   6
   7 ===================================================================
   8 Information about BioSQL and bioperl-db
   9 ===================================================================
  10
  11 This project was started by Ewan Birney with major work by Elia Stupka
  12 and continued support by Hilmar Lapp and the Bioperl community. It's
  13 purpose is to create a standalone sequence database with little
  14 external dependencies and tight integration with Bioperl. Support for
  15 more databases and bindings in Java and python by Biojava and
  16 Biopython projects is welcomed, and a working prototype was one of
  17 the accomplishments of the February 2002 hackathon in South Africa.
  18 All questions and comments should be directed to the bioperl list
  19 <bioperl-l@bioperl.org> and more information can be found about the
  20 related projects at http://www.bioperl.org and http://www.open-bio.org.
  21
  22 I. Scripts located in the scripts/ directory:
  23
  24 biosql/load_seqdatabase.pl      - a very flexible script to load
  25                                   sequences into a BioSQL database
  26 biosql/bioentry2flat.pl         - dump sequence data into rich sequence
  27                                   format flatfile from a BioSQL database
  28 biosql/load_ontology.pl         - load GO or SOFA ontology data into
  29                                   a BioSQL database
  30 biosql/merge-unique-ann.pl      - script used by load_seqdatabase.pl,
  31                                   it merges features and annotations
  32 biosql/update-on-new-date.pl    - script used by load_seqdatabase.pl,
  33                                   it will update based on date
  34 biosql/update-on-new-version.pl - script used by load_seqdatabase.pl,
  35                                   it will update based on version
  36 biosql/cgi-bin/getentry.pl      - example CGI script that fetches from
  37                                   a BioSQL database
  38 biosql/clean_ontology.pl        - "cleans" an ontology in a BioSQL
  39                                   database without foreign keys
  40 biosql/del-assocs-sql.pl        - script used by load-seqdatabase.pl,
  41                                   removes annotations and features
  42 biosql/freshen-annot.pl         - script used by load-seqdatabase.pl,
  43                                   removes annotations and features
  44 biosql/load_interpro.pl         - deprecated
  45 biosql/terms/add-term-annot.pl  - load ontology terms read from a file
  46 biosql/terms/importrelation.pl  - load relations between ontology
  47                                   terms and bioentries
  48 biosql/terms/interpro2go.pl     - load from interpro2go file
  49 corba/caching_corba_server.pl   - setup a corba sequence caching server
  50 corba/test_bioenv.pl            - test the bioenv of a running server
  51 corba/bioenv_server.pl          - setup a CORBA sequence server
  52
  53 There is also a script called scripts/load_ncbi_taxonomy.pl in the
  54 BioSQL package that loads taxonomic data from NCBI. Most users use
  55 this script to load the NCBI taxonomy into their Biosql databases
  56 before loading sequences with load_seqdatabase.pl.
  57
  58 II. Some background information and how it all works:
  59
  60 The adaptor code in Bio::DB and Bio::DB::BioSQL was completely
  61 refactored and architected from scratch since the previous branch
  62 bioperl-release-1-1-0. Meanwhile almost everything works. The
  63 following things are unsupported or do not work yet:
  64
  65      - sub-seqfeatures
  66      - round-tripping fuzzy locations (they will be stored according
  67           to their Bio::Location::CoordinatePolicyI interpretation)
  68      - Bio::Annotation::DBLink::optional_id
  69
  70 To understand the layout of the API and how you can interact with the
  71 adaptors to formulate your own queries, here is what you should know
  72 and read (i.e., read the PODs of all interfaces and modules named
  73 below).
  74
  75 1) Bio::DB::BioDB acts as a factory of database adaptors, where a
  76 database adaptor encapsulates an entire database, not a specific
  77 object-relational mapping or table. It is similar Bio::SeqIO in
  78 bioperl, where you specify a format and get back a parser for that
  79 format. Here you specify the database and get back a persistence
  80 factory for that database. Note that the only database really
  81 supported right now in this framework is BioSQL.
  82
  83 2) The persistence factory returned by Bio::DB::BioDB->new() will
  84 implement Bio::DB::DBAdaptorI. It allows you to obtain a persistence
  85 adaptor for an object at hand, and to turn an object into a persistent
  86 object.
  87
  88 3) A persistent object will implement Bio::DB::PersistentObjectI. A
  89 persistent object can be updated in and removed from the database. It
  90 also knows about its primary key in the database once it has been
  91 created or found in the database. A persistent object will still
  92 implement all interfaces and all methods that the non-persistent base
  93 object implements. E.g., a persistent sequence object will implement
  94 Bio::DB::PersistentObjectI and Bio::PrimarySeqI (or Bio::SeqI).
  95
  96 4) A persistence adaptor will implement Bio::DB::PersistenceAdaptorI.
  97 Apart from actually implementing all the persistence methods for
  98 persistent objects, a persistence adaptor allows you to locate
  99 objects in the database by key and by query. You can
 100 find_by_primary_key(), find_by_unique_key(), find_by_association(),
 101 and find_by_query(). The latter allows you to formulate object queries
 102 as Bio::DB::Query::BioQuery objects and retrieve the matching objects.
 103
 104 5) The guiding principle for the redesign of the adaptors was to
 105 separate business logic from schema logic. While business logic is
 106 largely driven by the object model (hence, by the bioperl object
 107 model) and therefore is mostly independent from the schema, but
 108 specific to the object model, the schema logic is driven by and
 109 specific to the relational model. The schema logic will therefore need
 110 to differ from one schema to another and even from one schema flavor
 111 to another for very similar relational models, whereas the business
 112 logic is unaffected by this.
 113
 114 This had two consequences. First, the user interacts with the adaptors
 115 without knowing anything about the specific schema behind it. All
 116 interaction takes place in object space. You construct queries by
 117 specifying object slots and the values they should have or
 118 match. Joins and associations are also specified on the object level
 119 (cf. Bio::DB::Query::BioQuery). Internally, the respective drivers for
 120 the particular schema translate those queries into schema-specific SQL
 121 queries.
 122
 123 The second consequence was that every persistence adaptor is divided
 124 into two layers, the persistence adaptor itself which does not contain
 125 a single SQL phrase or query, and its schema-specific driver which
 126 implements those functions which cannot be accomplished without
 127 actually doing the concrete object-relational mapping.
 128
 129 ===================================================================
 130 Information about Bio::DB::Map modules and database interface
 131 ===================================================================
 132
 133 These modules have been deprecated and are no longer in the distribution.
 134