3 # BioPerl module for Bio::DasI
5 # Cared for by Lincoln Stein <lstein@cshl.org>
7 # Copyright Lincoln Stein
9 # You may distribute this module under the same terms as perl itself
11 # POD documentation - main docs before the code
15 Bio::DasI - DAS-style access to a feature database
19 # Open up a feature database somehow...
20 $db = Bio::DasI->new(@args);
22 @segments = $db->segment(-name => 'NT_29921.4',
26 # segments are Bio::Das::SegmentI - compliant objects
28 # fetch a list of features
29 @features = $db->features(-type=>['type1','type2','type3']);
31 # invoke a callback over features
32 $db->features(-type=>['type1','type2','type3'],
33 -callback => sub { ... }
36 $stream = $db->get_seq_stream(-type=>['type1','type2','type3']);
37 while (my $feature = $stream->next_seq) {
38 # each feature is a Bio::SeqFeatureI-compliant object
41 # get all feature types
45 %types = $db->types(-enumerate=>1);
47 @feature = $db->get_feature_by_name($class=>$name);
48 @feature = $db->get_feature_by_target($target_name);
49 @feature = $db->get_feature_by_attribute($att1=>$value1,$att2=>$value2);
50 $feature = $db->get_feature_by_id($id);
56 Bio::DasI is a simplified alternative interface to sequence annotation
57 databases used by the distributed annotation system (see
58 L<Bio::Das>). In this scheme, the genome is represented as a series of
59 features, a subset of which are named. Named features can be used as
60 reference points for retrieving "segments" (see L<Bio::Das::SegmentI>),
61 and these can, in turn, be used as the basis for exploring the genome
64 In addition to a name, each feature has a "class", which is
65 essentially a namespace qualifier and a "type", which describes what
66 type of feature it is. Das uses the GO consortium's ontology of
67 feature types, and so the type is actually an object of class
68 Bio::Das::FeatureTypeI (see L<Bio::Das::FeatureTypeI>). Bio::DasI
69 provides methods for interrogating the database for the types it
70 contains and the counts of each type.
76 User feedback is an integral part of the evolution of this and other
77 Bioperl modules. Send your comments and suggestions preferably to one
78 of the Bioperl mailing lists. Your participation is much appreciated.
84 Report bugs to the Bioperl bug tracking system to help us keep track
85 the bugs and their resolution. Bug reports can be submitted via the web:
87 http://bugzilla.open-bio.org/
89 =head1 AUTHOR - Lincoln Stein
95 The rest of the documentation details each of the object
96 methods. Internal methods are usually preceded with a _
101 # Let the code begin...
106 use Bio
::Das
::SegmentI
;
107 # Object preamble - inherits from Bio::Root::Root;
108 use base
qw(Bio::Root::RootI Bio::SeqFeature::CollectionI);
113 Usage : Bio::DasI->new(@args)
114 Function: Create new Bio::DasI object
115 Returns : a Bio::DasI object
118 The new() method creates a new object. The argument list is either a
119 single argument consisting of a connection string, or the following
120 list of -name=E<gt>value arguments:
125 -dsn Connection string for database
126 -adaptor Name of an adaptor class to use when connecting
127 -aggregator Array ref containing list of aggregators
128 "semantic mappers" to apply to database
129 -user Authentication username
130 -pass Authentication password
132 Implementors of DasI may add other arguments.
136 sub new
{shift->throw_not_implemented}
141 Usage : $db->types(@args)
142 Function: return list of feature types in database
143 Returns : a list of Bio::Das::FeatureTypeI objects
146 This routine returns a list of feature types known to the database. It
147 is also possible to find out how many times each feature occurs.
149 Arguments are -option=E<gt>value pairs as follows:
151 -enumerate if true, count the features
153 The returned value will be a list of Bio::Das::FeatureTypeI objects
154 (see L<Bio::Das::FeatureTypeI>.
156 If -enumerate is true, then the function returns a hash (not a hash
157 reference) in which the keys are the stringified versions of
158 Bio::Das::FeatureTypeI and the values are the number of times each
159 feature appears in the database.
163 sub types
{ shift->throw_not_implemented; }
168 Usage : $db->parse_types(@args)
169 Function: parses list of types
170 Returns : an array ref containing ['method','source'] pairs
171 Args : a list of types in 'method:source' form
174 This method takes an array of type names in the format "method:source"
175 and returns an array reference of ['method','source'] pairs. It will
176 also accept a single argument consisting of an array reference with
177 the list of type names.
181 # turn feature types in the format "method:source" into a list of [method,source] refs
184 return [] if !@_ or !defined($_[0]);
185 return $_[0] if ref $_[0] eq 'ARRAY' && ref $_[0][0];
186 my @types = ref($_[0]) ? @
{$_[0]} : @_;
187 my @type_list = map { [split(':',$_,2)] } @types;
194 Usage : $db->segment(@args);
195 Function: create a segment object
196 Returns : segment object(s)
199 This method generates a Bio::Das::SegmentI object (see
200 L<Bio::Das::SegmentI>). The segment can be used to find overlapping
201 features and the raw sequence.
203 When making the segment() call, you specify the ID of a sequence
204 landmark (e.g. an accession number, a clone or contig), and a
205 positional range relative to the landmark. If no range is specified,
206 then the entire region spanned by the landmark is used to generate the
209 Arguments are -option=E<gt>value pairs as follows:
211 -name ID of the landmark sequence.
213 -class A namespace qualifier. It is not necessary for the
214 database to honor namespace qualifiers, but if it
215 does, this is where the qualifier is indicated.
217 -version Version number of the landmark. It is not necessary for
218 the database to honor versions, but if it does, this is
219 where the version is indicated.
221 -start Start of the segment relative to landmark. Positions
222 follow standard 1-based sequence rules. If not specified,
223 defaults to the beginning of the landmark.
225 -end End of the segment relative to the landmark. If not specified,
226 defaults to the end of the landmark.
228 The return value is a list of Bio::Das::SegmentI objects. If the method
229 is called in a scalar context and there are no more than one segments
230 that satisfy the request, then it is allowed to return the segment.
231 Otherwise, the method must throw a "multiple segment exception".
237 sub segment
{ shift->throw_not_implemented }
242 Usage : $db->features(@args)
243 Function: get all features, possibly filtered by type
244 Returns : a list of Bio::SeqFeatureI objects
248 This routine will retrieve features in the database regardless of
249 position. It can be used to return all features, or a subset based on
252 Arguments are -option=E<gt>value pairs as follows:
254 -types List of feature types to return. Argument is an array
255 of Bio::Das::FeatureTypeI objects or a set of strings
256 that can be converted into FeatureTypeI objects.
258 -callback A callback to invoke on each feature. The subroutine
259 will be passed each Bio::SeqFeatureI object in turn.
261 -attributes A hash reference containing attributes to match.
263 The -attributes argument is a hashref containing one or more attributes
266 -attributes => { Gene => 'abc-1',
267 Note => 'confirmed' }
269 Attribute matching is simple exact string matching, and multiple
270 attributes are ANDed together. See L<Bio::DB::ConstraintsI> for a
271 more sophisticated take on this.
273 If one provides a callback, it will be invoked on each feature in
274 turn. If the callback returns a false value, iteration will be
275 interrupted. When a callback is provided, the method returns undef.
279 sub features
{ shift->throw_not_implemented }
281 =head2 get_feature_by_name
283 Title : get_feature_by_name
284 Usage : $db->get_feature_by_name(-class=>$class,-name=>$name)
285 Function: fetch features by their name
286 Returns : a list of Bio::SeqFeatureI objects
287 Args : the class and name of the desired feature
290 This method can be used to fetch named feature(s) from the database.
291 The -class and -name arguments have the same meaning as in segment(),
292 and the method also accepts the following short-cut forms:
294 1) one argument: the argument is treated as the feature name
295 2) two arguments: the arguments are treated as the class and name
296 (note: this uses _rearrange() so the first argument must not
297 begin with a hyphen or it will be interpreted as a named
300 This method may return zero, one, or several Bio::SeqFeatureI objects.
301 The implementor may allow the name to contain wildcards, in which case
302 standard C-shell glob semantics are expected.
306 sub get_feature_by_name
{
307 shift->throw_not_implemented();
310 =head2 get_feature_by_target
312 Title : get_feature_by_target
313 Usage : $db->get_feature_by_target($class => $name)
314 Function: fetch features by their similarity target
315 Returns : a list of Bio::SeqFeatureI objects
316 Args : the class and name of the desired feature
319 This method can be used to fetch a named feature from the database
320 based on its similarity hit. The arguments are the same as
321 get_feature_by_name(). If this is not implemented, the interface
322 defaults to using get_feature_by_name().
326 sub get_feature_by_target
{
327 shift->get_feature_by_name(@_);
330 =head2 get_feature_by_id
332 Title : get_feature_by_id
333 Usage : $db->get_feature_by_target($id)
334 Function: fetch a feature by its ID
335 Returns : a Bio::SeqFeatureI objects
336 Args : the ID of the feature
339 If the database provides unique feature IDs, this can be used to
340 retrieve a single feature from the database. If not overridden, this
341 interface calls get_feature_by_name() and returns the first element.
345 sub get_feature_by_id
{
346 (shift->get_feature_by_name(@_))[0];
349 =head2 get_feature_by_attribute
351 Title : get_feature_by_attribute
352 Usage : $db->get_feature_by_attribute(attribute1=>value1,attribute2=>value2)
353 Function: fetch features by combinations of attribute values
354 Returns : a list of Bio::SeqFeatureI objects
355 Args : the class and name of the desired feature
358 This method can be used to fetch a set of features from the database.
359 Attributes are a list of name=E<gt>value pairs. They will be
360 logically ANDed together. If an attribute value is an array
361 reference, the list of values in the array is treated as an
362 alternative set of values to be ORed together.
366 sub get_feature_by_attribute
{
367 shift->throw_not_implemented();
374 Usage : $db->search_notes($search_term,$max_results)
375 Function: full-text search on features, ENSEMBL-style
376 Returns : an array of [$name,$description,$score]
380 This routine performs a full-text search on feature attributes (which
381 attributes depend on implementation) and returns a list of
382 [$name,$description,$score], where $name is the feature ID,
383 $description is a human-readable description such as a locus line, and
384 $score is the match strength.
386 Since this is a decidedly non-standard thing to do (but the generic
387 genome browser uses it), the default method returns an empty list.
388 You do not have to implement it.
392 sub search_notes
{ return }
394 =head2 get_seq_stream
396 Title : get_seq_stream
397 Usage : $seqio = $db->get_seq_stream(@args)
398 Function: Performs a query and returns an iterator over it
399 Returns : a Bio::SeqIO stream capable of returning Bio::SeqFeatureI objects
400 Args : As in features()
403 This routine takes the same arguments as features(), but returns a
404 Bio::SeqIO::Stream-compliant object. Use it like this:
406 $stream = $db->get_seq_stream('exon');
407 while (my $exon = $stream->next_seq) {
411 NOTE: In the interface this method is aliased to get_feature_stream(),
412 as the name is more descriptive.
416 sub get_seq_stream
{ shift->throw_not_implemented }
417 sub get_feature_stream
{shift->get_seq_stream(@_) }
422 Usage : $class = $db->refclass
423 Function: returns the default class to use for segment() calls
428 For data sources which use namespaces to distinguish reference
429 sequence accessions, this returns the default namespace (or "class")
430 to use. This interface defines a default of "Accession".
434 sub refclass
{ "Accession" }