* sync with trunk
[bioperl-live.git] / Bio / DasI.pm
blobb5c43abab67cb2ab717a7243c492b77d5d65a143
1 # $Id$
3 # BioPerl module for Bio::DasI
5 # Cared for by Lincoln Stein <lstein@cshl.org>
7 # Copyright Lincoln Stein
9 # You may distribute this module under the same terms as perl itself
11 # POD documentation - main docs before the code
13 =head1 NAME
15 Bio::DasI - DAS-style access to a feature database
17 =head1 SYNOPSIS
19 # Open up a feature database somehow...
20 $db = Bio::DasI->new(@args);
22 @segments = $db->segment(-name => 'NT_29921.4',
23 -start => 1,
24 -end => 1000000);
26 # segments are Bio::Das::SegmentI - compliant objects
28 # fetch a list of features
29 @features = $db->features(-type=>['type1','type2','type3']);
31 # invoke a callback over features
32 $db->features(-type=>['type1','type2','type3'],
33 -callback => sub { ... }
36 $stream = $db->get_seq_stream(-type=>['type1','type2','type3']);
37 while (my $feature = $stream->next_seq) {
38 # each feature is a Bio::SeqFeatureI-compliant object
41 # get all feature types
42 @types = $db->types;
44 # count types
45 %types = $db->types(-enumerate=>1);
47 @feature = $db->get_feature_by_name($class=>$name);
48 @feature = $db->get_feature_by_target($target_name);
49 @feature = $db->get_feature_by_attribute($att1=>$value1,$att2=>$value2);
50 $feature = $db->get_feature_by_id($id);
52 $error = $db->error;
54 =head1 DESCRIPTION
56 Bio::DasI is a simplified alternative interface to sequence annotation
57 databases used by the distributed annotation system (see
58 L<Bio::Das>). In this scheme, the genome is represented as a series of
59 features, a subset of which are named. Named features can be used as
60 reference points for retrieving "segments" (see L<Bio::Das::SegmentI>),
61 and these can, in turn, be used as the basis for exploring the genome
62 further.
64 In addition to a name, each feature has a "class", which is
65 essentially a namespace qualifier and a "type", which describes what
66 type of feature it is. Das uses the GO consortium's ontology of
67 feature types, and so the type is actually an object of class
68 Bio::Das::FeatureTypeI (see L<Bio::Das::FeatureTypeI>). Bio::DasI
69 provides methods for interrogating the database for the types it
70 contains and the counts of each type.
72 =head1 FEEDBACK
74 =head2 Mailing Lists
76 User feedback is an integral part of the evolution of this and other
77 Bioperl modules. Send your comments and suggestions preferably to one
78 of the Bioperl mailing lists. Your participation is much appreciated.
80 bioperl-l@bioperl.org
82 =head2 Reporting Bugs
84 Report bugs to the Bioperl bug tracking system to help us keep track
85 the bugs and their resolution. Bug reports can be submitted via the web:
87 http://bugzilla.open-bio.org/
89 =head1 AUTHOR - Lincoln Stein
91 Email lstein@cshl.org
93 =head1 APPENDIX
95 The rest of the documentation details each of the object
96 methods. Internal methods are usually preceded with a _
98 =cut
101 # Let the code begin...
103 package Bio::DasI;
104 use strict;
106 use Bio::Das::SegmentI;
107 # Object preamble - inherits from Bio::Root::Root;
108 use base qw(Bio::Root::RootI Bio::SeqFeature::CollectionI);
110 =head2 new
112 Title : new
113 Usage : Bio::DasI->new(@args)
114 Function: Create new Bio::DasI object
115 Returns : a Bio::DasI object
116 Args : see below
118 The new() method creates a new object. The argument list is either a
119 single argument consisting of a connection string, or the following
120 list of -name=E<gt>value arguments:
122 Argument Description
123 -------- -----------
125 -dsn Connection string for database
126 -adaptor Name of an adaptor class to use when connecting
127 -aggregator Array ref containing list of aggregators
128 "semantic mappers" to apply to database
129 -user Authentication username
130 -pass Authentication password
132 Implementors of DasI may add other arguments.
134 =cut
136 sub new {shift->throw_not_implemented}
138 =head2 types
140 Title : types
141 Usage : $db->types(@args)
142 Function: return list of feature types in database
143 Returns : a list of Bio::Das::FeatureTypeI objects
144 Args : see below
146 This routine returns a list of feature types known to the database. It
147 is also possible to find out how many times each feature occurs.
149 Arguments are -option=E<gt>value pairs as follows:
151 -enumerate if true, count the features
153 The returned value will be a list of Bio::Das::FeatureTypeI objects
154 (see L<Bio::Das::FeatureTypeI>.
156 If -enumerate is true, then the function returns a hash (not a hash
157 reference) in which the keys are the stringified versions of
158 Bio::Das::FeatureTypeI and the values are the number of times each
159 feature appears in the database.
161 =cut
163 sub types { shift->throw_not_implemented; }
165 =head2 parse_types
167 Title : parse_types
168 Usage : $db->parse_types(@args)
169 Function: parses list of types
170 Returns : an array ref containing ['method','source'] pairs
171 Args : a list of types in 'method:source' form
172 Status : internal
174 This method takes an array of type names in the format "method:source"
175 and returns an array reference of ['method','source'] pairs. It will
176 also accept a single argument consisting of an array reference with
177 the list of type names.
179 =cut
181 # turn feature types in the format "method:source" into a list of [method,source] refs
182 sub parse_types {
183 my $self = shift;
184 return [] if !@_ or !defined($_[0]);
185 return $_[0] if ref $_[0] eq 'ARRAY' && ref $_[0][0];
186 my @types = ref($_[0]) ? @{$_[0]} : @_;
187 my @type_list = map { [split(':',$_,2)] } @types;
188 return \@type_list;
191 =head2 segment
193 Title : segment
194 Usage : $db->segment(@args);
195 Function: create a segment object
196 Returns : segment object(s)
197 Args : see below
199 This method generates a Bio::Das::SegmentI object (see
200 L<Bio::Das::SegmentI>). The segment can be used to find overlapping
201 features and the raw sequence.
203 When making the segment() call, you specify the ID of a sequence
204 landmark (e.g. an accession number, a clone or contig), and a
205 positional range relative to the landmark. If no range is specified,
206 then the entire region spanned by the landmark is used to generate the
207 segment.
209 Arguments are -option=E<gt>value pairs as follows:
211 -name ID of the landmark sequence.
213 -class A namespace qualifier. It is not necessary for the
214 database to honor namespace qualifiers, but if it
215 does, this is where the qualifier is indicated.
217 -version Version number of the landmark. It is not necessary for
218 the database to honor versions, but if it does, this is
219 where the version is indicated.
221 -start Start of the segment relative to landmark. Positions
222 follow standard 1-based sequence rules. If not specified,
223 defaults to the beginning of the landmark.
225 -end End of the segment relative to the landmark. If not specified,
226 defaults to the end of the landmark.
228 The return value is a list of Bio::Das::SegmentI objects. If the method
229 is called in a scalar context and there are no more than one segments
230 that satisfy the request, then it is allowed to return the segment.
231 Otherwise, the method must throw a "multiple segment exception".
233 =cut
237 sub segment { shift->throw_not_implemented }
239 =head2 features
241 Title : features
242 Usage : $db->features(@args)
243 Function: get all features, possibly filtered by type
244 Returns : a list of Bio::SeqFeatureI objects
245 Args : see below
246 Status : public
248 This routine will retrieve features in the database regardless of
249 position. It can be used to return all features, or a subset based on
250 their type
252 Arguments are -option=E<gt>value pairs as follows:
254 -types List of feature types to return. Argument is an array
255 of Bio::Das::FeatureTypeI objects or a set of strings
256 that can be converted into FeatureTypeI objects.
258 -callback A callback to invoke on each feature. The subroutine
259 will be passed each Bio::SeqFeatureI object in turn.
261 -attributes A hash reference containing attributes to match.
263 The -attributes argument is a hashref containing one or more attributes
264 to match against:
266 -attributes => { Gene => 'abc-1',
267 Note => 'confirmed' }
269 Attribute matching is simple exact string matching, and multiple
270 attributes are ANDed together. See L<Bio::DB::ConstraintsI> for a
271 more sophisticated take on this.
273 If one provides a callback, it will be invoked on each feature in
274 turn. If the callback returns a false value, iteration will be
275 interrupted. When a callback is provided, the method returns undef.
277 =cut
279 sub features { shift->throw_not_implemented }
281 =head2 get_feature_by_name
283 Title : get_feature_by_name
284 Usage : $db->get_feature_by_name(-class=>$class,-name=>$name)
285 Function: fetch features by their name
286 Returns : a list of Bio::SeqFeatureI objects
287 Args : the class and name of the desired feature
288 Status : public
290 This method can be used to fetch named feature(s) from the database.
291 The -class and -name arguments have the same meaning as in segment(),
292 and the method also accepts the following short-cut forms:
294 1) one argument: the argument is treated as the feature name
295 2) two arguments: the arguments are treated as the class and name
296 (note: this uses _rearrange() so the first argument must not
297 begin with a hyphen or it will be interpreted as a named
298 argument).
300 This method may return zero, one, or several Bio::SeqFeatureI objects.
301 The implementor may allow the name to contain wildcards, in which case
302 standard C-shell glob semantics are expected.
304 =cut
306 sub get_feature_by_name {
307 shift->throw_not_implemented();
310 =head2 get_feature_by_target
312 Title : get_feature_by_target
313 Usage : $db->get_feature_by_target($class => $name)
314 Function: fetch features by their similarity target
315 Returns : a list of Bio::SeqFeatureI objects
316 Args : the class and name of the desired feature
317 Status : public
319 This method can be used to fetch a named feature from the database
320 based on its similarity hit. The arguments are the same as
321 get_feature_by_name(). If this is not implemented, the interface
322 defaults to using get_feature_by_name().
324 =cut
326 sub get_feature_by_target {
327 shift->get_feature_by_name(@_);
330 =head2 get_feature_by_id
332 Title : get_feature_by_id
333 Usage : $db->get_feature_by_target($id)
334 Function: fetch a feature by its ID
335 Returns : a Bio::SeqFeatureI objects
336 Args : the ID of the feature
337 Status : public
339 If the database provides unique feature IDs, this can be used to
340 retrieve a single feature from the database. If not overridden, this
341 interface calls get_feature_by_name() and returns the first element.
343 =cut
345 sub get_feature_by_id {
346 (shift->get_feature_by_name(@_))[0];
349 =head2 get_feature_by_attribute
351 Title : get_feature_by_attribute
352 Usage : $db->get_feature_by_attribute(attribute1=>value1,attribute2=>value2)
353 Function: fetch features by combinations of attribute values
354 Returns : a list of Bio::SeqFeatureI objects
355 Args : the class and name of the desired feature
356 Status : public
358 This method can be used to fetch a set of features from the database.
359 Attributes are a list of name=E<gt>value pairs. They will be
360 logically ANDed together. If an attribute value is an array
361 reference, the list of values in the array is treated as an
362 alternative set of values to be ORed together.
364 =cut
366 sub get_feature_by_attribute {
367 shift->throw_not_implemented();
371 =head2 search_notes
373 Title : search_notes
374 Usage : $db->search_notes($search_term,$max_results)
375 Function: full-text search on features, ENSEMBL-style
376 Returns : an array of [$name,$description,$score]
377 Args : see below
378 Status : public
380 This routine performs a full-text search on feature attributes (which
381 attributes depend on implementation) and returns a list of
382 [$name,$description,$score], where $name is the feature ID,
383 $description is a human-readable description such as a locus line, and
384 $score is the match strength.
386 Since this is a decidedly non-standard thing to do (but the generic
387 genome browser uses it), the default method returns an empty list.
388 You do not have to implement it.
390 =cut
392 sub search_notes { return }
394 =head2 get_seq_stream
396 Title : get_seq_stream
397 Usage : $seqio = $db->get_seq_stream(@args)
398 Function: Performs a query and returns an iterator over it
399 Returns : a Bio::SeqIO stream capable of returning Bio::SeqFeatureI objects
400 Args : As in features()
401 Status : public
403 This routine takes the same arguments as features(), but returns a
404 Bio::SeqIO::Stream-compliant object. Use it like this:
406 $stream = $db->get_seq_stream('exon');
407 while (my $exon = $stream->next_seq) {
408 print $exon,"\n";
411 NOTE: In the interface this method is aliased to get_feature_stream(),
412 as the name is more descriptive.
414 =cut
416 sub get_seq_stream { shift->throw_not_implemented }
417 sub get_feature_stream {shift->get_seq_stream(@_) }
419 =head2 refclass
421 Title : refclass
422 Usage : $class = $db->refclass
423 Function: returns the default class to use for segment() calls
424 Returns : a string
425 Args : none
426 Status : public
428 For data sources which use namespaces to distinguish reference
429 sequence accessions, this returns the default namespace (or "class")
430 to use. This interface defines a default of "Accession".
432 =cut
434 sub refclass { "Accession" }