sync with trunk to r15684
[bioperl-live.git] / Bio / DB / SeqFeature / Store / FeatureFileLoader.pm
blob999cb6b0a161721e64ee7ff2a1fc1db120d3b6a5
1 package Bio::DB::SeqFeature::Store::FeatureFileLoader;
3 # $Id$
5 =head1 NAME
7 Bio::DB::SeqFeature::Store::FeatureFileLoader -- feature file loader for Bio::DB::SeqFeature::Store
9 =head1 SYNOPSIS
11 use Bio::DB::SeqFeature::Store;
12 use Bio::DB::SeqFeature::Store::FeatureFileLoader;
14 # Open the sequence database
15 my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql',
16 -dsn => 'dbi:mysql:test',
17 -write => 1 );
19 my $loader =
20 Bio::DB::SeqFeature::Store::FeatureFileLoader->new(-store => $db,
21 -verbose => 1,
22 -fast => 1);
24 $loader->load('./my_genome.fff');
27 =head1 DESCRIPTION
29 The Bio::DB::SeqFeature::Store::FeatureFileLoader object parsers
30 FeatureFile-format sequence annotation files and loads
31 Bio::DB::SeqFeature::Store databases. For certain combinations of
32 SeqFeature classes and SeqFeature::Store databases it features a "fast
33 load" mode which will greatly accelerate the loading of databases by a
34 factor of 5-10.
36 FeatureFile Format (.fff) is very simple:
38 mRNA B0511.1 Chr1:1..100 Type=UTR;Note="putative primase"
39 mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS
40 mRNA B0511.1 Chr1:801..1000 Type=UTR
42 reference = Chr3
43 Cosmid B0511 516..619
44 Cosmid B0511 3185..3294
45 Cosmid B0511 10946..11208
46 Cosmid B0511 13126..13511
47 Cosmid B0511 11394..11539
48 EST yk260e10.5 15569..15724
49 EST yk672a12.5 537..618,3187..3294
50 EST yk595e6.5 552..618
51 EST yk595e6.5 3187..3294
52 EST yk846e07.3 11015..11208
53 EST yk53c10
54 yk53c10.3 15000..15500,15700..15800
55 yk53c10.5 18892..19154
56 EST yk53c10.5 16032..16105
57 SwissProt PECANEX 13153-13656 Note="Swedish fish"
58 FGENESH "Predicted gene 1" 1-205,518-616,661-735,3187-3365,3436-3846 "Pfam domain"
59 # file ends
61 There are up to four columns of WHITESPACE (not necessarily tab)
62 delimited text. Embedded whitespace must be escaped using shell
63 escaping rules (quoting the column or backslashing whitespace).
65 Column 1: The feature type. You may use type:subtype as a convention
66 for method:source.
68 Column 2: The feature name/ID.
70 Column 3: The position of this feature in base pair
71 coordinates. Ranges can be given as either
72 start-end or start..end. A chromosome position
73 can be specified using the format "reference:start..end".
74 A discontinuous feature can be specified by giving
75 multiple ranges separated by commas. Minus-strand features
76 are indicated by specifying a start > end.
78 Column 4: Comment/attribute field. A single Note can be given, or
79 a series of attribute=value pairs, separated by
80 spaces or semicolons, as in "score=23;type=transmembrane"
82 =head2 Specifying Positions and Ranges
84 A feature position is specified using a sequence ID (a genbank
85 accession number, a chromosome name, a contig, or any other meaningful
86 reference system, followed by a colon and a position range. Ranges are
87 two integers separated by double dots or the hyphen. Examples:
88 "Chr1:516..11208", "ctgA:1-5000". Negative coordinates are allowed, as
89 in "Chr1:-187..1000".
91 A discontinuous range ("split location") uses commas to separate the
92 ranges. For example:
94 Gene B0511.1 Chr1:516..619,3185..3294,10946..11208
96 In the case of a split location, the sequence id only has to appear in
97 front of the first range.
99 Alternatively, a split location can be indicated by repeating the
100 features type and name on multiple adjacent lines:
102 Gene B0511.1 Chr1:516..619
103 Gene B0511.1 Chr1:3185..3294
104 Gene B0511.1 Chr1:10946..11208
106 If all the locations are on the same reference sequence, you can
107 specify a default chromosome using a "reference=<seqid>":
109 reference=Chr1
110 Gene B0511.1 516..619
111 Gene B0511.1 3185..3294
112 Gene B0511.1 10946..11208
114 The default seqid is in effect until the next "reference" line
115 appears.
117 =head2 Feature Tags
119 Tags can be added to features by adding a fourth column consisting of
120 "tag=value" pairs:
122 Gene B0511.1 Chr1:516..619,3185..3294 Note="Putative primase"
124 Tags and their values take any form you want, and multiple tags can be
125 separated by semicolons. You can also repeat tags multiple times:
127 Gene B0511.1 Chr1:516..619,3185..3294 GO_Term=GO:100;GO_Term=GO:2087
129 Several tags have special meanings:
131 Tag Meaning
132 --- -------
134 Type The primary tag for a subfeature.
135 Score The score of a feature or subfeature.
136 Phase The phase of a feature or subfeature.
137 URL A URL to link to (via the Bio::Graphics library).
138 Note A note to attach to the feature for display by the Bio::Graphics library.
140 For example, in the common case of an mRNA, you can use the "Type" tag
141 to distinguish the parts of the mRNA into UTR and CDS:
143 mRNA B0511.1 Chr1:1..100 Type=UTR
144 mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS
145 mRNA B0511.1 Chr1:801..1000 Type=UTR
147 The top level feature's primary tag will be "mRNA", and its subparts
148 will have types UTR and CDS as indicated. Additional tags that are
149 placed in the first line of the feature will be applied to the top
150 level. In this example, the note "Putative primase" will be applied to
151 the mRNA at the top level of the feature:
153 mRNA B0511.1 Chr1:1..100 Type=UTR;Note="Putative primase"
154 mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS
155 mRNA B0511.1 Chr1:801..1000 Type=UTR
157 =head2 Feature Groups
159 Features can be grouped so that they are rendered by the "group"
160 glyph. To start a group, create a two-column feature entry showing
161 the group type and a name for the group. Follow this with a list of
162 feature entries with a blank type. For example:
164 EST yk53c10
165 yk53c10.3 15000-15500,15700-15800
166 yk53c10.5 18892-19154
168 This example is declaring that the ESTs named yk53c10.3 and yk53c10.5
169 belong to the same group named yk53c10.
171 =head2 Comments and the #include Directive
173 Lines that begin with the # sign are treated as comments and
174 ignored. When a # sign appears within a line, everything to the right
175 of the symbol is also ignored, unless it looks like an HTML fragment or
176 an HTML color, e.g.:
178 # this is ignored
179 [Example]
180 glyph = generic # this comment is ignored
181 bgcolor = #FF0000
182 link = http://www.google.com/search?q=$name#results
184 Be careful, because the processing of # signs uses a regexp heuristic. To be safe,
185 always put a space after the # sign to make sure it is treated as a comment.
187 The special comment "#include 'filename'" acts like the C preprocessor
188 directive and will insert the comments of a named file into the
189 position at which it occurs. Relative paths will be treated relative
190 to the file in which the #include occurs. Nested #include directives
191 are allowed:
193 #include "/usr/local/share/my_directives.txt"
194 #include 'my_directives.txt'
195 #include chromosome3_features.gff3
197 You can enclose the file path in single or double quotes as shown
198 above. If there are no spaces in the filename the quotes are optional.
200 Include file processing is not very smart. Avoid creating circular
201 #include references. You have been warned!
203 =head2 Caveats
205 Note that this loader always creates denormalized features such that
206 subfeatures and their parents are stored as one big database
207 object. The GFF3 format and its loader is usually preferred for both
208 space and execution efficiency.
210 =head1 METHODS
212 =cut
215 use strict;
216 use Carp 'croak';
217 use File::Spec;
218 use Text::ParseWords 'shellwords','quotewords';
220 use base 'Bio::DB::SeqFeature::Store::Loader';
222 =head2 new
224 Title : new
225 Usage : $loader = Bio::DB::SeqFeature::Store::FeatureFileLoader->new(@options)
226 Function: create a new parser
227 Returns : a Bio::DB::SeqFeature::Store::FeatureFileLoader parser and loader
228 Args : several - see below
229 Status : public
231 This method creates a new FeatureFile loader and establishes its connection
232 with a Bio::DB::SeqFeature::Store database. Arguments are -name=E<gt>$value
233 pairs as described in this table:
235 Name Value
236 ---- -----
238 -store A writeable Bio::DB::SeqFeature::Store database handle.
240 -seqfeature_class The name of the type of Bio::SeqFeatureI object to create
241 and store in the database (Bio::DB::SeqFeature by default)
243 -sf_class A shorter alias for -seqfeature_class
245 -verbose Send progress information to standard error.
247 -fast If true, activate fast loading (see below)
249 -chunk_size Set the storage chunk size for nucleotide/protein sequences
250 (default 2000 bytes)
252 -tmp Indicate a temporary directory to use when loading non-normalized
253 features.
255 When you call new(), a connection to a Bio::DB::SeqFeature::Store
256 database should already have been established and the database
257 initialized (if appropriate).
259 Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store
260 databases support a fast loading mode. Currently the only reliable
261 implementation of fast loading is the combination of DBI::mysql with
262 Bio::DB::SeqFeature. The other important restriction on fast loading
263 is the requirement that a feature that contains subfeatures must occur
264 in the FeatureFile file before any of its subfeatures. Otherwise the
265 subfeatures that occurred before the parent feature will not be
266 attached to the parent correctly. This restriction does not apply to
267 normal (slow) loading.
269 If you use an unnormalized feature class, such as
270 Bio::SeqFeature::Generic, then the loader needs to create a temporary
271 database in which to cache features until all their parts and subparts
272 have been seen. This temporary databases uses the "bdb" adaptor. The
273 -tmp option specifies the directory in which that database will be
274 created. If not present, it defaults to the system default tmp
275 directory specified by File::Spec-E<gt>tmpdir().
277 The -chunk_size option allows you to tune the representation of
278 DNA/Protein sequence in the Store database. By default, sequences are
279 split into 2000 base/residue chunks and then reassembled as
280 needed. This avoids the problem of pulling a whole chromosome into
281 memory in order to fetch a short subsequence from somewhere in the
282 middle. Depending on your usage patterns, you may wish to tune this
283 parameter using a chunk size that is larger or smaller than the
284 default.
286 =cut
288 # sub new {} inherited
290 =head2 load
292 Title : load
293 Usage : $count = $loader->load(@ARGV)
294 Function: load the indicated files or filehandles
295 Returns : number of feature lines loaded
296 Args : list of files or filehandles
297 Status : public
299 Once the loader is created, invoke its load() method with a list of
300 FeatureFile or FASTA file paths or previously-opened filehandles in order to
301 load them into the database. Compressed files ending with .gz, .Z and
302 .bz2 are automatically recognized and uncompressed on the fly. Paths
303 beginning with http: or ftp: are treated as URLs and opened using the
304 LWP GET program (which must be on your path).
306 FASTA files are recognized by their initial "E<gt>" character. Do not feed
307 the loader a file that is neither FeatureFile nor FASTA; I don't know what
308 will happen, but it will probably not be what you expect.
310 =cut
312 # sub load {} inherited
314 =head2 accessors
316 The following read-only accessors return values passed or created during new():
318 store() the long-term Bio::DB::SeqFeature::Store object
320 tmp_store() the temporary Bio::DB::SeqFeature::Store object used
321 during loading
323 sfclass() the Bio::SeqFeatureI class
325 fast() whether fast loading is active
327 seq_chunk_size() the sequence chunk size
329 verbose() verbose progress messages
331 =cut
333 # sub store {} inherited
334 # sub tmp_store {} inherited
335 # sub sfclass {} inherited
336 # sub fast {} inherited
337 # sub seq_chunk_size {} inherited
338 # sub verbose {} inherited
340 =head2 default_seqfeature_class
342 $class = $loader->default_seqfeature_class
344 Return the default SeqFeatureI class (Bio::Graphics::Feature).
346 =cut
348 sub default_seqfeature_class { #override
349 my $self = shift;
350 return 'Bio::Graphics::Feature';
354 =head2 load_fh
356 $count = $loader->load_fh($filehandle)
358 Load the FeatureFile data at the other end of the filehandle and return true
359 if successful. Internally, load_fh() invokes:
361 start_load();
362 do_load($filehandle);
363 finish_load();
365 =cut
367 # sub load_fh { } inherited
369 =head2 start_load, finish_load
371 These methods are called at the start and end of a filehandle load.
373 =cut
375 sub create_load_data {
376 my $self = shift;
377 $self->SUPER::create_load_data();
378 $self->{load_data}{mode} = 'fff';
379 $self->{load_data}{CurrentGroup} = undef;
382 sub finish_load {
383 my $self = shift;
384 $self->_store_group;
385 $self->SUPER::finish_load;
388 =head2 load_line
390 $loader->load_line($data);
392 Load a line of a FeatureFile file. You must bracket this with calls to
393 start_load() and finish_load()!
395 $loader->start_load();
396 $loader->load_line($_) while <FH>;
397 $loader->finish_load();
399 =cut
401 sub load_line {
402 my $self = shift;
403 my $line = shift;
405 chomp($line);
406 return unless $line =~ /\S/; # blank line
407 my $load_data = $self->{load_data};
409 $load_data->{mode} = 'fff' if /\s/; # if it has any whitespace in
410 # it, then back to fff mode
412 if ($line =~ /^\#\s?\#\s*([\#]+)/) { ## meta instruction
413 $load_data->{mode} = 'fff';
414 $self->handle_meta($1);
416 } elsif ($line =~ /^\#/) {
417 $load_data->{mode} = 'fff'; # just to be safe
418 return; # comment
421 elsif ($line =~ /^>\s*(\S+)/) { # FASTA lines are coming
422 $load_data->{mode} = 'fasta';
423 $self->start_or_finish_sequence($1);
426 elsif ($load_data->{mode} eq 'fasta') {
427 $self->load_sequence($line);
430 elsif ($load_data->{mode} eq 'fff') {
431 $self->handle_feature($line);
432 if (++$load_data->{count} % 1000 == 0) {
433 my $now = $self->time();
434 my $nl = -t STDOUT && !$ENV{EMACS} ? "\r" : "\n";
435 $self->msg(sprintf("%d features loaded in %5.2fs...$nl",
436 $load_data->{count},$now - $load_data->{start_time}));
437 $load_data->{start_time} = $now;
441 else {
442 $self->throw("I don't know what to do with this line:\n$line");
447 =head2 handle_meta
449 $loader->handle_meta($meta_directive)
451 This method is called to handle meta-directives such as
452 ##sequence-region. The method will receive the directive with the
453 initial ## stripped off.
455 =cut
457 # sub handle_meta { } inherited
459 =head2 handle_feature
461 $loader->handle_feature($gff3_line)
463 This method is called to process a single FeatureFile line. It manipulates
464 information stored a data structure called $self-E<gt>{load_data}.
466 =cut
468 sub handle_feature {
469 my $self = shift;
470 local $_ = shift;
472 my $ld = $self->{load_data};
474 # handle reference line
475 if (/^reference\s*=\s*(.+)/) {
476 $ld->{reference} = $1;
477 return;
480 # parse data lines
481 my @tokens = quotewords('\s+',1,$_);
482 for (0..2) { # remove quotes from everything but last column
483 next unless defined $tokens[$_];
484 $tokens[$_] =~ s/^"//;
485 $tokens[$_] =~ s/"$//;
488 if (@tokens < 3) { # short line; assume a group identifier
489 $self->store_current_feature();
490 my $type = shift @tokens;
491 my $name = shift @tokens;
492 $ld->{CurrentGroup} = $self->_make_indexed_feature($name,$type,'',{_ff_group=>1});
493 $self->_indexit($name => 1);
494 return;
497 my($type,$name,$strand,$bounds,$attributes);
499 if ($tokens[2] =~ /^([+-.]|[+-]?[01])$/) { # old version
500 ($type,$name,$strand,$bounds,$attributes) = @tokens;
501 } else { # new version
502 ($type,$name,$bounds,$attributes) = @tokens;
505 # handle case of there only being one value in the last column,
506 # in which case we treat it the same as Note="value"
507 my $attr = $self->parse_attributes($attributes);
509 # @parts is an array of ([ref,start,end],[ref,start,end],...)
510 my @parts = map { [/(?:(\w+):)?(-?\d+)(?:-|\.\.)(-?\d+)/]} split /(?:,| )\s*/,$bounds;
512 # deal with groups -- a group is ending if $type is defined
513 # and CurrentGroup is set
514 if ($type && $ld->{CurrentGroup}) {
515 $self->_store_group();
518 $type = '' unless defined $type;
519 $name = '' unless defined $name;
520 $type ||= $ld->{CurrentGroup}->primary_tag if $ld->{CurrentGroup};
522 my $reference = $ld->{reference} || 'ChrUN';
523 foreach (@parts) {
524 if (defined $_ && ref($_) eq 'ARRAY'
525 && defined $_->[1]
526 && defined $_->[2])
528 $strand ||= $_->[1] <= $_->[2] ? '+' : '-';
529 ($_->[1],$_->[2]) = ($_->[2],$_->[1]) if $_->[1] > $_->[2];
531 $reference = $_->[0] if defined $_->[0];
532 $_ = [@{$_}[1,2]]; # strip off the reference.
535 # now @parts is an array of [start,end] and $reference contains the seqid
537 # apply coordinate mapper
538 if ($self->{coordinate_mapper} && $reference) {
539 my @remapped = $self->{coordinate_mapper}->($reference,@parts);
540 ($reference,@parts) = @remapped if @remapped;
543 # either create a new feature or add a segment to it
544 my $feature = $ld->{CurrentFeature};
546 $ld->{OldPartType} = $ld->{PartType};
547 $ld->{PartType} = $attr->{Type}[0] if exists $attr->{Type};
548 $ld->{PartType} = $attr->{type}[0] if exists $attr->{type};
549 $ld->{PartType} ||= $type;
551 if ($feature) {
552 local $^W = 0; # avoid uninit warning when display_name() is called
554 # if this is a different feature from what we have now, then we
555 # store the current one, and create a new one
556 if ($feature->display_name ne $name ||
557 $feature->method ne $type) {
558 $self->store_current_feature; # new feature, store old one
559 undef $feature;
560 } else { # create a new multipart feature
561 $self->_multilevel_feature($feature,$ld->{OldPartType})
562 unless $feature->get_SeqFeatures;
563 my $part = $self->_make_feature($name,
564 $ld->{PartType},
565 $strand,
566 $attr,
567 $reference,
568 @{$parts[0]});
569 $feature->add_SeqFeature($part);
573 $feature ||= $self->_make_indexed_feature($name,
574 $type, # side effect is to set CurrentFeature
575 $strand,
576 $attr,
577 $reference,
578 @{$parts[0]});
580 # add more segments to the current feature
581 if (@parts > 1) {
582 for my $part (@parts) {
583 $type ||= $feature->primary_tag;
584 my $sp = $self->_make_feature($name,
585 $ld->{PartType},
586 $strand,
587 $attr,
588 $reference,
589 @{$part});
590 $feature->add_SeqFeature($sp);
595 sub _multilevel_feature { # turn a single-level feature into a multilevel one
596 my $self = shift;
597 my $f = shift;
598 my $type = shift;
599 my %attributes = $f->attributes;
600 $attributes{Score} = [$f->score] if defined $f->score;
601 $attributes{Phase} = [$f->phase] if defined $f->phase;
602 my @args = ($f->display_name,
603 $type||$f->type,
604 $f->strand,
605 \%attributes,
606 $f->seq_id,
607 $f->start,
608 $f->end);
609 my $subpart = $self->_make_feature(@args);
610 $f->add_SeqFeature($subpart);
613 sub _make_indexed_feature {
614 my $self = shift;
615 my $f = $self->_make_feature(@_);
616 my $name = $f->display_name;
617 $self->{load_data}{CurrentFeature} = $f;
618 $self->{load_data}{CurrentID} = $name;
619 $self->_indexit($name => 1);
620 return $f;
623 sub _make_feature {
624 my $self = shift;
625 my ($name,$type,$strand,$attributes,$ref,$start,$end) = @_;
627 # some basic error checking
628 $self->throw("invalid feature line: $_")
629 if ($ref && !defined $start)
630 or ($ref && !defined $end)
631 or ($start && $start !~ /^[-\d]+$/)
632 or ($end && $end !~ /^[-\d]+$/)
633 or !defined $type
634 or !defined $name;
636 $strand ||= '';
638 my @args = (-name => $name,
639 -strand => $strand eq '+' ? 1
640 :$strand eq '-' ? -1
641 :$strand eq '' ? 0
642 :$strand eq '.' ? 0
643 :$strand == 1 ? 1
644 :$strand == -1 ? -1
646 -attributes => $attributes,
649 if (my ($method,$source) = $type =~ /(\S+):(\S+)/) {
650 push @args,(-primary_tag => $method,
651 -source => $source);
652 } else {
653 push @args,(-primary_tag => $type);
656 push @args,(-seq_id => $ref) if defined $ref;
657 push @args,(-start => $start) if defined $start;
658 push @args,(-end => $end) if defined $end;
660 # pull out special attributes
661 if (my $score = $attributes->{Score} || $attributes->{score}) {
662 push @args,(-score => $score->[0]);
663 delete $attributes->{$_} foreach qw(Score score);
666 if (my $note = $attributes->{Note} || $attributes->{note}) {
667 push @args,(-desc => join '; ',@$note);
668 delete $attributes->{$_} foreach qw(Note note);
671 if (my $url = $attributes->{url} || $attributes->{Url}) {
672 push @args,(-url => $url->[0]);
673 delete $attributes->{$_} foreach qw (Url url);
676 if (my $phase = $attributes->{phase} || $attributes->{Phase}) {
677 push @args,(-phase => $phase->[0]);
678 delete $attributes->{$_} foreach qw (Phase phase);
681 $self->_indexit($name=>1)
682 if $self->index_subfeatures && $name;
684 return $self->sfclass->new(@args);
687 =head2 store_current_feature
689 $loader->store_current_feature()
691 This method is called to store the currently active feature in the
692 database. It uses a data structure stored in $self-E<gt>{load_data}.
694 =cut
696 sub store_current_feature { # overridden
697 my $self = shift;
699 # handle open groups
700 # if there is an open group, then we simply add the current
701 # feature to the group.
702 my $ld = $self->{load_data};
703 if ($ld->{CurrentGroup} && $ld->{CurrentFeature}) {
704 $ld->{CurrentGroup}->add_SeqFeature($ld->{CurrentFeature})
705 unless $ld->{CurrentGroup} eq $ld->{CurrentFeature}; # paranoia - shouldn't happen
706 return;
708 else {
709 $self->SUPER::store_current_feature();
713 sub _store_group {
714 my $self = shift;
715 my $ld = $self->{load_data};
716 my $group = $ld->{CurrentGroup} or return;
717 # if there is an unattached feature, then add it
718 $self->store_current_feature() if $ld->{CurrentFeature};
719 $ld->{CurrentFeature} = $group;
720 $ld->{CurrentID} = $group->display_name;
721 $self->_indexit($ld->{CurrentID} => 1);
722 undef $ld->{CurrentGroup};
723 $self->store_current_feature();
726 =head2 build_object_tree
728 $loader->build_object_tree()
730 This method gathers together features and subfeatures and builds the
731 graph that connects them.
733 =cut
736 # put objects together
738 sub build_object_tree {
739 croak "We shouldn't be building an object tree in the FeatureFileLoader";
742 =head2 build_object_tree_in_tables
744 $loader->build_object_tree_in_tables()
746 This method gathers together features and subfeatures and builds the
747 graph that connects them, assuming that parent/child relationships
748 will be stored in a database table.
750 =cut
752 sub build_object_tree_in_tables {
753 croak "We shouldn't be building an object tree in the FeatureFileLoader";
756 =head2 build_object_tree_in_features
758 $loader->build_object_tree_in_features()
760 This method gathers together features and subfeatures and builds the
761 graph that connects them, assuming that parent/child relationships are
762 stored in the seqfeature objects themselves.
764 =cut
766 sub build_object_tree_in_features {
767 croak "We shouldn't be building an object tree in the FeatureFileLoader";
770 =head2 attach_children
772 $loader->attach_children($store,$load_data,$load_id,$feature)
774 This recursively adds children to features and their subfeatures. It
775 is called when subfeatures are directly contained within other
776 features, rather than stored in a relational table.
778 =cut
780 sub attach_children {
781 croak "We shouldn't be attaching children in the
782 FeatureFileLoader!";
785 =head2 parse_attributes
787 @attributes = $loader->parse_attributes($attribute_line)
789 This method parses the information contained in the $attribute_line
790 into a flattened hash (array). It may return one element, in which case it is
791 an implicit
793 =cut
795 sub parse_attributes {
796 my $self = shift;
797 my $att = shift;
799 $att ||= ''; # to prevent uninit variable warnings from quotewords()
801 my @pairs = quotewords('[;\s]',1,$att);
802 my %attributes;
803 for my $pair (@pairs) {
804 unless ($pair =~ /=/) {
805 push @{$attributes{Note}},(quotewords('',0,$pair))[0] || $pair;
806 } else {
807 my ($tag,$value) = quotewords('\s*=\s*',0,$pair);
808 $tag = 'Note' if $tag eq 'description';
809 push @{$attributes{$tag}},$value;
812 return \%attributes;
815 =head2 start_or_finish_sequence
817 $loader->start_or_finish_sequence('Chr9')
819 This method is called at the beginning and end of a fasta section.
821 =cut
826 __END__
829 =head1 BUGS
831 This is an early version, so there are certainly some bugs. Please
832 use the BioPerl bug tracking system to report bugs.
834 =head1 SEE ALSO
836 L<bioperl>,
837 L<Bio::DB::SeqFeature::Store>,
838 L<Bio::DB::SeqFeature::Segment>,
839 L<Bio::DB::SeqFeature::NormalizedFeature>,
840 L<Bio::DB::SeqFeature::GFF3Loader>,
841 L<Bio::DB::SeqFeature::Store::DBI::mysql>,
842 L<Bio::DB::SeqFeature::Store::bdb>
844 =head1 AUTHOR
846 Lincoln Stein E<lt>lstein@cshl.orgE<gt>.
848 Copyright (c) 2006 Cold Spring Harbor Laboratory.
850 This library is free software; you can redistribute it and/or modify
851 it under the same terms as Perl itself.
853 =cut