1 package Bio
::DB
::SeqFeature
::Store
::GFF3Loader
;
6 Bio::DB::SeqFeature::Store::GFF3Loader -- GFF3 file loader for Bio::DB::SeqFeature::Store
10 use Bio::DB::SeqFeature::Store;
11 use Bio::DB::SeqFeature::Store::GFF3Loader;
13 # Open the sequence database
14 my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql',
15 -dsn => 'dbi:mysql:test',
18 my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db,
22 $loader->load('./my_genome.gff3');
27 The Bio::DB::SeqFeature::Store::GFF3Loader object parsers GFF3-format
28 sequence annotation files and loads Bio::DB::SeqFeature::Store
29 databases. For certain combinations of SeqFeature classes and
30 SeqFeature::Store databases it features a "fast load" mode which will
31 greatly accelerate the loading of GFF3 databases by a factor of 5-10.
33 The GFF3 file format has been extended very slightly to accommodate
34 Bio::DB::SeqFeature::Store. First, the loader recognizes is a new
37 # #index-subfeatures [0|1]
39 Note that you can place a space between the two #'s in order to
40 prevent GFF3 validators from complaining.
42 If this is true, then subfeatures are indexed (the default) so that
43 they can be retrieved with a query. See L<Bio::DB::SeqFeature::Store>
44 for an explanation of this. If false, then subfeatures can only be
45 accessed through their parent feature.
47 Second, the loader recognizes a new attribute tag called index, which
48 if present, controls indexing of the current feature. Example:
50 ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;index=1
52 You can use this to turn indexing on and off, overriding the default
53 for a particular feature.
55 Note that the loader keeps a record -- in memory -- of each feature
56 that it has processed. If you find the loader running out of memory on
57 particularly large GFF3 files, please split the input file into
58 smaller pieces and do the load in steps.
63 # load utility - incrementally load the store based on GFF3 file
66 # slow mode -- features can occur in any order in the GFF3 file
67 # fast mode -- all features with same ID must be contiguous in GFF3 file
71 use Bio
::DB
::GFF
::Util
::Rearrange
;
72 use Bio
::DB
::SeqFeature
::Store
::LoadHelper
;
73 use constant DEBUG
=> 0;
75 use base
'Bio::DB::SeqFeature::Store::Loader';
78 my %Special_attributes =(
79 Gap
=> 1, Target
=> 1,
80 Parent
=> 1, Name
=> 1,
82 index => 1, Index
=> 1,
84 my %Strandedness = ( '+' => 1,
98 Usage : $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(@options)
99 Function: create a new parser
100 Returns : a Bio::DB::SeqFeature::Store::GFF3Loader gff3 parser and loader
101 Args : several - see below
104 This method creates a new GFF3 loader and establishes its connection
105 with a Bio::DB::SeqFeature::Store database. Arguments are -name=E<gt>$value
106 pairs as described in this table:
111 -store A writable Bio::DB::SeqFeature::Store database handle.
113 -seqfeature_class The name of the type of Bio::SeqFeatureI object to create
114 and store in the database (Bio::DB::SeqFeature by default)
116 -sf_class A shorter alias for -seqfeature_class
118 -verbose Send progress information to standard error.
120 -fast If true, activate fast loading (see below)
122 -chunk_size Set the storage chunk size for nucleotide/protein sequences
125 -tmp Indicate a temporary directory to use when loading non-normalized
128 -ignore_seqregion Ignore ##sequence-region directives. The default is to create a
129 feature corresponding to the directive.
131 -noalias_target Don't create an Alias attribute for a target_id named in a
132 Target attribute. The default is to create an Alias
133 attribute containing the target_id found in a Target
136 When you call new(), a connection to a Bio::DB::SeqFeature::Store
137 database should already have been established and the database
138 initialized (if appropriate).
140 Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store
141 databases support a fast loading mode. Currently the only reliable
142 implementation of fast loading is the combination of DBI::mysql with
143 Bio::DB::SeqFeature. The other important restriction on fast loading
144 is the requirement that a feature that contains subfeatures must occur
145 in the GFF3 file before any of its subfeatures. Otherwise the
146 subfeatures that occurred before the parent feature will not be
147 attached to the parent correctly. This restriction does not apply to
148 normal (slow) loading.
150 If you use an unnormalized feature class, such as
151 Bio::SeqFeature::Generic, then the loader needs to create a temporary
152 database in which to cache features until all their parts and subparts
153 have been seen. This temporary databases uses the "berkeleydb"
154 adaptor. The -tmp option specifies the directory in which that
155 database will be created. If not present, it defaults to the system
156 default tmp directory specified by File::Spec-E<gt>tmpdir().
158 The -chunk_size option allows you to tune the representation of
159 DNA/Protein sequence in the Store database. By default, sequences are
160 split into 2000 base/residue chunks and then reassembled as
161 needed. This avoids the problem of pulling a whole chromosome into
162 memory in order to fetch a short subsequence from somewhere in the
163 middle. Depending on your usage patterns, you may wish to tune this
164 parameter using a chunk size that is larger or smaller than the
171 my $self = $class->SUPER::new
(@_);
172 my ($ignore_seqregion) = rearrange
(['IGNORE_SEQREGION'],@_);
173 $self->ignore_seqregion($ignore_seqregion);
174 my ($noalias_target) = rearrange
(['NOALIAS_TARGET'],@_);
175 $self->noalias_target($noalias_target);
179 =head2 ignore_seqregion
181 $ignore_it = $loader->ignore_seqregion([$new_flag])
183 Get or set the ignore_seqregion flag, which if true, will cause
184 GFF3 ##sequence-region directives to be ignored. The default behavior
185 is to create a feature corresponding to the region.
189 sub ignore_seqregion
{
191 my $d = $self->{ignore_seqregion
};
192 $self->{ignore_seqregion
} = shift if @_;
196 =head2 noalias_target
198 $noalias_target = $loader->noalias_target([$new_flag])
200 Get or set the noalias_target flag, which if true, will disable the creation of
201 an Alias attribute for a target_id named in a Target attribute. The default is
202 to create an Alias attribute containing the target_id found in a Target
209 my $d = $self->{noalias_target
};
210 $self->{noalias_target
} = shift if @_;
217 Usage : $count = $loader->load(@ARGV)
218 Function: load the indicated files or filehandles
219 Returns : number of feature lines loaded
220 Args : list of files or filehandles
223 Once the loader is created, invoke its load() method with a list of
224 GFF3 or FASTA file paths or previously-opened filehandles in order to
225 load them into the database. Compressed files ending with .gz, .Z and
226 .bz2 are automatically recognized and uncompressed on the fly. Paths
227 beginning with http: or ftp: are treated as URLs and opened using the
228 LWP GET program (which must be on your path).
230 FASTA files are recognized by their initial "E<gt>" character. Do not feed
231 the loader a file that is neither GFF3 nor FASTA; I don't know what
232 will happen, but it will probably not be what you expect.
236 # sub load { } inherited
240 The following read-only accessors return values passed or created during new():
242 store() the long-term Bio::DB::SeqFeature::Store object
244 tmp_store() the temporary Bio::DB::SeqFeature::Store object used
247 sfclass() the Bio::SeqFeatureI class
249 fast() whether fast loading is active
251 seq_chunk_size() the sequence chunk size
253 verbose() verbose progress messages
257 # sub store inherited
258 # sub tmp_store inherited
259 # sub sfclass inherited
261 # sub seq_chunk_size inherited
262 # sub verbose inherited
264 =head2 Internal Methods
266 The following methods are used internally and may be overidden by
271 =item default_seqfeature_class
273 $class = $loader->default_seqfeature_class
275 Return the default SeqFeatureI class (Bio::DB::SeqFeature).
279 # sub default_seqfeature_class { } inherited
281 =item subfeatures_normalized
283 $flag = $loader->subfeatures_normalized([$new_flag])
285 Get or set a flag that indicates that the subfeatures are
286 normalized. This is deduced from the SeqFeature class information.
290 # sub subfeatures_normalized { } inherited
292 =item subfeatures_in_table
294 $flag = $loader->subfeatures_in_table([$new_flag])
296 Get or set a flag that indicates that feature/subfeature relationships
297 are stored in a table. This is deduced from the SeqFeature class and
302 # sub subfeatures_in_table { } inherited
306 $count = $loader->load_fh($filehandle)
308 Load the GFF3 data at the other end of the filehandle and return true
309 if successful. Internally, load_fh() invokes:
312 do_load($filehandle);
317 # sub load_fh { } inherited
319 =item start_load, finish_load
321 These methods are called at the start and end of a filehandle load.
325 sub create_load_data
{ #overridden
327 $self->SUPER::create_load_data
;
328 $self->{load_data
}{TemporaryID
} = "GFFLoad0000000";
329 $self->{load_data
}{IndexSubfeatures
} = $self->index_subfeatures();
330 $self->{load_data
}{mode
} = 'gff';
332 $self->{load_data
}{Helper
} =
333 Bio
::DB
::SeqFeature
::Store
::LoadHelper
->new($self->{tmpdir
});
336 sub finish_load
{ #overridden
339 $self->store_current_feature(); # during fast loading, we will have a feature left at the very end
340 $self->start_or_finish_sequence(); # finish any half-loaded sequences
342 $self->msg("Building object tree...");
343 my $start = $self->time();
344 $self->build_object_tree;
345 $self->msg(sprintf "%5.2fs\n",$self->time()-$start);
348 $self->msg("Loading bulk data into database...");
349 $start = $self->time();
350 $self->store->finish_bulk_update;
351 $self->msg(sprintf "%5.2fs\n",$self->time()-$start);
353 eval {$self->store->commit};
355 # don't delete load data so that caller can ask for the loaded IDs
356 # $self->delete_load_data;
361 $count = $loader->do_load($fh)
363 This is called by load_fh() to load the GFF3 file's filehandle and
364 return the number of lines loaded.
368 # sub do_load { } inherited
372 $loader->load_line($data);
374 Load a line of a GFF3 file. You must bracket this with calls to
375 start_load() and finish_load()!
377 $loader->start_load();
378 $loader->load_line($_) while <FH>;
379 $loader->finish_load();
383 sub load_line
{ #overridden
388 my $load_data = $self->{load_data
};
389 $load_data->{line
}++;
391 return unless $line =~ /^\S/; # blank line
393 # if it has a tab in it or looks like a chrom.sizes file, switch to gff mode
394 $load_data->{mode
} = 'gff' if $line =~ /\t/
395 or $line =~ /^\w+\s+\d+\s*$/;
397 if ($line =~ /^\#\s?\#\s*(.+)/) { ## meta instruction
398 $load_data->{mode
} = 'gff';
399 $self->handle_meta($1);
401 } elsif ($line =~ /^\#/) {
402 $load_data->{mode
} = 'gff'; # just to be safe
406 elsif ($line =~ /^>\s*(\S+)/) { # FASTA lines are coming
407 $load_data->{mode
} = 'fasta';
408 $self->start_or_finish_sequence($1);
411 elsif ($load_data->{mode
} eq 'fasta') {
412 $self->load_sequence($line);
415 elsif ($load_data->{mode
} eq 'gff') {
416 $self->handle_feature($line);
417 if (++$load_data->{count
} % 1000 == 0) {
418 my $now = $self->time();
419 my $nl = -t STDOUT
&& !$ENV{EMACS
} ?
"\r" : "\n";
420 local $^W
= 0; # kill uninit variable warning
421 $self->msg(sprintf("%d features loaded in %5.2fs (%5.2fs/1000 features)...%s$nl",
422 $load_data->{count
},$now - $load_data->{start_time
},
423 $now - $load_data->{millenium_time
},
426 $load_data->{millenium_time
} = $now;
431 $self->throw("I don't know what to do with this line:\n$line");
438 $loader->handle_meta($meta_directive)
440 This method is called to handle meta-directives such as
441 ##sequence-region. The method will receive the directive with the
442 initial ## stripped off.
448 my $instruction = shift;
450 if ( $instruction =~ /^#$/ ) {
451 $self->store_current_feature() ; # during fast loading, we will have a feature left at the very end
452 $self->start_or_finish_sequence(); # finish any half-loaded sequences
453 if ( $self->store->can('handle_resolution_meta') ) {
454 $self->store->handle_resolution_meta($instruction);
459 if ($instruction =~ /sequence-region\s+(.+)\s+(-?\d+)\s+(-?\d+)/i
460 && !$self->ignore_seqregion()) {
461 my($ref,$start,$end,$strand) = $self->_remap($1,$2,$3,+1);
462 my $feature = $self->sfclass->new(-name
=> $ref,
467 -primary_tag
=> 'region');
468 $self->store->store($feature);
472 if ($instruction =~/index-subfeatures\s+(\S+)/i) {
473 $self->{load_data
}{IndexSubfeatures
} = $1;
474 $self->store->index_subfeatures($1);
478 if ( $self->store->can('handle_unrecognized_meta') ) {
479 $self->store->handle_unrecognized_meta($instruction);
486 $loader->handle_feature($gff3_line)
488 This method is called to process a single GFF3 line. It manipulates
489 information stored a data structure called $self-E<gt>{load_data}.
493 sub handle_feature
{ #overridden
495 my $gff_line = shift;
496 my $ld = $self->{load_data
};
498 my $allow_whitespace = $self->allow_whitespace;
500 # special case for a chrom.sizes-style line
502 if ($gff_line =~ /^(\w+)\s+(\d+)\s*$/) {
503 @columns = ($1,undef,'chromosome',1,$2,undef,undef,undef,"Name=$1");
505 $gff_line =~ s/\s+/\t/g if $allow_whitespace;
506 @columns = map {$_ eq '.' ?
undef : $_ } split /\t/,$gff_line;
509 $self->invalid_gff($gff_line) if @columns < 4;
510 $self->invalid_gff($gff_line) if @columns > 9 && $allow_whitespace;
514 if (@columns > 9) { #oops, split too much due to whitespace
515 $columns[8] = join(' ',@columns[8..$#columns]);
519 my ($refname,$source,$method,$start,$end,$score,$strand,$phase,$attributes) = @columns;
521 $self->invalid_gff($gff_line) unless defined $refname;
522 $self->invalid_gff($gff_line) unless !defined $start || $start =~ /^[\d.-]+$/;
523 $self->invalid_gff($gff_line) unless !defined $end || $end =~ /^[\d.-]+$/;
524 $self->invalid_gff($gff_line) unless defined $method;
526 $strand = $Strandedness{$strand||0};
527 my ($reserved,$unreserved) = $attributes ?
$self->parse_attributes($attributes) : ();
529 my $name = ($reserved->{Name
} && $reserved->{Name
}[0]);
531 my $has_loadid = defined $reserved->{ID
}[0];
533 my $feature_id = defined $reserved->{ID
}[0] ?
$reserved->{ID
}[0] : $ld->{TemporaryID
}++;
534 my @parent_ids = @
{$reserved->{Parent
}} if defined $reserved->{Parent
};
536 my $index_it = $ld->{IndexSubfeatures
};
537 if (exists $reserved->{Index
} || exists $reserved->{index}) {
538 $index_it = $reserved->{Index
}[0] || $reserved->{index}[0];
541 # Everything in the unreserved hash becomes an attribute, so we copy
542 # some attributes over
543 $unreserved->{Note
} = $reserved->{Note
} if exists $reserved->{Note
};
544 $unreserved->{Alias
} = $reserved->{Alias
} if exists $reserved->{Alias
};
545 $unreserved->{Target
} = $reserved->{Target
} if exists $reserved->{Target
};
546 $unreserved->{Gap
} = $reserved->{Gap
} if exists $reserved->{Gap
};
547 $unreserved->{load_id
}= $reserved->{ID
} if exists $reserved->{ID
};
549 # mec@stowers-institute.org, wondering why not all attributes are
550 # carried forward, adds ID tag in particular service of
551 # round-tripping ID, which, though present in database as load_id
552 # attribute, was getting lost as itself
553 # $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID};
555 # TEMPORARY HACKS TO SIMPLIFY DEBUGGING
556 $feature_id = '' unless defined $feature_id;
557 $name = '' unless defined $name; # prevent uninit variable warnings
558 # push @{$unreserved->{Alias}},$feature_id if $has_loadid && $feature_id ne $name;
559 $unreserved->{parent_id
} = \
@parent_ids if DEBUG
&& @parent_ids;
561 # POSSIBLY A PERMANENT HACK -- TARGETS BECOME ALIASES
562 # THIS IS TO ALLOW FOR TARGET-BASED LOOKUPS
563 if (exists $reserved->{Target
} && !$self->{noalias_target
}) {
564 my %aliases = map {$_=>1} @
{$unreserved->{Alias
}};
565 for my $t (@
{$reserved->{Target
}}) {
566 (my $tc = $t) =~ s/\s+.*$//; # get rid of coordinates
568 push @
{$unreserved->{Alias
}},$tc unless $name eq $tc || $aliases{$tc};
572 ($refname,$start,$end,$strand) = $self->_remap($refname,$start,$end,$strand) or return;
574 my @args = (-display_name
=> $name,
578 -strand
=> $strand || 0,
581 -primary_tag
=> $method || 'feature',
584 -attributes
=> $unreserved,
587 # Here's where we handle feature lines that have the same ID (multiple locations, not
588 # parent/child relationships)
592 # Current feature is the same as the previous feature, which hasn't yet been loaded
593 if (defined $ld->{CurrentID
} && $ld->{CurrentID
} eq $feature_id) {
594 $old_feat = $ld->{CurrentFeature
};
597 # Current feature is the same as a feature that was loaded earlier
598 elsif (defined(my $id = $self->{load_data
}{Helper
}->local2global($feature_id))) {
599 $old_feat = $self->fetch($feature_id)
600 or $self->warn(<<END);
601 ID=$feature_id has been used more than once, but it cannot be found in the database.
602 This can happen if you have specified fast loading, but features sharing the same ID
603 are not contiguous in the GFF file. This will be loaded as a separate feature.
608 # contiguous feature, so add a segment
609 warn $old_feat if defined $old_feat and !ref $old_feat;
610 if (defined $old_feat) {
611 # set this to 1 to disable split-location behavior
612 if (0 && @parent_ids) { # If multiple features are held together by the same ID
613 $feature_id = $ld->{TemporaryID
}++; # AND they have a Parent attribute, this causes an undesirable
614 } # additional layer of aggregation. Changing the ID fixes this.
616 $old_feat->seq_id ne $refname ||
617 $old_feat->start != $start ||
618 $old_feat->end != $end # make sure endpoints are distinct
621 $self->add_segment($old_feat,$self->sfclass->new(@args));
626 # we get here if this is a new feature
627 # first of all, store the current feature if it is there
628 $self->store_current_feature() if defined $ld->{CurrentID
};
630 # now create the new feature
631 # (index top-level features only if policy asks us to)
632 my $feature = $self->sfclass->new(@args);
633 $feature->object_store($self->store) if $feature->can('object_store'); # for lazy table features
634 $ld->{CurrentFeature
} = $feature;
635 $ld->{CurrentID
} = $feature_id;
637 my $top_level = !@parent_ids;
638 my $has_id = defined $reserved->{ID
}[0];
639 $index_it ||= $top_level;
641 my $helper = $ld->{Helper
};
642 $helper->indexit($feature_id=>1) if $index_it;
643 $helper->toplevel($feature_id=>1) if !$self->{fast
}
644 && $top_level; # need to track top level features
648 for my $parent (@parent_ids) {
649 $helper->add_children($parent=>$feature_id);
657 $self->throw("invalid GFF line at line $self->{load_data}{line}.\n".$line);
660 =item allow_whitespace
662 $allow_it = $loader->allow_whitespace([$newvalue]);
664 Get or set the allow_whitespace flag. If true, then GFF3 files are
665 allowed to be delimited with whitespace in addition to tabs.
669 sub allow_whitespace
{
671 my $d = $self->{allow_whitespace
};
672 $self->{allow_whitespace
} = shift if @_;
676 =item store_current_feature
678 $loader->store_current_feature()
680 This method is called to store the currently active feature in the
681 database. It uses a data structure stored in $self-E<gt>{load_data}.
685 # sub store_current_feature { } inherited
687 =item build_object_tree
689 $loader->build_object_tree()
691 This method gathers together features and subfeatures and builds the graph that connects them.
696 # put objects together
698 sub build_object_tree
{
700 $self->subfeatures_in_table ?
$self->build_object_tree_in_tables : $self->build_object_tree_in_features;
703 =item build_object_tree_in_tables
705 $loader->build_object_tree_in_tables()
707 This method gathers together features and subfeatures and builds the
708 graph that connects them, assuming that parent/child relationships
709 will be stored in a database table.
713 sub build_object_tree_in_tables
{
715 my $store = $self->store;
716 my $helper = $self->{load_data
}{Helper
};
718 while (my ($load_id,$children) = $helper->each_family()) {
720 my $parent_id = $helper->local2global($load_id);
721 die $self->throw("$load_id doesn't have a primary id")
722 unless defined $parent_id;
724 my @children = map {$helper->local2global($_)} @
$children;
725 # this updates the table that keeps track of parent/child relationships,
726 # but does not update the parent object -- so (start,end) had better be right!!!
727 $store->add_SeqFeature($parent_id,@children);
733 =item build_object_tree_in_features
735 $loader->build_object_tree_in_features()
737 This method gathers together features and subfeatures and builds the
738 graph that connects them, assuming that parent/child relationships are
739 stored in the seqfeature objects themselves.
743 sub build_object_tree_in_features
{
745 my $store = $self->store;
746 my $tmp = $self->tmp_store;
747 my $ld = $self->{load_data
};
748 my $normalized = $self->subfeatures_normalized;
750 my $helper = $ld->{Helper
};
752 while (my $load_id = $helper->each_toplevel) {
753 my $feature = $self->fetch($load_id)
754 or $self->throw("$load_id (id="
755 .$helper->local2global($load_id)
756 ." should have a database entry, but doesn't");
757 $self->attach_children($store,$ld,$load_id,$feature);
758 # Indexed objects are updated, not created anew
759 $feature->primary_id(undef) unless $helper->indexit($load_id);
760 $store->store($feature);
765 =item attach_children
767 $loader->attach_children($store,$load_data,$load_id,$feature)
769 This recursively adds children to features and their subfeatures. It
770 is called when subfeatures are directly contained within other
771 features, rather than stored in a relational table.
775 sub attach_children
{
777 my ($store,$ld,$load_id,$feature) = @_;
779 my $children = $ld->{Helper
}->children() or return;
780 for my $child_id (@
$children) {
781 my $child = $self->fetch($child_id)
782 or $self->throw("$child_id should have a database entry, but doesn't");
783 $self->attach_children($store,$ld,$child_id,$child); # recursive call
784 $feature->add_SeqFeature($child);
790 my $feature = $loader->fetch($load_id)
792 Given a load ID (from the ID= attribute) this method returns the
793 feature from the temporary database or the permanent one, depending on
801 my $helper = $self->{load_data
}{Helper
};
802 my $id = $helper->local2global($load_id);
805 ($self->subfeatures_normalized || $helper->indexit($load_id)
806 ?
$self->store->fetch($id)
807 : $self->tmp_store->fetch($id)
813 $loader->add_segment($parent,$child)
815 This method is used to add a split location to the parent.
821 my ($parent,$child) = @_;
823 if ($parent->can('add_segment')) { # probably a lazy table feature
824 my $segment_count = $parent->can('denormalized_segment_count') ?
$parent->denormalized_segment_count
825 : $parent->can('denormalized_segments ') ?
$parent->denormalized_segments
826 : $parent->can('segments') ?
$parent->segments
828 unless ($segment_count) { # convert into a segmented object
830 if ($parent->can('clone')) {
831 $segment = $parent->clone;
833 my %clone = %$parent;
834 $segment = bless \
%clone,ref $parent;
836 delete $segment->{segments
};
837 eval {$segment->object_store(undef) };
838 $segment->primary_id(undef);
840 # this updates the object and expands its start and end positions without writing
841 # the segments into the database as individual objects
842 $parent->add_segment($segment);
844 $parent->add_segment($child);
848 # a conventional Bio::SeqFeature::Generic object - create a split location
850 my $current_location = $parent->location;
851 if ($current_location->can('add_sub_Location')) {
852 $current_location->add_sub_Location($child->location);
854 eval "require Bio::Location::Split" unless Bio
::Location
::Split
->can('add_sub_Location');
855 my $new_location = Bio
::Location
::Split
->new();
856 $new_location->add_sub_Location($current_location);
857 $new_location->add_sub_Location($child->location);
858 $parent->location($new_location);
863 =item parse_attributes
865 ($reserved,$unreserved) = $loader->parse_attributes($attribute_line)
867 This method parses the information contained in the $attribute_line
868 into two hashrefs, one containing the values of reserved attribute
869 tags (e.g. ID) and the other containing the values of unreserved ones.
873 sub parse_attributes
{
877 unless ($att =~ /=/) { # ouch! must be a GFF line
878 require Bio
::DB
::SeqFeature
::Store
::GFF2Loader
879 unless Bio
::DB
::SeqFeature
::Store
::GFF2Loader
->can('parse_attributes');
880 return $self->Bio::DB
::SeqFeature
::Store
::GFF2Loader
::parse_attributes
($att);
883 my @pairs = map { my ($name,$value) = split '=';
884 [$self->unescape($name) => $value];
886 my (%reserved,%unreserved);
890 unless (defined $_->[1]) {
891 warn "$tag does not have a value at GFF3 file line $.\n";
895 my @values = split ',',$_->[1];
896 map {$_ = $self->unescape($_);} @values;
897 if ($Special_attributes{$tag}) { # reserved attribute
898 push @
{$reserved{$tag}},@values;
900 push @
{$unreserved{$tag}},@values
903 return (\
%reserved,\
%unreserved);
906 =item start_or_finish_sequence
908 $loader->start_or_finish_sequence('Chr9')
910 This method is called at the beginning and end of a fasta section.
914 # sub start_or_finish_sequence { } inherited
918 $loader->load_sequence('gatttcccaaa')
920 This method is called to load some amount of sequence after
921 start_or_finish_sequence() is first called.
925 # sub load_sequence { } inherited
929 my $io_file = $loader->open_fh($filehandle_or_path)
931 This method opens up the indicated file or pipe, using some
932 intelligence to recognized compressed files and URLs and doing the
937 # sub open_fh { } inherited
939 # sub msg { } inherited
943 my $time = $loader->time
945 This method returns the current time in seconds, using Time::HiRes if available.
949 # sub time { } inherited
953 my $unescaped = GFF3Loader::unescape($escaped)
955 This is an internal utility. It is the same as CGI::Util::unescape,
956 but doesn't change pluses into spaces and ignores unicode escapes.
960 # sub unescape { } inherited
964 my ($ref,$start,$end,$strand) = @_;
965 my $mapper = $self->coordinate_mapper;
966 return ($ref,$start,$end,$strand) unless $mapper;
968 my ($newref,$coords) = $mapper->($ref,[$start,$end]);
969 return unless defined $coords->[0];
970 if ($coords->[0] > $coords->[1]) {
971 @
{$coords} = reverse(@
{$coords});
974 return ($newref,@
{$coords},$strand);
977 sub _indexit
{ # override
979 return $self->{load_data
}{Helper
}->indexit(@_);
982 sub _local2global
{ # override
984 return $self->{load_data
}{Helper
}->local2global(@_);
989 my $ids = $self->local_ids;
992 After performing a load, this returns an array ref containing all the
993 load file IDs that were contained within the file just loaded.
997 sub local_ids
{ # override
999 return $self->{load_data
}{Helper
}->local_ids(@_);
1004 my $ids = $loader->loaded_ids;
1007 After performing a load, this returns an array ref containing all the
1008 feature primary ids that were created during the load.
1012 sub loaded_ids
{ # override
1014 return $self->{load_data
}{Helper
}->loaded_ids(@_);
1025 This is an early version, so there are certainly some bugs. Please
1026 use the BioPerl bug tracking system to report bugs.
1030 L<Bio::DB::SeqFeature::Store>,
1031 L<Bio::DB::SeqFeature::Segment>,
1032 L<Bio::DB::SeqFeature::NormalizedFeature>,
1033 L<Bio::DB::SeqFeature::GFF2Loader>,
1034 L<Bio::DB::SeqFeature::Store::DBI::mysql>,
1035 L<Bio::DB::SeqFeature::Store::berkeleydb>
1039 Lincoln Stein E<lt>lstein@cshl.orgE<gt>.
1041 Copyright (c) 2006 Cold Spring Harbor Laboratory.
1043 This library is free software; you can redistribute it and/or modify
1044 it under the same terms as Perl itself.