Merge pull request #181 from bioperl/limit-dockerhub-trigger
[bioperl-live.git] / Bio / SeqIO / bsml.pm
blob7026188cf43caa60dc8ef20036b1aa59aeb5e22c
2 # BioPerl module for Bio::SeqIO::bsml
4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
6 # Cared for by Charles Tilford (tilfordc@bms.com)
7 # Copyright (C) Charles Tilford 2001
9 # This library is free software; you can redistribute it and/or
10 # modify it under the terms of the GNU Lesser General Public
11 # License as published by the Free Software Foundation; either
12 # version 2.1 of the License, or (at your option) any later version.
14 # This library is distributed in the hope that it will be useful,
15 # but WITHOUT ANY WARRANTY; without even the implied warranty of
16 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
17 # Lesser General Public License for more details.
19 # You should have received a copy of the GNU Lesser General Public
20 # License along with this library; if not, write to the Free Software
21 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
22 # Also at: http://www.gnu.org/copyleft/lesser.html
25 # Much of the basic documentation in this module has been
26 # cut-and-pasted from the embl.pm (Ewan Birney) SeqIO module.
29 =head1 NAME
31 Bio::SeqIO::bsml - BSML sequence input/output stream
33 =head1 SYNOPSIS
35 It is probably best not to use this object directly, but rather go
36 through the SeqIO handler system. To read a BSML file:
38 $stream = Bio::SeqIO->new( -file => $filename, -format => 'bsml');
40 while ( my $bioSeqObj = $stream->next_seq() ) {
41 # do something with $bioSeqObj
44 To write a Seq object to the current file handle in BSML XML format:
46 $stream->write_seq( -seq => $seqObj);
48 If instead you would like a XML::DOM object containing the BSML, use:
50 my $newXmlObject = $stream->to_bsml( -seq => $seqObj);
52 =head1 DEPENDENCIES
54 In addition to parts of the Bio:: hierarchy, this module uses:
56 XML::DOM
58 =head1 DESCRIPTION
60 This object can transform Bio::Seq objects to and from BSML (XML)
61 flatfiles.
63 =head2 NOTE:
65 2/1/02 - I have changed the API to more closely match argument
66 passing used by other BioPerl methods ( -tag => value ). Internal
67 methods are using the same API, but you should not be calling those
68 anyway...
70 =head1 FEEDBACK
72 =head2 Mailing Lists
74 User feedback is an integral part of the evolution of this and other
75 Bioperl modules. Send your comments and suggestions preferably to one
76 of the Bioperl mailing lists. Your participation is much appreciated.
78 bioperl-l@bioperl.org - General discussion
79 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
81 =head2 Support
83 Please direct usage questions or support issues to the mailing list:
85 I<bioperl-l@bioperl.org>
87 rather than to the module maintainer directly. Many experienced and
88 reponsive experts will be able look at the problem and quickly
89 address it. Please include a thorough description of the problem
90 with code and data examples if at all possible.
92 =head2 Reporting Bugs
94 Report bugs to the Bioperl bug tracking system to help us keep track
95 the bugs and their resolution.
96 Bug reports can be submitted via the web:
98 https://github.com/bioperl/bioperl-live/issues
100 =head2 Things Still to Do
102 * The module now uses the new Collection.pm system. However,
103 Annotations associated with a Feature object still seem to use the
104 old system, so parsing with the old methods are included..
106 * Generate Seq objects with no sequence data but an assigned
107 length. This appears to be an issue with Bio::Seq. It is possible
108 (and reasonable) to make a BSML document with features but no
109 sequence data.
111 * Support <Seq-data-import>. Do not know how commonly this is used.
113 * Some features are awaiting implementation in later versions of
114 BSML. These include:
116 * Nested feature support
118 * Complex feature (ie joins)
120 * Unambiguity in strand (ie -1,0,1, not just 'complement' )
122 * More friendly dblink structures
124 * Location.pm (or RangeI::union?) appears to have a bug when 'expand'
125 is used.
127 * More intelligent hunting for sequence and feature titles? It is not
128 terribly clear where the most appropriate field is located, better
129 grepping (eg looking for a reasonable count for spaces and numbers)
130 may allow for titles better than "AE008041".
132 =head1 AUTHOR - Charles Tilford
134 Bristol-Myers Squibb Bioinformatics
136 Email tilfordc@bms.com
138 I have developed the BSML specific code for this package, but have used
139 code from other SeqIO packages for much of the nuts-and-bolts. In particular
140 I have used code from the embl.pm module either directly or as a framework
141 for many of the subroutines that are common to SeqIO modules.
143 =cut
145 package Bio::SeqIO::bsml;
146 use strict;
148 use Bio::SeqFeature::Generic;
149 use Bio::Species;
150 use XML::DOM;
151 use Bio::Seq::SeqFactory;
152 use Bio::Annotation::Collection;
153 use Bio::Annotation::Comment;
154 use Bio::Annotation::Reference;
155 use Bio::Annotation::DBLink;
157 use base qw(Bio::SeqIO);
159 my $idcounter = {}; # Used to generate unique id values
160 my $nvtoken = ": "; # The token used if a name/value pair has to be stuffed
161 # into a single line
163 =head1 METHODS
165 =cut
167 # LS: this seems to get overwritten on line 1317, generating a redefinition error. Dead code?
168 # CAT: This was inappropriately added in revision 1.10 - I added the check for existance of a sequence factory to the actual _initialize
169 # sub _initialize {
170 # my($self,@args) = @_;
171 # $self->SUPER::_initialize(@args);
172 # if( ! defined $self->sequence_factory ) {
173 # $self->sequence_factory(Bio::Seq::SeqFactory->new(-verbose => $self->verbose(), -type => 'Bio::Seq::RichSeq'));
177 =head2 next_seq
179 Title : next_seq
180 Usage : my $bioSeqObj = $stream->next_seq
181 Function: Retrieves the next sequence from a SeqIO::bsml stream.
182 Returns : A reference to a Bio::Seq::RichSeq object
183 Args :
185 =cut
187 sub next_seq {
188 my $self = shift;
189 my ($desc);
190 my $bioSeq = $self->sequence_factory->create(-verbose =>$self->verbose());
192 unless (exists $self->{'domtree'}) {
193 $self->throw("A BSML document has not yet been parsed.");
194 return;
196 my $dom = $self->{'domtree'};
197 my $seqElements = $dom->getElementsByTagName ("Sequence");
198 if ($self->{'current_node'} == $seqElements->getLength ) {
199 # There are no more <Sequence>s to process
200 return;
202 my $xmlSeq = $seqElements->item($self->{'current_node'});
204 # Assume that title attribute contains the best display id
205 if (my $val = $xmlSeq->getAttribute( "title")) {
206 $bioSeq->display_id($val);
209 # Set the molecule type
210 if (my $val = $xmlSeq->getAttribute( "molecule" )) {
211 my %mol = ('dna' => 'DNA', 'rna' => 'RNA', 'aa' => 'protein');
212 $bioSeq->molecule($mol{ lc($val) });
215 # Set the accession number
216 if (my $val = $xmlSeq->getAttribute( "ic-acckey" )) {
217 $bioSeq->accession_number($val);
220 # Get the sequence data for the element
221 if (my $seqData = &FIRSTDATA($xmlSeq->getElementsByTagName("Seq-data")
222 ->item(0) ) ) {
223 # Sequence data exists, transfer to the Seq object
224 # Remove white space and CRs (not neccesary?)
225 $seqData =~ s/[\s\n\r]//g;
226 $bioSeq->seq($seqData);
227 } elsif (my $import = $xmlSeq->getElementsByTagName("Seq-dataimport")
228 ->item(0) ) {
229 #>>>> # What about <Seq-data-import> ??
231 } elsif (my $val = $xmlSeq->getAttribute("length")) {
232 # No sequence defined, set the length directly
234 #>>>> # This does not appear to work - length is apparently calculated
235 # from the sequence. How to make a "virtual" sequence??? Such
236 # creatures are common in BSML...
237 $bioSeq->length($val);
240 my $species = Bio::Species->new();
241 my @classification = ();
243 # Peruse the generic <Attributes> - those that are direct children of
244 # the <Sequence> or the <Feature-tables> element
245 # Sticky wicket here - data not controlled by schema, could be anything
246 my @seqDesc = ();
247 my %specs = ('common_name' => 'y',
248 'genus' => 'y',
249 'species' => 'y',
250 'sub_species' => 'y',
252 my %seqMap = (
253 'add_date' => [ qw(date date-created date-last-updated)],
254 'keywords' => [ 'keyword', ],
255 'seq_version' => [ 'version' ],
256 'division' => [ 'division' ],
257 'add_secondary_accession' => ['accession'],
258 'pid' => ['pid'],
259 'primary_id' => [ 'primary.id', 'primary_id' ],
261 my @links;
262 my $floppies = &GETFLOPPIES($xmlSeq);
263 for my $attr (@{$floppies}) {
264 # Don't want to get attributes from <Feature> or <Table> elements yet
265 my $parent = $attr->getParentNode->getNodeName;
266 next unless($parent eq "Sequence" || $parent eq "Feature-tables");
268 my ($name, $content) = &FLOPPYVALS($attr);
269 $name = lc($name);
270 if (exists $specs{$name}) { # It looks like part of species...
271 $species->$name($content);
272 next;
274 my $value = "";
275 # Cycle through the Seq methods:
276 for my $method (keys %seqMap) {
277 # Cycle through potential matching attributes:
278 for my $match (@{$seqMap{$method}}) {
279 # If the <Attribute> name matches one of the keys,
280 # set $value, unless it has already been set
281 $value ||= $content if ($name =~ /$match/i);
283 if ($value ne "") {
285 if( $method eq 'seq_version'&& $value =~ /\S+\.(\d+)/ ) {
286 # hack for the fact that data in version is actually
287 # ACCESSION.VERSION
288 ($value) = $1;
290 $bioSeq->$method($value);
291 last;
294 if( $name eq 'database-xref' ) {
295 my ($link_id,$link_db) = split(/:/,$value);
296 push @links, Bio::Annotation::DBLink->new(-primary_id => $link_id,
297 -database => $link_db);
299 next if ($value ne "");
301 if ($name =~ /^species$/i) { # Uh, it's the species designation?
302 if ($content =~ / /) {
303 # Assume that a full species name has been provided
304 # This will screw up if the last word is the subspecies...
305 my @break = split " ", $content;
306 @classification = reverse @break;
307 } else {
308 $classification[0] = $content;
310 next;
312 if ($name =~ /sub[_ ]?species/i) { # Should be the subspecies...
313 $species->sub_species( $content );
314 next;
316 if ($name =~ /classification/i) { # Should be species classification
317 # We will assume that there are spaces separating the terms:
318 my @bits = split " ", $content;
319 # Now make sure there is not other cruft as well (eg semi-colons)
320 for my $i (0..$#bits) {
321 $bits[$i] =~ /(\w+)/;
322 $bits[$i] = $1;
324 $species->classification( @bits );
325 next;
327 if ($name =~ /comment/) {
328 my $com = Bio::Annotation::Comment->new('-text' => $content);
329 # $bioSeq->annotation->add_Comment($com);
330 $bioSeq->annotation->add_Annotation('comment', $com);
331 next;
333 # Description line - collect all descriptions for later assembly
334 if ($name =~ /descr/) {
335 push @seqDesc, $content;
336 next;
338 # Ok, we have no idea what this attribute is. Dump to SimpleValue
339 my $simp = Bio::Annotation::SimpleValue->new( -value => $content);
340 $bioSeq->annotation->add_Annotation($name, $simp);
342 unless ($#seqDesc < 0) {
343 $bioSeq->desc( join "; ", @seqDesc);
346 #>>>> This should be modified so that any IDREF associated with the
347 # <Reference> is then used to associate the reference with the
348 # appropriate Feature
350 # Extract out <Reference>s associated with the sequence
351 my @refs;
352 my %tags = (
353 -title => "RefTitle",
354 -authors => "RefAuthors",
355 -location => "RefJournal",
357 for my $ref ( $xmlSeq->getElementsByTagName ("Reference") ) {
358 my %refVals;
359 for my $tag (keys %tags) {
360 my $rt = &FIRSTDATA($ref->getElementsByTagName($tags{$tag})
361 ->item(0));
362 next unless ($rt);
363 $rt =~ s/^[\s\r\n]+//; # Kill leading space
364 $rt =~ s/[\s\r\n]+$//; # Kill trailing space
365 $rt =~ s/[\s\r\n]+/ /; # Collapse internal space runs
366 $refVals{$tag} = $rt;
368 my $reference = Bio::Annotation::Reference->new( %refVals );
370 # Pull out any <Reference> information hidden in <Attributes>
371 my %refMap = (
372 comment => [ 'comment', 'remark' ],
373 medline => [ 'medline', ],
374 pubmed => [ 'pubmed' ],
375 start => [ 'start', 'begin' ],
376 end => [ 'stop', 'end' ],
378 my @refCom = ();
379 my $floppies = &GETFLOPPIES($ref);
380 for my $attr (@{$floppies}) {
381 my ($name, $content) = &FLOPPYVALS($attr);
382 my $value = "";
383 # Cycle through the Seq methods:
384 for my $method (keys %refMap) {
385 # Cycle through potential matching attributes:
386 for my $match (@{$refMap{$method}}) {
387 # If the <Attribute> name matches one of the keys,
388 # set $value, unless it has already been set
389 $value ||= $content if ($name =~ /$match/i);
391 if ($value ne "") {
392 my $str = '$reference->' . $method . "($value)";
393 eval($str);
394 next;
397 next if ($value ne "");
398 # Don't know what the <Attribute> is, dump it to comments:
399 push @refCom, $name . $nvtoken . $content;
401 unless ($#refCom < 0) {
402 # Random stuff was found, tack it to the comment field
403 my $exist = $reference->comment;
404 $exist .= join ", ", @refCom;
405 $reference->comment($exist);
407 push @refs, $reference;
409 $bioSeq->annotation->add_Annotation('reference' => $_) for @refs;
410 my $ann_col = $bioSeq->annotation;
411 # Extract the <Feature>s for this <Sequence>
412 for my $feat ( $xmlSeq->getElementsByTagName("Feature") ) {
413 $bioSeq->add_SeqFeature( $self->_parse_bsml_feature($feat) );
416 $species->classification( @classification );
417 $bioSeq->species( $species );
419 $bioSeq->annotation->add_Annotation('dblink' => $_) for @links;
421 $self->{'current_node'}++;
422 return $bioSeq;
424 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
425 # Get all the <Attribute> and <Qualifier> children for an object, and
426 # return them as an array reference
427 # ('floppy' since these elements have poor/no schema control)
428 sub GETFLOPPIES {
429 my $obj = shift;
431 my @floppies;
432 my $attributes = $obj->getElementsByTagName ("Attribute");
433 for (my $i = 0; $i < $attributes->getLength; $i++) {
434 push @floppies, $attributes->item($i);
436 my $qualifiers = $obj->getElementsByTagName ("Qualifier");
437 for (my $i = 0; $i < $qualifiers->getLength; $i++) {
438 push @floppies, $qualifiers->item($i);
440 return \@floppies;
442 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
443 # Given a DOM <Attribute> or <Qualifier> object, return the [name, value] pair
444 sub FLOPPYVALS {
445 my $obj = shift;
447 my ($name, $value);
448 if ($obj->getNodeName eq "Attribute") {
449 $name = $obj->getAttribute('name');
450 $value = $obj->getAttribute('content');
451 } elsif ($obj->getNodeName eq "Qualifier") {
452 # Wheras <Attribute>s require both 'name' and 'content' attributes,
453 # <Qualifier>s can technically have either blank (and sometimes do)
454 my $n = $obj->getAttribute('value-type');
455 $name = $n if ($n ne "");
456 my $v = $obj->getAttribute('value');
457 $value = $v if ($v ne "");
459 return ($name, $value);
461 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
462 # Returns the value of the first TEXT_NODE encountered below an element
463 # Rational - avoid grabbing a comment rather than the PCDATA. Not foolproof...
464 sub FIRSTDATA {
465 my $element = shift;
466 return unless ($element);
468 my $hopefuls = $element->getChildNodes;
469 my $data;
470 for (my $i = 0; $i < $hopefuls->getLength; $i++) {
471 if ($hopefuls->item($i)->getNodeType ==
472 XML::DOM::Node::TEXT_NODE() ) {
473 $data = $hopefuls->item($i)->getNodeValue;
474 last;
477 return $data;
479 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
480 # Just collapses whitespace runs in a string
481 sub STRIP {
482 my $string = shift;
483 $string =~ s/[\s\r\n]+/ /g;
484 return $string;
487 =head2 to_bsml
489 Title : to_bsml
490 Usage : my $domDoc = $obj->to_bsml(@args)
491 Function: Generates an XML structure for one or more Bio::Seq objects.
492 If $seqref is an array ref, the XML tree generated will include
493 all the sequences in the array.
494 Returns : A reference to the XML DOM::Document object generated / modified
495 Args : Argument array in form of -key => val. Recognized keys:
497 -seq A Bio::Seq reference, or an array reference of many of them
499 -xmldoc Specifies an existing XML DOM document to add the sequences
500 to. If included, then only data (no page formatting) will
501 be added. If not, a new XML::DOM::Document will be made,
502 and will be populated with both <Sequence> data, as well as
503 <Page> display elements.
505 -nodisp Do not generate <Display> elements, or any children
506 thereof, even if -xmldoc is not set.
508 -skipfeat If set to 'all', all <Feature>s will be skipped. If it is
509 a hash reference, any <Feature> with a class matching a key
510 in the hash will be skipped - for example, to skip 'source'
511 and 'score' features, use:
513 -skipfeat => { source => 'Y', score => 'Y' }
515 -skiptags As above: if set to 'all', no tags are included, and if a
516 hash reference, those specific tags will be ignored.
518 Skipping some or all tags and features can result in
519 noticeable speed improvements.
521 -nodata If true, then <Seq-data> will not be included. This may be
522 useful if you just want annotations and do not care about
523 the raw ACTG information.
525 -return Default is 'xml', which will return a reference to the BSML
526 XML object. If set to 'seq' will return an array ref of the
527 <Sequence> objects added (rather than the whole XML object)
529 -close Early BSML browsers will crash if an element *could* have
530 children but does not, and is closed as an empty element
531 e.g. <Styles/>. If -close is true, then such tags are given
532 a comment child to explicitly close them e.g. <Styles><!--
533 --></Styles>. This is default true, set to "0" if you do
534 not want this behavior.
536 Examples : my $domObj = $stream->to_bsml( -seq => \@fourCoolSequenceObjects,
537 -skipfeat => { source => 1 },
540 # Or add sequences to an existing BSML document:
541 $stream->to_bsml( -seq => \@fourCoolSequenceObjects,
542 -skipfeat => { source => 1 },
543 -xmldoc => $myBsmlDocumentInProgress, );
545 =cut
547 sub to_bsml {
548 my $self = shift;
549 my $args = $self->_parseparams( -close => 1,
550 -return => 'xml',
551 @_);
552 $args->{NODISP} ||= $args->{NODISPLAY};
553 my $seqref = $args->{SEQ};
554 $seqref = (ref($seqref) eq 'ARRAY') ? $seqref : [ $seqref ];
556 #############################
557 # Basic BSML XML Components #
558 #############################
560 my $xml;
561 my ($bsmlElem, $defsElem, $seqsElem, $dispElem);
562 if ($args->{XMLDOC}) {
563 # The user has provided an existing XML DOM object
564 $xml = $args->{XMLDOC};
565 unless ($xml->isa("XML::DOM::Document")) {
566 $self->throw('SeqIO::bsml.pm error:\n'.
567 'When calling ->to_bsml( { xmldoc => $myDoc }), $myDoc \n' .
568 'should be an XML::DOM::Document object, or an object that\n'.
569 'inherits from that class (like BsmlHelper.pm)');
571 } else {
572 # The user has not provided a new document, make one from scratch
573 $xml = XML::DOM::Document->new();
574 $xml->setXMLDecl( $xml->createXMLDecl("1.0") );
575 my $url = "http://www.labbook.com/dtd/bsml2_2.dtd";
576 my $doc = $xml->createDocumentType("Bsml",$url);
577 $xml->setDoctype($doc);
578 $bsmlElem = $self->_addel( $xml, 'Bsml');
579 $defsElem = $self->_addel( $bsmlElem, 'Definitions');
580 $seqsElem = $self->_addel( $defsElem, 'Sequences');
581 unless ($args->{NODISP}) {
582 $dispElem = $self->_addel( $bsmlElem, 'Display');
583 my $stylElem = $self->_addel( $dispElem, 'Styles');
584 my $style = $self->_addel( $stylElem, 'Style', {
585 type => "text/css" });
586 my $styleText =
587 qq(Interval-widget { display : "1"; }\n) .
588 qq(Feature { display-auto : "1"; });
589 $style->appendChild( $xml->createTextNode($styleText) );
593 # Establish fundamental BSML elements, if they do not already exist
594 $bsmlElem ||= $xml->getElementsByTagName("Bsml")->item(0);
595 $defsElem ||= $xml->getElementsByTagName("Definitions")->item(0);
596 $seqsElem ||= $xml->getElementsByTagName("Sequences")->item(0);
598 ###############
599 # <Sequences> #
600 ###############
602 # Map over Bio::Seq to BSML
603 my %mol = ('dna' => 'DNA', 'rna' => 'RNA', 'protein' => 'AA');
604 my @xmlSequences;
606 for my $bioSeq (@{$seqref}) {
607 my $xmlSeq = $xml->createElement("Sequence");
608 my $FTs = $xml->createElement("Feature-tables");
610 # Array references to hold <Reference> objects:
611 my $seqRefs = []; my $featRefs = [];
612 # Array references to hold <Attribute> values (not objects):
613 my $seqDesc = [];
614 push @{$seqDesc}, ["comment" , "This file generated to BSML 2.2 standards - joins will be collapsed to a single feature enclosing all members of the join"];
615 push @{$seqDesc}, ["description" , eval{$bioSeq->desc}];
616 for my $kwd ( eval{$bioSeq->get_keywords} ) {
617 push @{$seqDesc}, ["keyword" , $kwd];
619 push @{$seqDesc}, ["keyword" , eval{$bioSeq->keywords}];
620 push @{$seqDesc}, ["version" , eval{
621 join(".", $bioSeq->accession_number, $bioSeq->seq_version); }];
622 push @{$seqDesc}, ["division" , eval{$bioSeq->division}];
623 push @{$seqDesc}, ["pid" , eval{$bioSeq->pid}];
624 # push @{$seqDesc}, ["bio_object" , ref($bioSeq)];
625 push @{$seqDesc}, ["primary_id" , eval{$bioSeq->primary_id}];
626 for my $dt (eval{$bioSeq->get_dates()} ) {
627 push @{$seqDesc}, ["date" , $dt];
629 for my $ac (eval{$bioSeq->get_secondary_accessions()} ) {
630 push @{$seqDesc}, ["secondary_accession" , $ac];
633 # Determine the accession number and a unique identifier
634 my $acc = $bioSeq->accession_number eq "unknown" ?
635 "" : $bioSeq->accession_number;
636 my $id;
637 my $pi = $bioSeq->primary_id;
638 if ($pi && $pi !~ /Bio::/) {
639 # Not sure I understand what primary_id is... It sometimes
640 # is a string describing a reference to a BioSeq object...
641 $id = "SEQ" . $bioSeq->primary_id;
642 } else {
643 # Nothing useful found, make a new unique ID
644 $id = $acc || ("SEQ-io" . $idcounter->{Sequence}++);
646 # print "$id->",ref($bioSeq->primary_id),"\n";
647 # An id field with spaces is interpreted as an idref - kill the spaces
648 $id =~ s/ /-/g;
649 # Map over <Sequence> attributes
650 my %attr = ( 'title' => $bioSeq->display_id,
651 'length' => $bioSeq->length,
652 'ic-acckey' => $acc,
653 'id' => $id,
654 'representation' => 'raw',
656 $attr{molecule} = $mol{ lc($bioSeq->molecule) } if $bioSeq->can('molecule');
659 for my $a (keys %attr) {
660 $xmlSeq->setAttribute($a, $attr{$a}) if (defined $attr{$a} &&
661 $attr{$a} ne "");
663 # Orphaned Attributes:
664 $xmlSeq->setAttribute('topology', 'circular')
665 if ($bioSeq->is_circular);
666 # <Sequence> strand, locus
668 $self->_add_page($xml, $xmlSeq) if ($dispElem);
669 ################
670 # <Attributes> #
671 ################
673 # Check for Bio::Annotations on the * <Sequence> *.
674 $self->_parse_annotation( -xml => $xml, -obj => $bioSeq,
675 -desc => $seqDesc, -refs => $seqRefs);
677 # Incorporate species data
678 if (ref($bioSeq->species) eq 'Bio::Species') {
679 # Need to peer into Bio::Species ...
680 my @specs = ('common_name', 'genus', 'species', 'sub_species');
681 for my $sp (@specs) {
682 next unless (my $val = $bioSeq->species()->$sp());
683 push @{$seqDesc}, [$sp , $val];
685 push @{$seqDesc}, ['classification',
686 (join " ", $bioSeq->species->classification) ];
687 # Species::binomial will return "genus species sub_species" ...
688 } elsif (my $val = $bioSeq->species) {
689 # Ok, no idea what it is, just dump it in there...
690 push @{$seqDesc}, ["species", $val];
693 # Add the description <Attribute>s for the <Sequence>
694 for my $seqD (@{$seqDesc}) {
695 $self->_addel($xmlSeq, "Attribute", {
696 name => $seqD->[0], content => $seqD->[1]}) if ($seqD->[1]);
699 # If sequence references were added, make a Feature-table for them
700 unless ($#{$seqRefs} < 0) {
701 my $seqFT = $self->_addel($FTs, "Feature-table", {
702 title => "Sequence References", });
703 for my $feat (@{$seqRefs}) {
704 $seqFT->appendChild($feat);
708 # This is the appropriate place to add <Feature-tables>
709 $xmlSeq->appendChild($FTs);
711 #############
712 # <Feature> #
713 #############
715 #>>>> # Perhaps it is better to loop through top_Seqfeatures?...
716 #>>>> # ...however, BSML does not have a hierarchy for Features
718 if (defined $args->{SKIPFEAT} &&
719 $args->{SKIPFEAT} eq 'all') {
720 $args->{SKIPFEAT} = { all => 1};
721 } else { $args->{SKIPFEAT} ||= {} }
722 for my $class (keys %{$args->{SKIPFEAT}}) {
723 $args->{SKIPFEAT}{lc($class)} = $args->{SKIPFEAT}{$class};
725 # Loop through all the features
726 my @features = $bioSeq->all_SeqFeatures();
727 if (@features && !$args->{SKIPFEAT}{all}) {
728 my $ft = $self->_addel($FTs, "Feature-table", {
729 title => "Features", });
730 for my $bioFeat (@features ) {
731 my $featDesc = [];
732 my $class = lc($bioFeat->primary_tag);
733 # The user may have specified to ignore this type of feature
734 next if ($args->{SKIPFEAT}{$class});
735 my $id = "FEAT-io" . $idcounter->{Feature}++;
736 my $xmlFeat = $self->_addel( $ft, 'Feature', {
737 'id' => $id,
738 'class' => $class ,
739 'value-type' => $bioFeat->source_tag });
740 # Check for Bio::Annotations on the * <Feature> *.
741 $self->_parse_annotation( -xml => $xml, -obj => $bioFeat,
742 -desc => $featDesc, -id => $id,
743 -refs =>$featRefs, );
744 # Add the description stuff for the <Feature>
745 for my $de (@{$featDesc}) {
746 $self->_addel($xmlFeat, "Attribute", {
747 name => $de->[0], content => $de->[1]}) if ($de->[1]);
749 $self->_parse_location($xml, $xmlFeat, $bioFeat);
751 # loop through the tags, add them as <Qualifiers>
752 next if (defined $args->{SKIPTAGS} &&
753 $args->{SKIPTAGS} =~ /all/i);
754 # Tags can consume a lot of CPU cycles, and can often be
755 # rather non-informative, so -skiptags can allow total or
756 # selective omission of tags.
757 for my $tag ($bioFeat->all_tags()) {
758 next if (exists $args->{SKIPTAGS}{$tag});
759 for my $val ($bioFeat->each_tag_value($tag)) {
760 $self->_addel( $xmlFeat, 'Qualifier', {
761 'value-type' => $tag ,
762 'value' => $val });
768 ##############
769 # <Seq-data> #
770 ##############
772 # Add sequence data
773 if ( (my $data = $bioSeq->seq) && !$args->{NODATA} ) {
774 my $d = $self->_addel($xmlSeq, 'Seq-data');
775 $d->appendChild( $xml->createTextNode($data) );
778 # If references were added, make a Feature-table for them
779 unless ($#{$featRefs} < 0) {
780 my $seqFT = $self->_addel($FTs, "Feature-table", {
781 title => "Feature References", });
782 for my $feat (@{$featRefs}) {
783 $seqFT->appendChild($feat);
787 # Place the completed <Sequence> tree as a child of <Sequences>
788 $seqsElem->appendChild($xmlSeq);
789 push @xmlSequences, $xmlSeq;
792 # Prevent browser crashes by explicitly closing empty elements:
793 if ($args->{CLOSE}) {
794 my @problemChild = ('Sequences', 'Sequence', 'Feature-tables',
795 'Feature-table', 'Screen', 'View',);
796 for my $kid (@problemChild) {
797 for my $prob ($xml->getElementsByTagName($kid)) {
798 unless ($prob->hasChildNodes) {
799 $prob->appendChild(
800 $xml->createComment(" Must close <$kid> explicitly "));
806 if (defined $args->{RETURN} &&
807 $args->{RETURN} =~ /seq/i) {
808 return \@xmlSequences;
809 } else {
810 return $xml;
814 =head2 write_seq
816 Title : write_seq
817 Usage : $obj->write_seq(@args)
818 Function: Prints out an XML structure for one or more Bio::Seq objects.
819 If $seqref is an array ref, the XML tree generated will include
820 all the sequences in the array. This method is fairly simple,
821 most of the processing is performed within to_bsml.
822 Returns : A reference to the XML object generated / modified
823 Args : Argument array. Recognized keys:
825 -seq A Bio::Seq reference, or an array reference of many of them
827 Alternatively, the method may be called simply as...
829 $obj->write_seq( $bioseq )
831 ... if only a single argument is passed, it is assumed that
832 it is the sequence object (can also be an array ref of
833 many Seq objects )
835 -printmime If true prints "Content-type: $mimetype\n\n" at top of
836 document, where $mimetype is the value designated by this
837 key. For generic XML use text/xml, for BSML use text/x-bsml
839 -return This option will be supressed, since the nature of this
840 method is to print out the XML document. If you wish to
841 retrieve the <Sequence> objects generated, use the to_bsml
842 method directly.
844 =cut
846 sub write_seq {
847 my $self = shift;
848 my $args = $self->_parseparams( @_);
849 if ($#_ == 0 ) {
850 # If only a single value is passed, assume it is the seq object
851 unshift @_, "-seq";
853 # Build a BSML XML DOM object based on the sequence(s)
854 my $xml = $self->to_bsml( @_,
855 -return => undef );
856 # Convert to a string
857 my $out = $xml->toString;
858 # Print after putting a return after each element - more readable
859 $out =~ s/>/>\n/g;
860 $self->_print("Content-type: " . $args->{PRINTMIME} . "\n\n")
861 if ($args->{PRINTMIME});
862 $self->_print( $out );
863 # Return the DOM tree in case the user wants to do something with it
865 $self->flush if $self->_flush_on_write && defined $self->_fh;
866 return $xml;
869 =head1 INTERNAL METHODS
870 #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-
872 The following methods are used for internal processing, and should probably
873 not be accessed by the user.
875 =head2 _parse_location
877 Title : _parse_location
878 Usage : $obj->_parse_location($xmlDocument, $parentElem, $SeqFeatureObj)
879 Function: Adds <Interval-loc> and <Site-loc> children to <$parentElem> based
880 on locations / sublocations found in $SeqFeatureObj. If
881 sublocations exist, the original location will be ignored.
882 Returns : An array ref containing the elements added to the parent.
883 These will have already been added to <$parentElem>
884 Args : 0 The DOM::Document being modified
885 1 The DOM::Element parent that you want to add to
886 2 Reference to the Bio::SeqFeature being analyzed
888 =cut
890 ###############################
891 # <Interval-loc> & <Site-loc> #
892 ###############################
894 sub _parse_location {
895 my $self = shift;
896 my ($xml, $xmlFeat, $bioFeat) = @_;
897 my $bioLoc = $bioFeat->location;
898 my @locations;
899 if (ref($bioLoc) =~ /Split/) {
900 @locations = $bioLoc->sub_Location;
901 # BSML 2.2 does not recognize / support joins. For this reason,
902 # we will just use the upper-level location. The line below can
903 # be deleted or commented out if/when BSML 3 supports complex
904 # interval deffinitions:
905 @locations = ($bioLoc);
906 } else {
907 @locations = ($bioLoc);
909 my @added = ();
911 # Add the site or interval positional information:
912 for my $loc (@locations) {
913 my ($start, $end) = ($loc->start, $loc->end);
914 my %locAttr;
915 # Strand information is not well described in BSML
916 $locAttr{complement} = 1 if ($loc->strand == -1);
917 if ($start ne "" && ($start == $end || $end eq "")) {
918 $locAttr{sitepos} = $start;
919 push @added, $self->_addel($xmlFeat,'Site-loc',\%locAttr);
920 } elsif ($start ne "" && $end ne "") {
921 if ($start > $end) {
922 # The feature is on the complementary strand
923 ($start, $end) = ($end, $start);
924 $locAttr{complement} = 1;
926 $locAttr{startpos} = $start;
927 $locAttr{endpos} = $end;
928 push @added, $self->_addel($xmlFeat,'Interval-loc',\%locAttr);
929 } else {
930 warn "Failure to parse SeqFeature location. Start = '$start' & End = '$end'";
933 return \@added;
936 =head2 _parse_bsml_feature
938 Title : _parse_bsml_feature
939 Usage : $obj->_parse_bsml_feature($xmlFeature )
940 Function: Will examine the <Feature> element provided by $xmlFeature and
941 return a generic seq feature.
942 Returns : Bio::SeqFeature::Generic
943 Args : 0 XML::DOM::Element <Feature> being analyzed.
945 =cut
947 sub _parse_bsml_feature {
948 my $self = shift;
949 my ($feat) = @_;
951 my $basegsf = Bio::SeqFeature::Generic->new();
952 # score
953 # frame
954 # source_tag
956 # Use the class as the primary tag value, if it is present
957 if ( my $val = $feat->getAttribute("class") ) {
958 $basegsf->primary_tag($val);
961 # Positional information is in <Interval-loc>s or <Site-loc>s
962 # We need to grab these in order, to try to recreate joins...
963 my @locations = ();
964 for my $kid ($feat->getChildNodes) {
965 my $nodeName = $kid->getNodeName;
966 next unless ($nodeName eq "Interval-loc" ||
967 $nodeName eq "Site-loc");
968 push @locations, $kid;
970 if ($#locations == 0) {
971 # There is only one location specified
972 $self->_parse_bsml_location($locations[0], $basegsf);
973 } elsif ($#locations > 0) {
974 #>>>> # This is not working, I think the error is somewhere downstream
975 # of add_sub_SeqFeature, probably in RangeI::union ?
976 # The sub features are added fine, but the EXPANDed parent feature
977 # location has a messed up start - Bio::SeqFeature::Generic ref
978 # instead of an integer - and an incorrect end - the end of the first
979 # sub feature added, not of the union of all of them.
981 # Also, the SeqIO::genbank.pm output is odd - the sub features appear
982 # to be listed with the *previous* feature, not this one.
984 for my $location (@locations) {
985 my $subgsf = $self->_parse_bsml_location($location);
986 # print "start ", $subgsf->start,"\n";
987 # print "end ", $subgsf->end,"\n";
988 $basegsf->add_sub_SeqFeature($subgsf, 'EXPAND');
990 # print $feat->getAttribute('id'),"\n";
991 # print $basegsf->primary_tag,"\n";
993 } else {
994 # What to do if there are no locations? Nothing needed?
997 # Look at any <Attribute>s or <Qualifier>s that are present:
998 my $floppies = &GETFLOPPIES($feat);
999 for my $attr (@{$floppies}) {
1000 my ($name, $content) = &FLOPPYVALS($attr);
1001 # Don't know what the object is, dump it to a tag:
1002 $basegsf->add_tag_value(lc($name), $content);
1005 # Mostly this helps with debugging, but may be of utility...
1006 # Add a tag holding the BSML id value
1007 if ( (my $val = $feat->getAttribute('id')) &&
1008 !$basegsf->has_tag('bsml-id')) {
1009 # Decided that this got a little sloppy...
1010 # $basegsf->add_tag_value("bsml-id", $val);
1012 return $basegsf;
1015 =head2 _parse_bsml_location
1017 Title : _parse_bsml_location
1018 Usage : $obj->_parse_bsml_feature( $intOrSiteLoc, $gsfObject )
1019 Function: Will examine the <Interval-loc> or <Site-loc> element provided
1020 Returns : Bio::SeqFeature::Generic
1021 Args : 0 XML::DOM::Element <Interval/Site-loc> being analyzed.
1022 1 Optional SeqFeature::Generic to use
1024 =cut
1026 sub _parse_bsml_location {
1027 my $self = shift;
1028 my ($loc, $gsf) = @_;
1030 $gsf ||= Bio::SeqFeature::Generic->new();
1031 my $type = $loc->getNodeName;
1032 my ($start, $end);
1033 if ($type eq 'Interval-loc') {
1034 $start = $loc->getAttribute('startpos');
1035 $end = $loc->getAttribute('endpos');
1036 } elsif ($type eq 'Site-loc') {
1037 $start = $end = $loc->getAttribute('sitepos');
1038 } else {
1039 warn "Unknown location type '$type', could not make GSF\n";
1040 return;
1042 $gsf->start($start);
1043 $gsf->end($end);
1045 # BSML does not have an explicit method to set undefined strand
1046 if (my $s = $loc->getAttribute("complement")) {
1047 if ($s) {
1048 $gsf->strand(-1);
1049 } else {
1050 $gsf->strand(1);
1052 } else {
1053 # We're setting "strand nonspecific" here - bad idea?
1054 # In most cases the user likely meant it to be on the + strand
1055 $gsf->strand(0);
1058 return $gsf;
1061 =head2 _parse_reference
1063 Title : _parse_reference
1064 Usage : $obj->_parse_reference(@args )
1065 Function: Makes a new <Reference> object from a ::Reference, which is
1066 then stored in an array provide by -refs. It will be
1067 appended to the XML tree later.
1068 Returns :
1069 Args : Argument array. Recognized keys:
1071 -xml The DOM::Document being modified
1073 -refobj The Annotation::Reference Object
1075 -refs An array reference to hold the new <Reference> DOM object
1077 -id Optional. If the XML id for the 'calling' element is
1078 provided, it will be placed in any <Reference> refs
1079 attribute.
1081 =cut
1083 sub _parse_reference {
1084 my $self = shift;
1085 my $args = $self->_parseparams( @_);
1086 my ($xml, $ref, $refRef) = ($args->{XML}, $args->{REFOBJ}, $args->{REFS});
1088 ###############
1089 # <Reference> #
1090 ###############
1092 my $xmlRef = $xml->createElement("Reference");
1093 #>> This may not be the right way to make a BSML dbxref...
1094 if (my $link = $ref->medline) {
1095 $xmlRef->setAttribute('dbxref', $link);
1098 # Make attributes for some of the characteristics
1099 my %stuff = ( start => $ref->start,
1100 end => $ref->end,
1101 rp => $ref->rp,
1102 comment => $ref->comment,
1103 pubmed => $ref->pubmed,
1105 for my $s (keys %stuff) {
1106 $self->_addel($xmlRef, "Attribute", {
1107 name => $s, content => $stuff{$s} }) if ($stuff{$s});
1109 $xmlRef->setAttribute('refs', $args->{ID}) if ($args->{ID});
1110 # Add the basic information
1111 # Should probably check for content before creation...
1112 $self->_addel($xmlRef, "RefAuthors")->
1113 appendChild( $xml->createTextNode(&STRIP($ref->authors)) );
1114 $self->_addel($xmlRef, "RefTitle")->
1115 appendChild( $xml->createTextNode(&STRIP($ref->title)) );
1116 $self->_addel($xmlRef, "RefJournal")->
1117 appendChild( $xml->createTextNode(&STRIP($ref->location)) );
1118 # References will be added later in a <Feature-Table>
1119 push @{$refRef}, $xmlRef;
1122 =head2 _parse_annotation
1124 Title : _parse_annotation
1125 Usage : $obj->_parse_annotation(@args )
1126 Function: Will examine any Annotations found in -obj. Data found in
1127 ::Comment and ::DBLink structures, as well as Annotation
1128 description fields are stored in -desc for later
1129 generation of <Attribute>s. <Reference> objects are generated
1130 from ::References, and are stored in -refs - these will
1131 be appended to the XML tree later.
1132 Returns :
1133 Args : Argument array. Recognized keys:
1135 -xml The DOM::Document being modified
1137 -obj Reference to the Bio object being analyzed
1139 -descr An array reference for holding description text items
1141 -refs An array reference to hold <Reference> DOM objects
1143 -id Optional. If the XML id for the 'calling' element is
1144 provided, it will be placed in any <Reference> refs
1145 attribute.
1147 =cut
1149 sub _parse_annotation {
1150 my $self = shift;
1151 my $args = $self->_parseparams( @_);
1152 my ($xml, $obj, $descRef, $refRef) =
1153 ( $args->{XML}, $args->{OBJ}, $args->{DESC}, $args->{REFS} );
1154 # No good place to put any of this (except for references). Most stuff
1155 # just gets dumped to <Attribute>s
1156 my $ann = $obj->annotation;
1157 return unless ($ann);
1158 # use BMS::Branch; my $debug = BMS::Branch->new( ); warn "$obj :"; $debug->branch($ann);
1159 unless (ref($ann) =~ /Collection/) {
1160 # Old style annotation. It seems that Features still use this
1161 # form of object
1162 $self->_parse_annotation_old(@_);
1163 return;
1166 for my $key ($ann->get_all_annotation_keys()) {
1167 for my $thing ($ann->get_Annotations($key)) {
1168 if ($key eq 'description') {
1169 push @{$descRef}, ["description" , $thing->value];
1170 } elsif ($key eq 'comment') {
1171 push @{$descRef}, ["comment" , $thing->text];
1172 } elsif ($key eq 'dblink') {
1173 # DBLinks get dumped to attributes, too
1174 push @{$descRef}, ["db_xref" , $thing->database . ":"
1175 . $thing->primary_id ];
1176 if (my $com = $thing->comment) {
1177 push @{$descRef}, ["link" , $com->text ];
1180 } elsif ($key eq 'reference') {
1181 $self->_parse_reference( @_, -refobj => $thing );
1182 } elsif (ref($thing) =~ /SimpleValue/) {
1183 push @{$descRef}, [$key , $thing->value];
1184 } else {
1185 # What is this??
1186 push @{$descRef}, ["error", "bsml.pm did not understand ".
1187 "'$key' = '$thing'" ];
1193 =head2 _parse_annotation_old
1195 Title : _parse_annotation_old
1196 Usage : $obj->_parse_annotation_old(@args)
1197 Function: As above, but for the old Annotation system.
1198 Apparently needed because Features are still using the old-style
1199 annotations?
1200 Returns :
1201 Args : Argument array. Recognized keys:
1203 -xml The DOM::Document being modified
1205 -obj Reference to the Bio object being analyzed
1207 -descr An array reference for holding description text items
1209 -refs An array reference to hold <Reference> DOM objects
1211 -id Optional. If the XML id for the 'calling' element is
1212 provided, it will be placed in any <Reference> refs
1213 attribute.
1215 =cut
1217 ###############
1218 # <Reference> #
1219 ###############
1221 sub _parse_annotation_old {
1222 my $self = shift;
1223 my $args = $self->_parseparams( @_);
1224 my ($xml, $obj, $descRef, $refRef) =
1225 ( $args->{XML}, $args->{OBJ}, $args->{DESC}, $args->{REFS} );
1226 # No good place to put any of this (except for references). Most stuff
1227 # just gets dumped to <Attribute>s
1228 if (my $ann = $obj->annotation) {
1229 push @{$descRef}, ["annotation", $ann->description];
1230 for my $com ($ann->each_Comment) {
1231 push @{$descRef}, ["comment" , $com->text];
1234 # Gene names just get dumped to <Attribute name="gene">
1235 for my $gene ($ann->each_gene_name) {
1236 push @{$descRef}, ["gene" , $gene];
1239 # DBLinks get dumped to attributes, too
1240 for my $link ($ann->each_DBLink) {
1241 push @{$descRef}, ["db_xref" ,
1242 $link->database . ":" . $link->primary_id ];
1243 if (my $com = $link->comment) {
1244 push @{$descRef}, ["link" , $com->text ];
1248 # References get produced and temporarily held
1249 for my $ref ($ann->each_Reference) {
1250 $self->_parse_reference( @_, -refobj => $ref );
1255 =head2 _add_page
1257 Title : _add_page
1258 Usage : $obj->_add_page($xmlDocument, $xmlSequenceObject)
1259 Function: Adds a simple <Page> and <View> structure for a <Sequence>
1260 Returns : a reference to the newly created <Page>
1261 Args : 0 The DOM::Document being modified
1262 1 Reference to the <Sequence> object
1264 =cut
1266 sub _add_page {
1267 my $self = shift;
1268 my ($xml, $seq) = @_;
1269 my $disp = $xml->getElementsByTagName("Display")->item(0);
1270 my $page = $self->_addel($disp, "Page");
1271 my ($width, $height) = ( 7.8, 5.5);
1272 my $screen = $self->_addel($page, "Screen", {
1273 width => $width, height => $height, });
1274 # $screen->appendChild($xml->createComment("Must close explicitly"));
1275 my $view = $self->_addel($page, "View", {
1276 seqref => $seq->getAttribute('id'),
1277 title => $seq->getAttribute('title'),
1278 title1 => "{NAME}",
1279 title2 => "{LENGTH} {UNIT}",
1281 $self->_addel($view, "View-line-widget", {
1282 shape => 'horizontal',
1283 hcenter => $width/2 + 0.7,
1284 'linear-length' => $width - 2,
1286 $self->_addel($view, "View-axis-widget");
1287 return $page;
1291 =head2 _addel
1293 Title : _addel
1294 Usage : $obj->_addel($parentElem, 'ChildName',
1295 { anAttr => 'someValue', anotherAttr => 'aValue',})
1296 Function: Add an element with attribute values to a DOM tree
1297 Returns : a reference to the newly added element
1298 Args : 0 The DOM::Element parent that you want to add to
1299 1 The name of the new child element
1300 2 Optional hash reference containing
1301 attribute name => attribute value assignments
1303 =cut
1305 sub _addel {
1306 my $self = shift;
1307 my ($root, $name, $attr) = @_;
1309 # Find the DOM::Document for the parent
1310 my $doc = $root->getOwnerDocument || $root;
1311 my $elem = $doc->createElement($name);
1312 for my $a (keys %{$attr}) {
1313 $elem->setAttribute($a, $attr->{$a});
1315 $root->appendChild($elem);
1316 return $elem;
1319 =head2 _show_dna
1321 Title : _show_dna
1322 Usage : $obj->_show_dna($newval)
1323 Function: (cut-and-pasted directly from embl.pm)
1324 Returns : value of _show_dna
1325 Args : newvalue (optional)
1327 =cut
1329 sub _show_dna {
1330 my $obj = shift;
1331 if( @_ ) {
1332 my $value = shift;
1333 $obj->{'_show_dna'} = $value;
1335 return $obj->{'_show_dna'};
1338 =head2 _initialize
1340 Title : _initialize
1341 Usage : $dom = $obj->_initialize(@args)
1342 Function: Coppied from embl.pm, and augmented with initialization of the
1343 XML DOM tree
1344 Returns :
1345 Args : -file => the XML file to be parsed
1347 =cut
1349 sub _initialize {
1350 my($self,@args) = @_;
1352 $self->SUPER::_initialize(@args);
1353 # hash for functions for decoding keys.
1354 $self->{'_func_ftunit_hash'} = {};
1355 $self->_show_dna(1); # sets this to one by default. People can change it
1357 my %param = @args; # From SeqIO.pm
1358 @param{ map { lc $_ } keys %param } = values %param; # lowercase keys
1359 if ( exists $param{-file} && $param{-file} !~ /^>/) {
1360 # Is it blasphemy to add your own keys to an object in another package?
1361 # domtree => the parsed DOM tree retruned by XML::DOM
1362 $self->{'domtree'} = $self->_parse_xml( $param{-file} );
1363 # current_node => the <Sequence> node next in line for next_seq
1364 $self->{'current_node'} = 0;
1367 $self->sequence_factory( Bio::Seq::SeqFactory->new
1368 ( -verbose => $self->verbose(),
1369 -type => 'Bio::Seq::RichSeq'))
1370 if( ! defined $self->sequence_factory );
1374 =head2 _parseparams
1376 Title : _parseparams
1377 Usage : my $paramHash = $obj->_parseparams(@args)
1378 Function: Borrowed from Bio::Parse.pm, who borrowed it from CGI.pm
1379 Lincoln Stein -> Richard Resnick -> here
1380 Returns : A hash reference of the parameter keys (uppercase) pointing to
1381 their values.
1382 Args : An array of key, value pairs. Easiest to pass values as:
1383 -key1 => value1, -key2 => value2, etc
1384 Leading "-" are removed.
1386 =cut
1388 sub _parseparams {
1389 my $self = shift;
1390 my %hash = ();
1391 my @param = @_;
1393 # Hacked out from Parse.pm
1394 # The next few lines strip out the '-' characters which
1395 # preceed the keys, and capitalizes them.
1396 for (my $i=0;$i<@param;$i+=2) {
1397 $param[$i]=~s/^\-//;
1398 $param[$i]=~tr/a-z/A-Z/;
1400 pop @param if @param %2; # not an even multiple
1401 %hash = @param;
1402 return \%hash;
1405 =head2 _parse_xml
1407 Title : _parse_xml
1408 Usage : $dom = $obj->_parse_xml($filename)
1409 Function: uses XML::DOM to construct a DOM tree from the BSML document
1410 Returns : a reference to the parsed DOM tree
1411 Args : 0 Path to the XML file needing to be parsed
1413 =cut
1415 sub _parse_xml {
1416 my $self = shift;
1417 my $file = shift;
1419 unless (-e $file) {
1420 $self->throw("Could not parse non-existant XML file '$file'.");
1421 return;
1423 my $parser = XML::DOM::Parser->new();
1424 my $doc = $parser->parsefile ($file);
1425 return $doc;
1428 sub DESTROY {
1429 my $self = shift;
1430 # Reports off the net imply that DOM::Parser will memory leak if you
1431 # do not explicitly dispose of it:
1432 # http://aspn.activestate.com/ASPN/Mail/Message/perl-xml/788458
1433 my $dom = $self->{'domtree'};
1434 # For some reason the domtree can get undef-ed somewhere...
1435 $dom->dispose if ($dom);
1439 =head1 TESTING SCRIPT
1441 The following script may be used to test the conversion process. You
1442 will need a file of the format you wish to test. The script will
1443 convert the file to BSML, store it in /tmp/bsmltemp, read that file
1444 into a new SeqIO stream, and write it back as the original
1445 format. Comparison of this second file to the original input file
1446 will allow you to track where data may be lost or corrupted. Note
1447 that you will need to specify $readfile and $readformat.
1449 use Bio::SeqIO;
1450 # Tests preservation of details during round-trip conversion:
1451 # $readformat -> BSML -> $readformat
1452 my $tempspot = "/tmp/bsmltemp"; # temp folder to hold generated files
1453 my $readfile = "rps4y.embl"; # The name of the file you want to test
1454 my $readformat = "embl"; # The format of the file being tested
1456 system "mkdir $tempspot" unless (-d $tempspot);
1457 # Make Seq object from the $readfile
1458 my $biostream = Bio::SeqIO->new( -file => "$readfile" );
1459 my $seq = $biostream->next_seq();
1461 # Write BSML from SeqObject
1462 my $bsmlout = Bio::SeqIO->new( -format => 'bsml',
1463 -file => ">$tempspot/out.bsml");
1464 warn "\nBSML written to $tempspot/out.bsml\n";
1465 $bsmlout->write_seq($seq);
1466 # Need to kill object for following code to work... Why is this so?
1467 $bsmlout = "";
1469 # Make Seq object from BSML
1470 my $bsmlin = Bio::SeqIO->new( -file => "$tempspot/out.bsml",
1471 -format => 'bsml');
1472 my $seq2 = $bsmlin->next_seq();
1474 # Write format back from Seq Object
1475 my $genout = Bio::SeqIO->new( -format => $readformat,
1476 -file => ">$tempspot/out.$readformat");
1477 $genout->write_seq($seq2);
1478 warn "$readformat written to $tempspot/out.$readformat\n";
1480 # BEING LOST:
1481 # Join information (not possible in BSML 2.2)
1482 # Sequence type (??)
1484 =cut