parse DBlinks now supported
[bioperl-live.git] / Bio / SeqIO / bsml.pm
blobb79e5a8b6253731ac655d7d9a2e030b4a1b968c8
2 # BioPerl module for Bio::SeqIO::bsml
4 # Cared for by Charles Tilford (tilfordc@bms.com)
5 # Copyright (C) Charles Tilford 2001
7 # This library is free software; you can redistribute it and/or
8 # modify it under the terms of the GNU Lesser General Public
9 # License as published by the Free Software Foundation; either
10 # version 2.1 of the License, or (at your option) any later version.
12 # This library is distributed in the hope that it will be useful,
13 # but WITHOUT ANY WARRANTY; without even the implied warranty of
14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15 # Lesser General Public License for more details.
17 # You should have received a copy of the GNU Lesser General Public
18 # License along with this library; if not, write to the Free Software
19 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
20 # Also at: http://www.gnu.org/copyleft/lesser.html
23 # Much of the basic documentation in this module has been
24 # cut-and-pasted from the embl.pm (Ewan Birney) SeqIO module.
27 =head1 NAME
29 Bio::SeqIO::bsml - BSML sequence input/output stream
31 =head1 SYNOPSIS
33 It is probably best not to use this object directly, but rather go
34 through the SeqIO handler system. To read a BSML file:
36 $stream = Bio::SeqIO->new( -file => $filename, -format => 'bsml');
38 while ( my $bioSeqObj = $stream->next_seq() ) {
39 # do something with $bioSeqObj
42 To write a Seq object to the current file handle in BSML XML format:
44 $stream->write_seq( -seq => $seqObj);
46 If instead you would like a XML::DOM object containing the BSML, use:
48 my $newXmlObject = $stream->to_bsml( -seq => $seqObj);
50 =head1 DEPENDENCIES
52 In addition to parts of the Bio:: hierarchy, this module uses:
54 XML::DOM
56 =head1 DESCRIPTION
58 This object can transform Bio::Seq objects to and from BSML (XML)
59 flatfiles.
61 =head2 NOTE:
63 2/1/02 - I have changed the API to more closely match argument
64 passing used by other BioPerl methods ( -tag => value ). Internal
65 methods are using the same API, but you should not be calling those
66 anyway...
68 =head1 FEEDBACK
70 =head2 Mailing Lists
72 User feedback is an integral part of the evolution of this and other
73 Bioperl modules. Send your comments and suggestions preferably to one
74 of the Bioperl mailing lists. Your participation is much appreciated.
76 bioperl-l@bioperl.org - General discussion
77 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
79 =head2 Reporting Bugs
81 Report bugs to the Bioperl bug tracking system to help us keep track
82 the bugs and their resolution.
83 Bug reports can be submitted via the web:
85 http://bugzilla.open-bio.org/
87 =head2 Things Still to Do
89 * The module now uses the new Collection.pm system. However,
90 Annotations associated with a Feature object still seem to use the
91 old system, so parsing with the old methods are included..
93 * Generate Seq objects with no sequence data but an assigned
94 length. This appears to be an issue with Bio::Seq. It is possible
95 (and reasonable) to make a BSML document with features but no
96 sequence data.
98 * Support <Seq-data-import>. Do not know how commonly this is used.
100 * Some features are awaiting implementation in later versions of
101 BSML. These include:
103 * Nested feature support
105 * Complex feature (ie joins)
107 * Unambiguity in strand (ie -1,0,1, not just 'complement' )
109 * More friendly dblink structures
111 * Location.pm (or RangeI::union?) appears to have a bug when 'expand'
112 is used.
114 * More intelligent hunting for sequence and feature titles? It is not
115 terribly clear where the most appropriate field is located, better
116 grepping (eg looking for a reasonable count for spaces and numbers)
117 may allow for titles better than "AE008041".
119 =head1 AUTHOR - Charles Tilford
121 Bristol-Myers Squibb Bioinformatics
123 Email tilfordc@bms.com
125 I have developed the BSML specific code for this package, but have used
126 code from other SeqIO packages for much of the nuts-and-bolts. In particular
127 I have used code from the embl.pm module either directly or as a framework
128 for many of the subroutines that are common to SeqIO modules.
130 =cut
132 package Bio::SeqIO::bsml;
133 use strict;
135 use Bio::SeqFeature::Generic;
136 use Bio::Species;
137 use XML::DOM;
138 use Bio::Seq::SeqFactory;
139 use Bio::Annotation::Collection;
140 use Bio::Annotation::Comment;
141 use Bio::Annotation::Reference;
142 use Bio::Annotation::DBLink;
144 use base qw(Bio::SeqIO);
146 my $idcounter = {}; # Used to generate unique id values
147 my $nvtoken = ": "; # The token used if a name/value pair has to be stuffed
148 # into a single line
150 =head1 METHODS
152 =cut
154 # LS: this seems to get overwritten on line 1317, generating a redefinition error. Dead code?
155 # CAT: This was inappropriately added in revision 1.10 - I added the check for existance of a sequence factory to the actual _initialize
156 # sub _initialize {
157 # my($self,@args) = @_;
158 # $self->SUPER::_initialize(@args);
159 # if( ! defined $self->sequence_factory ) {
160 # $self->sequence_factory(Bio::Seq::SeqFactory->new(-verbose => $self->verbose(), -type => 'Bio::Seq::RichSeq'));
164 =head2 next_seq
166 Title : next_seq
167 Usage : my $bioSeqObj = $stream->next_seq
168 Function: Retrieves the next sequence from a SeqIO::bsml stream.
169 Returns : A reference to a Bio::Seq::RichSeq object
170 Args :
172 =cut
174 sub next_seq {
175 my $self = shift;
176 my ($desc);
177 my $bioSeq = $self->sequence_factory->create(-verbose =>$self->verbose());
179 unless (exists $self->{'domtree'}) {
180 $self->throw("A BSML document has not yet been parsed.");
181 return;
183 my $dom = $self->{'domtree'};
184 my $seqElements = $dom->getElementsByTagName ("Sequence");
185 if ($self->{'current_node'} == $seqElements->getLength ) {
186 # There are no more <Sequence>s to process
187 return;
189 my $xmlSeq = $seqElements->item($self->{'current_node'});
191 # Assume that title attribute contains the best display id
192 if (my $val = $xmlSeq->getAttribute( "title")) {
193 $bioSeq->display_id($val);
196 # Set the molecule type
197 if (my $val = $xmlSeq->getAttribute( "molecule" )) {
198 my %mol = ('dna' => 'DNA', 'rna' => 'RNA', 'aa' => 'protein');
199 $bioSeq->molecule($mol{ lc($val) });
202 # Set the accession number
203 if (my $val = $xmlSeq->getAttribute( "ic-acckey" )) {
204 $bioSeq->accession_number($val);
207 # Get the sequence data for the element
208 if (my $seqData = &FIRSTDATA($xmlSeq->getElementsByTagName("Seq-data")
209 ->item(0) ) ) {
210 # Sequence data exists, transfer to the Seq object
211 # Remove white space and CRs (not neccesary?)
212 $seqData =~ s/[\s\n\r]//g;
213 $bioSeq->seq($seqData);
214 } elsif (my $import = $xmlSeq->getElementsByTagName("Seq-dataimport")
215 ->item(0) ) {
216 #>>>> # What about <Seq-data-import> ??
218 } elsif (my $val = $xmlSeq->getAttribute("length")) {
219 # No sequence defined, set the length directly
221 #>>>> # This does not appear to work - length is apparently calculated
222 # from the sequence. How to make a "virtual" sequence??? Such
223 # creatures are common in BSML...
224 $bioSeq->length($val);
227 my $species = Bio::Species->new();
228 my @classification = ();
230 # Peruse the generic <Attributes> - those that are direct children of
231 # the <Sequence> or the <Feature-tables> element
232 # Sticky wicket here - data not controlled by schema, could be anything
233 my @seqDesc = ();
234 my %specs = ('common_name' => 'y',
235 'genus' => 'y',
236 'species' => 'y',
237 'sub_species' => 'y',
239 my %seqMap = (
240 'add_date' => [ qw(date date-created date-last-updated)],
241 'keywords' => [ 'keyword', ],
242 'seq_version' => [ 'version' ],
243 'division' => [ 'division' ],
244 'add_secondary_accession' => ['accession'],
245 'pid' => ['pid'],
246 'primary_id' => [ 'primary.id', 'primary_id' ],
248 my @links;
249 my $floppies = &GETFLOPPIES($xmlSeq);
250 for my $attr (@{$floppies}) {
251 # Don't want to get attributes from <Feature> or <Table> elements yet
252 my $parent = $attr->getParentNode->getNodeName;
253 next unless($parent eq "Sequence" || $parent eq "Feature-tables");
255 my ($name, $content) = &FLOPPYVALS($attr);
256 $name = lc($name);
257 if (exists $specs{$name}) { # It looks like part of species...
258 $species->$name($content);
259 next;
261 my $value = "";
262 # Cycle through the Seq methods:
263 for my $method (keys %seqMap) {
264 # Cycle through potential matching attributes:
265 for my $match (@{$seqMap{$method}}) {
266 # If the <Attribute> name matches one of the keys,
267 # set $value, unless it has already been set
268 $value ||= $content if ($name =~ /$match/i);
270 if ($value ne "") {
272 if( $method eq 'seq_version'&& $value =~ /\S+\.(\d+)/ ) {
273 # hack for the fact that data in version is actually
274 # ACCESSION.VERSION
275 ($value) = $1;
277 $bioSeq->$method($value);
278 last;
281 if( $name eq 'database-xref' ) {
282 my ($link_id,$link_db) = split(/:/,$value);
283 push @links, Bio::Annotation::DBLink->new(-primary_id => $link_id,
284 -database => $link_db);
286 next if ($value ne "");
288 if ($name =~ /^species$/i) { # Uh, it's the species designation?
289 if ($content =~ / /) {
290 # Assume that a full species name has been provided
291 # This will screw up if the last word is the subspecies...
292 my @break = split " ", $content;
293 @classification = reverse @break;
294 } else {
295 $classification[0] = $content;
297 next;
299 if ($name =~ /sub[_ ]?species/i) { # Should be the subspecies...
300 $species->sub_species( $content );
301 next;
303 if ($name =~ /classification/i) { # Should be species classification
304 # We will assume that there are spaces separating the terms:
305 my @bits = split " ", $content;
306 # Now make sure there is not other cruft as well (eg semi-colons)
307 for my $i (0..$#bits) {
308 $bits[$i] =~ /(\w+)/;
309 $bits[$i] = $1;
311 $species->classification( @bits );
312 next;
314 if ($name =~ /comment/) {
315 my $com = Bio::Annotation::Comment->new('-text' => $content);
316 # $bioSeq->annotation->add_Comment($com);
317 $bioSeq->annotation->add_Annotation('comment', $com);
318 next;
320 # Description line - collect all descriptions for later assembly
321 if ($name =~ /descr/) {
322 push @seqDesc, $content;
323 next;
325 # Ok, we have no idea what this attribute is. Dump to SimpleValue
326 my $simp = Bio::Annotation::SimpleValue->new( -value => $content);
327 $bioSeq->annotation->add_Annotation($name, $simp);
329 unless ($#seqDesc < 0) {
330 $bioSeq->desc( join "; ", @seqDesc);
333 #>>>> This should be modified so that any IDREF associated with the
334 # <Reference> is then used to associate the reference with the
335 # appropriate Feature
337 # Extract out <Reference>s associated with the sequence
338 my @refs;
339 my %tags = (
340 -title => "RefTitle",
341 -authors => "RefAuthors",
342 -location => "RefJournal",
344 for my $ref ( $xmlSeq->getElementsByTagName ("Reference") ) {
345 my %refVals;
346 for my $tag (keys %tags) {
347 my $rt = &FIRSTDATA($ref->getElementsByTagName($tags{$tag})
348 ->item(0));
349 next unless ($rt);
350 $rt =~ s/^[\s\r\n]+//; # Kill leading space
351 $rt =~ s/[\s\r\n]+$//; # Kill trailing space
352 $rt =~ s/[\s\r\n]+/ /; # Collapse internal space runs
353 $refVals{$tag} = $rt;
355 my $reference = Bio::Annotation::Reference->new( %refVals );
357 # Pull out any <Reference> information hidden in <Attributes>
358 my %refMap = (
359 comment => [ 'comment', 'remark' ],
360 medline => [ 'medline', ],
361 pubmed => [ 'pubmed' ],
362 start => [ 'start', 'begin' ],
363 end => [ 'stop', 'end' ],
365 my @refCom = ();
366 my $floppies = &GETFLOPPIES($ref);
367 for my $attr (@{$floppies}) {
368 my ($name, $content) = &FLOPPYVALS($attr);
369 my $value = "";
370 # Cycle through the Seq methods:
371 for my $method (keys %refMap) {
372 # Cycle through potential matching attributes:
373 for my $match (@{$refMap{$method}}) {
374 # If the <Attribute> name matches one of the keys,
375 # set $value, unless it has already been set
376 $value ||= $content if ($name =~ /$match/i);
378 if ($value ne "") {
379 my $str = '$reference->' . $method . "($value)";
380 eval($str);
381 next;
384 next if ($value ne "");
385 # Don't know what the <Attribute> is, dump it to comments:
386 push @refCom, $name . $nvtoken . $content;
388 unless ($#refCom < 0) {
389 # Random stuff was found, tack it to the comment field
390 my $exist = $reference->comment;
391 $exist .= join ", ", @refCom;
392 $reference->comment($exist);
394 push @refs, $reference;
396 $bioSeq->annotation->add_Annotation('reference' => $_) for @refs;
397 my $ann_col = $bioSeq->annotation;
398 # Extract the <Feature>s for this <Sequence>
399 for my $feat ( $xmlSeq->getElementsByTagName("Feature") ) {
400 $bioSeq->add_SeqFeature( $self->_parse_bsml_feature($feat) );
403 $species->classification( @classification );
404 $bioSeq->species( $species );
406 $bioSeq->annotation->add_Annotation('dblink' => $_) for @links;
408 $self->{'current_node'}++;
409 return $bioSeq;
411 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
412 # Get all the <Attribute> and <Qualifier> children for an object, and
413 # return them as an array reference
414 # ('floppy' since these elements have poor/no schema control)
415 sub GETFLOPPIES {
416 my $obj = shift;
418 my @floppies;
419 my $attributes = $obj->getElementsByTagName ("Attribute");
420 for (my $i = 0; $i < $attributes->getLength; $i++) {
421 push @floppies, $attributes->item($i);
423 my $qualifiers = $obj->getElementsByTagName ("Qualifier");
424 for (my $i = 0; $i < $qualifiers->getLength; $i++) {
425 push @floppies, $qualifiers->item($i);
427 return \@floppies;
429 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
430 # Given a DOM <Attribute> or <Qualifier> object, return the [name, value] pair
431 sub FLOPPYVALS {
432 my $obj = shift;
434 my ($name, $value);
435 if ($obj->getNodeName eq "Attribute") {
436 $name = $obj->getAttribute('name');
437 $value = $obj->getAttribute('content');
438 } elsif ($obj->getNodeName eq "Qualifier") {
439 # Wheras <Attribute>s require both 'name' and 'content' attributes,
440 # <Qualifier>s can technically have either blank (and sometimes do)
441 my $n = $obj->getAttribute('value-type');
442 $name = $n if ($n ne "");
443 my $v = $obj->getAttribute('value');
444 $value = $v if ($v ne "");
446 return ($name, $value);
448 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
449 # Returns the value of the first TEXT_NODE encountered below an element
450 # Rational - avoid grabbing a comment rather than the PCDATA. Not foolproof...
451 sub FIRSTDATA {
452 my $element = shift;
453 return unless ($element);
455 my $hopefuls = $element->getChildNodes;
456 my $data;
457 for (my $i = 0; $i < $hopefuls->getLength; $i++) {
458 if ($hopefuls->item($i)->getNodeType ==
459 XML::DOM::Node::TEXT_NODE() ) {
460 $data = $hopefuls->item($i)->getNodeValue;
461 last;
464 return $data;
466 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
467 # Just collapses whitespace runs in a string
468 sub STRIP {
469 my $string = shift;
470 $string =~ s/[\s\r\n]+/ /g;
471 return $string;
474 =head2 to_bsml
476 Title : to_bsml
477 Usage : my $domDoc = $obj->to_bsml(@args)
478 Function: Generates an XML structure for one or more Bio::Seq objects.
479 If $seqref is an array ref, the XML tree generated will include
480 all the sequences in the array.
481 Returns : A reference to the XML DOM::Document object generated / modified
482 Args : Argument array in form of -key => val. Recognized keys:
484 -seq A Bio::Seq reference, or an array reference of many of them
486 -xmldoc Specifies an existing XML DOM document to add the sequences
487 to. If included, then only data (no page formatting) will
488 be added. If not, a new XML::DOM::Document will be made,
489 and will be populated with both <Sequence> data, as well as
490 <Page> display elements.
492 -nodisp Do not generate <Display> elements, or any children
493 thereof, even if -xmldoc is not set.
495 -skipfeat If set to 'all', all <Feature>s will be skipped. If it is
496 a hash reference, any <Feature> with a class matching a key
497 in the hash will be skipped - for example, to skip 'source'
498 and 'score' features, use:
500 -skipfeat => { source => 'Y', score => 'Y' }
502 -skiptags As above: if set to 'all', no tags are included, and if a
503 hash reference, those specific tags will be ignored.
505 Skipping some or all tags and features can result in
506 noticable speed improvements.
508 -nodata If true, then <Seq-data> will not be included. This may be
509 useful if you just want annotations and do not care about
510 the raw ACTG information.
512 -return Default is 'xml', which will return a reference to the BSML
513 XML object. If set to 'seq' will return an array ref of the
514 <Sequence> objects added (rather than the whole XML object)
516 -close Early BSML browsers will crash if an element *could* have
517 children but does not, and is closed as an empty element
518 e.g. <Styles/>. If -close is true, then such tags are given
519 a comment child to explicitly close them e.g. <Styles><!--
520 --></Styles>. This is default true, set to "0" if you do
521 not want this behavior.
523 Examples : my $domObj = $stream->to_bsml( -seq => \@fourCoolSequenceObjects,
524 -skipfeat => { source => 1 },
527 # Or add sequences to an existing BSML document:
528 $stream->to_bsml( -seq => \@fourCoolSequenceObjects,
529 -skipfeat => { source => 1 },
530 -xmldoc => $myBsmlDocumentInProgress, );
532 =cut
534 sub to_bsml {
535 my $self = shift;
536 my $args = $self->_parseparams( -close => 1,
537 -return => 'xml',
538 @_);
539 $args->{NODISP} ||= $args->{NODISPLAY};
540 my $seqref = $args->{SEQ};
541 $seqref = (ref($seqref) eq 'ARRAY') ? $seqref : [ $seqref ];
543 #############################
544 # Basic BSML XML Components #
545 #############################
547 my $xml;
548 my ($bsmlElem, $defsElem, $seqsElem, $dispElem);
549 if ($args->{XMLDOC}) {
550 # The user has provided an existing XML DOM object
551 $xml = $args->{XMLDOC};
552 unless ($xml->isa("XML::DOM::Document")) {
553 $self->throw('SeqIO::bsml.pm error:\n'.
554 'When calling ->to_bsml( { xmldoc => $myDoc }), $myDoc \n' .
555 'should be an XML::DOM::Document object, or an object that\n'.
556 'inherits from that class (like BsmlHelper.pm)');
558 } else {
559 # The user has not provided a new document, make one from scratch
560 $xml = XML::DOM::Document->new();
561 $xml->setXMLDecl( $xml->createXMLDecl("1.0") );
562 my $url = "http://www.labbook.com/dtd/bsml2_2.dtd";
563 my $doc = $xml->createDocumentType("Bsml",$url);
564 $xml->setDoctype($doc);
565 $bsmlElem = $self->_addel( $xml, 'Bsml');
566 $defsElem = $self->_addel( $bsmlElem, 'Definitions');
567 $seqsElem = $self->_addel( $defsElem, 'Sequences');
568 unless ($args->{NODISP}) {
569 $dispElem = $self->_addel( $bsmlElem, 'Display');
570 my $stylElem = $self->_addel( $dispElem, 'Styles');
571 my $style = $self->_addel( $stylElem, 'Style', {
572 type => "text/css" });
573 my $styleText =
574 qq(Interval-widget { display : "1"; }\n) .
575 qq(Feature { display-auto : "1"; });
576 $style->appendChild( $xml->createTextNode($styleText) );
580 # Establish fundamental BSML elements, if they do not already exist
581 $bsmlElem ||= $xml->getElementsByTagName("Bsml")->item(0);
582 $defsElem ||= $xml->getElementsByTagName("Definitions")->item(0);
583 $seqsElem ||= $xml->getElementsByTagName("Sequences")->item(0);
585 ###############
586 # <Sequences> #
587 ###############
589 # Map over Bio::Seq to BSML
590 my %mol = ('dna' => 'DNA', 'rna' => 'RNA', 'protein' => 'AA');
591 my @xmlSequences;
593 for my $bioSeq (@{$seqref}) {
594 my $xmlSeq = $xml->createElement("Sequence");
595 my $FTs = $xml->createElement("Feature-tables");
597 # Array references to hold <Reference> objects:
598 my $seqRefs = []; my $featRefs = [];
599 # Array references to hold <Attribute> values (not objects):
600 my $seqDesc = [];
601 push @{$seqDesc}, ["comment" , "This file generated to BSML 2.2 standards - joins will be collapsed to a single feature enclosing all members of the join"];
602 push @{$seqDesc}, ["description" , eval{$bioSeq->desc}];
603 for my $kwd ( eval{$bioSeq->get_keywords} ) {
604 push @{$seqDesc}, ["keyword" , $kwd];
606 push @{$seqDesc}, ["keyword" , eval{$bioSeq->keywords}];
607 push @{$seqDesc}, ["version" , eval{
608 join(".", $bioSeq->accession_number, $bioSeq->seq_version); }];
609 push @{$seqDesc}, ["division" , eval{$bioSeq->division}];
610 push @{$seqDesc}, ["pid" , eval{$bioSeq->pid}];
611 # push @{$seqDesc}, ["bio_object" , ref($bioSeq)];
612 push @{$seqDesc}, ["primary_id" , eval{$bioSeq->primary_id}];
613 for my $dt (eval{$bioSeq->get_dates()} ) {
614 push @{$seqDesc}, ["date" , $dt];
616 for my $ac (eval{$bioSeq->get_secondary_accessions()} ) {
617 push @{$seqDesc}, ["secondary_accession" , $ac];
620 # Determine the accession number and a unique identifier
621 my $acc = $bioSeq->accession_number eq "unknown" ?
622 "" : $bioSeq->accession_number;
623 my $id;
624 my $pi = $bioSeq->primary_id;
625 if ($pi && $pi !~ /Bio::/) {
626 # Not sure I understand what primary_id is... It sometimes
627 # is a string describing a reference to a BioSeq object...
628 $id = "SEQ" . $bioSeq->primary_id;
629 } else {
630 # Nothing useful found, make a new unique ID
631 $id = $acc || ("SEQ-io" . $idcounter->{Sequence}++);
633 # print "$id->",ref($bioSeq->primary_id),"\n";
634 # An id field with spaces is interpreted as an idref - kill the spaces
635 $id =~ s/ /-/g;
636 # Map over <Sequence> attributes
637 my %attr = ( 'title' => $bioSeq->display_id,
638 'length' => $bioSeq->length,
639 'ic-acckey' => $acc,
640 'id' => $id,
641 'representation' => 'raw',
643 $attr{molecule} = $mol{ lc($bioSeq->molecule) } if $bioSeq->can('molecule');
646 for my $a (keys %attr) {
647 $xmlSeq->setAttribute($a, $attr{$a}) if (defined $attr{$a} &&
648 $attr{$a} ne "");
650 # Orphaned Attributes:
651 $xmlSeq->setAttribute('topology', 'circular')
652 if ($bioSeq->is_circular);
653 # <Sequence> strand, locus
655 $self->_add_page($xml, $xmlSeq) if ($dispElem);
656 ################
657 # <Attributes> #
658 ################
660 # Check for Bio::Annotations on the * <Sequence> *.
661 $self->_parse_annotation( -xml => $xml, -obj => $bioSeq,
662 -desc => $seqDesc, -refs => $seqRefs);
664 # Incorporate species data
665 if (ref($bioSeq->species) eq 'Bio::Species') {
666 # Need to peer into Bio::Species ...
667 my @specs = ('common_name', 'genus', 'species', 'sub_species');
668 for my $sp (@specs) {
669 next unless (my $val = $bioSeq->species()->$sp());
670 push @{$seqDesc}, [$sp , $val];
672 push @{$seqDesc}, ['classification',
673 (join " ", $bioSeq->species->classification) ];
674 # Species::binomial will return "genus species sub_species" ...
675 } elsif (my $val = $bioSeq->species) {
676 # Ok, no idea what it is, just dump it in there...
677 push @{$seqDesc}, ["species", $val];
680 # Add the description <Attribute>s for the <Sequence>
681 for my $seqD (@{$seqDesc}) {
682 $self->_addel($xmlSeq, "Attribute", {
683 name => $seqD->[0], content => $seqD->[1]}) if ($seqD->[1]);
686 # If sequence references were added, make a Feature-table for them
687 unless ($#{$seqRefs} < 0) {
688 my $seqFT = $self->_addel($FTs, "Feature-table", {
689 title => "Sequence References", });
690 for my $feat (@{$seqRefs}) {
691 $seqFT->appendChild($feat);
695 # This is the appropriate place to add <Feature-tables>
696 $xmlSeq->appendChild($FTs);
698 #############
699 # <Feature> #
700 #############
702 #>>>> # Perhaps it is better to loop through top_Seqfeatures?...
703 #>>>> # ...however, BSML does not have a hierarchy for Features
705 if (defined $args->{SKIPFEAT} &&
706 $args->{SKIPFEAT} eq 'all') {
707 $args->{SKIPFEAT} = { all => 1};
708 } else { $args->{SKIPFEAT} ||= {} }
709 for my $class (keys %{$args->{SKIPFEAT}}) {
710 $args->{SKIPFEAT}{lc($class)} = $args->{SKIPFEAT}{$class};
712 # Loop through all the features
713 my @features = $bioSeq->all_SeqFeatures();
714 if (@features && !$args->{SKIPFEAT}{all}) {
715 my $ft = $self->_addel($FTs, "Feature-table", {
716 title => "Features", });
717 for my $bioFeat (@features ) {
718 my $featDesc = [];
719 my $class = lc($bioFeat->primary_tag);
720 # The user may have specified to ignore this type of feature
721 next if ($args->{SKIPFEAT}{$class});
722 my $id = "FEAT-io" . $idcounter->{Feature}++;
723 my $xmlFeat = $self->_addel( $ft, 'Feature', {
724 'id' => $id,
725 'class' => $class ,
726 'value-type' => $bioFeat->source_tag });
727 # Check for Bio::Annotations on the * <Feature> *.
728 $self->_parse_annotation( -xml => $xml, -obj => $bioFeat,
729 -desc => $featDesc, -id => $id,
730 -refs =>$featRefs, );
731 # Add the description stuff for the <Feature>
732 for my $de (@{$featDesc}) {
733 $self->_addel($xmlFeat, "Attribute", {
734 name => $de->[0], content => $de->[1]}) if ($de->[1]);
736 $self->_parse_location($xml, $xmlFeat, $bioFeat);
738 # loop through the tags, add them as <Qualifiers>
739 next if (defined $args->{SKIPTAGS} &&
740 $args->{SKIPTAGS} =~ /all/i);
741 # Tags can consume a lot of CPU cycles, and can often be
742 # rather non-informative, so -skiptags can allow total or
743 # selective omission of tags.
744 for my $tag ($bioFeat->all_tags()) {
745 next if (exists $args->{SKIPTAGS}{$tag});
746 for my $val ($bioFeat->each_tag_value($tag)) {
747 $self->_addel( $xmlFeat, 'Qualifier', {
748 'value-type' => $tag ,
749 'value' => $val });
755 ##############
756 # <Seq-data> #
757 ##############
759 # Add sequence data
760 if ( (my $data = $bioSeq->seq) && !$args->{NODATA} ) {
761 my $d = $self->_addel($xmlSeq, 'Seq-data');
762 $d->appendChild( $xml->createTextNode($data) );
765 # If references were added, make a Feature-table for them
766 unless ($#{$featRefs} < 0) {
767 my $seqFT = $self->_addel($FTs, "Feature-table", {
768 title => "Feature References", });
769 for my $feat (@{$featRefs}) {
770 $seqFT->appendChild($feat);
774 # Place the completed <Sequence> tree as a child of <Sequences>
775 $seqsElem->appendChild($xmlSeq);
776 push @xmlSequences, $xmlSeq;
779 # Prevent browser crashes by explicitly closing empty elements:
780 if ($args->{CLOSE}) {
781 my @problemChild = ('Sequences', 'Sequence', 'Feature-tables',
782 'Feature-table', 'Screen', 'View',);
783 for my $kid (@problemChild) {
784 for my $prob ($xml->getElementsByTagName($kid)) {
785 unless ($prob->hasChildNodes) {
786 $prob->appendChild(
787 $xml->createComment(" Must close <$kid> explicitly "));
793 if (defined $args->{RETURN} &&
794 $args->{RETURN} =~ /seq/i) {
795 return \@xmlSequences;
796 } else {
797 return $xml;
801 =head2 write_seq
803 Title : write_seq
804 Usage : $obj->write_seq(@args)
805 Function: Prints out an XML structure for one or more Bio::Seq objects.
806 If $seqref is an array ref, the XML tree generated will include
807 all the sequences in the array. This method is fairly simple,
808 most of the processing is performed within to_bsml.
809 Returns : A reference to the XML object generated / modified
810 Args : Argument array. Recognized keys:
812 -seq A Bio::Seq reference, or an array reference of many of them
814 Alternatively, the method may be called simply as...
816 $obj->write_seq( $bioseq )
818 ... if only a single argument is passed, it is assumed that
819 it is the sequence object (can also be an array ref of
820 many Seq objects )
822 -printmime If true prints "Content-type: $mimetype\n\n" at top of
823 document, where $mimetype is the value designated by this
824 key. For generic XML use text/xml, for BSML use text/x-bsml
826 -return This option will be supressed, since the nature of this
827 method is to print out the XML document. If you wish to
828 retrieve the <Sequence> objects generated, use the to_bsml
829 method directly.
831 =cut
833 sub write_seq {
834 my $self = shift;
835 my $args = $self->_parseparams( @_);
836 if ($#_ == 0 ) {
837 # If only a single value is passed, assume it is the seq object
838 unshift @_, "-seq";
840 # Build a BSML XML DOM object based on the sequence(s)
841 my $xml = $self->to_bsml( @_,
842 -return => undef );
843 # Convert to a string
844 my $out = $xml->toString;
845 # Print after putting a return after each element - more readable
846 $out =~ s/>/>\n/g;
847 $self->_print("Content-type: " . $args->{PRINTMIME} . "\n\n")
848 if ($args->{PRINTMIME});
849 $self->_print( $out );
850 # Return the DOM tree in case the user wants to do something with it
852 $self->flush if $self->_flush_on_write && defined $self->_fh;
853 return $xml;
856 =head1 INTERNAL METHODS
857 #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-
859 The following methods are used for internal processing, and should probably
860 not be accessed by the user.
862 =head2 _parse_location
864 Title : _parse_location
865 Usage : $obj->_parse_location($xmlDocument, $parentElem, $SeqFeatureObj)
866 Function: Adds <Interval-loc> and <Site-loc> children to <$parentElem> based
867 on locations / sublocations found in $SeqFeatureObj. If
868 sublocations exist, the original location will be ignored.
869 Returns : An array ref containing the elements added to the parent.
870 These will have already been added to <$parentElem>
871 Args : 0 The DOM::Document being modified
872 1 The DOM::Element parent that you want to add to
873 2 Reference to the Bio::SeqFeature being analyzed
875 =cut
877 ###############################
878 # <Interval-loc> & <Site-loc> #
879 ###############################
881 sub _parse_location {
882 my $self = shift;
883 my ($xml, $xmlFeat, $bioFeat) = @_;
884 my $bioLoc = $bioFeat->location;
885 my @locations;
886 if (ref($bioLoc) =~ /Split/) {
887 @locations = $bioLoc->sub_Location;
888 # BSML 2.2 does not recognize / support joins. For this reason,
889 # we will just use the upper-level location. The line below can
890 # be deleted or commented out if/when BSML 3 supports complex
891 # interval deffinitions:
892 @locations = ($bioLoc);
893 } else {
894 @locations = ($bioLoc);
896 my @added = ();
898 # Add the site or interval positional information:
899 for my $loc (@locations) {
900 my ($start, $end) = ($loc->start, $loc->end);
901 my %locAttr;
902 # Strand information is not well described in BSML
903 $locAttr{complement} = 1 if ($loc->strand == -1);
904 if ($start ne "" && ($start == $end || $end eq "")) {
905 $locAttr{sitepos} = $start;
906 push @added, $self->_addel($xmlFeat,'Site-loc',\%locAttr);
907 } elsif ($start ne "" && $end ne "") {
908 if ($start > $end) {
909 # The feature is on the complementary strand
910 ($start, $end) = ($end, $start);
911 $locAttr{complement} = 1;
913 $locAttr{startpos} = $start;
914 $locAttr{endpos} = $end;
915 push @added, $self->_addel($xmlFeat,'Interval-loc',\%locAttr);
916 } else {
917 warn "Failure to parse SeqFeature location. Start = '$start' & End = '$end'";
920 return \@added;
923 =head2 _parse_bsml_feature
925 Title : _parse_bsml_feature
926 Usage : $obj->_parse_bsml_feature($xmlFeature )
927 Function: Will examine the <Feature> element provided by $xmlFeature and
928 return a generic seq feature.
929 Returns : Bio::SeqFeature::Generic
930 Args : 0 XML::DOM::Element <Feature> being analyzed.
932 =cut
934 sub _parse_bsml_feature {
935 my $self = shift;
936 my ($feat) = @_;
938 my $basegsf = Bio::SeqFeature::Generic->new();
939 # score
940 # frame
941 # source_tag
943 # Use the class as the primary tag value, if it is present
944 if ( my $val = $feat->getAttribute("class") ) {
945 $basegsf->primary_tag($val);
948 # Positional information is in <Interval-loc>s or <Site-loc>s
949 # We need to grab these in order, to try to recreate joins...
950 my @locations = ();
951 for my $kid ($feat->getChildNodes) {
952 my $nodeName = $kid->getNodeName;
953 next unless ($nodeName eq "Interval-loc" ||
954 $nodeName eq "Site-loc");
955 push @locations, $kid;
957 if ($#locations == 0) {
958 # There is only one location specified
959 $self->_parse_bsml_location($locations[0], $basegsf);
960 } elsif ($#locations > 0) {
961 #>>>> # This is not working, I think the error is somewhere downstream
962 # of add_sub_SeqFeature, probably in RangeI::union ?
963 # The sub features are added fine, but the EXPANDed parent feature
964 # location has a messed up start - Bio::SeqFeature::Generic ref
965 # instead of an integer - and an incorrect end - the end of the first
966 # sub feature added, not of the union of all of them.
968 # Also, the SeqIO::genbank.pm output is odd - the sub features appear
969 # to be listed with the *previous* feature, not this one.
971 for my $location (@locations) {
972 my $subgsf = $self->_parse_bsml_location($location);
973 # print "start ", $subgsf->start,"\n";
974 # print "end ", $subgsf->end,"\n";
975 $basegsf->add_sub_SeqFeature($subgsf, 'EXPAND');
977 # print $feat->getAttribute('id'),"\n";
978 # print $basegsf->primary_tag,"\n";
980 } else {
981 # What to do if there are no locations? Nothing needed?
984 # Look at any <Attribute>s or <Qualifier>s that are present:
985 my $floppies = &GETFLOPPIES($feat);
986 for my $attr (@{$floppies}) {
987 my ($name, $content) = &FLOPPYVALS($attr);
988 # Don't know what the object is, dump it to a tag:
989 $basegsf->add_tag_value(lc($name), $content);
992 # Mostly this helps with debugging, but may be of utility...
993 # Add a tag holding the BSML id value
994 if ( (my $val = $feat->getAttribute('id')) &&
995 !$basegsf->has_tag('bsml-id')) {
996 # Decided that this got a little sloppy...
997 # $basegsf->add_tag_value("bsml-id", $val);
999 return $basegsf;
1002 =head2 _parse_bsml_location
1004 Title : _parse_bsml_location
1005 Usage : $obj->_parse_bsml_feature( $intOrSiteLoc, $gsfObject )
1006 Function: Will examine the <Interval-loc> or <Site-loc> element provided
1007 Returns : Bio::SeqFeature::Generic
1008 Args : 0 XML::DOM::Element <Interval/Site-loc> being analyzed.
1009 1 Optional SeqFeature::Generic to use
1011 =cut
1013 sub _parse_bsml_location {
1014 my $self = shift;
1015 my ($loc, $gsf) = @_;
1017 $gsf ||= Bio::SeqFeature::Generic->new();
1018 my $type = $loc->getNodeName;
1019 my ($start, $end);
1020 if ($type eq 'Interval-loc') {
1021 $start = $loc->getAttribute('startpos');
1022 $end = $loc->getAttribute('endpos');
1023 } elsif ($type eq 'Site-loc') {
1024 $start = $end = $loc->getAttribute('sitepos');
1025 } else {
1026 warn "Unknown location type '$type', could not make GSF\n";
1027 return;
1029 $gsf->start($start);
1030 $gsf->end($end);
1032 # BSML does not have an explicit method to set undefined strand
1033 if (my $s = $loc->getAttribute("complement")) {
1034 if ($s) {
1035 $gsf->strand(-1);
1036 } else {
1037 $gsf->strand(1);
1039 } else {
1040 # We're setting "strand nonspecific" here - bad idea?
1041 # In most cases the user likely meant it to be on the + strand
1042 $gsf->strand(0);
1045 return $gsf;
1048 =head2 _parse_reference
1050 Title : _parse_reference
1051 Usage : $obj->_parse_reference(@args )
1052 Function: Makes a new <Reference> object from a ::Reference, which is
1053 then stored in an array provide by -refs. It will be
1054 appended to the XML tree later.
1055 Returns :
1056 Args : Argument array. Recognized keys:
1058 -xml The DOM::Document being modified
1060 -refobj The Annotation::Reference Object
1062 -refs An array reference to hold the new <Reference> DOM object
1064 -id Optional. If the XML id for the 'calling' element is
1065 provided, it will be placed in any <Reference> refs
1066 attribute.
1068 =cut
1070 sub _parse_reference {
1071 my $self = shift;
1072 my $args = $self->_parseparams( @_);
1073 my ($xml, $ref, $refRef) = ($args->{XML}, $args->{REFOBJ}, $args->{REFS});
1075 ###############
1076 # <Reference> #
1077 ###############
1079 my $xmlRef = $xml->createElement("Reference");
1080 #>> This may not be the right way to make a BSML dbxref...
1081 if (my $link = $ref->medline) {
1082 $xmlRef->setAttribute('dbxref', $link);
1085 # Make attributes for some of the characteristics
1086 my %stuff = ( start => $ref->start,
1087 end => $ref->end,
1088 rp => $ref->rp,
1089 comment => $ref->comment,
1090 pubmed => $ref->pubmed,
1092 for my $s (keys %stuff) {
1093 $self->_addel($xmlRef, "Attribute", {
1094 name => $s, content => $stuff{$s} }) if ($stuff{$s});
1096 $xmlRef->setAttribute('refs', $args->{ID}) if ($args->{ID});
1097 # Add the basic information
1098 # Should probably check for content before creation...
1099 $self->_addel($xmlRef, "RefAuthors")->
1100 appendChild( $xml->createTextNode(&STRIP($ref->authors)) );
1101 $self->_addel($xmlRef, "RefTitle")->
1102 appendChild( $xml->createTextNode(&STRIP($ref->title)) );
1103 $self->_addel($xmlRef, "RefJournal")->
1104 appendChild( $xml->createTextNode(&STRIP($ref->location)) );
1105 # References will be added later in a <Feature-Table>
1106 push @{$refRef}, $xmlRef;
1109 =head2 _parse_annotation
1111 Title : _parse_annotation
1112 Usage : $obj->_parse_annotation(@args )
1113 Function: Will examine any Annotations found in -obj. Data found in
1114 ::Comment and ::DBLink structures, as well as Annotation
1115 description fields are stored in -desc for later
1116 generation of <Attribute>s. <Reference> objects are generated
1117 from ::References, and are stored in -refs - these will
1118 be appended to the XML tree later.
1119 Returns :
1120 Args : Argument array. Recognized keys:
1122 -xml The DOM::Document being modified
1124 -obj Reference to the Bio object being analyzed
1126 -descr An array reference for holding description text items
1128 -refs An array reference to hold <Reference> DOM objects
1130 -id Optional. If the XML id for the 'calling' element is
1131 provided, it will be placed in any <Reference> refs
1132 attribute.
1134 =cut
1136 sub _parse_annotation {
1137 my $self = shift;
1138 my $args = $self->_parseparams( @_);
1139 my ($xml, $obj, $descRef, $refRef) =
1140 ( $args->{XML}, $args->{OBJ}, $args->{DESC}, $args->{REFS} );
1141 # No good place to put any of this (except for references). Most stuff
1142 # just gets dumped to <Attribute>s
1143 my $ann = $obj->annotation;
1144 return unless ($ann);
1145 # use BMS::Branch; my $debug = BMS::Branch->new( ); warn "$obj :"; $debug->branch($ann);
1146 unless (ref($ann) =~ /Collection/) {
1147 # Old style annotation. It seems that Features still use this
1148 # form of object
1149 $self->_parse_annotation_old(@_);
1150 return;
1153 for my $key ($ann->get_all_annotation_keys()) {
1154 for my $thing ($ann->get_Annotations($key)) {
1155 if ($key eq 'description') {
1156 push @{$descRef}, ["description" , $thing->value];
1157 } elsif ($key eq 'comment') {
1158 push @{$descRef}, ["comment" , $thing->text];
1159 } elsif ($key eq 'dblink') {
1160 # DBLinks get dumped to attributes, too
1161 push @{$descRef}, ["db_xref" , $thing->database . ":"
1162 . $thing->primary_id ];
1163 if (my $com = $thing->comment) {
1164 push @{$descRef}, ["link" , $com->text ];
1167 } elsif ($key eq 'reference') {
1168 $self->_parse_reference( @_, -refobj => $thing );
1169 } elsif (ref($thing) =~ /SimpleValue/) {
1170 push @{$descRef}, [$key , $thing->value];
1171 } else {
1172 # What is this??
1173 push @{$descRef}, ["error", "bsml.pm did not understand ".
1174 "'$key' = '$thing'" ];
1180 =head2 _parse_annotation_old
1182 Title : _parse_annotation_old
1183 Usage : $obj->_parse_annotation_old(@args)
1184 Function: As above, but for the old Annotation system.
1185 Apparently needed because Features are still using the old-style
1186 annotations?
1187 Returns :
1188 Args : Argument array. Recognized keys:
1190 -xml The DOM::Document being modified
1192 -obj Reference to the Bio object being analyzed
1194 -descr An array reference for holding description text items
1196 -refs An array reference to hold <Reference> DOM objects
1198 -id Optional. If the XML id for the 'calling' element is
1199 provided, it will be placed in any <Reference> refs
1200 attribute.
1202 =cut
1204 ###############
1205 # <Reference> #
1206 ###############
1208 sub _parse_annotation_old {
1209 my $self = shift;
1210 my $args = $self->_parseparams( @_);
1211 my ($xml, $obj, $descRef, $refRef) =
1212 ( $args->{XML}, $args->{OBJ}, $args->{DESC}, $args->{REFS} );
1213 # No good place to put any of this (except for references). Most stuff
1214 # just gets dumped to <Attribute>s
1215 if (my $ann = $obj->annotation) {
1216 push @{$descRef}, ["annotation", $ann->description];
1217 for my $com ($ann->each_Comment) {
1218 push @{$descRef}, ["comment" , $com->text];
1221 # Gene names just get dumped to <Attribute name="gene">
1222 for my $gene ($ann->each_gene_name) {
1223 push @{$descRef}, ["gene" , $gene];
1226 # DBLinks get dumped to attributes, too
1227 for my $link ($ann->each_DBLink) {
1228 push @{$descRef}, ["db_xref" ,
1229 $link->database . ":" . $link->primary_id ];
1230 if (my $com = $link->comment) {
1231 push @{$descRef}, ["link" , $com->text ];
1235 # References get produced and temporarily held
1236 for my $ref ($ann->each_Reference) {
1237 $self->_parse_reference( @_, -refobj => $ref );
1242 =head2 _add_page
1244 Title : _add_page
1245 Usage : $obj->_add_page($xmlDocument, $xmlSequenceObject)
1246 Function: Adds a simple <Page> and <View> structure for a <Sequence>
1247 Returns : a reference to the newly created <Page>
1248 Args : 0 The DOM::Document being modified
1249 1 Reference to the <Sequence> object
1251 =cut
1253 sub _add_page {
1254 my $self = shift;
1255 my ($xml, $seq) = @_;
1256 my $disp = $xml->getElementsByTagName("Display")->item(0);
1257 my $page = $self->_addel($disp, "Page");
1258 my ($width, $height) = ( 7.8, 5.5);
1259 my $screen = $self->_addel($page, "Screen", {
1260 width => $width, height => $height, });
1261 # $screen->appendChild($xml->createComment("Must close explicitly"));
1262 my $view = $self->_addel($page, "View", {
1263 seqref => $seq->getAttribute('id'),
1264 title => $seq->getAttribute('title'),
1265 title1 => "{NAME}",
1266 title2 => "{LENGTH} {UNIT}",
1268 $self->_addel($view, "View-line-widget", {
1269 shape => 'horizontal',
1270 hcenter => $width/2 + 0.7,
1271 'linear-length' => $width - 2,
1273 $self->_addel($view, "View-axis-widget");
1274 return $page;
1278 =head2 _addel
1280 Title : _addel
1281 Usage : $obj->_addel($parentElem, 'ChildName',
1282 { anAttr => 'someValue', anotherAttr => 'aValue',})
1283 Function: Add an element with attribute values to a DOM tree
1284 Returns : a reference to the newly added element
1285 Args : 0 The DOM::Element parent that you want to add to
1286 1 The name of the new child element
1287 2 Optional hash reference containing
1288 attribute name => attribute value assignments
1290 =cut
1292 sub _addel {
1293 my $self = shift;
1294 my ($root, $name, $attr) = @_;
1296 # Find the DOM::Document for the parent
1297 my $doc = $root->getOwnerDocument || $root;
1298 my $elem = $doc->createElement($name);
1299 for my $a (keys %{$attr}) {
1300 $elem->setAttribute($a, $attr->{$a});
1302 $root->appendChild($elem);
1303 return $elem;
1306 =head2 _show_dna
1308 Title : _show_dna
1309 Usage : $obj->_show_dna($newval)
1310 Function: (cut-and-pasted directly from embl.pm)
1311 Returns : value of _show_dna
1312 Args : newvalue (optional)
1314 =cut
1316 sub _show_dna {
1317 my $obj = shift;
1318 if( @_ ) {
1319 my $value = shift;
1320 $obj->{'_show_dna'} = $value;
1322 return $obj->{'_show_dna'};
1325 =head2 _initialize
1327 Title : _initialize
1328 Usage : $dom = $obj->_initialize(@args)
1329 Function: Coppied from embl.pm, and augmented with initialization of the
1330 XML DOM tree
1331 Returns :
1332 Args : -file => the XML file to be parsed
1334 =cut
1336 sub _initialize {
1337 my($self,@args) = @_;
1339 $self->SUPER::_initialize(@args);
1340 # hash for functions for decoding keys.
1341 $self->{'_func_ftunit_hash'} = {};
1342 $self->_show_dna(1); # sets this to one by default. People can change it
1344 my %param = @args; # From SeqIO.pm
1345 @param{ map { lc $_ } keys %param } = values %param; # lowercase keys
1346 if ( exists $param{-file} && $param{-file} !~ /^>/) {
1347 # Is it blasphemy to add your own keys to an object in another package?
1348 # domtree => the parsed DOM tree retruned by XML::DOM
1349 $self->{'domtree'} = $self->_parse_xml( $param{-file} );
1350 # current_node => the <Sequence> node next in line for next_seq
1351 $self->{'current_node'} = 0;
1354 $self->sequence_factory( Bio::Seq::SeqFactory->new
1355 ( -verbose => $self->verbose(),
1356 -type => 'Bio::Seq::RichSeq'))
1357 if( ! defined $self->sequence_factory );
1361 =head2 _parseparams
1363 Title : _parseparams
1364 Usage : my $paramHash = $obj->_parseparams(@args)
1365 Function: Borrowed from Bio::Parse.pm, who borrowed it from CGI.pm
1366 Lincoln Stein -> Richard Resnick -> here
1367 Returns : A hash reference of the parameter keys (uppercase) pointing to
1368 their values.
1369 Args : An array of key, value pairs. Easiest to pass values as:
1370 -key1 => value1, -key2 => value2, etc
1371 Leading "-" are removed.
1373 =cut
1375 sub _parseparams {
1376 my $self = shift;
1377 my %hash = ();
1378 my @param = @_;
1380 # Hacked out from Parse.pm
1381 # The next few lines strip out the '-' characters which
1382 # preceed the keys, and capitalizes them.
1383 for (my $i=0;$i<@param;$i+=2) {
1384 $param[$i]=~s/^\-//;
1385 $param[$i]=~tr/a-z/A-Z/;
1387 pop @param if @param %2; # not an even multiple
1388 %hash = @param;
1389 return \%hash;
1392 =head2 _parse_xml
1394 Title : _parse_xml
1395 Usage : $dom = $obj->_parse_xml($filename)
1396 Function: uses XML::DOM to construct a DOM tree from the BSML document
1397 Returns : a reference to the parsed DOM tree
1398 Args : 0 Path to the XML file needing to be parsed
1400 =cut
1402 sub _parse_xml {
1403 my $self = shift;
1404 my $file = shift;
1406 unless (-e $file) {
1407 $self->throw("Could not parse non-existant XML file '$file'.");
1408 return;
1410 my $parser = new XML::DOM::Parser;
1411 my $doc = $parser->parsefile ($file);
1412 return $doc;
1415 sub DESTROY {
1416 my $self = shift;
1417 # Reports off the net imply that DOM::Parser will memory leak if you
1418 # do not explicitly dispose of it:
1419 # http://aspn.activestate.com/ASPN/Mail/Message/perl-xml/788458
1420 my $dom = $self->{'domtree'};
1421 # For some reason the domtree can get undef-ed somewhere...
1422 $dom->dispose if ($dom);
1426 =head1 TESTING SCRIPT
1428 The following script may be used to test the conversion process. You
1429 will need a file of the format you wish to test. The script will
1430 convert the file to BSML, store it in /tmp/bsmltemp, read that file
1431 into a new SeqIO stream, and write it back as the original
1432 format. Comparison of this second file to the original input file
1433 will allow you to track where data may be lost or corrupted. Note
1434 that you will need to specify $readfile and $readformat.
1436 use Bio::SeqIO;
1437 # Tests preservation of details during round-trip conversion:
1438 # $readformat -> BSML -> $readformat
1439 my $tempspot = "/tmp/bsmltemp"; # temp folder to hold generated files
1440 my $readfile = "rps4y.embl"; # The name of the file you want to test
1441 my $readformat = "embl"; # The format of the file being tested
1443 system "mkdir $tempspot" unless (-d $tempspot);
1444 # Make Seq object from the $readfile
1445 my $biostream = Bio::SeqIO->new( -file => "$readfile" );
1446 my $seq = $biostream->next_seq();
1448 # Write BSML from SeqObject
1449 my $bsmlout = Bio::SeqIO->new( -format => 'bsml',
1450 -file => ">$tempspot/out.bsml");
1451 warn "\nBSML written to $tempspot/out.bsml\n";
1452 $bsmlout->write_seq($seq);
1453 # Need to kill object for following code to work... Why is this so?
1454 $bsmlout = "";
1456 # Make Seq object from BSML
1457 my $bsmlin = Bio::SeqIO->new( -file => "$tempspot/out.bsml",
1458 -format => 'bsml');
1459 my $seq2 = $bsmlin->next_seq();
1461 # Write format back from Seq Object
1462 my $genout = Bio::SeqIO->new( -format => $readformat,
1463 -file => ">$tempspot/out.$readformat");
1464 $genout->write_seq($seq2);
1465 warn "$readformat written to $tempspot/out.$readformat\n";
1467 # BEING LOST:
1468 # Join information (not possible in BSML 2.2)
1469 # Sequence type (??)
1471 =cut