1 #-----------------------------------------------------------------
4 # BioPerl module Bio::Search::Hit::PsiBlastHit
6 # (This module was originally called Bio::Tools::Blast::Sbjct)
8 # Please direct questions and support issues to <bioperl-l@bioperl.org>
10 # Cared for by Steve Chervitz <sac@bioperl.org>
12 # You may distribute this module under the same terms as perl itself
13 #-----------------------------------------------------------------
19 Bio::Search::Hit::PsiBlastHit - Bioperl BLAST Hit object
23 See L<Bio::Search::Result::BlastResult>.
27 The Bio::Search::Hit::PsiBlastHit.pm module encapsulates data and
28 methods for manipulating "hits" from a BLAST report. A BLAST hit is a
29 collection of HSPs along with other metadata such as sequence name and
30 score information. Hit objects are accessed via
31 L<Bio::Search::Result::BlastResult> objects after parsing a BLAST
32 report using the L<Bio::SearchIO> system.
34 In Blast lingo, the "sbjct" sequences are all the sequences in a
35 target database which were compared against a "query" sequence. The
36 terms "sbjct" and "hit" will be used interchangeably in this module.
37 All methods that take 'sbjct' as an argument also support 'hit' as a
40 This module supports BLAST versions 1.x and 2.x, gapped and ungapped,
43 The construction of PsiBlastHit objects is performed by
44 Bio::SearchIO::blast::PsiBlastHitFactory in a process that is
45 orchestrated by the Blast parser (L<Bio::SearchIO::blast>).
46 The resulting PsiBlastHits are then accessed via
47 L<Bio::Search::Result::BlastResult>). Therefore, you do not need to
48 use L<Bio::Search::Hit::PsiBlastHit>) directly. If you need to
49 construct PsiBlastHits directly, see the C<new()> function for details.
51 For L<Bio::SearchIO> BLAST parsing usage examples, see the
52 C<examples/search-blast> directory of the Bioperl distribution.
55 =head2 HSP Tiling and Ambiguous Alignments
57 If a Blast hit has more than one HSP, the Bio::Search::Hit::PsiBlastHit.pm
58 object has the ability to merge overlapping HSPs into contiguous
59 blocks. This permits the PsiBlastHit object to sum data across all HSPs
60 without counting data in the overlapping regions multiple times, which
61 would happen if data from each overlapping HSP are simply summed. HSP
62 tiling is performed automatically when methods of the PsiBlastHit object
63 that rely on tiled data are invoked. These include
64 L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<gaps()|gaps>,
65 L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>,
66 L<num_unaligned_query()|num_unaligned_query>, L<num_unaligned_hit()|num_unaligned_hit>.
68 It also permits the assessment of an "ambiguous alignment" if the
69 query (or sbjct) sequences from different HSPs overlap
70 (see L<ambiguous_aln()|ambiguous_aln>). The existence
71 of an overlap could indicate a biologically interesting region in the
72 sequence, such as a repeated domain. The PsiBlastHit object uses the
73 C<-OVERLAP> parameter to determine when two sequences overlap; if this is
74 set to 2 -- the default -- then any two sbjct or query HSP sequences
75 must overlap by more than two residues to get merged into the same
76 contig and counted as an overlap. See the L<BUGS | BUGS> section below for
77 "issues" with HSP tiling.
80 The results of the HSP tiling is reported with the following ambiguity codes:
82 'q' = Query sequence contains multiple sub-sequences matching
83 a single region in the sbjct sequence.
85 's' = Subject (PsiBlastHit) sequence contains multiple sub-sequences matching
86 a single region in the query sequence.
88 'qs' = Both query and sbjct sequences contain more than one
89 sub-sequence with similarity to the other sequence.
92 For addition information about ambiguous BLAST alignments, see
93 L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
97 Bio::Search::Hit::PsiBlastHit.pm is a concrete class that inherits from
98 L<Bio::Root::Root> and L<Bio::Search::Hit::HitI>. and relies on
99 L<Bio::Search::HSP::BlastHSP>.
104 One consequence of the HSP tiling is that methods that rely on HSP
105 tiling such as L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<gaps()|gaps>
106 etc. may report misleading numbers when C<-OVERLAP> is set to a large
107 number. For example, say we have two HSPs and the query sequence tile
111 Full seq: ------------------------------------------------------------
113 HSP1: --------------- (6 identical matches)
115 HSP2: ------------- (6 identical matches)
118 If C<-OVERLAP> is set to some number over 4, HSP1 and HSP2 will not be
119 tiled into a single contig and their numbers of identical matches will
120 be added, giving a total of 12, not 10 if they had be combined into
121 one contig. This can lead to number greater than 1.0 for methods
122 L<frac_identical()|frac_identical> and L<frac_conserved()|frac_conserved>. This is less of an issue
123 with gapped Blast since it tends to combine HSPs that would be listed
124 separately without gapping. (Fractions E<gt>1.0 can be viewed as a
125 signal for an interesting alignment that warrants further inspection,
126 thus turning this bug into a feature :-).
128 Using large values for C<-OVERLAP> can lead to incorrect numbers
129 reported by methods that rely on HSP tiling but can be useful if you
130 care more about detecting ambiguous alignments. Setting C<-OVERLAP>
131 to zero will lead to the most accurate numbers for the
132 tiling-dependent methods but will be useless for detecting overlapping
133 HSPs since all HSPs will appear to overlap.
138 Bio::Search::HSP::BlastHSP.pm - Blast HSP object.
139 Bio::Search::Result::BlastResult.pm - Blast Result object.
140 Bio::Search::Hit::HitI.pm - Interface implemented by PsiBlastHit.pm
141 Bio::Root::Root.pm - Base class for PsiBlastHit.pm
145 http://bio.perl.org/ - Bioperl Project Homepage
152 User feedback is an integral part of the evolution of this and other
153 Bioperl modules. Send your comments and suggestions preferably to one
154 of the Bioperl mailing lists. Your participation is much appreciated.
156 bioperl-l@bioperl.org - General discussion
157 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
161 Please direct usage questions or support issues to the mailing list:
163 I<bioperl-l@bioperl.org>
165 rather than to the module maintainer directly. Many experienced and
166 reponsive experts will be able look at the problem and quickly
167 address it. Please include a thorough description of the problem
168 with code and data examples if at all possible.
170 =head2 Reporting Bugs
172 Report bugs to the Bioperl bug tracking system to help us keep track
173 the bugs and their resolution. Bug reports can be submitted via the
176 http://bugzilla.open-bio.org/
180 Steve Chervitz E<lt>sac@bioperl.orgE<gt>
182 See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
184 =head1 ACKNOWLEDGEMENTS
186 This software was originally developed in the Department of Genetics
187 at Stanford University. I would also like to acknowledge my
188 colleagues at Affymetrix for useful feedback.
192 Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
196 This software is provided "as is" without warranty of any kind.
200 The rest of the documentation details each of the object methods.
201 Internal methods are usually preceded with a _
206 # Let the code begin...
208 package Bio
::Search
::Hit
::PsiBlastHit
;
211 use Bio
::Search
::BlastUtils
;
212 use vars
qw(%SUMMARY_OFFSET);
217 use base qw(Bio::Root::Root Bio::Search::Hit::HitI);
222 Usage : $hit = Bio::Search::Hit::PsiBlastHit->new( %named_params );
223 : Bio::Search::Hit::PsiBlastHit.pm objects are constructed
224 : automatically by Bio::SearchIO::PsiBlastHitFactory.pm,
225 : so there is no need for direct instantiation.
226 Purpose : Constructs a new PsiBlastHit object and Initializes key variables
228 Returns : A Bio::Search::Hit::PsiBlastHit object
229 Argument : Named Parameters:
230 : Parameter keys are case-insensitive.
231 : -RAW_DATA => array reference holding raw BLAST report data
232 : for a single hit. This includes all lines
233 : within the HSP alignment listing section of a
234 : traditional BLAST or PSI-BLAST (non-XML) report,
235 : starting at (or just after) the leading '>'.
236 : -HOLD_RAW_DATA => boolean, should -RAW_DATA be saved within the object.
237 : -QUERY_LEN => Length of the query sequence
238 : -ITERATION => integer (PSI-BLAST iteration number in which hit was found)
239 : -OVERLAP => integer (maximum overlap between adjacent
241 : -PROGRAM => string (type of Blast: BLASTP, BLASTN, etc)
242 : -SIGNIF => significance
243 : -IS_PVAL => boolean, true if -SIGNIF contains a P-value
244 : -SCORE => raw BLAST score
245 : -FOUND_AGAIN => boolean, true if this was a hit from the
246 : section of a PSI-BLAST with iteration > 1
247 : containing sequences that were also found
249 Comments : This object accepts raw Blast report data not because it
250 : is required for parsing, but in order to retrieve it
251 : (only available if -HOLD_RAW_DATA is set to true).
253 See Also : L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<Bio::Root::Root::new()|Bio::Root::Root>
260 my ($class, @args ) = @_;
261 my $self = $class->SUPER::new
( @args );
263 my ($raw_data, $signif, $is_pval, $hold_raw);
265 ($self->{'_blast_program'}, $self->{'_query_length'}, $raw_data, $hold_raw,
266 $self->{'_overlap'}, $self->{'_iteration'}, $signif, $is_pval,
267 $self->{'_score'}, $self->{'_found_again'} ) =
268 $self->_rearrange( [qw(PROGRAM
277 FOUND_AGAIN )], @args );
279 # TODO: Handle this in parser. Just pass in name parameter.
280 $self->_set_id( $raw_data->[0] );
283 $self->{'_p'} = $signif;
285 $self->{'_expect'} = $signif;
289 $self->{'_hit_data'} = $raw_data;
297 #print STDERR "-->DESTROYING $self\n";
301 #=================================================
302 # Begin Bio::Search::Hit::HitI implementation
303 #=================================================
308 Usage : $alg = $hit->algorithm();
309 Function: Gets the algorithm specification that was used to obtain the hit
310 For BLAST, the algorithm denotes what type of sequence was aligned
311 against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated
312 dna-prt, TBLASTN prt-translated dna, TBLASTX translated
314 Returns : a scalar string
322 my ($self,@args) = @_;
323 return $self->{'_blast_program'};
328 Usage : $hit->name([string]);
329 Purpose : Set/Get a string to identify the hit.
330 Example : $name = $hit->name;
331 : $hit->name('M81707');
332 Returns : String consisting of the hit's name or undef if not set.
333 Comments : The name is parsed out of the "Query=" line as the first chunk of
334 non-whitespace text. If you want the rest of the line, use
337 See Also: L<accession()|accession>
349 $name =~ s/^\s+|(\s+|,)$//g;
350 $self->{'_name'} = $name;
352 return $self->{'_name'};
357 Usage : $hit_object->description( [integer] );
358 Purpose : Set/Get a description string for the hit.
359 This is parsed out of the "Query=" line as everything after
360 the first chunk of non-whitespace text. Use $hit->name()
361 to get the first chunk (the ID of the sequence).
362 Example : $description = $hit->description;
363 : $desc_60char = $hit->description(60);
364 Argument : Integer (optional) indicating the desired length of the
365 : description string to be returned.
366 Returns : String consisting of the hit's description or undef if not set.
375 my( $self, $len ) = @_;
376 $len = (defined $len) ?
$len : (CORE
::length $self->{'_description'});
377 return substr( $self->{'_description'}, 0 ,$len );
383 Usage : $acc = $hit->accession();
384 Function: Retrieve the accession (if available) for the hit
385 Returns : a scalar string (empty string if not set)
387 Comments: Accession numbers are extracted based on the assumption that they
388 are delimited by | characters (NCBI-style). If this is not the case,
389 use the name() method and parse it as necessary.
391 See Also: L<name()|name>
395 #--------------------
397 #--------------------
399 if(@_) { $self->{'_accession'} = shift; }
400 $self->{'_accession'} || '';
405 Usage : $hit_object->raw_score();
406 Purpose : Gets the BLAST score of the best HSP for the current Blast hit.
407 Example : $score = $hit_object->raw_score();
412 See Also : L<bits()|bits>
421 # The check for $self->{'_score'} is a remnant from the 'query' mode days
422 # in which the sbjct object would collect data from the description line only.
425 if(not defined($self->{'_score'})) {
426 $score = $self->hsp->score;
428 $score = $self->{'_score'};
436 Usage : $hit_object->length();
437 Purpose : Get the total length of the hit sequence.
438 Example : $len = $hit_object->length();
442 Comments : Developer note: when using the built-in length function within
443 : this module, call it as CORE::length().
445 See Also : L<logical_length()|logical_length>, L<length_aln()|length_aln>
453 return $self->{'_length'};
458 Equivalent to L<signif()|signif>
463 sub significance
{ shift->signif( @_ ); }
470 Usage : $hsp = $obj->next_hsp();
471 Function : returns the next available High Scoring Pair object
473 Returns : Bio::Search::HSP::BlastHSP or undef if finished
483 unless($self->{'_hsp_queue_started'}) {
484 $self->{'_hsp_queue'} = [$self->hsps()];
485 $self->{'_hsp_queue_started'} = 1;
487 pop @
{$self->{'_hsp_queue'}};
490 #=================================================
491 # End Bio::Search::Hit::HitI implementation
492 #=================================================
495 # Providing a more explicit method for getting name of hit
496 # (corresponds with column name in HitTableWriter)
504 # Older method Delegates to description()
509 return $self->description( @_ );
512 # Providing a more explicit method for getting description of hit
513 # (corresponds with column name in HitTableWriter)
515 sub hit_description
{
518 return $self->description( @_ );
523 Equivalent to L<raw_score()|raw_score>
528 sub score
{ shift->raw_score( @_ ); }
534 Equivalent to L<length()|length>
538 # Providing a more explicit method for getting length of hit
540 sub hit_length
{ shift->length( @_ ); }
546 Usage : $hit_object->signif( [format] );
547 Purpose : Get the P or Expect value for the best HSP of the given BLAST hit.
548 : The value returned is the one which is reported in the description
549 : section of the Blast report. For Blast1 and WU-Blast2, this
550 : is a P-value, for Blast2, it is an Expect value.
551 Example : $obj->signif() # returns 1.3e-34
552 : $obj->signif('exp') # returns -34
553 : $obj->signif('parts') # returns (1.3, -34)
554 Returns : Float or scientific notation number (the raw P/Expect value, DEFAULT).
555 : Integer if format == 'exp' (the magnitude of the base 10 exponent).
556 : 2-element list (float, int) if format == 'parts' and P/Expect value
557 : is in scientific notation (see Comments).
558 Argument : format: string of 'raw' | 'exp' | 'parts'
559 : 'raw' returns value given in report. Default. (1.2e-34)
560 : 'exp' returns exponent value only (34)
561 : 'parts' returns the decimal and exponent as a
562 : 2-element list (1.2, -34) (see Comments).
564 Comments : The signif() method provides a way to deal with the fact that
565 : Blast1 and Blast2 formats (and WU- vs. NCBI-BLAST) differ in
566 : what is reported in the description lines of each hit in the
567 : Blast report. The signif() method frees any client code from
568 : having to know if this is a P-value or an Expect value,
569 : making it easier to write code that can process both
570 : Blast1 and Blast2 reports. This is not necessarily a good thing,
571 : since one should always know when one is working with P-values or
572 : Expect values (hence the deprecated status).
573 : Use of expect() is recommended since all hits will have an Expect value.
575 : Using the 'parts' argument is not recommended since it will not
576 : work as expected if the expect value is not in scientific notation.
577 : That is, floats are not converted into sci notation before
578 : splitting into parts.
580 See Also : L<p()|p>, L<expect()|expect>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
587 # Some duplication of logic for p(), expect() and signif() for the sake of performance.
588 my ($self, $fmt) = @_;
590 my $val = defined($self->{'_p'}) ?
$self->{'_p'} : $self->{'_expect'};
593 defined($val) or $self->throw("Can't get P- or Expect value: HSPs may not have been set.");
595 return $val if not $fmt or $fmt =~ /^raw/i;
596 ## Special formats: exponent-only or as list.
597 return &Bio
::Search
::BlastUtils
::get_exponent
($val) if $fmt =~ /^exp/i;
598 return (split (/eE/, $val)) if $fmt =~ /^parts/i;
600 ## Default: return the raw P/Expect-value.
609 # Need to add blank lines where we've removed them.
610 foreach( @
{$self->{'_hit_data'}} ) {
615 $data .= /^\s*(Score|Query)/ ?
"\n$_" : $_;
624 # Usage : $hit_object->_set_length( "233" );
625 # Purpose : Set the total length of the hit sequence.
626 # Example : $hit_object->_set_length( $len );
628 # Argument : Integer (only when setting). Any commas will be stripped out.
636 my ($self, $len) = @_;
637 $len =~ s/,//g; # get rid of commas
638 $self->{'_length'} = $len;
641 #=head2 _set_description
643 # Usage : Private method; called automatically during construction
644 # Purpose : Sets the description of the hit sequence.
645 # : For sequence without descriptions, does not set any description.
646 # Argument : Array containing description (multiple lines).
647 # Comments : Processes the supplied description:
648 # 1. Join all lines into one string.
649 # 2. Remove sequence id at the beginning of description.
650 # 3. Removes junk charactes at begin and end of description.
655 sub _set_description
{
657 my( $self, @desc ) = @_;
660 # print STDERR "PsiBlastHit: RAW DESC:\n@desc\n";
662 $desc = join(" ", @desc);
664 my $name = $self->name;
667 $desc =~ s/^\s*\S+\s+//; # remove the sequence ID(s)
668 # This won't work if there's no description.
669 $desc =~ s/^\s*$name//; # ...but this should.
670 $desc =~ s/^[\s!]+//;
673 $self->{'_description'} = $desc;
676 # print STDERR "PsiBlastHit: _set_description = $desc\n";
682 Usage : print $hit->to_string;
683 Function: Returns a string representation for the Blast Hit.
684 Primarily intended for debugging purposes.
686 Returns : A string of the form:
687 [PsiBlastHit] <name> <description>
689 [PsiBlastHit] emb|Z46660|SC9725 S.cerevisiae chromosome XIII cosmid
698 return "[PsiBlastHit] " . $self->name . " " . $self->description;
704 # Usage : Private method; automatically called by new()
705 # Purpose : Sets the name of the PsiBlastHit sequence from the BLAST summary line.
706 # : The identifier is assumed to be the first
707 # : chunk of non-whitespace characters in the description line
708 # : Does not assume any semantics in the structure of the identifier
709 # : (Formerly, this method attempted to extract database name from
710 # : the seq identifiers, but this was prone to break).
712 # Argument : String containing description line of the hit from Blast report
713 # : or first line of an alignment section (with or without the leading '>').
714 # Throws : Warning if cannot locate sequence ID.
716 #See Also : L<new()|new>, L<accession()|accession>
723 my( $self, $desc ) = @_;
725 # New strategy: Assume only that the ID is the first white space
726 # delimited chunk. Not attempting to extract accession & database name.
727 # Clients will have to interpret it as necessary.
728 if($desc =~ /^>?(\S+)\s*(.*)/) {
729 my ($name, $desc) = ($1, $2);
731 $self->{'_description'} = $desc;
732 # Note that this description comes from the summary section of the
733 # BLAST report and so may be truncated. The full description will be
734 # set from the alignment section. We're setting description here in case
735 # the alignment section isn't being parsed.
737 # Assuming accession is delimited with | symbols (NCBI-style)
738 my @pieces = split(/\|/,$name);
739 my $acc = pop @pieces;
740 $self->accession( $acc );
743 $self->warn("Can't locate sequence identifier in summary line.", "Line = $desc");
744 $desc = 'Unknown sequence ID' if not $desc;
752 Usage : $ambig_code = $hit_object->ambiguous_aln();
753 Purpose : Sets/Gets ambiguity code data member.
754 Example : (see usage)
755 Returns : String = 'q', 's', 'qs', '-'
756 : 'q' = query sequence contains overlapping sub-sequences
757 : while sbjct does not.
758 : 's' = sbjct sequence contains overlapping sub-sequences
759 : while query does not.
760 : 'qs' = query and sbjct sequence contains overlapping sub-sequences
761 : relative to each other.
762 : '-' = query and sbjct sequence do not contains multiple domains
763 : relative to each other OR both contain the same distribution
764 : of similar domains.
767 Status : Experimental
769 See Also : L<Bio::Search::BlastUtils::tile_hsps>, L<HSP Tiling and Ambiguous Alignments>
773 #--------------------
775 #--------------------
777 if(@_) { $self->{'_ambiguous_aln'} = shift; }
778 $self->{'_ambiguous_aln'} || '-';
785 Usage : $blast_object->overlap( [integer] );
786 Purpose : Gets/Sets the allowable amount overlap between different HSP sequences.
787 Example : $blast_object->overlap(5);
788 : $overlap = $blast_object->overlap;
792 Status : Experimental
793 Comments : Any two HSPs whose sequences overlap by less than or equal
794 : to the overlap() number of resides will be considered separate HSPs
795 : and will not get tiled by Bio::Search::BlastUtils::_adjust_contigs().
797 See Also : L<Bio::Search::BlastUtils::_adjust_contigs()|Bio::Search::BlastUtils>, L<BUGS | BUGS>
805 if(@_) { $self->{'_overlap'} = shift; }
806 defined $self->{'_overlap'} ?
$self->{'_overlap'} : 0;
816 Usage : $hit_object->bits();
817 Purpose : Gets the BLAST bit score of the best HSP for the current Blast hit.
818 Example : $bits = $hit_object->bits();
821 Throws : Exception if bit score is not set.
822 Comments : For BLAST1, the non-bit score is listed in the summary line.
824 See Also : L<score()|score>
833 # The check for $self->{'_bits'} is a remnant from the 'query' mode days
834 # in which the sbjct object would collect data from the description line only.
837 if(not defined($self->{'_bits'})) {
838 $bits = $self->hsp->bits;
840 $bits = $self->{'_bits'};
849 Usage : $hit_object->n();
850 Purpose : Gets the N number for the current Blast hit.
851 : This is the number of HSPs in the set which was ascribed
852 : the lowest P-value (listed on the description line).
853 : This number is not the same as the total number of HSPs.
854 : To get the total number of HSPs, use num_hsps().
855 Example : $n = $hit_object->n();
858 Throws : Exception if HSPs have not been set (BLAST2 reports).
859 Comments : Note that the N parameter is not reported in gapped BLAST2.
860 : Calling n() on such reports will result in a call to num_hsps().
861 : The num_hsps() method will count the actual number of
862 : HSPs in the alignment listing, which may exceed N in
865 See Also : L<num_hsps()|num_hsps>
874 # The check for $self->{'_n'} is a remnant from the 'query' mode days
875 # in which the sbjct object would collect data from the description line only.
878 if(not defined($self->{'_n'})) {
883 $n ||= $self->num_hsps;
892 Usage : $hit_object->frame();
893 Purpose : Gets the reading frame for the best HSP after HSP tiling.
894 : This is only valid for BLASTX and TBLASTN/X reports.
895 Example : $frame = $hit_object->frame();
896 Returns : Integer (-2 .. +2)
898 Throws : Exception if HSPs have not been set (BLAST2 reports).
899 Comments : This method requires that all HSPs be tiled. If they have not
900 : already been tiled, they will be tiled first automatically..
901 : If you don't want the tiled data, iterate through each HSP
902 : calling frame() on each (use hsps() to get all HSPs).
904 See Also : L<hsps()|hsps>
913 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
915 # The check for $self->{'_frame'} is a remnant from the 'query' mode days
916 # in which the sbjct object would collect data from the description line only.
919 if(not defined($self->{'_frame'})) {
920 $frame = $self->hsp->frame('hit');
922 $frame = $self->{'_frame'};
933 Usage : $hit_object->p( [format] );
934 Purpose : Get the P-value for the best HSP of the given BLAST hit.
935 : (Note that P-values are not provided with NCBI Blast2 reports).
936 Example : $p = $sbjct->p;
937 : $p = $sbjct->p('exp'); # get exponent only.
938 : ($num, $exp) = $sbjct->p('parts'); # split sci notation into parts
939 Returns : Float or scientific notation number (the raw P-value, DEFAULT).
940 : Integer if format == 'exp' (the magnitude of the base 10 exponent).
941 : 2-element list (float, int) if format == 'parts' and P-value
942 : is in scientific notation (See Comments).
943 Argument : format: string of 'raw' | 'exp' | 'parts'
944 : 'raw' returns value given in report. Default. (1.2e-34)
945 : 'exp' returns exponent value only (34)
946 : 'parts' returns the decimal and exponent as a
947 : 2-element list (1.2, -34) (See Comments).
948 Throws : Warns if no P-value is defined. Uses expect instead.
949 Comments : Using the 'parts' argument is not recommended since it will not
950 : work as expected if the P-value is not in scientific notation.
951 : That is, floats are not converted into sci notation before
952 : splitting into parts.
954 See Also : L<expect()|expect>, L<signif()|signif>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
961 # Some duplication of logic for p(), expect() and signif() for the sake of performance.
962 my ($self, $fmt) = @_;
964 my $val = $self->{'_p'};
967 if(not defined $val) {
968 # P-value not defined, must be a NCBI Blast2 report.
969 # Use expect instead.
970 $self->warn( "P-value not defined. Using expect() instead.");
971 $val = $self->{'_expect'};
974 return $val if not $fmt or $fmt =~ /^raw/i;
975 ## Special formats: exponent-only or as list.
976 return &Bio
::Search
::BlastUtils
::get_exponent
($val) if $fmt =~ /^exp/i;
977 return (split (/eE/, $val)) if $fmt =~ /^parts/i;
979 ## Default: return the raw P-value.
987 Usage : $hit_object->expect( [format] );
988 Purpose : Get the Expect value for the best HSP of the given BLAST hit.
989 Example : $e = $sbjct->expect;
990 : $e = $sbjct->expect('exp'); # get exponent only.
991 : ($num, $exp) = $sbjct->expect('parts'); # split sci notation into parts
992 Returns : Float or scientific notation number (the raw expect value, DEFAULT).
993 : Integer if format == 'exp' (the magnitude of the base 10 exponent).
994 : 2-element list (float, int) if format == 'parts' and Expect
995 : is in scientific notation (see Comments).
996 Argument : format: string of 'raw' | 'exp' | 'parts'
997 : 'raw' returns value given in report. Default. (1.2e-34)
998 : 'exp' returns exponent value only (34)
999 : 'parts' returns the decimal and exponent as a
1000 : 2-element list (1.2, -34) (see Comments).
1001 Throws : Exception if the Expect value is not defined.
1002 Comments : Using the 'parts' argument is not recommended since it will not
1003 : work as expected if the expect value is not in scientific notation.
1004 : That is, floats are not converted into sci notation before
1005 : splitting into parts.
1007 See Also : L<p()|p>, L<signif()|signif>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
1014 # Some duplication of logic for p(), expect() and signif() for the sake of performance.
1015 my ($self, $fmt) = @_;
1019 # For Blast reports that list the P value on the description line,
1020 # getting the expect value requires fully parsing the HSP data.
1021 # For NCBI blast, there's no problem.
1022 if(not defined($self->{'_expect'})) {
1023 if( defined $self->{'_hsps'}) {
1024 $self->{'_expect'} = $val = $self->hsp->expect;
1026 # If _expect is not set and _hsps are not set,
1027 # then this must be a P-value-based report that was
1028 # run without setting the HSPs (shallow parsing).
1029 $self->throw("Can't get expect value. HSPs have not been set.");
1032 $val = $self->{'_expect'};
1036 defined($val) or $self->throw("Can't get Expect value.");
1038 return $val if not $fmt or $fmt =~ /^raw/i;
1039 ## Special formats: exponent-only or as list.
1040 return &Bio
::Search
::BlastUtils
::get_exponent
($val) if $fmt =~ /^exp/i;
1041 return (split (/eE/, $val)) if $fmt =~ /^parts/i;
1043 ## Default: return the raw Expect-value.
1050 Usage : $hit_object->hsps();
1051 Purpose : Get a list containing all HSP objects.
1052 : Get the numbers of HSPs for the current hit.
1053 Example : @hsps = $hit_object->hsps();
1054 : $num = $hit_object->hsps(); # alternatively, use num_hsps()
1055 Returns : Array context : list of Bio::Search::HSP::BlastHSP.pm objects.
1056 : Scalar context: integer (number of HSPs).
1057 : (Equivalent to num_hsps()).
1058 Argument : n/a. Relies on wantarray
1059 Throws : Exception if the HSPs have not been collected.
1061 See Also : L<hsp()|hsp>, L<num_hsps()|num_hsps>
1070 if (not ref $self->{'_hsps'}) {
1071 $self->throw("Can't get HSPs: data not collected.");
1075 # returning list containing all HSPs.
1076 ? @
{$self->{'_hsps'}}
1077 # returning number of HSPs.
1078 : scalar(@
{$self->{'_hsps'}});
1085 Usage : $hit_object->hsp( [string] );
1086 Purpose : Get a single BlastHSP.pm object for the present PsiBlastHit.pm object.
1087 Example : $hspObj = $hit_object->hsp; # same as 'best'
1088 : $hspObj = $hit_object->hsp('best');
1089 : $hspObj = $hit_object->hsp('worst');
1090 Returns : Object reference for a Bio::Search::HSP::BlastHSP.pm object.
1091 Argument : String (or no argument).
1092 : No argument (default) = highest scoring HSP (same as 'best').
1093 : 'best' or 'first' = highest scoring HSP.
1094 : 'worst' or 'last' = lowest scoring HSP.
1095 Throws : Exception if the HSPs have not been collected.
1096 : Exception if an unrecognized argument is used.
1098 See Also : L<hsps()|hsps>, L<num_hsps>()
1105 my( $self, $option ) = @_;
1108 if (not ref $self->{'_hsps'}) {
1109 $self->throw("Can't get HSPs: data not collected.");
1112 my @hsps = @
{$self->{'_hsps'}};
1114 return $hsps[0] if $option =~ /best|first|1/i;
1115 return $hsps[$#hsps] if $option =~ /worst|last/i;
1117 $self->throw("Can't get HSP for: $option\n" .
1118 "Valid arguments: 'best', 'worst'");
1125 Usage : $hit_object->num_hsps();
1126 Purpose : Get the number of HSPs for the present Blast hit.
1127 Example : $nhsps = $hit_object->num_hsps();
1130 Throws : Exception if the HSPs have not been collected.
1132 See Also : L<hsps()|hsps>
1141 if (not defined $self->{'_hsps'}) {
1142 $self->throw("Can't get HSPs: data not collected.");
1145 return scalar(@
{$self->{'_hsps'}});
1150 =head2 logical_length
1152 Usage : $hit_object->logical_length( [seq_type] );
1153 : (mostly intended for internal use).
1154 Purpose : Get the logical length of the hit sequence.
1155 : For query sequence of BLASTX and TBLASTX reports and the hit
1156 : sequence of TBLASTN and TBLASTX reports, the returned length
1157 : is the length of the would-be amino acid sequence (length/3).
1158 : For all other BLAST flavors, this function is the same as length().
1159 Example : $len = $hit_object->logical_length();
1161 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
1162 ('sbjct' is synonymous with 'hit')
1164 Comments : This is important for functions like frac_aligned_query()
1165 : which need to operate in amino acid coordinate space when dealing
1166 : with T?BLASTX type reports.
1168 See Also : L<length()|length>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>
1172 #--------------------
1173 sub logical_length
{
1174 #--------------------
1176 my $seqType = shift || 'query';
1177 $seqType = 'sbjct' if $seqType eq 'hit';
1181 # For the sbjct, return logical sbjct length
1182 if( $seqType eq 'sbjct' ) {
1183 $length = $self->{'_logical_length'} || $self->{'_length'};
1186 # Otherwise, return logical query length
1187 $length = $self->{'_query_length'};
1189 # Adjust length based on BLAST flavor.
1190 if($self->{'_blast_program'} =~ /T?BLASTX/ ) {
1200 Usage : $hit_object->length_aln( [seq_type] );
1201 Purpose : Get the total length of the aligned region for query or sbjct seq.
1202 : This number will include all HSPs
1203 Example : $len = $hit_object->length_aln(); # default = query
1204 : $lenAln = $hit_object->length_aln('query');
1206 Argument : seq_Type = 'query' or 'hit' or 'sbjct' (Default = 'query')
1207 ('sbjct' is synonymous with 'hit')
1208 Throws : Exception if the argument is not recognized.
1209 Comments : This method will report the logical length of the alignment,
1210 : meaning that for TBLAST[NX] reports, the length is reported
1211 : using amino acid coordinate space (i.e., nucleotides / 3).
1213 : This method requires that all HSPs be tiled. If they have not
1214 : already been tiled, they will be tiled first automatically..
1215 : If you don't want the tiled data, iterate through each HSP
1216 : calling length() on each (use hsps() to get all HSPs).
1218 See Also : L<length()|length>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>, L<gaps()|gaps>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<Bio::Search::HSP::BlastHSP::length()|Bio::Search::HSP::BlastHSP>
1225 my( $self, $seqType ) = @_;
1227 $seqType ||= 'query';
1228 $seqType = 'sbjct' if $seqType eq 'hit';
1230 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1232 my $data = $self->{'_length_aln_'.$seqType};
1234 ## If we don't have data, figure out what went wrong.
1236 $self->throw("Can't get length aln for sequence type \"$seqType\"" .
1237 "Valid types are 'query', 'hit', 'sbjct' ('sbjct' = 'hit')");
1245 Usage : $hit_object->gaps( [seq_type] );
1246 Purpose : Get the number of gaps in the aligned query, sbjct, or both sequences.
1247 : Data is summed across all HSPs.
1248 Example : $qgaps = $hit_object->gaps('query');
1249 : $hgaps = $hit_object->gaps('hit');
1250 : $tgaps = $hit_object->gaps(); # default = total (query + hit)
1251 Returns : scalar context: integer
1252 : array context without args: two-element list of integers
1253 : (queryGaps, sbjctGaps)
1254 : Array context can be forced by providing an argument of 'list' or 'array'.
1256 : CAUTION: Calling this method within printf or sprintf is arrray context.
1257 : So this function may not give you what you expect. For example:
1258 : printf "Total gaps: %d", $hit->gaps();
1259 : Actually returns a two-element array, so what gets printed
1260 : is the number of gaps in the query, not the total
1262 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total' | 'list' (default = 'total')
1263 ('sbjct' is synonymous with 'hit')
1265 Comments : If you need data for each HSP, use hsps() and then interate
1266 : through each HSP object.
1267 : This method requires that all HSPs be tiled. If they have not
1268 : already been tiled, they will be tiled first automatically..
1269 : Not relying on wantarray since that will fail in situations
1270 : such as printf "%d", $hit->gaps() in which you might expect to
1271 : be printing the total gaps, but evaluates to array context.
1273 See Also : L<length_aln()|length_aln>
1280 my( $self, $seqType ) = @_;
1282 $seqType ||= (wantarray ?
'list' : 'total');
1283 $seqType = 'sbjct' if $seqType eq 'hit';
1285 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1287 $seqType = lc($seqType);
1289 if($seqType =~ /list|array/i) {
1290 return ($self->{'_gaps_query'}, $self->{'_gaps_sbjct'});
1293 if($seqType eq 'total') {
1294 return ($self->{'_gaps_query'} + $self->{'_gaps_sbjct'}) || 0;
1296 return $self->{'_gaps_'.$seqType} || 0;
1304 Usage : $hit_object->matches( [class] );
1305 Purpose : Get the total number of identical or conserved matches
1306 : (or both) across all HSPs.
1307 : (Note: 'conservative' matches are indicated as 'positives'
1308 : in the Blast report.)
1309 Example : ($id,$cons) = $hit_object->matches(); # no argument
1310 : $id = $hit_object->matches('id');
1311 : $cons = $hit_object->matches('cons');
1312 Returns : Integer or a 2-element array of integers
1313 Argument : class = 'id' | 'cons' OR none.
1314 : If no argument is provided, both identical and conservative
1315 : numbers are returned in a two element list.
1316 : (Other terms can be used to refer to the conservative
1317 : matches, e.g., 'positive'. All that is checked is whether or
1318 : not the supplied string starts with 'id'. If not, the
1319 : conservative matches are returned.)
1320 Throws : Exception if the requested data cannot be obtained.
1321 Comments : If you need data for each HSP, use hsps() and then interate
1322 : through the HSP objects.
1323 : Does not rely on wantarray to return a list. Only checks for
1324 : the presence of an argument (no arg = return list).
1326 See Also : L<Bio::Search::HSP::BlastHSP::matches()|Bio::Search::HSP::BlastHSP>, L<hsps()|hsps>
1333 my( $self, $arg) = @_;
1337 @data = ($self->{'_totalIdentical'}, $self->{'_totalConserved'});
1339 return @data if @data;
1343 if($arg =~ /^id/i) {
1344 $data = $self->{'_totalIdentical'};
1346 $data = $self->{'_totalConserved'};
1348 return $data if $data;
1351 ## Something went wrong if we make it to here.
1352 $self->throw("Can't get identical or conserved data: no data.");
1358 Usage : $sbjct->start( [seq_type] );
1359 Purpose : Gets the start coordinate for the query, sbjct, or both sequences
1360 : in the PsiBlastHit object. If there is more than one HSP, the lowest start
1361 : value of all HSPs is returned.
1362 Example : $qbeg = $sbjct->start('query');
1363 : $sbeg = $sbjct->start('hit');
1364 : ($qbeg, $sbeg) = $sbjct->start();
1365 Returns : scalar context: integer
1366 : array context without args: list of two integers (queryStart, sbjctStart)
1367 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1368 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
1369 ('sbjct' is synonymous with 'hit')
1371 Comments : This method requires that all HSPs be tiled. If there is more than one
1372 : HSP and they have not already been tiled, they will be tiled first automatically..
1373 : Remember that the start and end coordinates of all HSPs are
1374 : normalized so that start < end. Strand information can be
1375 : obtained by calling $hit->strand().
1377 See Also : L<end()|end>, L<range()|range>, L<strand()|strand>, L<HSP Tiling and Ambiguous Alignments>, L<Bio::Search::HSP::BlastHSP::start|Bio::Search::HSP::BlastHSP>
1384 my ($self, $seqType) = @_;
1386 $seqType ||= (wantarray ?
'list' : 'query');
1387 $seqType = 'sbjct' if $seqType eq 'hit';
1389 # If there is only one HSP, defer this call to the solitary HSP.
1390 if($self->num_hsps == 1) {
1391 return $self->hsp->start($seqType);
1393 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1394 if($seqType =~ /list|array/i) {
1395 return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
1397 ## Sensitive to member name changes.
1398 $seqType = "_\L$seqType\E";
1399 return $self->{$seqType.'Start'};
1407 Usage : $sbjct->end( [seq_type] );
1408 Purpose : Gets the end coordinate for the query, sbjct, or both sequences
1409 : in the PsiBlastHit object. If there is more than one HSP, the largest end
1410 : value of all HSPs is returned.
1411 Example : $qend = $sbjct->end('query');
1412 : $send = $sbjct->end('hit');
1413 : ($qend, $send) = $sbjct->end();
1414 Returns : scalar context: integer
1415 : array context without args: list of two integers (queryEnd, sbjctEnd)
1416 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1417 Argument : In scalar context: seq_type = 'query' or 'sbjct'
1418 : (case insensitive). If not supplied, 'query' is used.
1420 Comments : This method requires that all HSPs be tiled. If there is more than one
1421 : HSP and they have not already been tiled, they will be tiled first automatically..
1422 : Remember that the start and end coordinates of all HSPs are
1423 : normalized so that start < end. Strand information can be
1424 : obtained by calling $hit->strand().
1426 See Also : L<start()|start>, L<range()|range>, L<strand()|strand>, L<HSP Tiling and Ambiguous Alignments>, L<Bio::Search::HSP::BlastHSP::end|Bio::Search::HSP::BlastHSP>
1433 my ($self, $seqType) = @_;
1435 $seqType ||= (wantarray ?
'list' : 'query');
1436 $seqType = 'sbjct' if $seqType eq 'hit';
1438 # If there is only one HSP, defer this call to the solitary HSP.
1439 if($self->num_hsps == 1) {
1440 return $self->hsp->end($seqType);
1442 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1443 if($seqType =~ /list|array/i) {
1444 return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
1446 ## Sensitive to member name changes.
1447 $seqType = "_\L$seqType\E";
1448 return $self->{$seqType.'Stop'};
1455 Usage : $sbjct->range( [seq_type] );
1456 Purpose : Gets the (start, end) coordinates for the query or sbjct sequence
1457 : in the HSP alignment.
1458 Example : ($qbeg, $qend) = $sbjct->range('query');
1459 : ($sbeg, $send) = $sbjct->range('hit');
1460 Returns : Two-element array of integers
1461 Argument : seq_type = string, 'query' or 'hit' or 'sbjct' (default = 'query')
1462 ('sbjct' is synonymous with 'hit')
1465 See Also : L<start()|start>, L<end()|end>
1472 my ($self, $seqType) = @_;
1473 $seqType ||= 'query';
1474 $seqType = 'sbjct' if $seqType eq 'hit';
1475 return ($self->start($seqType), $self->end($seqType));
1479 =head2 frac_identical
1481 Usage : $hit_object->frac_identical( [seq_type] );
1482 Purpose : Get the overall fraction of identical positions across all HSPs.
1483 : The number refers to only the aligned regions and does not
1484 : account for unaligned regions in between the HSPs, if any.
1485 Example : $frac_iden = $hit_object->frac_identical('query');
1486 Returns : Float (2-decimal precision, e.g., 0.75).
1487 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total'
1488 : default = 'query' (but see comments below).
1489 : ('sbjct' is synonymous with 'hit')
1491 Comments : Different versions of Blast report different values for the total
1492 : length of the alignment. This is the number reported in the
1493 : denominators in the stats section:
1494 : "Identical = 34/120 Positives = 67/120".
1495 : NCBI BLAST uses the total length of the alignment (with gaps)
1496 : WU-BLAST uses the length of the query sequence (without gaps).
1498 : Therefore, when called with an argument of 'total',
1499 : this method will report different values depending on the
1500 : version of BLAST used. Total does NOT take into account HSP
1501 : tiling, so it should not be used.
1503 : To get the fraction identical among only the aligned residues,
1504 : ignoring the gaps, call this method without an argument or
1505 : with an argument of 'query' or 'hit'.
1507 : If you need data for each HSP, use hsps() and then iterate
1508 : through the HSP objects.
1509 : This method requires that all HSPs be tiled. If they have not
1510 : already been tiled, they will be tiled first automatically.
1512 See Also : L<frac_conserved()|frac_conserved>, L<frac_aligned_query()|frac_aligned_query>, L<matches()|matches>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1517 sub frac_identical
{
1519 my ($self, $seqType) = @_;
1520 $seqType ||= 'query';
1521 $seqType = 'sbjct' if $seqType eq 'hit';
1523 ## Sensitive to member name format.
1524 $seqType = lc($seqType);
1526 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1528 sprintf( "%.2f", $self->{'_totalIdentical'}/$self->{'_length_aln_'.$seqType});
1533 =head2 frac_conserved
1535 Usage : $hit_object->frac_conserved( [seq_type] );
1536 Purpose : Get the overall fraction of conserved positions across all HSPs.
1537 : The number refers to only the aligned regions and does not
1538 : account for unaligned regions in between the HSPs, if any.
1539 Example : $frac_cons = $hit_object->frac_conserved('hit');
1540 Returns : Float (2-decimal precision, e.g., 0.75).
1541 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total'
1542 : default = 'query' (but see comments below).
1543 : ('sbjct' is synonymous with 'hit')
1545 Comments : Different versions of Blast report different values for the total
1546 : length of the alignment. This is the number reported in the
1547 : denominators in the stats section:
1548 : "Positives = 34/120 Positives = 67/120".
1549 : NCBI BLAST uses the total length of the alignment (with gaps)
1550 : WU-BLAST uses the length of the query sequence (without gaps).
1552 : Therefore, when called with an argument of 'total',
1553 : this method will report different values depending on the
1554 : version of BLAST used. Total does NOT take into account HSP
1555 : tiling, so it should not be used.
1557 : To get the fraction conserved among only the aligned residues,
1558 : ignoring the gaps, call this method without an argument or
1559 : with an argument of 'query' or 'hit'.
1561 : If you need data for each HSP, use hsps() and then interate
1562 : through the HSP objects.
1563 : This method requires that all HSPs be tiled. If they have not
1564 : already been tiled, they will be tiled first automatically.
1566 See Also : L<frac_identical()|frac_identical>, L<matches()|matches>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1570 #--------------------
1571 sub frac_conserved
{
1572 #--------------------
1573 my ($self, $seqType) = @_;
1574 $seqType ||= 'query';
1575 $seqType = 'sbjct' if $seqType eq 'hit';
1577 ## Sensitive to member name format.
1578 $seqType = lc($seqType);
1580 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1582 sprintf( "%.2f", $self->{'_totalConserved'}/$self->{'_length_aln_'.$seqType});
1588 =head2 frac_aligned_query
1590 Usage : $hit_object->frac_aligned_query();
1591 Purpose : Get the fraction of the query sequence which has been aligned
1592 : across all HSPs (not including intervals between non-overlapping
1594 Example : $frac_alnq = $hit_object->frac_aligned_query();
1595 Returns : Float (2-decimal precision, e.g., 0.75).
1598 Comments : If you need data for each HSP, use hsps() and then interate
1599 : through the HSP objects.
1600 : To compute the fraction aligned, the logical length of the query
1601 : sequence is used, meaning that for [T]BLASTX reports, the
1602 : full length of the query sequence is converted into amino acids
1603 : by dividing by 3. This is necessary because of the way
1604 : the lengths of aligned sequences are computed.
1605 : This method requires that all HSPs be tiled. If they have not
1606 : already been tiled, they will be tiled first automatically.
1608 See Also : L<frac_aligned_hit()|frac_aligned_hit>, L<logical_length()|logical_length>, L<length_aln()|length_aln>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1612 #----------------------
1613 sub frac_aligned_query
{
1614 #----------------------
1617 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1619 sprintf( "%.2f", $self->{'_length_aln_query'}/$self->logical_length('query'));
1624 =head2 frac_aligned_hit
1626 Usage : $hit_object->frac_aligned_hit();
1627 Purpose : Get the fraction of the hit (sbjct) sequence which has been aligned
1628 : across all HSPs (not including intervals between non-overlapping
1630 Example : $frac_alnq = $hit_object->frac_aligned_hit();
1631 Returns : Float (2-decimal precision, e.g., 0.75).
1634 Comments : If you need data for each HSP, use hsps() and then interate
1635 : through the HSP objects.
1636 : To compute the fraction aligned, the logical length of the sbjct
1637 : sequence is used, meaning that for TBLAST[NX] reports, the
1638 : full length of the sbjct sequence is converted into amino acids
1639 : by dividing by 3. This is necessary because of the way
1640 : the lengths of aligned sequences are computed.
1641 : This method requires that all HSPs be tiled. If they have not
1642 : already been tiled, they will be tiled first automatically.
1644 See Also : L<frac_aligned_query()|frac_aligned_query>, L<matches()|matches>, , L<logical_length()|logical_length>, L<length_aln()|length_aln>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1648 #--------------------
1649 sub frac_aligned_hit
{
1650 #--------------------
1653 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1655 sprintf( "%.2f", $self->{'_length_aln_sbjct'}/$self->logical_length('sbjct'));
1659 ## These methods are being maintained for backward compatibility.
1661 =head2 frac_aligned_sbjct
1663 Same as L<frac_aligned_hit()|frac_aligned_hit>
1668 sub frac_aligned_sbjct
{ my $self=shift; $self->frac_aligned_hit(@_); }
1671 =head2 num_unaligned_sbjct
1673 Same as L<num_unaligned_hit()|num_unaligned_hit>
1678 sub num_unaligned_sbjct
{ my $self=shift; $self->num_unaligned_hit(@_); }
1683 =head2 num_unaligned_hit
1685 Usage : $hit_object->num_unaligned_hit();
1686 Purpose : Get the number of the unaligned residues in the hit sequence.
1687 : Sums across all all HSPs.
1688 Example : $num_unaln = $hit_object->num_unaligned_hit();
1692 Comments : See notes regarding logical lengths in the comments for frac_aligned_hit().
1693 : They apply here as well.
1694 : If you need data for each HSP, use hsps() and then interate
1695 : through the HSP objects.
1696 : This method requires that all HSPs be tiled. If they have not
1697 : already been tiled, they will be tiled first automatically..
1699 See Also : L<num_unaligned_query()|num_unaligned_query>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<frac_aligned_hit()|frac_aligned_hit>
1703 #---------------------
1704 sub num_unaligned_hit
{
1705 #---------------------
1708 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1710 my $num = $self->logical_length('sbjct') - $self->{'_length_aln_sbjct'};
1711 ($num < 0 ?
0 : $num );
1715 =head2 num_unaligned_query
1717 Usage : $hit_object->num_unaligned_query();
1718 Purpose : Get the number of the unaligned residues in the query sequence.
1719 : Sums across all all HSPs.
1720 Example : $num_unaln = $hit_object->num_unaligned_query();
1724 Comments : See notes regarding logical lengths in the comments for frac_aligned_query().
1725 : They apply here as well.
1726 : If you need data for each HSP, use hsps() and then interate
1727 : through the HSP objects.
1728 : This method requires that all HSPs be tiled. If they have not
1729 : already been tiled, they will be tiled first automatically..
1731 See Also : L<num_unaligned_hit()|num_unaligned_hit>, L<frac_aligned_query()|frac_aligned_query>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1735 #-----------------------
1736 sub num_unaligned_query
{
1737 #-----------------------
1740 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1742 my $num = $self->logical_length('query') - $self->{'_length_aln_query'};
1743 ($num < 0 ?
0 : $num );
1750 Usage : $hit->seq_inds( seq_type, class, collapse );
1751 Purpose : Get a list of residue positions (indices) across all HSPs
1752 : for identical or conserved residues in the query or sbjct sequence.
1753 Example : @s_ind = $hit->seq_inds('query', 'identical');
1754 : @h_ind = $hit->seq_inds('hit', 'conserved');
1755 : @h_ind = $hit->seq_inds('hit', 'conserved', 1);
1756 Returns : Array of integers
1757 : May include ranges if collapse is non-zero.
1758 Argument : [0] seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
1759 : ('sbjct' is synonymous with 'hit')
1760 : [1] class = 'identical' or 'conserved' (default = 'identical')
1761 : (can be shortened to 'id' or 'cons')
1762 : (actually, anything not 'id' will evaluate to 'conserved').
1763 : [2] collapse = boolean, if non-zero, consecutive positions are merged
1764 : using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
1765 : collapses to "1-5 7 9-11". This is useful for
1766 : consolidating long lists. Default = no collapse.
1768 Comments : Note that HSPs are not tiled for this. This could be a problem
1769 : for hits containing mutually exclusive HSPs.
1770 : TODO: Consider tiling and then reporting seq_inds for the
1773 See Also : L<Bio::Search::HSP::BlastHSP::seq_inds()|Bio::Search::HSP::BlastHSP>
1780 my ($self, $seqType, $class, $collapse) = @_;
1782 $seqType ||= 'query';
1783 $class ||= 'identical';
1786 $seqType = 'sbjct' if $seqType eq 'hit';
1789 foreach $hsp ($self->hsps) {
1790 # This will merge data for all HSPs together.
1791 push @inds, $hsp->seq_inds($seqType, $class);
1794 # Need to remove duplicates and sort the merged positions.
1796 my %tmp = map { $_, 1 } @inds;
1797 @inds = sort {$a <=> $b} keys %tmp;
1800 $collapse ?
&Bio
::Search
::BlastUtils
::collapse_nums
(@inds) : @inds;
1806 Usage : $sbjct->iteration( );
1807 Purpose : Gets the iteration number in which the Hit was found.
1808 Example : $iteration_num = $sbjct->iteration();
1809 Returns : Integer greater than or equal to 1
1810 Non-PSI-BLAST reports will report iteration as 1, but this number
1811 is only meaningful for PSI-BLAST reports.
1815 See Also : L<found_again()|found_again>
1820 sub iteration
{ shift->{'_iteration'} }
1826 Usage : $sbjct->found_again;
1827 Purpose : Gets a boolean indicator whether or not the hit has
1828 been found in a previous iteration.
1829 This is only applicable to PSI-BLAST reports.
1831 This method indicates if the hit was reported in the
1832 "Sequences used in model and found again" section of the
1833 PSI-BLAST report or if it was reported in the
1834 "Sequences not found previously or not previously below threshold"
1835 section of the PSI-BLAST report. Only for hits in iteration > 1.
1837 Example : if( $sbjct->found_again()) { ... };
1838 Returns : Boolean (1 or 0) for PSI-BLAST report iterations greater than 1.
1839 Returns undef for PSI-BLAST report iteration 1 and non PSI_BLAST
1844 See Also : L<found_again()|found_again>
1849 sub found_again
{ shift->{'_found_again'} }
1855 Usage : $sbjct->strand( [seq_type] );
1856 Purpose : Gets the strand(s) for the query, sbjct, or both sequences
1857 : in the best HSP of the PsiBlastHit object after HSP tiling.
1858 : Only valid for BLASTN, TBLASTX, BLASTX-query, TBLASTN-hit.
1859 Example : $qstrand = $sbjct->strand('query');
1860 : $sstrand = $sbjct->strand('hit');
1861 : ($qstrand, $sstrand) = $sbjct->strand();
1862 Returns : scalar context: integer '1', '-1', or '0'
1863 : array context without args: list of two strings (queryStrand, sbjctStrand)
1864 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1865 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
1866 ('sbjct' is synonymous with 'hit')
1868 Comments : This method requires that all HSPs be tiled. If they have not
1869 : already been tiled, they will be tiled first automatically..
1870 : If you don't want the tiled data, iterate through each HSP
1871 : calling strand() on each (use hsps() to get all HSPs).
1873 : Formerly (prior to 10/21/02), this method would return the
1874 : string "-1/1" for hits with HSPs on both strands.
1875 : However, now that strand and frame is properly being accounted
1876 : for during HSP tiling, it makes more sense for strand()
1877 : to return the strand data for the best HSP after tiling.
1879 : If you really want to know about hits on opposite strands,
1880 : you should be iterating through the HSPs using methods on the
1883 : A possible use case where knowing whether a hit has HSPs
1884 : on both strands would be when filtering via SearchIO for hits with
1885 : this property. However, in this case it would be better to have a
1886 : dedicated method such as $hit->hsps_on_both_strands(). Similarly
1887 : for frame. This could be provided if there is interest.
1889 See Also : L<Bio::Search::HSP::BlastHSP::strand>()
1896 my ($self, $seqType) = @_;
1898 Bio
::Search
::BlastUtils
::tile_hsps
($self) if not $self->{'_tile_hsps'};
1900 $seqType ||= (wantarray ?
'list' : 'query');
1901 $seqType = 'sbjct' if $seqType eq 'hit';
1904 # If there is only one HSP, defer this call to the solitary HSP.
1905 if($self->num_hsps == 1) {
1906 return $self->hsp->strand($seqType);
1908 elsif( defined $self->{'_qstrand'}) {
1909 # Get the data computed during hsp tiling.
1910 $qstr = $self->{'_qstrand'};
1911 $hstr = $self->{'_sstrand'};
1914 # otherwise, iterate through all HSPs collecting strand info.
1915 # This will return the string "-1/1" if there are HSPs on different strands.
1916 # NOTE: This was the pre-10/21/02 procedure which will no longer be used,
1917 # (unless the above elsif{} is commented out).
1919 foreach my $hsp( $self->hsps ) {
1920 my ( $q, $h ) = $hsp->strand();
1924 $qstr = join( '/', sort keys %qstr);
1925 $hstr = join( '/', sort keys %hstr);
1928 if($seqType =~ /list|array/i) {
1929 return ($qstr, $hstr);
1930 } elsif( $seqType eq 'query' ) {
1941 #####################################################################################
1943 #####################################################################################
1946 =head1 FOR DEVELOPERS ONLY
1950 Information about the various data members of this module is provided for those
1951 wishing to modify or understand the code. Two things to bear in mind:
1955 =item 1 Do NOT rely on these in any code outside of this module.
1957 All data members are prefixed with an underscore to signify that they are private.
1958 Always use accessor methods. If the accessor doesn't exist or is inadequate,
1959 create or modify an accessor (and let me know, too!). (An exception to this might
1960 be for BlastHSP.pm which is more tightly coupled to PsiBlastHit.pm and
1961 may access PsiBlastHit data members directly for efficiency purposes, but probably
1964 =item 2 This documentation may be incomplete and out of date.
1966 It is easy for these data member descriptions to become obsolete as
1967 this module is still evolving. Always double check this info and search
1968 for members not described here.
1972 An instance of Bio::Search::Hit::PsiBlastHit.pm is a blessed reference to a hash containing
1973 all or some of the following fields:
1976 --------------------------------------------------------------
1977 _hsps : Array ref for a list of Bio::Search::HSP::BlastHSP.pm objects.
1979 _db : Database identifier from the summary line.
1981 _desc : Description data for the hit from the summary line.
1983 _length : Total length of the hit sequence.
1985 _score : BLAST score.
1987 _bits : BLAST score (in bits). Matrix-independent.
1989 _p : BLAST P value. Obtained from summary section. (Blast1/WU-Blast only)
1991 _expect : BLAST Expect value. Obtained from summary section.
1993 _n : BLAST N value (number of HSPs) (Blast1/WU-Blast2 only)
1995 _frame : Reading frame for TBLASTN and TBLASTX analyses.
1997 _totalIdentical: Total number of identical aligned monomers.
1999 _totalConserved: Total number of conserved aligned monomers (a.k.a. "positives").
2001 _overlap : Maximum number of overlapping residues between adjacent HSPs
2002 : before considering the alignment to be ambiguous.
2004 _ambiguous_aln : Boolean. True if the alignment of all HSPs is ambiguous.
2006 _length_aln_query : Length of the aligned region of the query sequence.
2008 _length_aln_sbjct : Length of the aligned region of the sbjct sequence.