1 #-----------------------------------------------------------------------------
2 # PACKAGE : Bio::Tools::Sigcleave.pm
3 # AUTHOR : Chris Dagdigian, dag@sonsorol.org
4 # CREATED : Jan 28 1999
7 # Copyright (c) 1997-9 bioperl, Chris Dagdigian and others. All Rights Reserved.
8 # This module is free software; you can redistribute it and/or
9 # modify it under the same terms as Perl itself.
13 # Object framework ripped from Steve Chervits's SeqPattern.pm
15 # Core EGCG Sigcleave emulation from perl code developed by
16 # Danh Nguyen & Kamalakar Gulukota which itself was based
17 # loosely on Colgrove's signal.c program.
19 # The overall idea is to replicate the output of the sigcleave
20 # program which was distributed with the EGCG extension to the GCG sequence
21 # analysis package. There is also an accessor method for just getting at
24 #-----------------------------------------------------------------------------
26 package Bio
::Tools
::Sigcleave
;
28 use Bio
::Root
::Global
qw(:devel);
33 use vars qw
($ID $VERSION %WeightTable);
34 $ID = 'Bio::Tools::Sigcleave';
39 Bio::Tools::Sigcleave.pm - Bioperl object for sigcleave analysis
43 =head2 Object Creation
45 use Bio::Tools::Sigcleave ();
47 $sigcleave_object = new Bio::Tools::Sigcleave(-file=>'sigtest.aa',
48 -desc=>'test sigcleave protein seq',
53 Sigcleave objects can be created via the same methods as Bio::Seq objects. The
54 one additional parameter is "-threshold" which sets the score reporting limit
55 for the algorithim. The above exmple shows a sigcleave object being created
56 from a protein sequence file. See the Bio::Seq documention to see the other ways
57 that objects can be created.
59 =head2 Object Methods & Accessors
61 %raw_results = $sigcleave_object->signals;
63 $formatted_output = $sigcleave_object->pretty_print;
67 This module is included with the central Bioperl distribution:
69 http://bio.perl.org/Core/Latest
70 ftp://bio.perl.org/pub/DIST
72 Follow the installation instructions included in the README file.
76 "Sigcleave" was a program distributed as part of the free EGCG add-on to
77 earlier versions of the GCG Sequence Analysis package.
79 From the EGCG documentation:
80 SigCleave uses the von Heijne method to locate signal sequences, and to identify
81 the cleavage site. The method is 95% accurate in resolving signal sequences from
82 non-signal sequences with a cutoff score of 3.5, and 75-80% accurate in identifying
83 the cleavage site. The program reports all hits above a minimum value.
85 The EGCG Sigcleave program was written by Peter Rice
86 (E-mail: pmr@sanger.ac.uk Post: Informatics Division, The Sanger Centre,
87 Wellcome Trust Genome Campus, Hinxton, Cambs, CB10 1SA, UK).
89 Since EGCG is no longer distributed for the latest versions of GCG, this code
90 was developed to emulate the output of the original program as much as possible for
91 those who lost access to sigcleave when upgrading to newer versions of GCG.
93 The EGCG website can be found at: http://www.sanger.ac.uk/Software/EGCG/
95 There are 2 accessor methods for this object. "signals" will return a perl
96 associative array containing the sigcleave scores keyed by amino acid position.
97 "pretty_print" returns a formatted string similar to the output of the original
100 In both cases, the "threshold" setting controls the score reporting level. If no
101 value for threshold is passed in by the user, the code defaults to a reporting value
104 In this implemntation the accessor will never return any score/position pair which does not
105 meet the threshold limit. This is the slightly different from the behaviour of
106 the 8.1 EGCG sigcleave program which will report the highest of the under-threshold
107 results if nothing else is found.
111 Example of pretty_print output:
113 SIGCLEAVE of sigtest from: 1 to 146
115 Report scores over 3.5
116 Maximum score 4.9 at residue 131
118 Sequence: FVILAAMSIQGSA-NLQTQWKSTASLALET
119 | (signal) | (mature peptide)
122 Other entries above 3.5
124 Maximum score 3.7 at residue 112
126 Sequence: CSRQLFGWLFCKV-HPGAIVFVILAAMSIQGSANLQTQWKSTASLALET
127 | (signal) | (mature peptide)
133 No warranty implied or expressed. Use at your own risk :) Users unfamiliar
134 with the original Sigcleave application should read the von Heijne papers.
136 The emphasis here is on correctly replicating the calls that 8.1 EGCG sigcleave
137 would make. This code has been tested against a non-redundant curated set
138 of 405 Swissprot proteins representing secreted, non-secreted, membrane and
139 transit proteins. Except for the EGCG sigcleave habit of reporting an
140 under-threshold score if nothing better is found the output was identical.
142 The weight matrix in this code is for eukaryote signal sequences.
144 Please see the example script located in the bioperl distribution
145 to see how this code can be used.
149 When updating and maintaining a module, it helps to know that people
150 are actually using it. Let us know if you find a bug, think this code
151 is useful or have any improvements/features to suggest.
153 =head2 Reporting Bugs
155 Report bugs to the Bioperl bug tracking system to help us keep track the bugs and
156 their resolution. Bug reports can be submitted via email or the web:
158 bioperl-bugs@bio.perl.org
159 http://bio.perl.org/bioperl-bugs/
163 Chris Dagdigian, dag@sonsorol.org & others
167 Bio::Tools::Sigcleave.pm, $Id$
171 Copyright (c) 1999 Chris Dagdigian & others. All Rights Reserved.
172 This module is free software; you can redistribute it and/or
173 modify it under the same terms as Perl itself.
175 =head1 REFERENCES / SEE ALSO
177 von Heijne G. (1986) "A new method for predicting signal sequences cleavage sites."
178 Nucleic Acids Res. 14, 4683-4690.
180 von Heijne G. (1987) in "Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit"
181 (Acad. Press, (1987), 113-117).
183 EGCG website: http://www.sanger.ac.uk/Software/EGCG/
187 The following documentation describes the various functions
188 contained in this module. Some functions are for internal
189 use and are not meant to be called by the user; they are
190 preceded by an underscore ("_").
198 #### END of main POD documentation.
203 %Bio::Tools
::Sigcleave
::WeightTable
= (
204 'A' => [16 ,13 ,14 ,15 ,20 ,18 ,18 ,17 ,25 ,15 ,47 , 6 ,80 ,18 , 6 ,14.5],
205 'C' => [3 , 6 , 9 , 7 , 9 ,14 , 6 , 8 , 5 , 6 ,19 , 3 , 9 , 8 , 3 , 4.5],
206 'D' => [0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 5 , 3 , 0 , 5 , 0 ,10 ,11 , 8.9],
207 'E' => [0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 3 , 7 , 0 , 7 , 0 ,13 ,14 ,10.0],
208 'F' => [13 , 9 ,11 ,11 , 6 , 7 ,18 ,13 , 4 , 5 , 0 ,13 , 0 , 6 , 4 , 5.6],
209 'G' => [4 , 4 , 3 , 6 , 3 ,13 , 3 , 2 ,19 ,34 , 5 , 7 ,39 ,10 , 7 ,12.1],
210 'H' => [0 , 0 , 0 , 0 , 0 , 1 , 1 , 0 , 5 , 0 , 0 , 6 , 0 , 4 , 2 , 3.4],
211 'I' => [15 ,15 , 8 , 6 ,11 , 5 , 4 , 8 , 5 , 1 ,10 , 5 , 0 , 8 , 7 , 7.4],
212 'K' => [0 , 0 , 0 , 1 , 0 , 0 , 1 , 0 , 0 , 4 , 0 , 2 , 0 ,11 , 9 ,11.3],
213 'L' => [71 ,68 ,72 ,79 ,78 ,45 ,64 ,49 ,10 ,23 , 8 ,20 , 1 , 8 , 4 ,12.1],
214 'M' => [0 , 3 , 7 , 4 , 1 , 6 , 2 , 2 , 0 , 0 , 0 , 1 , 0 , 1 , 2 , 2.7],
215 'N' => [0 , 1 , 0 , 1 , 1 , 0 , 0 , 0 , 3 , 3 , 0 ,10 , 0 , 4 , 7 , 7.1],
216 'P' => [2 , 0 , 2 , 0 , 0 , 4 , 1 , 8 ,20 ,14 , 0 , 1 , 3 , 0 ,22 , 7.4],
217 'Q' => [0 , 0 , 0 , 1 , 0 , 6 , 1 , 0 ,10 , 8 , 0 ,18 , 3 ,19 ,10 , 6.3],
218 'R' => [2 , 0 , 0 , 0 , 0 , 1 , 0 , 0 , 7 , 4 , 0 ,15 , 0 ,12 , 9 , 7.6],
219 'S' => [9 , 3 , 8 , 6 ,13 ,10 ,15 ,16 ,26 ,11 ,23 ,17 ,20 ,15 ,10 ,11.4],
220 'T' => [2 ,10 , 5 , 4 , 5 ,13 , 7 , 7 ,12 , 6 ,17 , 8 , 6 , 3 ,10 , 9.7],
221 'V' => [20 ,25 ,15 ,18 ,13 ,15 ,11 ,27 , 0 ,12 ,32 , 3 , 0 , 8 ,17 ,11.1],
222 'W' => [4 , 3 , 3 , 1 , 1 , 2 , 6 , 3 , 1 , 3 , 0 , 9 , 0 , 2 , 0 , 1.8],
223 'Y' => [0 , 1 , 4 , 0 , 0 , 1 , 3 , 1 , 1 , 2 , 0 , 5 , 0 , 1 , 7 , 5.6],
227 ## Now we calculate the _real_ values for the weight table
230 ## yeah yeah yeah there is lots of math here that gets repeated
231 ## every single time a sigcleave object gets created. This is
232 ## a quick hack to make sure that we get the scores as accurate as
233 ## possible. Need all those significant digits....
235 ## suggestions for speedup aproaches welcome
238 my ($i, $j, $expected);
239 foreach $i (keys %WeightTable)
241 $expected = $WeightTable{$i}[15];
244 for ($j=0; $j<16; $j++)
246 if ($WeightTable{$i}[$j] == 0)
248 $WeightTable{$i}[$j] = 1;
249 if ($j == 10 || $j == 12) { $WeightTable{$i}[$j] = 1.e
-10; }
251 $WeightTable{$i}[$j] = log($WeightTable{$i}[$j]/$expected);
259 #####################################################################################
261 #####################################################################################
267 Usage : n/a; automatically called by Bio::Root::Object::new()
268 Purpose : Verifies that the type is correct for superclass (Bio::Seq.pm)
269 : and calls superclass constructor last.
271 Argument : Parameters passed to new()
272 Throws : Exception if given type is not protein.
274 See Also : B<Bio::Root::Object::new()>, B<Bio::Seq::_initialize()>
281 my($self, %param) = @_;
283 my($threshold,$type) = $self->_rearrange([qw(THRESHOLD
286 my $make = $self->SUPER::_initialize
(%param);
288 # complain if not protein
289 if ($type =~ /nuc|[dr]na/i) {
290 $self->throw("Sigcleave.pm only supports protein sequences.");
291 } elsif ($type =~ /amino|pep|prot/i) {
292 $self->{type
} = 'amino';
295 # set threshold if supplied, otherwise default to 3.5
296 if (defined $threshold) {
297 $self->{threshold
} = $threshold;
299 $self->{threshold
} = 3.5;
310 Usage : N/A This is an internal method. Not meant to be called from outside
313 Purpose : calculates sigcleave score and amino acid position for the
314 : given protein sequence. The score reporting threshold can
315 : be adjusted by passing in the "threshold" parameter during
316 : object construction. If no threshold is passed in, the code
317 : defaults to reporting any scores equal to or above 3.5
319 Returns : nothing. results are added to the object
332 ## need to shut strict() up
351 my $seqEnd = $self->seq_len;
352 my $pep = $self->str;
353 my $minWeight = $self->threshold;
355 ## The weight table is keyed by UPPERCASE letters so we uppercase
356 ## the pep string because we don't want to alter the actual object
361 for($seqPos = $seqBegin; $seqPos < $seqEnd; $seqPos++)
363 $istart = (0 > $seqPos + $pVal)?
0 : $seqPos + $pVal;
364 $iend = ($seqPos + $nVal - 1 < $seqEnd)?
$seqPos + $nVal - 1 : $seqEnd;
365 $icol= $iend - $istart + 1;
367 for ($k=0; $k<$icol; $k++)
369 $c = substr($pep, $istart + $k, 1);
371 ## CD: The if(defined) stuff was put in here because Sigcleave.pm
372 ## CD: kept getting warnings about undefined vals during 'make test' ...
373 if(defined $WeightTable{$c}[$k]) { $weight += $WeightTable{$c}[$k]; }
377 if ($weight >= $minWeight)
378 { $signals{$seqPos+1} = sprintf ("%.1f", $weight); }
381 $self->{"signal_scores"} = { %signals };
389 Usage : $value = $self->threshold
391 Purpose : Accessor method sigcleave score reporting threshold.
406 return $self->{threshold
};
414 Usage : %sigcleave_results = $sigcleave_object->signals;
416 Purpose : Accessor method for sigcleave results
418 Returns : Associative array. The key value represents the amino acid position
419 : and the value represents the score. Only scores that
420 : are greater than or equal to the THRESHOLD value are reported.
436 foreach $position ( sort keys %{ $self->{signal_scores
} } ) {
437 $results{$position} = $self->{signal_scores
}{$position};
448 Usage : $output = $sigcleave_object->pretty_print;
449 : print $sigcleave_object->pretty_print;
451 Purpose : Emulates the output of the EGCG Sigcleave
454 Returns : A formatted string.
469 my %results = $self->signals;
470 my @hits = keys %results;
471 my $hitcount = $#hits; $hitcount++;
472 my $thresh = $self->threshold;
473 my $seqlen = $self->seq_len;
474 my $name = $self->id;
475 my $pep = $self->str;
478 $output = "SIGCLEAVE of $name from: 1 to $seqlen\n\n";
481 $output .= "Report scores over $thresh\n";
482 foreach $pos ((sort { $results{$b} cmp $results{$a} } keys %results)) {
483 my $start = $pos - 13;
484 $output .= sprintf ("Maximum score %1.1f at residue %3d\n",$results{$pos},$pos);
486 $output .= " Sequence: ";
487 $output .= substr($pep,$start,13);
489 $output .= substr($pep,$pos,50);
491 $output .= " | \(signal\) | \(mature peptide\)\n";
492 $output .= sprintf(" %3d %3d\n\n",$start,$pos);
494 if(($hitcount > 1) && ($cnt == 1)) {
495 $output .= " Other entries above $thresh\n\n";
508 #########################################################################
510 #########################################################################