[bug 2714]
[bioperl-live.git] / Bio / SearchDist.pm
blobccd8bbf3f4e1c85abffeebed0b7f8fdbee612f12
1 # $Id$
4 # BioPerl module for Bio::SearchDist
6 # Cared for by Ewan Birney <birney@ebi.ac.uk>
8 # Copyright Ewan Birney
10 # You may distribute this module under the same terms as perl itself
12 # POD documentation - main docs before the code
14 =head1 NAME
16 Bio::SearchDist - A perl wrapper around Sean Eddy's histogram object
18 =head1 SYNOPSIS
20 $dis = Bio::SearchDist->new();
21 foreach $score ( @scores ) {
22 $dis->add_score($score);
25 if( $dis->fit_evd() ) {
26 foreach $score ( @scores ) {
27 $evalue = $dis->evalue($score);
28 print "Score $score had an evalue of $evalue\n";
30 } else {
31 warn("Could not fit histogram to an EVD!");
34 =head1 DESCRIPTION
36 The Bio::SearchDist object is a wrapper around Sean Eddy's excellent
37 histogram object. The histogram object can bascially take in a number
38 of scores which are sensibly distributed somewhere around 0 that come
39 from a supposed Extreme Value Distribution. Having add all the scores
40 from a database search via the add_score method you can then fit a
41 extreme value distribution using fit_evd(). Once fitted you can then
42 get out the evalue for each score (or a new score) using
43 evalue($score).
45 The fitting procedure is better described in Sean Eddy's own code
46 (available from http://hmmer.wustl.edu, or in the histogram.h header
47 file in Compile/SW). Bascially it fits a EVD via a maximum likelhood
48 method with pruning of the top end of the distribution so that real
49 positives are discarded in the fitting procedure. This comes from
50 an orginally idea of Richard Mott's and the likelhood fitting
51 is from a book by Lawless [should ref here].
54 The object relies on the fact that the scores are sensibly distributed
55 around about 0 and that integer bins are sensible for the
56 histogram. Scores based on bits are often ideal for this (bits based
57 scoring mechanisms is what this histogram object was originally
58 designed for).
61 =head1 CONTACT
63 The original code this was based on comes from the histogram module as
64 part of the HMMer2 package. Look at http://hmmer.wustl.edu/
66 Its use in Bioperl is via the Compiled XS extension which is cared for
67 by Ewan Birney (birney@ebi.ac.uk). Please contact Ewan first about
68 the use of this module
70 =head1 FEEDBACK
72 =head2 Mailing Lists
74 User feedback is an integral part of the evolution of this and other
75 Bioperl modules. Send your comments and suggestions preferably to one
76 of the Bioperl mailing lists. Your participation is much appreciated.
78 bioperl-l@bioperl.org - General discussion
79 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
81 =head2 Reporting Bugs
83 Report bugs to the Bioperl bug tracking system to help us keep track
84 the bugs and their resolution. Bug reports can be submitted via the
85 web:
87 http://bugzilla.open-bio.org/
89 =head1 APPENDIX
91 The rest of the documentation details each of the object
92 methods. Internal methods are usually preceded with a _
94 =cut
97 # Let the code begin...
100 package Bio::SearchDist;
101 use strict;
104 BEGIN {
105 eval {
106 require Bio::Ext::Align;
108 if ( $@ ) {
109 print $@;
110 print STDERR ("\nThe C-compiled engine for histogram object (Bio::Ext::Align) has not been installed.\n Please install the bioperl-ext package\n\n");
111 exit(1);
116 use base qw(Bio::Root::Root);
118 sub new {
119 my($class,@args) = @_;
120 my $self = $class->SUPER::new(@args);
121 my($min, $max, $lump) =
122 $self->_rearrange([qw(MIN MAX LUMP)], @args);
124 if( ! $min ) {
125 $min = -100;
128 if( ! $max ) {
129 $max = +100;
132 if( ! $lump ) {
133 $lump = 50;
136 $self->_engine(&Bio::Ext::Align::new_Histogram($min,$max,$lump));
138 return $self;
141 =head2 add_score
143 Title : add_score
144 Usage : $dis->add_score(300);
145 Function: Adds a single score to the distribution
146 Returns : nothing
147 Args :
150 =cut
152 sub add_score{
153 my ($self,$score) = @_;
154 my ($eng);
155 $eng = $self->_engine();
156 #$eng->AddToHistogram($score);
157 $eng->add($score);
160 =head2 fit_evd
162 Title : fit_evd
163 Usage : $dis->fit_evd();
164 Function: fits an evd to the current distribution
165 Returns : 1 if it fits successfully, 0 if not
166 Args :
169 =cut
171 sub fit_evd{
172 my ($self,@args) = @_;
174 return $self->_engine()->fit_EVD(10000,1);
177 =head2 fit_Gaussian
179 Title : fit_Gaussian
180 Usage :
181 Function:
182 Example :
183 Returns :
184 Args :
187 =cut
189 sub fit_Gaussian{
190 my ($self,$high) = @_;
192 if( ! defined $high ) {
193 $high = 10000;
196 return $self->_engine()->fit_Gaussian($high);
200 =head2 evalue
202 Title : evalue
203 Usage : $eval = $dis->evalue($score)
204 Function: Returns the evalue of this score
205 Returns : float
206 Args :
209 =cut
211 sub evalue{
212 my ($self,$score) = @_;
214 return $self->_engine()->evalue($score);
220 =head2 _engine
222 Title : _engine
223 Usage : $obj->_engine($newval)
224 Function: underlyine bp_sw:: histogram engine
225 Returns : value of _engine
226 Args : newvalue (optional)
229 =cut
231 sub _engine{
232 my ($self,$value) = @_;
233 if( defined $value) {
234 $self->{'_engine'} = $value;
236 return $self->{'_engine'};
240 ## End of Package
243 __END__