maint: restructure to use Dist::Zilla
[bioperl-live.git] / lib / Bio / Matrix / PSM / SiteMatrixI.pm
bloba41365e9e0e615012b89bd05bfe84b1179a64414
2 =head1 NAME
4 Bio::Matrix::PSM::SiteMatrixI - SiteMatrixI implementation, holds a
5 position scoring matrix (or position weight matrix) and log-odds
7 =head1 SYNOPSIS
9 # You cannot use this module directly; see Bio::Matrix::PSM::SiteMatrix
10 # for an example implementation
12 =head1 DESCRIPTION
14 SiteMatrix is designed to provide some basic methods when working with position
15 scoring (weight) matrices, such as transcription factor binding sites for
16 example. A DNA PSM consists of four vectors with frequencies {A,C,G,T}. This is
17 the minimum information you should provide to construct a PSM object. The
18 vectors can be provided as strings with frequenciesx10 rounded to an int, going
19 from {0..a} and 'a' represents the maximum (10). This is like MEME's compressed
20 representation of a matrix and it is quite useful when working with relational
21 DB. If arrays are provided as an input (references to arrays actually) they can
22 be any number, real or integer (frequency or count).
24 When creating the object you can ask the constructor to make a simple pseudo
25 count correction by adding a number (typically 1) to all positions (with the
26 -correction option). After adding the number the frequencies will be
27 calculated. Only use correction when you supply counts, not frequencies.
29 Throws an exception if: You mix as an input array and string (for example A
30 matrix is given as array, C - as string). The position vector is (0,0,0,0). One
31 of the probability vectors is shorter than the rest.
33 Summary of the methods I use most frequently (details below):
35 iupac - return IUPAC compliant consensus as a string
36 score - Returns the score as a real number
37 IC - information content. Returns a real number
38 id - identifier. Returns a string
39 accession - accession number. Returns a string
40 next_pos - return the sequence probably for each letter, IUPAC
41 symbol, IUPAC probability and simple sequence
42 consenus letter for this position. Rewind at the end. Returns a hash.
43 pos - current position get/set. Returns an integer.
44 regexp - construct a regular expression based on IUPAC consensus.
45 For example AGWV will be [Aa][Gg][AaTt][AaCcGg]
46 width - site width
47 get_string - gets the probability vector for a single base as a string.
48 get_array - gets the probability vector for a single base as an array.
49 get_logs_array - gets the log-odds vector for a single base as an array.
51 New methods, which might be of interest to anyone who wants to store PSM in a relational
52 database without creating an entry for each position is the ability to compress the
53 PSM vector into a string with losing usually less than 1% of the data.
54 this can be done with:
56 my $str=$matrix->get_compressed_freq('A');
60 my $str=$matrix->get_compressed_logs('A');
62 Loading from a database should be done with new, but is not yest implemented.
63 However you can still uncompress such string with:
65 my @arr=Bio::Matrix::PSM::_uncompress_string ($str,1,1); for PSM
69 my @arr=Bio::Matrix::PSM::_uncompress_string ($str,1000,2); for log odds
71 =head1 FEEDBACK
73 =head2 Mailing Lists
75 User feedback is an integral part of the evolution of this and other
76 Bioperl modules. Send your comments and suggestions preferably to one
77 of the Bioperl mailing lists. Your participation is much appreciated.
79 bioperl-l@bioperl.org - General discussion
80 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
82 =head2 Support
84 Please direct usage questions or support issues to the mailing list:
86 I<bioperl-l@bioperl.org>
88 rather than to the module maintainer directly. Many experienced and
89 reponsive experts will be able look at the problem and quickly
90 address it. Please include a thorough description of the problem
91 with code and data examples if at all possible.
93 =head2 Reporting Bugs
95 Report bugs to the Bioperl bug tracking system to help us keep track
96 the bugs and their resolution. Bug reports can be submitted via the
97 web:
99 https://github.com/bioperl/bioperl-live/issues
101 =head1 AUTHOR - Stefan Kirov
103 Email skirov@utk.edu
105 =head1 APPENDIX
107 =cut
110 # Let the code begin...
112 package Bio::Matrix::PSM::SiteMatrixI;
114 # use strict;
115 use base qw(Bio::Root::RootI);
117 =head2 calc_weight
119 Title : calc_weight
120 Usage : $self->calc_weight({A=>0.2562,C=>0.2438,G=>0.2432,T=>0.2568});
121 Function: Recalculates the PSM (or weights) based on the PFM (the frequency matrix)
122 and user supplied background model.
123 Throws : if no model is supplied
124 Example :
125 Returns :
126 Args : reference to a hash with background frequencies for A,C,G and T
128 =cut
130 sub calc_weight {
131 my $self = shift;
132 $self->throw_not_implemented();
136 =head2 next_pos
138 Title : next_pos
139 Usage : my %base=$site->next_pos;
140 Function:
142 Retrieves the next position features: frequencies and weights for
143 A,C,G,T, the main letter (as in consensus) and the
144 probabilty for this letter to occur at this position and
145 the current position
147 Throws :
148 Example :
149 Returns : hash (pA,pC,pG,pT,lA,lC,lG,lT,base,prob,rel)
150 Args : none
153 =cut
155 sub next_pos {
156 my $self = shift;
157 $self->throw_not_implemented();
160 =head2 curpos
162 Title : curpos
163 Usage : my $pos=$site->curpos;
164 Function: Gets/sets the current position. Converts to 0 if argument is minus and
165 to width if greater than width
166 Throws :
167 Example :
168 Returns : integer
169 Args : integer
171 =cut
173 sub curpos {
174 my $self = shift;
175 $self->throw_not_implemented();
178 =head2 e_val
180 Title : e_val
181 Usage : my $score=$site->e_val;
182 Function: Gets/sets the e-value
183 Throws :
184 Example :
185 Returns : real number
186 Args : real number
188 =cut
190 sub e_val {
191 my $self = shift;
192 $self->throw_not_implemented();
195 =head2 consensus
197 Title : consensus
198 Usage :
199 Function: Returns the consensus
200 Returns : string
201 Args : (optional) threshold value 1 to 10, default 5
202 '5' means the returned characters had a 50% or higher presence at
203 their position
205 =cut
207 sub consensus {
208 my $self = shift;
209 $self->throw_not_implemented();
212 =head2 accession_number
214 Title : accession_number
215 Usage :
216 Function: accession number, this will be unique id for the SiteMatrix object as
217 well for any other object, inheriting from SiteMatrix
218 Throws :
219 Example :
220 Returns : string
221 Args : string
223 =cut
225 sub accession_number {
226 my $self = shift;
227 $self->throw_not_implemented();
231 =head2 width
233 Title : width
234 Usage : my $width=$site->width;
235 Function: Returns the length of the site
236 Throws :
237 Example :
238 Returns : number
239 Args :
241 =cut
243 sub width {
244 my $self = shift;
245 $self->throw_not_implemented();
248 =head2 IUPAC
250 Title : IUPAC
251 Usage : my $iupac_consensus=$site->IUPAC;
252 Function: Returns IUPAC compliant consensus
253 Throws :
254 Example :
255 Returns : string
256 Args :
258 =cut
260 sub IUPAC {
261 my $self = shift;
262 $self->throw_not_implemented();
265 =head2 IC
267 Title : IC
268 Usage : my $ic=$site->IC;
269 Function: Information content
270 Throws :
271 Example :
272 Returns : real number
273 Args : none
275 =cut
277 sub IC {
278 my $self=shift;
279 $self->throw_not_implemented();
282 =head2 get_string
284 Title : get_string
285 Usage : my $freq_A=$site->get_string('A');
286 Function: Returns given probability vector as a string. Useful if you want to
287 store things in a rel database, where arrays are not first choice
288 Throws : If the argument is outside {A,C,G,T}
289 Example :
290 Returns : string
291 Args : character {A,C,G,T}
293 =cut
295 sub get_string {
296 my $self=shift;
297 $self->throw_not_implemented();
300 =head2 id
302 Title : id
303 Usage : my $id=$site->id;
304 Function: Gets/sets the site id
305 Throws :
306 Example :
307 Returns : string
308 Args : string
310 =cut
312 sub id {
313 my $self = shift;
314 $self->throw_not_implemented();
317 =head2 regexp
319 Title : regexp
320 Usage : my $regexp=$site->regexp;
321 Function: Returns a regular expression which matches the IUPAC convention.
322 N will match X, N, - and .
323 Throws :
324 Example :
325 Returns : string
326 Args :
328 =cut
330 sub regexp {
331 my $self=shift;
332 $self->throw_not_implemented();
335 =head2 regexp_array
337 Title : regexp_array
338 Usage : my @regexp=$site->regexp;
339 Function: Returns a regular expression which matches the IUPAC convention.
340 N will match X, N, - and .
341 Throws :
342 Example :
343 Returns : array
344 Args :
345 To do : I have separated regexp and regexp_array, but
346 maybe they can be rewritten as one - just check what
347 should be returned
349 =cut
351 sub regexp_array {
352 my $self=shift;
353 $self->throw_not_implemented();
356 =head2 get_array
358 Title : get_array
359 Usage : my @freq_A=$site->get_array('A');
360 Function: Returns an array with frequencies for a specified base
361 Throws :
362 Example :
363 Returns : array
364 Args : char
366 =cut
368 sub get_array {
369 my $self=shift;
370 $self->throw_not_implemented();
374 =head2 _to_IUPAC
376 Title : _to_IUPAC
377 Usage :
378 Function: Converts a single position to IUPAC compliant symbol and
379 returns its probability. For rules see the implementation.
380 Throws :
381 Example :
382 Returns : char, real number
383 Args : real numbers for A,C,G,T (positional)
385 =cut
387 sub _to_IUPAC {
388 my $self = shift;
389 $self->throw_not_implemented();
392 =head2 _to_cons
394 Title : _to_cons
395 Usage :
396 Function: Converts a single position to simple consensus character and
397 returns its probability. For rules see the implementation,
398 Throws :
399 Example :
400 Returns : char, real number
401 Args : real numbers for A,C,G,T (positional)
403 =cut
405 sub _to_cons {
406 my $self = shift;
407 $self->throw_not_implemented();
411 =head2 _calculate_consensus
413 Title : _calculate_consensus
414 Usage :
415 Function: Internal stuff
416 Throws :
417 Example :
418 Returns :
419 Args :
421 =cut
423 sub _calculate_consensus {
424 my $self = shift;
425 $self->throw_not_implemented();
428 =head2 _compress_array
430 Title : _compress_array
431 Usage :
432 Function: Will compress an array of real signed numbers to a string (ie vector of bytes)
433 -127 to +127 for bi-directional(signed) and 0..255 for unsigned ;
434 Throws :
435 Example : Internal stuff
436 Returns : String
437 Args : array reference, followed by an max value and
438 direction (optional, default 1-unsigned),1 unsigned, any other is signed.
440 =cut
442 sub _compress_array {
443 my $self = shift;
444 $self->throw_not_implemented();
447 =head2 _uncompress_string
449 Title : _uncompress_string
450 Usage :
451 Function: Will uncompress a string (vector of bytes) to create an array of real
452 signed numbers (opposite to_compress_array)
453 Throws :
454 Example : Internal stuff
455 Returns : string, followed by an max value and
456 direction (optional, default 1-unsigned), 1 unsigned, any other is signed.
457 Args : array
459 =cut
461 sub _uncompress_string {
462 my $self = shift;
463 $self->throw_not_implemented();
466 =head2 get_compressed_freq
468 Title : get_compressed_freq
469 Usage :
470 Function: A method to provide a compressed frequency vector. It uses one byte to
471 code the frequence for one of the probability vectors for one position.
472 Useful for relational database. Improvement of the previous 0..a coding.
473 Throws :
474 Example : my $strA=$self->get_compressed_freq('A');
475 Returns : String
476 Args : char
478 =cut
480 sub get_compressed_freq {
481 my $self = shift;
482 $self->throw_not_implemented();
485 =head2 get_compressed_logs
487 Title : get_compressed_logs
488 Usage :
489 Function: A method to provide a compressed log-odd vector. It uses one byte to
490 code the log value for one of the log-odds vectors for one position.
491 Throws :
492 Example : my $strA=$self->get_compressed_logs('A');
493 Returns : String
494 Args : char
496 =cut
498 sub get_compressed_logs {
499 my $self = shift;
500 $self->throw_not_implemented();
503 =head2 sequence_match_weight
505 Title : sequence_match_weight
506 Usage :
507 Function: This method will calculate the score of a match, based on the PWM
508 if such is associated with the matrix object. Returns undef if no
509 PWM data is available.
510 Throws : if the length of the sequence is different from the matrix width
511 Example : my $score=$matrix->sequence_match_weight('ACGGATAG');
512 Returns : Floating point
513 Args : string
515 =cut
517 sub sequence_match_weight {
518 my $self = shift;
519 $self->throw_not_implemented();
522 =head2 get_all_vectors
524 Title : get_all_vectors
525 Usage :
526 Function: returns all possible sequence vectors to satisfy the PFM under
527 a given threshold
528 Throws : If threshold outside of 0..1 (no sense to do that)
529 Example : my @vectors=$self->get_all_vectors(4);
530 Returns : Array of strings
531 Args : (optional) floating
533 =cut
535 sub get_all_vectors {
536 my $self = shift;
537 $self->throw_not_implemented();