2 # BioPerl module for Bio::Align::AlignI
4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
6 # Cared for by Jason Stajich <jason@bioperl.org>
8 # Copyright Jason Stajich
10 # You may distribute this module under the same terms as perl itself
12 # POD documentation - main docs before the code
16 Bio::Align::AlignI - An interface for describing sequence alignments.
20 # get a Bio::Align::AlignI somehow - typically using Bio::AlignIO system
22 print $aln->length, "\n";
23 print $aln->num_residues, "\n";
24 print $aln->is_flush, "\n";
25 print $aln->num_sequences, "\n";
26 print $aln->percentage_identity, "\n";
27 print $aln->consensus_string(50), "\n";
29 # find the position in the alignment for a sequence location
30 $pos = $aln->column_from_residue_number('1433_LYCES', 14); # = 6;
32 # extract sequences and check values for the alignment column $pos
33 foreach $seq ($aln->each_seq) {
34 $res = $seq->subseq($pos, $pos);
37 foreach $res (keys %count) {
38 printf "Res: %s Count: %2d\n", $res, $count{$res};
43 This interface describes the basis for alignment objects.
49 User feedback is an integral part of the evolution of this and other
50 Bioperl modules. Send your comments and suggestions preferably to
51 the Bioperl mailing list. Your participation is much appreciated.
53 bioperl-l@bioperl.org - General discussion
54 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
58 Please direct usage questions or support issues to the mailing list:
60 I<bioperl-l@bioperl.org>
62 rather than to the module maintainer directly. Many experienced and
63 reponsive experts will be able look at the problem and quickly
64 address it. Please include a thorough description of the problem
65 with code and data examples if at all possible.
69 Report bugs to the Bioperl bug tracking system to help us keep track
70 of the bugs and their resolution. Bug reports can be submitted via the
73 https://github.com/bioperl/bioperl-live/issues
75 =head1 AUTHOR - Jason Stajich
77 Email jason@bioperl.org
81 Ewan Birney, birney@ebi.ac.uk
82 Heikki Lehvaslaiho, heikki-at-bioperl-dot-org
86 The rest of the documentation details each of the object methods.
87 Internal methods are usually preceded with a _
92 # Let the code begin...
95 package Bio
::Align
::AlignI
;
99 use base
qw(Bio::Root::RootI);
101 =head1 Modifier methods
103 These methods modify the MSE by adding, removing or shuffling complete
109 Usage : $myalign->add_seq($newseq);
110 Function : Adds another sequence to the alignment. *Does not* align
111 it - just adds it to the hashes.
113 Argument : a Bio::LocatableSeq object
116 See L<Bio::LocatableSeq> for more information.
122 $self->throw_not_implemented();
128 Usage : $aln->remove_seq($seq);
129 Function : Removes a single sequence from an alignment
131 Argument : a Bio::LocatableSeq object
137 $self->throw_not_implemented();
143 Usage : $aln->purge(0.7);
146 Removes sequences above whatever %id.
148 This function will grind on large alignments. Beware!
149 (perhaps not ideally implemented)
152 Returns : An array of the removed sequences
160 $self->throw_not_implemented();
163 =head2 sort_alphabetically
165 Title : sort_alphabetically
166 Usage : $ali->sort_alphabetically
169 Changes the order of the alignment to alphabetical on name
170 followed by numerical by number.
177 sub sort_alphabetically
{
179 $self->throw_not_implemented();
182 =head1 Sequence selection methods
184 Methods returning one or more sequences objects.
189 Usage : foreach $seq ( $align->each_seq() )
190 Function : Gets an array of Seq objects from the alignment
198 $self->throw_not_implemented();
201 =head2 each_alphabetically
203 Title : each_alphabetically
204 Usage : foreach $seq ( $ali->each_alphabetically() )
207 Returns an array of sequence object sorted alphabetically
208 by name and then by start point.
209 Does not change the order of the alignment
216 sub each_alphabetically
{
218 $self->throw_not_implemented();
221 =head2 each_seq_with_id
223 Title : each_seq_with_id
224 Usage : foreach $seq ( $align->each_seq_with_id() )
227 Gets an array of Seq objects from the
228 alignment, the contents being those sequences
229 with the given name (there may be more than one)
232 Argument : a seq name
236 sub each_seq_with_id
{
238 $self->throw_not_implemented();
241 =head2 get_seq_by_pos
243 Title : get_seq_by_pos
244 Usage : $seq = $aln->get_seq_by_pos(3) # third sequence from the alignment
247 Gets a sequence based on its position in the alignment.
248 Numbering starts from 1. Sequence positions larger than
249 num_sequences() will throw an error.
251 Returns : a Bio::LocatableSeq object
252 Argument : positive integer for the sequence position
258 $self->throw_not_implemented();
261 =head1 Create new alignments
263 The result of these methods are horizontal or vertical subsets of the
269 Usage : $aln2 = $aln->select(1, 3) # three first sequences
272 Creates a new alignment from a continuous subset of
273 sequences. Numbering starts from 1. Sequence positions
274 larger than num_sequences() will throw an error.
276 Returns : a Bio::SimpleAlign object
277 Argument : positive integer for the first sequence
278 positive integer for the last sequence to include (optional)
284 $self->throw_not_implemented();
288 =head2 select_noncont
290 Title : select_noncont
291 Usage : $aln2 = $aln->select_noncont(1, 3) # first and 3rd sequences
294 Creates a new alignment from a subset of
295 sequences. Numbering starts from 1. Sequence positions
296 larger than num_sequences() will throw an error.
298 Returns : a Bio::SimpleAlign object
299 Args : array of integers for the sequences
305 $self->throw_not_implemented();
311 Usage : $aln2 = $aln->slice(20, 30)
314 Creates a slice from the alignment inclusive of start and
315 end columns. Sequences with no residues in the slice are
316 excluded from the new alignment and a warning is printed.
317 Slice beyond the length of the sequence does not do
320 Returns : a Bio::SimpleAlign object
321 Argument : positive integer for start column
322 positive integer for end column
328 $self->throw_not_implemented();
331 =head1 Change sequences within the MSE
333 These methods affect characters in all sequences without changing the
340 Usage : $ali->map_chars('\.','-')
343 Does a s/$arg1/$arg2/ on the sequences. Useful for gap
346 Notice that the from (arg1) is interpreted as a regex,
347 so be careful about quoting meta characters (eg
348 $ali->map_chars('.','-') wont do what you want)
351 Argument : 'from' rexexp
358 $self->throw_not_implemented();
364 Usage : $ali->uppercase()
365 Function : Sets all the sequences to uppercase
373 $self->throw_not_implemented();
379 Usage : $align->match_line()
380 Function : Generates a match line - much like consensus string
381 except that a line indicating the '*' for a match.
382 Argument : (optional) Match line characters ('*' by default)
383 (optional) Strong match char (':' by default)
384 (optional) Weak match char ('.' by default)
390 $self->throw_not_implemented();
396 Usage : $ali->match()
399 Goes through all columns and changes residues that are
400 identical to residue in first sequence to match '.'
401 character. Sets match_char.
403 USE WITH CARE: Most MSE formats do not support match
404 characters in sequences, so this is mostly for output
405 only. NEXUS format (Bio::AlignIO::nexus) can handle
409 Argument : a match character, optional, defaults to '.'
415 $self->throw_not_implemented();
421 Usage : $ali->unmatch()
424 Undoes the effect of method match. Unsets match_char.
427 Argument : a match character, optional, defaults to '.'
433 $self->throw_not_implemented();
439 Methods for setting and reading the MSE attributes.
441 Note that the methods defining character semantics depend on the user
442 to set them sensibly. They are needed only by certain input/output
443 methods. Unset them by setting to an empty string ('').
448 Usage : $myalign->id("Ig")
449 Function : Gets/sets the id field of the alignment
450 Returns : An id string
451 Argument : An id string (optional)
457 $self->throw_not_implemented();
463 Usage : $myalign->missing_char("?")
464 Function : Gets/sets the missing_char attribute of the alignment
465 It is generally recommended to set it to 'n' or 'N'
466 for nucleotides and to 'X' for protein.
467 Returns : An missing_char string,
468 Argument : An missing_char string (optional)
474 $self->throw_not_implemented();
480 Usage : $myalign->match_char('.')
481 Function : Gets/sets the match_char attribute of the alignment
482 Returns : An match_char string,
483 Argument : An match_char string (optional)
489 $self->throw_not_implemented();
495 Usage : $myalign->gap_char('-')
496 Function : Gets/sets the gap_char attribute of the alignment
497 Returns : An gap_char string, defaults to '-'
498 Argument : An gap_char string (optional)
504 $self->throw_not_implemented();
510 Usage : my @symbolchars = $aln->symbol_chars;
511 Function: Returns all the seen symbols (other than gaps)
512 Returns : array of characters that are the seen symbols
513 Argument: boolean to include the gap/missing/match characters
519 $self->throw_not_implemented();
522 =head1 Alignment descriptors
524 These read only methods describe the MSE in various ways.
527 =head2 consensus_string
529 Title : consensus_string
530 Usage : $str = $ali->consensus_string($threshold_percent)
531 Function : Makes a strict consensus
532 Returns : consensus string
533 Argument : Optional threshold ranging from 0 to 100.
534 The consensus residue has to appear at least threshold %
535 of the sequences at a given location, otherwise a '?'
536 character will be placed at that location.
541 sub consensus_string
{
543 $self->throw_not_implemented();
546 =head2 consensus_iupac
548 Title : consensus_iupac
549 Usage : $str = $ali->consensus_iupac()
552 Makes a consensus using IUPAC ambiguity codes from DNA
553 and RNA. The output is in upper case except when gaps in
554 a column force output to be in lower case.
556 Note that if your alignment sequences contain a lot of
557 IUPAC ambiquity codes you often have to manually set
558 alphabet. Bio::PrimarySeq::_guess_type thinks they
559 indicate a protein sequence.
561 Returns : consensus string
563 Throws : on protein sequences
568 sub consensus_iupac
{
570 $self->throw_not_implemented();
576 Usage : if( $ali->is_flush() )
579 Function : Tells you whether the alignment
580 : is flush, ie all of the same length
590 $self->throw_not_implemented();
596 Usage : $len = $ali->length()
597 Function : Returns the maximum length of the alignment.
598 To be sure the alignment is a block, use is_flush
606 $self->throw_not_implemented();
609 =head2 maxname_length
611 Title : maxname_length
612 Usage : $ali->maxname_length()
615 Gets the maximum length of the displayname in the
616 alignment. Used in writing out various MSE formats.
625 $self->throw_not_implemented();
631 Usage : $no = $ali->num_residues
632 Function : number of residues in total in the alignment
635 Note : replaces no_residues
641 $self->throw_not_implemented();
646 Title : num_sequences
647 Usage : $depth = $ali->num_sequences
648 Function : number of sequence in the sequence alignment
651 Note : replaces no_sequences
657 $self->throw_not_implemented();
660 =head2 percentage_identity
662 Title : percentage_identity
663 Usage : $id = $align->percentage_identity
664 Function: The function calculates the percentage identity of the alignment
665 Returns : The percentage identity of the alignment (as defined by the
671 sub percentage_identity
{
673 $self->throw_not_implemented();
676 =head2 overall_percentage_identity
678 Title : overall_percentage_identity
679 Usage : $id = $align->overall_percentage_identity
680 Function: The function calculates the percentage identity of
681 the conserved columns
682 Returns : The percentage identity of the conserved columns
687 sub overall_percentage_identity
{
689 $self->throw_not_implemented();
693 =head2 average_percentage_identity
695 Title : average_percentage_identity
696 Usage : $id = $align->average_percentage_identity
697 Function: The function uses a fast method to calculate the average
698 percentage identity of the alignment
699 Returns : The average percentage identity of the alignment
704 sub average_percentage_identity
{
706 $self->throw_not_implemented();
709 =head1 Alignment positions
711 Methods to map a sequence position into an alignment column and back.
712 column_from_residue_number() does the former. The latter is really a
713 property of the sequence object and can done using
714 L<Bio::LocatableSeq::location_from_column>:
716 # select somehow a sequence from the alignment, e.g.
717 my $seq = $aln->get_seq_by_pos(1);
718 #$loc is undef or Bio::LocationI object
719 my $loc = $seq->location_from_column(5);
722 =head2 column_from_residue_number
724 Title : column_from_residue_number
725 Usage : $col = $ali->column_from_residue_number( $seqname, $resnumber)
728 This function gives the position in the alignment
729 (i.e. column number) of the given residue number in the
730 sequence with the given name. For example, for the
733 Seq1/91-97 AC..DEF.GH
734 Seq2/24-30 ACGG.RTY..
735 Seq3/43-51 AC.DDEFGHI
737 column_from_residue_number( "Seq1", 94 ) returns 6.
738 column_from_residue_number( "Seq2", 25 ) returns 2.
739 column_from_residue_number( "Seq3", 50 ) returns 9.
741 An exception is thrown if the residue number would lie
742 outside the length of the alignment
743 (e.g. column_from_residue_number( "Seq2", 22 )
745 Note: If the parent sequence is represented by more than one
746 alignment sequence and the residue number is present in
747 them, this method finds only the first one.
749 Returns : A column number for the position in the alignment of the
750 given residue in the given sequence (1 = first column)
751 Args : A sequence id/name (not a name/start-end)
752 A residue number in the whole sequence (not just that
753 segment of it in the alignment)
757 sub column_from_residue_number
{
759 $self->throw_not_implemented();
762 =head1 Sequence names
764 Methods to manipulate the display name. The default name based on the
765 sequence id and subsequence positions can be overridden in various
771 Usage : $myalign->displayname("Ig", "IgA")
772 Function : Gets/sets the display name of a sequence in the alignment
774 Returns : A display name string
775 Argument : name of the sequence
776 displayname of the sequence (optional)
782 $self->throw_not_implemented();
785 =head2 set_displayname_count
787 Title : set_displayname_count
788 Usage : $ali->set_displayname_count
791 Sets the names to be name_# where # is the number of
792 times this name has been used.
799 sub set_displayname_count
{
801 $self->throw_not_implemented();
804 =head2 set_displayname_flat
806 Title : set_displayname_flat
807 Usage : $ali->set_displayname_flat()
808 Function : Makes all the sequences be displayed as just their name,
815 sub set_displayname_flat
{
817 $self->throw_not_implemented();
820 =head2 set_displayname_normal
822 Title : set_displayname_normal
823 Usage : $ali->set_displayname_normal()
824 Function : Makes all the sequences be displayed as name/start-end
830 sub set_displayname_normal
{
832 $self->throw_not_implemented();
835 =head1 Deprecated methods
840 Usage : $no = $ali->no_residues
841 Function : number of residues in total in the alignment
844 Note : deprecated in favor of num_residues()
849 # immediate deprecation
856 Usage : $depth = $ali->no_sequences
857 Function : number of sequence in the sequence alignment
860 Note : deprecated in favor of num_sequences()
865 # immediate deprecation