More tests for PrimarySeq
[bioperl-live.git] / Bio / SeqUtils.pm
blob018e6aacd9f0fa9f5cc068a4e1c5feb5ba9e9482
1 # BioPerl module for Bio::SeqUtils
3 # Please direct questions and support issues to <bioperl-l@bioperl.org>
5 # Cared for by Heikki Lehvaslaiho <heikki-at-bioperl-dot-org>
7 # Copyright Heikki Lehvaslaiho
9 # You may distribute this module under the same terms as perl itself
11 # POD documentation - main docs before the code
13 =head1 NAME
15 Bio::SeqUtils - Additional methods for PrimarySeq objects
17 =head1 SYNOPSIS
19 use Bio::SeqUtils;
20 # get a Bio::PrimarySeqI compliant object, $seq, somehow
21 $util = Bio::SeqUtils->new();
22 $polypeptide_3char = $util->seq3($seq);
23 # or
24 $polypeptide_3char = Bio::SeqUtils->seq3($seq);
26 # set the sequence string (stored in one char code in the object)
27 Bio::SeqUtils->seq3($seq, $polypeptide_3char);
29 # translate a sequence in all six frames
30 @seqs = Bio::SeqUtils->translate_6frames($seq);
32 # inplace editing of the sequence
33 Bio::SeqUtils->mutate($seq,
34 Bio::LiveSeq::Mutation->new(-seq => 'c',
35 -pos => 3
36 ));
37 # mutate a sequence to desired similarity%
38 $newseq = Bio::SeqUtils-> evolve
39 ($seq, $similarity, $transition_transversion_rate);
41 # concatenate two or more sequences with annotations and features,
42 # the first sequence will be modified
43 Bio::SeqUtils->cat(@seqs);
44 my $catseq=$seqs[0];
46 # truncate a sequence, retaining features and adjusting their
47 # coordinates if necessary
48 my $truncseq = Bio::SeqUtils->trunc_with_features($seq, 100, 200);
50 # reverse complement a sequence and its features
51 my $revcomseq = Bio::SeqUtils->revcom_with_features($seq);
53 # simulate cloning of a fragment into a vector. Cut the vector at
54 # positions 1000 and 1100 (deleting postions 1001 to 1099) and
55 # "ligate" a fragment into the sites. The fragment is
56 # reverse-complemented in this example (option "flip").
57 # All features of the vector and fragment are preserved and
58 # features that are affected by the deletion/insertion are
59 # modified accordingly.
60 # $vector and $fragment must be Bio::SeqI compliant objects
61 my $new_molecule = Bio::Sequtils->ligate(
62 -vector => $vector,
63 -fragment => $fragment,
64 -left => 1000,
65 -right => 1100,
66 -flip => 1
69 # delete a segment of a seqence (from pos 1000 to 1100, inclusive),
70 # again preserving features and annotations
71 my $new_molecule = Bio::SeqUtils->cut( $seq, 1000, 1100 );
73 # insert a fragment into a recipient between positions 1000 and
74 # 1001. $recipient is a Bio::SeqI compliant object
75 my $new_molecule = Bio::SeqUtils::PbrTools->insert(
76 $recipient_seq,
77 $fragment_seq,
78 1000
81 =head1 DESCRIPTION
83 This class is a holder of methods that work on Bio::PrimarySeqI-
84 compliant sequence objects, e.g. Bio::PrimarySeq and
85 Bio::Seq. These methods are not part of the Bio::PrimarySeqI
86 interface and should in general not be essential to the primary function
87 of sequence objects. If you are thinking of adding essential
88 functions, it might be better to create your own sequence class.
89 See L<Bio::PrimarySeqI>, L<Bio::PrimarySeq>, and L<Bio::Seq> for more.
91 The methods take as their first argument a sequence object. It is
92 possible to use methods without first creating a SeqUtils object,
93 i.e. use it as an anonymous hash.
95 The first two methods, seq3() and seq3in(), give out or read in protein
96 sequences coded in three letter IUPAC amino acid codes.
98 The next two methods, translate_3frames() and translate_6frames(), wrap
99 around the standard translate method to give back an array of three
100 forward or all six frame translations.
102 The mutate() method mutates the sequence string with a mutation
103 description object.
105 The cat() method concatenates two or more sequences. The first sequence
106 is modified by addition of the remaining sequences. All annotations and
107 sequence features will be transferred.
109 The revcom_with_features() and trunc_with_features() methods are similar
110 to the revcom() and trunc() methods from Bio::Seq, but also adjust any
111 features associated with the sequence as appropriate.
113 There are also methods that simulate molecular cloning with rich
114 sequence objects.
115 The delete() method cuts a segment out of a sequence and re-joins the
116 left and right fragments (like splicing or digesting and re-ligating a
117 molecule). Positions (and types) of sequence features are adjusted
118 accordingly:
119 Features that span the deleted segment are converted to split featuress
120 to indicate the disruption. (Sub)Features that extend into the deleted
121 segment are truncated.
122 A new molecule is created and returned.
124 The insert() method inserts a fragment (which can be a rich Bio::Seq
125 object) into another sequence object adding all annotations and
126 features to the final product.
127 Features that span the insertion site are converted to split features
128 to indicate the disruption.
129 A new feature is added to indicate the inserted fragment itself.
130 A new molecule is created and returned.
132 The ligate() method simulates digesting a recipient (vector) and
133 ligating a fragment into it, which can also be flipped if needed. It
134 is simply a combination of a deletion and an insertion step and
135 returns a new molecule. The rules for modifying feature locations
136 outlined above are also used here, e.g. features that span the cut
137 sites are converted to split features with truncated sub-locations.
140 =head1 FEEDBACK
142 =head2 Mailing Lists
144 User feedback is an integral part of the evolution of this and other
145 Bioperl modules. Send your comments and suggestions preferably to one
146 of the Bioperl mailing lists. Your participation is much appreciated.
148 bioperl-l@bioperl.org - General discussion
149 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
151 =head2 Support
153 Please direct usage questions or support issues to the mailing list:
155 I<bioperl-l@bioperl.org>
157 rather than to the module maintainer directly. Many experienced and
158 reponsive experts will be able look at the problem and quickly
159 address it. Please include a thorough description of the problem
160 with code and data examples if at all possible.
162 =head2 Reporting Bugs
164 Report bugs to the Bioperl bug tracking system to help us keep track
165 the bugs and their resolution. Bug reports can be submitted via the
166 web:
168 https://redmine.open-bio.org/projects/bioperl/
170 =head1 AUTHOR - Heikki Lehvaslaiho
172 Email: heikki-at-bioperl-dot-org
174 =head1 CONTRIBUTORS
176 Roy R. Chaudhuri - roy.chaudhuri at gmail.com
177 Frank Schwach - frank.schwach@sanger.ac.uk
179 =head1 APPENDIX
181 The rest of the documentation details each of the object
182 methods. Internal methods are usually preceded with a _
184 =cut
187 # Let the code begin...
190 package Bio::SeqUtils;
191 use vars qw(%ONECODE %THREECODE);
192 use strict;
193 use Carp;
194 use Scalar::Util qw(blessed);
195 use base qw(Bio::Root::Root);
196 # new inherited from RootI
198 BEGIN {
199 # Note : Ambiguity code 'J' = I/L (used for ambiguities in mass-spec data)
200 %ONECODE =
201 ('Ala' => 'A', 'Asx' => 'B', 'Cys' => 'C', 'Asp' => 'D',
202 'Glu' => 'E', 'Phe' => 'F', 'Gly' => 'G', 'His' => 'H',
203 'Ile' => 'I', 'Lys' => 'K', 'Leu' => 'L', 'Met' => 'M',
204 'Asn' => 'N', 'Pro' => 'P', 'Gln' => 'Q', 'Arg' => 'R',
205 'Ser' => 'S', 'Thr' => 'T', 'Val' => 'V', 'Trp' => 'W',
206 'Xaa' => 'X', 'Tyr' => 'Y', 'Glx' => 'Z', 'Ter' => '*',
207 'Sec' => 'U', 'Pyl' => 'O', 'Xle' => 'J'
210 %THREECODE =
211 ('A' => 'Ala', 'B' => 'Asx', 'C' => 'Cys', 'D' => 'Asp',
212 'E' => 'Glu', 'F' => 'Phe', 'G' => 'Gly', 'H' => 'His',
213 'I' => 'Ile', 'K' => 'Lys', 'L' => 'Leu', 'M' => 'Met',
214 'N' => 'Asn', 'P' => 'Pro', 'Q' => 'Gln', 'R' => 'Arg',
215 'S' => 'Ser', 'T' => 'Thr', 'V' => 'Val', 'W' => 'Trp',
216 'Y' => 'Tyr', 'Z' => 'Glx', 'X' => 'Xaa', '*' => 'Ter',
217 'U' => 'Sec', 'O' => 'Pyl', 'J' => 'Xle'
221 =head2 seq3
223 Title : seq3
224 Usage : $string = Bio::SeqUtils->seq3($seq)
225 Function: Read only method that returns the amino acid sequence as a
226 string of three letter codes. alphabet has to be
227 'protein'. Output follows the IUPAC standard plus 'Ter' for
228 terminator. Any unknown character, including the default
229 unknown character 'X', is changed into 'Xaa'. A noncoded
230 aminoacid selenocystein is recognized (Sec, U).
232 Returns : A scalar
233 Args : character used for stop in the protein sequence optional,
234 defaults to '*' string used to separate the output amino
235 acid codes, optional, defaults to ''
237 =cut
239 sub seq3 {
240 my ($self, $seq, $stop, $sep ) = @_;
242 $seq->isa('Bio::PrimarySeqI') ||
243 $self->throw('Not a Bio::PrimarySeqI object but [$self]');
244 $seq->alphabet eq 'protein' ||
245 $self->throw('Not a protein sequence');
247 if (defined $stop) {
248 length $stop != 1 and $self->throw('One character stop needed, not [$stop]');
249 $THREECODE{$stop} = "Ter";
251 $sep ||= '';
253 my $aa3s;
254 foreach my $aa (split //, uc $seq->seq) {
255 $THREECODE{$aa} and $aa3s .= $THREECODE{$aa}. $sep, next;
256 $aa3s .= 'Xaa'. $sep;
258 $sep and substr($aa3s, -(length $sep), length $sep) = '' ;
259 return $aa3s;
262 =head2 seq3in
264 Title : seq3in
265 Usage : $seq = Bio::SeqUtils->seq3in($seq, 'MetGlyTer')
266 Function: Method for changing of the sequence of a
267 Bio::PrimarySeqI sequence object. The three letter amino
268 acid input string is converted into one letter code. Any
269 unknown character triplet, including the default 'Xaa', is
270 converted into 'X'.
272 Returns : Bio::PrimarySeq object
273 Args : sequence string
274 optional character to be used for stop in the protein sequence,
275 defaults to '*'
276 optional character to be used for unknown in the protein sequence,
277 defaults to 'X'
279 =cut
281 sub seq3in {
282 my ($self, $seq, $string, $stop, $unknown) = @_;
284 $seq->isa('Bio::PrimarySeqI') ||
285 $self->throw("Not a Bio::PrimarySeqI object but [$self]");
286 $seq->alphabet eq 'protein' ||
287 $self->throw('Not a protein sequence');
289 if (defined $stop) {
290 length $stop != 1 and $self->throw("One character stop needed, not [$stop]");
291 $ONECODE{'Ter'} = $stop;
293 if (defined $unknown) {
294 length $unknown != 1 and $self->throw("One character stop needed, not [$unknown]");
295 $ONECODE{'Xaa'} = $unknown;
298 my ($aas, $aa3);
299 my $length = (length $string) - 2;
300 for (my $i = 0 ; $i < $length ; $i += 3) {
301 $aa3 = substr($string, $i, 3);
302 $aa3 = ucfirst(lc($aa3));
303 $ONECODE{$aa3} and $aas .= $ONECODE{$aa3}, next;
304 $aas .= $ONECODE{'Xaa'};
306 $seq->seq($aas);
307 return $seq;
310 =head2 translate_3frames
312 Title : translate_3frames
313 Usage : @prots = Bio::SeqUtils->translate_3frames($seq)
314 Function: Translate a nucleotide sequence in three forward frames.
315 The IDs of the sequences are appended with '-0F', '-1F', '-2F'.
316 Returns : An array of seq objects
317 Args : sequence object
318 same arguments as to Bio::PrimarySeqI::translate
320 =cut
322 sub translate_3frames {
323 my ($self, $seq, @args ) = @_;
325 $self->throw('Object [$seq] '. 'of class ['. ref($seq). '] can not be translated.')
326 unless $seq->can('translate');
328 my ($stop, $unknown, $frame, $tableid, $fullCDS, $throw) = @args;
329 my @seqs;
330 my $f = 0;
331 while ($f != 3) {
332 my $translation = $seq->translate($stop, $unknown,$f,$tableid, $fullCDS, $throw );
333 $translation->id($seq->id. "-". $f. "F");
334 push @seqs, $translation;
335 $f++;
338 return @seqs;
341 =head2 translate_6frames
343 Title : translate_6frames
344 Usage : @prots = Bio::SeqUtils->translate_6frames($seq)
345 Function: translate a nucleotide sequence in all six frames
346 The IDs of the sequences are appended with '-0F', '-1F', '-2F',
347 '-0R', '-1R', '-2R'.
348 Returns : An array of seq objects
349 Args : sequence object
350 same arguments as to Bio::PrimarySeqI::translate
352 =cut
354 sub translate_6frames {
355 my ($self, $seq, @args ) = @_;
357 my @seqs = $self->translate_3frames($seq, @args);
358 my @seqs2 = $self->translate_3frames($seq->revcom, @args);
359 foreach my $seq2 (@seqs2) {
360 my ($tmp) = $seq2->id;
361 $tmp =~ s/F$/R/g;
362 $seq2->id($tmp);
364 return @seqs, @seqs2;
368 =head2 valid_aa
370 Title : valid_aa
371 Usage : my @aa = $table->valid_aa
372 Function: Retrieves a list of the valid amino acid codes.
373 The list is ordered so that first 21 codes are for unique
374 amino acids. The rest are ['B', 'Z', 'X', '*'].
375 Returns : array of all the valid amino acid codes
376 Args : [optional] $code => [0 -> return list of 1 letter aa codes,
377 1 -> return list of 3 letter aa codes,
378 2 -> return associative array of both ]
380 =cut
382 sub valid_aa{
383 my ($self,$code) = @_;
385 if( ! $code ) {
386 my @codes;
387 foreach my $c ( sort values %ONECODE ) {
388 push @codes, $c unless ( $c =~ /[BZX\*]/ );
390 push @codes, qw(B Z X *); # so they are in correct order ?
391 return @codes;
393 elsif( $code == 1 ) {
394 my @codes;
395 foreach my $c ( sort keys %ONECODE ) {
396 push @codes, $c unless ( $c =~ /(Asx|Glx|Xaa|Ter)/ );
398 push @codes, ('Asx', 'Glx', 'Xaa', 'Ter' );
399 return @codes;
401 elsif( $code == 2 ) {
402 my %codes = %ONECODE;
403 foreach my $c ( keys %ONECODE ) {
404 my $aa = $ONECODE{$c};
405 $codes{$aa} = $c;
407 return %codes;
408 } else {
409 $self->warn("unrecognized code in ".ref($self)." method valid_aa()");
410 return ();
414 =head2 mutate
416 Title : mutate
417 Usage : Bio::SeqUtils->mutate($seq,$mutation1, $mutation2);
418 Function: Inplace editing of the sequence.
420 The second argument can be a Bio::LiveSeq::Mutation object
421 or an array of them. The mutations are applied sequentially
422 checking only that their position is within the current
423 sequence. Insertions are inserted before the given
424 position.
426 Returns : boolean
427 Args : sequence object
428 mutation, a Bio::LiveSeq::Mutation object, or an array of them
430 See L<Bio::LiveSeq::Mutation>.
432 =cut
434 sub mutate {
435 my ($self, $seq, @mutations ) = @_;
437 $self->throw('Object [$seq] '. 'of class ['. ref($seq).
438 '] should be a Bio::PrimarySeqI ')
439 unless $seq->isa('Bio::PrimarySeqI');
440 $self->throw('Object [$mutations[0]] '. 'of class ['. ref($mutations[0]).
441 '] should be a Bio::LiveSeq::Mutation')
442 unless $mutations[0]->isa('Bio::LiveSeq::Mutation');
444 foreach my $mutation (@mutations) {
445 $self->throw('Attempting to mutate sequence beyond its length')
446 unless $mutation->pos - 1 <= $seq->length;
448 my $string = $seq->seq;
449 substr $string, $mutation->pos - 1, $mutation->len, $mutation->seq;
450 $seq->seq($string);
456 =head2 cat
458 Title : cat
459 Usage : Bio::SeqUtils->cat(@seqs);
460 my $catseq=$seqs[0];
461 Function: Concatenates a list of Bio::Seq objects, adding them all on to the
462 end of the first sequence. Annotations and sequence features are
463 copied over from any additional objects, and the coordinates of any
464 copied features are adjusted appropriately.
465 Returns : a boolean
466 Args : array of sequence objects
468 Note that annotations have no sequence locations. If you concatenate
469 sequences with the same annotations they will all be added.
471 =cut
473 sub cat {
474 my ($self, $seq, @seqs) = @_;
475 $self->throw('Object [$seq] '. 'of class ['. ref($seq).
476 '] should be a Bio::PrimarySeqI ')
477 unless $seq->isa('Bio::PrimarySeqI');
480 for my $catseq (@seqs) {
481 $self->throw('Object [$catseq] '. 'of class ['. ref($catseq).
482 '] should be a Bio::PrimarySeqI ')
483 unless $catseq->isa('Bio::PrimarySeqI');
485 $self->throw('Trying to concatenate sequences with different alphabets: '.
486 $seq->display_id. '('. $seq->alphabet. ') and '. $catseq->display_id.
487 '('. $catseq->alphabet. ')')
488 unless $catseq->alphabet eq $seq->alphabet;
491 my $length=$seq->length;
492 $seq->seq($seq->seq.$catseq->seq);
494 # move annotations
495 if ($seq->isa("Bio::AnnotatableI") and $catseq->isa("Bio::AnnotatableI")) {
496 foreach my $key ( $catseq->annotation->get_all_annotation_keys() ) {
498 foreach my $value ( $catseq->annotation->get_Annotations($key) ) {
499 $seq->annotation->add_Annotation($key, $value);
504 # move SeqFeatures
505 if ( $seq->isa('Bio::SeqI') and $catseq->isa('Bio::SeqI')) {
506 for my $feat ($catseq->get_SeqFeatures) {
507 $seq->add_SeqFeature($self->_coord_adjust($feat, $length));
516 =head2 trunc_with_features
518 Title : trunc_with_features
519 Usage : $trunc=Bio::SeqUtils->trunc_with_features($seq, $start, $end);
520 Function: Like Bio::Seq::trunc, but keeps features (adjusting coordinates
521 where necessary. Features that partially overlap the region have
522 their location changed to a Bio::Location::Fuzzy.
523 Returns : A new sequence object
524 Args : A sequence object, start coordinate, end coordinate (inclusive)
527 =cut
529 sub trunc_with_features{
530 use Bio::Range;
531 my ($self,$seq,$start,$end) = @_;
532 $self->throw('Object [$seq] '. 'of class ['. ref($seq).
533 '] should be a Bio::SeqI ')
534 unless $seq->isa('Bio::SeqI');
535 my $trunc=$seq->trunc($start, $end);
536 my $truncrange=Bio::Range->new(-start=>$start, -end=>$end, -strand=>0);
537 # move annotations
538 foreach my $key ( $seq->annotation->get_all_annotation_keys() ) {
539 foreach my $value ( $seq->annotation->get_Annotations($key) ) {
540 $trunc->annotation->add_Annotation($key, $value);
544 # move features
545 foreach ( grep
546 { $_=$self->_coord_adjust($_, 1-$start, $end+1-$start) if $_->overlaps($truncrange) }
547 $seq->get_SeqFeatures ) {
548 $trunc->add_SeqFeature($_);
550 return $trunc;
554 =head2 delete
556 Title : delete
557 Function: cuts a segment out of a sequence and re-joins the left and right fragments
558 (like splicing or digesting and re-ligating a molecule).
559 Positions (and types) of sequence features are adjusted accordingly:
560 Features that span the cut site are converted to split featuress to
561 indicate the disruption.
562 Features that extend into the cut-out fragment are truncated.
563 A new molecule is created and returned.
564 Usage : my $cutseq = Bio::SeqUtils::PbrTools->cut( $seq, 1000, 1100 );
565 Args : a Bio::PrimarySeqI compliant object to cut,
566 first nt of the segment to be deleted
567 last nt of the segment to be deleted
568 optional:
569 hash-ref of options:
570 clone_obj: if true, clone the input sequence object rather
571 than calling "new" on the object's class
573 Returns : a new Bio::Seq object
575 =cut
577 sub delete{
578 my $self = shift;
579 my ( $seq, $left, $right, $opts_ref) = @_ ;
580 $self->throw( 'was expecting 3-4 paramters but got '.@_) unless @_ == 3 || @_ == 4;
582 $self->throw( 'Object of class ['
583 . ref($seq)
584 . '] should be a Bio::PrimarySeqI ' )
585 unless blessed ($seq) && $seq->isa('Bio::PrimarySeqI');
587 $self->throw("Left coordinate ($left) must be >= 1") if $left < 1;
588 if ($right > $seq->length ){
589 $self->throw("Right coordinate ($right) must be less than "
590 . 'sequence length ('. $seq->length .')');
593 # piece together the sequence string of the remaining fragments
594 my $left_seq = $seq->subseq( 1, $left - 1) ;
595 my $right_seq = $seq->subseq( $right + 1, $seq->length);
596 if (!$left_seq || !$right_seq ){
597 $self->throw('could not assemble sequences. At least one of the fragments is empty');
599 my $seq_str = $left_seq . $right_seq;
601 # create the new seq object with the same class as the recipient
602 # or (if requested), make a clone of the existing object. In the
603 # latter case we need to remove sequence features from the cloned
604 # object instead of copying them
605 my $product;
606 if ($opts_ref->{clone_obj}){
607 $product = $self->_new_seq_via_clone( $seq, $seq_str );
608 } else {
609 $product = $self->_new_seq_from_old( $seq, { seq => $seq_str } );
612 # move sequence features
613 if ( $product->isa('Bio::SeqI') && $seq->isa('Bio::SeqI') ) {
614 for my $feat ( $seq->get_SeqFeatures ) {
615 my $adjfeat = $self->_coord_adjust_deletion( $feat, $left, $right );
616 $product->add_SeqFeature( $adjfeat ) if $adjfeat;
621 # add a feature to annotatde the deletion
622 my $deletion_feature = Bio::SeqFeature::Generic->new(
623 -primary_tag => 'misc_feature',
624 -tag => {
625 note => 'deletion of ' . ( $right - $left + 1 ) .'bp'
627 -location => Bio::Location::Simple->new(
628 -start => $left - 1,
629 -end => $left,
630 -location_type => 'IN-BETWEEN'
633 $product->add_SeqFeature( $deletion_feature );
634 return $product;
637 =head2 insert
639 Title : insert
640 Function: inserts a fragment (a Bio::Seq object) into a nother sequence object
641 adding all annotations and features to the final product.
642 Features that span the insertion site are converted to split
643 features to indicate the disruption.
644 A new feature is added to indicate the inserted fragment itself.
645 A new molecule is created and returned.
646 Usage : # insert a fragment after pos 1000
647 my $insert_seq = Bio::SeqUtils::PbrTools->insert(
648 $recipient_seq,
649 $fragment_seq,
650 1000
652 Args : recipient sequence (a Bio::PrimarySeqI compliant object),
653 a fragmetn to insert (Bio::PrimarySeqI compliant object),
654 insertion position (fragment is inserted to the right of this pos)
655 pos=0 will prepend the fragment to the recipient
656 optional:
657 hash-ref of options:
658 clone_obj: if true, clone the input sequence object rather
659 than calling "new" on the object's class
660 Returns : a new Bio::Seq object
662 =cut
664 sub insert{
665 my $self = shift;
666 my ( $recipient, $fragment, $insert_pos, $opts_ref) = @_ ;
667 $self->throw( 'was expecting 3-4 paramters but got '.@_) unless @_ == 3 || @_ == 4;
669 $self->throw( 'Recipient object of class ['
670 . ref($recipient)
671 . '] should be a Bio::PrimarySeqI ' )
672 unless blessed ($recipient) && $recipient->isa('Bio::PrimarySeqI');
674 $self->throw( 'Fragment object of class ['
675 . ref($fragment)
676 . '] should be a Bio::PrimarySeqI ' )
677 unless blessed ($fragment) && $fragment->isa('Bio::PrimarySeqI');
679 $self->throw( 'Can\'t concatenate sequences with different alphabets: '
680 . 'recipient is '.$recipient->alphabet
681 . ' and fragment is '. $fragment->alphabet )
682 unless $recipient->alphabet eq $fragment->alphabet;
684 if ($insert_pos < 0 or $insert_pos > $recipient->length ){
685 $self->throw("insertion position ($insert_pos) must be between 0 and "
686 . 'recipient sequence length ('. $recipient->length .')');
689 if ($fragment->can('is_circular') && $fragment->is_circular){
690 $self->throw('Can\'t insert circular fragments');
693 if ( !$recipient->seq ){
694 $self->throw('Recipient has no sequence, can not insert into this object');
697 # construct raw sequence of the new molecule
698 my $left_seq = $insert_pos > 0
699 ? $recipient->subseq( 1, $insert_pos )
700 : '';
701 my $mid_seq = $fragment->seq;
702 my $right_seq = $insert_pos < $recipient->length
703 ? $recipient->subseq( $insert_pos + 1, $recipient->length)
704 : '';
705 my $seq_str = $left_seq . $mid_seq . $right_seq;
707 # create the new seq object with the same class as the recipient
708 # or (if requested), make a clone of the existing object. In the
709 # latter case we need to remove sequence features from the cloned
710 # object instead of copying them
711 my $product;
712 if ($opts_ref->{clone_obj}){
713 $product = $self->_new_seq_via_clone( $recipient, $seq_str );
714 } else {
715 my @desc;
716 push @desc, 'Inserted fragment: '.$fragment->desc if defined $fragment->desc;
717 push @desc, 'Recipient: '.$recipient->desc if defined $recipient->desc;
718 $product = $self->_new_seq_from_old(
719 $recipient,
721 seq => $seq_str,
722 display_id => $recipient->display_id,
723 accession_number => $recipient->accession_number || '',
724 alphabet => $recipient->alphabet,
725 desc => join('; ', @desc),
726 verbose => $recipient->verbose || $fragment->verbose,
727 is_circular => $recipient->is_circular || 0,
731 } # if clone_obj
733 # move annotations from fragment to product
734 if ( $product->isa("Bio::AnnotatableI") && $fragment->isa("Bio::AnnotatableI") ) {
735 foreach my $key ( $fragment->annotation->get_all_annotation_keys ) {
736 foreach my $value ( $fragment->annotation->get_Annotations($key) ) {
737 $product->annotation->add_Annotation( $key, $value );
742 # move sequence features to product with adjusted coordinates
743 if ( $product->isa('Bio::SeqI') ) {
744 # for the fragment, just shift the features to new position
745 if ( $fragment->isa('Bio::SeqI') ) {
746 for my $feat ( $fragment->get_SeqFeatures ) {
747 my $adjfeat = $self->_coord_adjust( $feat, $insert_pos ) ;
748 $product->add_SeqFeature( $adjfeat ) if $adjfeat ;
751 # for recipient, shift and modify features according to insertion.
752 if ( $recipient->isa('Bio::SeqI') ) {
753 for my $feat ( $recipient->get_SeqFeatures ) {
754 my $adjfeat = $self->_coord_adjust_insertion( $feat, $insert_pos, $fragment->length );
755 $product->add_SeqFeature($adjfeat) if $adjfeat;
760 # add a feature to annotate the insertion
761 my $insertion_feature = Bio::SeqFeature::Generic->new(
762 -start => $insert_pos + 1,
763 -end => $insert_pos + $fragment->length,
764 -primary_tag => 'misc_feature',
765 -tag => {note => 'inserted fragment'},
767 $product->add_SeqFeature( $insertion_feature );
769 return $product;
772 =head2 ligate
774 title : ligate
775 function: pastes a fragment (which can also have features) into a recipient
776 sequence between two "cut" sites, preserving features and adjusting
777 their locations.
778 This is a shortcut for deleting a segment from a sequence object followed
779 by an insertion of a fragmnet and is supposed to be used to simulate
780 in-vitro cloning where a recipient (a vector) is digested and a fragment
781 is then ligated into the recipient molecule. The fragment can be flipped
782 (reverse-complemented with all its features).
783 A new sequence object is returned to represent the product of the reaction.
784 Features and annotations are transferred from the insert to the product
785 and features on the recipient are adjusted according to the methods
786 L</"delete"> amd L</"insert">:
787 Features spanning the insertion site will be split up into two sub-locations.
788 (Sub-)features in the deleted region are themselves deleted.
789 (Sub-)features that extend into the deleted region are truncated.
790 The class of the product object depends on the class of the recipient (vector)
791 sequence object. if it is not possible to instantiate a new
792 object of that class, a Bio::Primaryseq object is created instead.
793 usage : # insert the flipped fragment between positions 1000 and 1100 of the
794 # vector, i.e. everything between these two positions is deleted and
795 # replaced by the fragment
796 my $new_molecule = Bio::Sequtils::Pbrtools->ligate(
797 -recipient => $vector,
798 -fragment => $fragment,
799 -left => 1000,
800 -right => 1100,
801 -flip => 1,
802 -clone_obj => 1
804 args : recipient: the recipient/vector molecule
805 fragment: molecule that is to be ligated into the vector
806 left: left cut site (fragment will be inserted to the right of
807 this position)
808 optional:
809 right: right cut site (fragment will be inseterted to the
810 left of this position). defaults to left+1
811 flip: boolean, if true, the fragment is reverse-complemented
812 (including features) before inserting
813 clone_obj: if true, clone the recipient object to create the product
814 instead of calling "new" on its class
815 returns : a new Bio::Seq object of the ligated fragments
817 =cut
819 sub ligate {
820 my $self = shift;
821 my ($recipient, $fragment, $left, $right, $flip, $clone_obj ) = $self->_rearrange(
822 [qw(RECIPIENT FRAGMENT LEFT RIGHT FLIP CLONE_OBJ )],@_);
823 $self->throw("missing required parameter 'recipient'") unless $recipient;
824 $self->throw("missing required parameter 'fragment'") unless $fragment;
825 $self->throw("missing required parameter 'left'") unless defined $left;
827 $right ||= $left + 1;
829 $self->throw("Fragment must be a Bio::PrimarySeqI compliant object but it is a ".
830 ref($fragment)) unless blessed($fragment) && $fragment->isa('Bio::PrimarySeqI');
832 $fragment = $self->revcom_with_features($fragment) if $flip;
834 my $opts_ref = {};
835 $opts_ref->{clone_obj} = 1 if $clone_obj;
837 # clone in two steps: first delete between the insertion sites,
838 # then insert the fragment. Step 1 is skipped if insert positions
839 # are adjacent (no deletion)
840 my ($product1, $product2) ;
841 eval {
842 if ($right == $left + 1){
843 $product1 = $recipient ;
844 } else {
845 $product1 = $self->delete($recipient, $left + 1, $right - 1, $opts_ref ) ;
848 $self->throw("Failed in step 1 (cut recipient): ".$@) if $@;
849 eval { $product2 = $self->insert( $product1, $fragment, $left, $opts_ref ) };
850 $self->throw("Failed in step 2 (insert fragment): ".$@) if $@;
852 return $product2;
857 =head2 _coord_adjust_deletion
859 title : _coord_adjust_deletion
860 function: recursively adjusts coordinates of seqfeatures on a molecule
861 where a segment has been deleted.
862 (sub)features that span the deletion site become split features.
863 (sub)features that extend into the deletion site are truncated.
864 A note is added to the feature to inform about the size and
865 position of the deletion.
866 usage : my $adjusted_feature = Bio::Sequtils::_coord_adjust_deletion(
867 $feature,
868 $start,
869 $end
871 args : a Bio::SeqFeatureI compliant object,
872 start (inclusive) position of the deletion site,
873 end (inclusive) position of the deletion site
874 returns : a Bio::SeqFeatureI compliant object
876 =cut
878 sub _coord_adjust_deletion {
879 my ($self, $feat, $left, $right)=@_;
881 $self->throw('object [$feat] '. 'of class ['. ref($feat)
882 . '] should be a Bio::SeqFeatureI ')
883 unless $feat->isa('Bio::SeqFeatureI');
884 $self->throw('missing coordinates: need a left and a right position')
885 unless defined $left && defined $right;
887 if ($left > $right){
888 if ($feat->can('is_circular') && $feat->is_circular){
889 # todo handle circular molecules
890 $self->throw('can not yet handle deletions in circular molecules if deletion spans origin');
891 } else {
892 $self->throw("left coordinate ($left) must be less than right ($right)"
893 . " but it was greater");
897 my $deletion = Bio::Location::Simple->new(
898 -start => $left,
899 -end => $right,
901 my $del_length = $right - $left + 1;
903 my @adjsubfeat;
904 for my $subfeat ($feat->get_SeqFeatures) {
905 my $adjsubfeat = $self->_coord_adjust_deletion($subfeat, $left, $right);
906 push @adjsubfeat, $adjsubfeat if $adjsubfeat;
909 my @loc;
910 my $note;
911 for ($feat->location->each_Location) {
912 next if $deletion->contains( $_ ); # this location will be deleted;
913 my $strand = $_->strand;
914 my $type = $_->location_type;
915 my $start = $_->start;
916 my $start_type = $_->can('start_pos_type') ? $_->start_pos_type : undef;
917 my $end = $_->end;
918 my $end_type = $_->can('end_pos_type') ? $_->end_pos_type : undef;
919 my @newcoords=();
920 if ($_->start < $deletion->start && $_->end > $deletion->end){ # split the feature
921 @newcoords = (
922 [ $start, ($deletion->start - 1), $start_type, $end_type ],
923 [ ($deletion->start ), $end - $del_length, $start_type, $end_type ]
925 $note = $del_length . 'bp internal deletion between pos '
926 . ($deletion->start - 1) . ' and ' . $deletion->start;
927 } elsif ($_->contains( $deletion->start )){ # truncate feature end
928 @newcoords = ([$start, ($deletion->start - 1), $start_type, $end_type ]);
929 $note = ($end - $deletion->start + 1). 'bp deleted from feature ';
930 if ($feat->strand){
931 $note .= $feat->strand == 1 ? "3' " : "5' ";
933 $note .= 'end'
934 } elsif ($_->contains( $deletion->end )){ # truncate feature start and shift end
935 @newcoords = ([($deletion->start), $end - $del_length, $start_type, $end_type]);
936 $note = ( $deletion->end - $start + 1) . 'bp deleted from feature ';
937 if ($feat->strand){
938 $note .= $feat->strand == 1 ? "5' end" : "3' end";
939 } else {
940 $note .= 'start';
942 } elsif ($start >= $deletion->end){ # just shift entire location
943 @newcoords = ([$start - $del_length, $end - $del_length, $start_type, $end_type ]);
944 } else { # not affected by deletion
945 @newcoords = ( [$start, $end, $start_type, $end_type ]);
948 # if we have no coordinates, we return nothing
949 # the feature is deleted
950 return unless @newcoords;
952 my @subloc = $self->_location_objects_from_coordinate_list(
953 \@newcoords,
954 $strand,
955 $type
958 # put together final location which could be a split now
959 push @loc, $self->_single_loc_object_from_collection( @subloc );
960 } # each location
962 # create new feature based on original one and move annotation across
963 my $newfeat=Bio::SeqFeature::Generic->new(-primary=>$feat->primary_tag);
964 foreach my $key ( $feat->annotation->get_all_annotation_keys() ) {
965 foreach my $value ( $feat->annotation->get_Annotations($key) ) {
966 $newfeat->annotation->add_Annotation($key, $value);
969 foreach my $key ( $feat->get_all_tags() ) {
970 $newfeat->add_tag_value($key, $feat->get_tag_values($key));
973 # If we have a note about the deleted bases, add it
974 if ($note){
975 $newfeat->add_tag_value('note',$note);
978 # set modified location(s) for the new feature and
979 # add its subfeatures if any
980 my $loc = $self->_single_loc_object_from_collection( @loc );
981 $loc ? $newfeat->location( $loc ) : return ;
982 $newfeat->add_SeqFeature($_) for @adjsubfeat;
984 return $newfeat;
988 =head2 _coord_adjust_insertion
990 title : _coord_adjust_insertion
991 function: recursively adjusts coordinates of seqfeatures on a molecule
992 where another sequence has been inserted.
993 (sub)features that span the insertion site become split features
994 and a note is added about the size and positin of the insertion.
995 Features with an IN-BETWEEN location at the insertion site
996 are lost (such features can only exist between adjacent bases)
997 usage : my $adjusted_feature = Bio::Sequtils::_coord_adjust_insertion(
998 $feature,
999 $insert_pos,
1000 $insert_length
1002 args : a Bio::SeqFeatureI compliant object,
1003 insertion position (insert to the right of this position)
1004 length of inserted fragment
1005 returns : a Bio::SeqFeatureI compliant object
1007 =cut
1009 sub _coord_adjust_insertion {
1010 my ($self, $feat, $insert_pos, $insert_len)=@_;
1012 $self->throw('object [$feat] '. 'of class ['. ref($feat)
1013 . '] should be a Bio::SeqFeatureI ')
1014 unless $feat->isa('Bio::SeqFeatureI');
1015 $self->throw('missing insert position') unless defined $insert_pos;
1016 $self->throw('missing insert length') unless defined $insert_len;
1018 my @adjsubfeat;
1019 for my $subfeat ($feat->get_SeqFeatures) {
1020 push @adjsubfeat, $self->_coord_adjust_insertion($subfeat, $insert_pos, $insert_len);
1023 my @loc;
1024 my $note;
1025 for ($feat->location->each_Location) {
1026 # loose IN-BETWEEN features at the insertion site
1027 if ($_->location_type eq 'IN-BETWEEN' && $_->start == $insert_pos ){
1028 next;
1030 my $strand = $_->strand;
1031 my $type = $_->location_type;
1032 my $start = $_->start;
1033 my $start_type = $_->can('start_pos_type') ? $_->start_pos_type : undef;
1034 my $end = $_->end;
1035 my $end_type = $_->can('end_pos_type') ? $_->end_pos_type : undef;
1036 my @newcoords=();
1037 if ($start <= $insert_pos && $end > $insert_pos){ # split the feature
1038 @newcoords = (
1039 [ $start, $insert_pos, $start_type, $end_type ],
1040 [ ($insert_pos + 1 + $insert_len ), $end + $insert_len, $start_type, $end_type ]
1042 $note = $insert_len . 'bp internal insertion between pos '
1043 . $insert_pos . ' and ' . ( $insert_pos + $insert_len + 1);
1045 } elsif ($start > $insert_pos){ # just shift entire location
1046 @newcoords = ([$start + $insert_len, $end + $insert_len, $start_type, $end_type ]);
1047 } else { # not affected
1048 @newcoords = ( [$start, $end, $start_type, $end_type ]);
1051 # if we have deleted all coordinates, return nothing
1052 # (possible if all locations are IN-BETWEEN)
1053 return unless @newcoords;
1055 my @subloc = $self->_location_objects_from_coordinate_list(
1056 \@newcoords,
1057 $strand,
1058 $type
1060 # put together final location which could be a split now
1061 push @loc, $self->_single_loc_object_from_collection( @subloc );
1062 } # each location
1064 # create new feature based on original one and move annotation across
1065 my $newfeat=Bio::SeqFeature::Generic->new(-primary=>$feat->primary_tag);
1066 foreach my $key ( $feat->annotation->get_all_annotation_keys() ) {
1067 foreach my $value ( $feat->annotation->get_Annotations($key) ) {
1068 $newfeat->annotation->add_Annotation($key, $value);
1071 foreach my $key ( $feat->get_all_tags() ) {
1072 $newfeat->add_tag_value($key, $feat->get_tag_values($key));
1075 # If we have a note about the inserted bases, add it
1076 if ($note){
1077 $newfeat->add_tag_value('note',$note);
1080 # set modified location(s) for the new feature and
1081 # add its subfeatures if any
1082 my $loc = $self->_single_loc_object_from_collection( @loc );
1083 $loc ? $newfeat->location( $loc ) : return ;
1084 $newfeat->add_SeqFeature($_) for @adjsubfeat;
1086 return $newfeat;
1090 =head2 _single_loc_object_from_collection
1092 Title : _single_loc_object_from_collection
1093 Function: takes an array of location objects. Returns either a split
1094 location object if there are more than one locations in the
1095 array or returns the single location if there is only one
1096 Usage : my $loc = _single_loc_object_from_collection( @sublocs );
1097 Args : array of Bio::Location objects
1098 Returns : a single Bio:;Location object containing all locations
1100 =cut
1102 sub _single_loc_object_from_collection {
1103 my ($self, @locs) = @_;
1104 my $loc;
1105 if (@locs > 1){
1106 $loc = Bio::Location::Split->new;
1107 $loc->add_sub_Location( @locs);
1108 } elsif (@locs == 1) {
1109 $loc = shift @locs;
1111 return $loc;
1112 } # _single_loc_object_from_collection
1114 =head2 _location_objects_from_coordinate_list
1116 Title : _location_objects_from_coordinate_list
1117 Function: takes an array-ref of start/end coordinates, a strand and a
1118 type and returns a list of Bio::Location objects (Fuzzy by
1119 default, Simple in case of in-between coordinates).
1120 If location type is not "IN-BETWEEN", individual types may be
1121 passed in for start and end location as per Bio::Location::Fuzzy
1122 documentation.
1123 Usage : my @loc_objs = $self->_location_objects_from_coordinate_list(
1124 \@coords,
1125 $strand,
1126 $type
1128 Args : array-ref of array-refs each containing:
1129 start, end [, start-type, end-type]
1130 where types are optional. If given, must be
1131 a one of ('BEFORE', 'AFTER', 'EXACT','WITHIN', 'BETWEEN')
1132 strand (all locations must be on same strand)
1133 location-type (EXACT, IN-BETWEEN etc)
1134 Returns : list of Bio::Location objects
1136 =cut
1138 sub _location_objects_from_coordinate_list {
1139 my $self = shift;
1140 my ($coords_ref, $strand, $type) = @_;
1141 $self->throw('expected 3 parameters but got '.@_) unless @_ == 3;
1142 $self->throw('first argument must be an ARRAY reference#')
1143 unless ref($coords_ref) eq 'ARRAY';
1145 my @loc;
1146 foreach my $coords_set (@$coords_ref){
1147 my ($start, $end, $start_type, $end_type ) = @$coords_set;
1148 # taken from Bio::SeqUtils::_coord_adjust
1149 unless ( $type eq 'IN-BETWEEN' ) {
1150 my $loc = Bio::Location::Fuzzy->new(
1151 -start => $start,
1152 -end => $end,
1153 -strand => $strand,
1154 -location_type => $type
1156 $loc->start_pos_type( $start_type ) if $start_type;
1157 $loc->end_pos_type( $end_type ) if $end_type;
1158 push @loc, $loc;
1160 else {
1161 push @loc,
1162 Bio::Location::Simple->new(
1163 -start => $start,
1164 -end => $end,
1165 -strand => $strand,
1166 -location_type => $type
1169 } # each coords_set
1170 return @loc;
1171 } # _location_objects_from_coordinate_list
1173 =head2 _new_seq_via_clone
1175 Title : _new_seq_via_clone
1176 Function: clone a sequence object using Bio::Root::Root::clone and set the new sequence string
1177 sequence features are removed.
1178 Usage : my $new_seq = $self->_new_seq_via_clone( $seq_obj, $seq_str );
1179 Args : original seq object [, new sequence string]
1180 Returns : a clone of the original sequence object, optionally with new sequence string
1182 =cut
1184 sub _new_seq_via_clone {
1185 my ($self, $in_seq_obj, $seq_str) = @_;
1186 my $out_seq_obj = $in_seq_obj->clone;
1187 $out_seq_obj->remove_SeqFeatures if $out_seq_obj->can('remove_SeqFeatures');
1188 if ( blessed $out_seq_obj->seq && $out_seq_obj->seq->isa('Bio::PrimarySeq')){
1189 $out_seq_obj->seq->seq( $seq_str );
1190 } else {
1191 $out_seq_obj->seq( $seq_str );
1193 return $out_seq_obj;
1195 } # _new_seq_via_clone
1197 =head2 _new_seq_from_old
1199 Title : _new_seq_from_old
1200 Function: creates a new sequence obejct, if possible of the same class as the old and adds
1201 attributes to it. Also copies annotation across to the new object.
1202 Usage : my $new_seq = $self->_new_seq_from_old( $seq_obj, { seq => $seq_str, display_id => 'some_ID'});
1203 Args : old sequence object
1204 hashref of attributes for the new sequence (sequence string etc.)
1205 Returns : a new Bio::Seq object
1207 =cut
1209 sub _new_seq_from_old {
1210 my ($self, $in_seq_obj, $attr) = @_ ;
1211 $self->throw('attributes must be a hashref') if $attr && ref($attr) ne 'HASH';
1213 my $seqclass;
1214 if ( $in_seq_obj->can_call_new ) {
1215 $seqclass = ref($in_seq_obj);
1216 } else {
1217 $seqclass = 'Bio::Primaryseq';
1218 $self->_attempt_to_load_seq;
1221 my $out_seq_obj = $seqclass->new(
1222 -seq => $attr->{seq} || $in_seq_obj->seq,
1223 -display_id => $attr->{display_id} || $in_seq_obj->display_id,
1224 -accession_number => $attr->{accession_number} || $in_seq_obj->accession_number || '',
1225 -alphabet => $in_seq_obj->alphabet,
1226 -desc => $attr->{desc} || $in_seq_obj->desc,
1227 -verbose => $attr->{verbose} || $in_seq_obj->verbose,
1228 -is_circular => $attr->{is_circular} || $in_seq_obj->is_circular || 0,
1230 # move the annotation across to the product
1231 if ( $out_seq_obj->isa("Bio::AnnotatableI") && $in_seq_obj->isa("Bio::AnnotatableI") ) {
1232 foreach my $key ( $in_seq_obj->annotation->get_all_annotation_keys ) {
1233 foreach my $value ( $in_seq_obj->annotation->get_Annotations($key) ) {
1234 $out_seq_obj->annotation->add_Annotation( $key, $value );
1238 return $out_seq_obj;
1239 } # _new_seq_from_old
1242 =head2 _coord_adjust
1244 Title : _coord_adjust
1245 Usage : my $newfeat=Bio::SeqUtils->_coord_adjust($feature, 100, $seq->length);
1246 Function: Recursive subroutine to adjust the coordinates of a feature
1247 and all its subfeatures. If a sequence length is specified, then
1248 any adjusted features that have locations beyond the boundaries
1249 of the sequence are converted to Bio::Location::Fuzzy objects.
1251 Returns : A Bio::SeqFeatureI compliant object.
1252 Args : A Bio::SeqFeatureI compliant object,
1253 the number of bases to add to the coordinates
1254 (optional) the length of the parent sequence
1257 =cut
1259 sub _coord_adjust {
1260 my ($self, $feat, $add, $length)=@_;
1261 $self->throw('Object [$feat] '. 'of class ['. ref($feat).
1262 '] should be a Bio::SeqFeatureI ')
1263 unless $feat->isa('Bio::SeqFeatureI');
1264 my @adjsubfeat;
1265 for my $subfeat ($feat->get_SeqFeatures) {
1266 push @adjsubfeat, $self->_coord_adjust($subfeat, $add, $length);
1268 my @loc;
1269 for ($feat->location->each_Location) {
1270 my @coords=($_->start, $_->end);
1271 my $strand=$_->strand;
1272 my $type=$_->location_type;
1273 map s/[<>]?(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add+$1>$length) {">$length"} else {$add+$1}/ge, @coords;
1274 push @coords, $_->start_pos_type, $_->end_pos_type;
1275 $coords[2]='BEFORE' if substr($coords[0],0,1) eq '<';
1276 $coords[3]='AFTER' if substr($coords[1],0,1) eq '>';
1277 push @loc, $self->_location_objects_from_coordinate_list(
1278 [\@coords], $strand, $type
1281 my $newfeat=Bio::SeqFeature::Generic->new(-primary=>$feat->primary_tag);
1282 foreach my $key ( $feat->annotation->get_all_annotation_keys() ) {
1283 foreach my $value ( $feat->annotation->get_Annotations($key) ) {
1284 $newfeat->annotation->add_Annotation($key, $value);
1287 foreach my $key ( $feat->get_all_tags() ) {
1288 $newfeat->add_tag_value($key, $feat->get_tag_values($key));
1290 my $loc = $self->_single_loc_object_from_collection( @loc );
1291 $loc ? $newfeat->location( $loc ) : return ;
1292 $newfeat->add_SeqFeature($_) for @adjsubfeat;
1293 return $newfeat;
1297 =head2 revcom_with_features
1299 Title : revcom_with_features
1300 Usage : $revcom=Bio::SeqUtils->revcom_with_features($seq);
1301 Function: Like Bio::Seq::revcom, but keeps features (adjusting coordinates
1302 as appropriate.
1303 Returns : A new sequence object
1304 Args : A sequence object
1307 =cut
1309 sub revcom_with_features{
1310 my ($self,$seq) = @_;
1311 $self->throw('Object [$seq] '. 'of class ['. ref($seq).
1312 '] should be a Bio::SeqI ')
1313 unless $seq->isa('Bio::SeqI');
1314 my $revcom=$seq->revcom;
1316 #move annotations
1317 foreach my $key ( $seq->annotation->get_all_annotation_keys() ) {
1318 foreach my $value ( $seq->annotation->get_Annotations($key) ) {
1319 $revcom->annotation->add_Annotation($key, $value);
1323 #move features
1324 for (map {$self->_feature_revcom($_, $seq->length)} reverse $seq->get_SeqFeatures) {
1325 $revcom->add_SeqFeature($_);
1327 return $revcom;
1330 =head2 _feature_revcom
1332 Title : _feature_revcom
1333 Usage : my $newfeat=Bio::SeqUtils->_feature_revcom($feature, $seq->length);
1334 Function: Recursive subroutine to reverse complement a feature and
1335 all its subfeatures. The length of the parent sequence must be
1336 specified.
1338 Returns : A Bio::SeqFeatureI compliant object.
1339 Args : A Bio::SeqFeatureI compliant object,
1340 the length of the parent sequence
1343 =cut
1345 sub _feature_revcom {
1346 my ($self, $feat, $length)=@_;
1347 $self->throw('Object [$feat] '. 'of class ['. ref($feat).
1348 '] should be a Bio::SeqFeatureI ')
1349 unless $feat->isa('Bio::SeqFeatureI');
1350 my @adjsubfeat;
1351 for my $subfeat ($feat->get_SeqFeatures) {
1352 push @adjsubfeat, $self->_feature_revcom($subfeat, $length);
1354 my @loc;
1355 for ($feat->location->each_Location) {
1356 my $type=$_->location_type;
1357 my $strand;
1358 if ($_->strand==-1) {$strand=1}
1359 elsif ($_->strand==1) {$strand=-1}
1360 else {$strand=$_->strand}
1361 my $newend=$self->_coord_revcom($_->start,
1362 $_->start_pos_type,
1363 $length);
1364 my $newstart=$self->_coord_revcom($_->end,
1365 $_->end_pos_type,
1366 $length);
1367 my $newstart_type=$_->end_pos_type;
1368 $newstart_type='BEFORE' if $_->end_pos_type eq 'AFTER';
1369 $newstart_type='AFTER' if $_->end_pos_type eq 'BEFORE';
1370 my $newend_type=$_->start_pos_type;
1371 $newend_type='BEFORE' if $_->start_pos_type eq 'AFTER';
1372 $newend_type='AFTER' if $_->start_pos_type eq 'BEFORE';
1373 push @loc, $self->_location_objects_from_coordinate_list(
1374 [[$newstart, $newend, $newstart_type, $newend_type]], $strand, $type
1377 my $newfeat=Bio::SeqFeature::Generic->new(-primary=>$feat->primary_tag);
1378 foreach my $key ( $feat->annotation->get_all_annotation_keys() ) {
1379 foreach my $value ( $feat->annotation->get_Annotations($key) ) {
1380 $newfeat->annotation->add_Annotation($key, $value);
1383 foreach my $key ( $feat->get_all_tags() ) {
1384 $newfeat->add_tag_value($key, $feat->get_tag_values($key));
1387 my $loc = $self->_single_loc_object_from_collection( @loc );
1388 $loc ? $newfeat->location( $loc ) : return ;
1390 $newfeat->add_SeqFeature($_) for @adjsubfeat;
1391 return $newfeat;
1394 sub _coord_revcom {
1395 my ($self, $coord, $type, $length)=@_;
1396 if ($type eq 'BETWEEN' or $type eq 'WITHIN') {
1397 $coord=~s/(\d+)(\D*)(\d+)/$length+1-$3.$2.$length+1-$1/ge;
1398 } else {
1399 $coord=~s/(\d+)/$length+1-$1/ge;
1400 $coord=~tr/<>/></;
1401 $coord='>'.$coord if $type eq 'BEFORE' and substr($coord,0,1) ne '>';
1402 $coord='<'.$coord if $type eq 'AFTER' and substr($coord,0,1) ne '<';
1404 return $coord;
1407 =head2 evolve
1409 Title : evolve
1410 Usage : my $newseq = Bio::SeqUtils->
1411 evolve($seq, $similarity, $transition_transversion_rate);
1412 Function: Mutates the sequence by point mutations until the similarity of
1413 the new sequence has decreased to the required level.
1414 Transition/transversion rate is adjustable.
1415 Returns : A new Bio::PrimarySeq object
1416 Args : sequence object
1417 percentage similarity (e.g. 80)
1418 tr/tv rate, optional, defaults to 1 (= 1:1)
1420 Set the verbosity of the Bio::SeqUtils object to positive integer to
1421 see the mutations as they happen.
1423 This method works only on nucleotide sequences. It prints a warning if
1424 you set the target similarity to be less than 25%.
1426 Transition/transversion ratio is an observed attribute of an sequence
1427 comparison. We are dealing here with the transition/transversion rate
1428 that we set for our model of sequence evolution.
1430 =cut
1432 sub evolve {
1433 my ($self, $seq, $sim, $rate) = @_;
1434 $rate ||= 1;
1436 $self->throw('Object [$seq] '. 'of class ['. ref($seq).
1437 '] should be a Bio::PrimarySeqI ')
1438 unless $seq->isa('Bio::PrimarySeqI');
1440 $self->throw("[$sim] ". ' should be a positive integer or float under 100')
1441 unless $sim =~ /^[+\d.]+$/ and $sim <= 100;
1443 $self->warn("Nucleotide sequences are 25% similar by chance.
1444 Do you really want to set similarity to [$sim]%?\n")
1445 unless $sim >25 ;
1447 $self->throw('Only nucleotide sequences are supported')
1448 if $seq->alphabet eq 'protein';
1451 # arrays of possible changes have transitions as first items
1452 my %changes;
1453 $changes{'a'} = ['t', 'c', 'g'];
1454 $changes{'t'} = ['a', 'c', 'g'];
1455 $changes{'c'} = ['g', 'a', 't'];
1456 $changes{'g'} = ['c', 'a', 't'];
1459 # given the desired rate, find out where cut off points need to be
1460 # when random numbers are generated from 0 to 100
1461 # we are ignoring identical mutations (e.g. A->A) to speed things up
1462 my $bin_size = 100/($rate + 2);
1463 my $transition = 100 - (2*$bin_size);
1464 my $first_transversion = $transition + $bin_size;
1466 # unify the look of sequence strings
1467 my $string = lc $seq->seq; # lower case
1468 $string =~ s/u/t/; # simplyfy our life; modules should deal with the change anyway
1469 # store the original sequence string
1470 my $oristring = $string;
1471 my $length = $seq->length;
1473 while (1) {
1474 # find the location in the string to change
1475 my $loc = int (rand $length) + 1;
1478 # nucleotide to change
1479 my $oldnuc = substr $string, $loc-1, 1;
1480 my $newnuc;
1482 # nucleotide it is changed to
1483 my $choose = rand(100);
1484 if ($choose < $transition ) {
1485 $newnuc = $changes{$oldnuc}[0];
1487 elsif ($choose < $first_transversion ) {
1488 $newnuc = $changes{$oldnuc}[1];
1489 } else {
1490 $newnuc = $changes{$oldnuc}[2];
1493 # do the change
1494 substr $string, $loc-1, 1 , $newnuc;
1496 $self->debug("$loc$oldnuc>$newnuc\n");
1498 # stop evolving if the limit has been reached
1499 last if $self->_get_similarity($oristring, $string) <= $sim;
1503 return new Bio::PrimarySeq(-id => $seq->id. "-$sim",
1504 -description => $seq->description,
1505 -seq => $string
1510 sub _get_similarity {
1511 my ($self, $oriseq, $seq) = @_;
1513 my $len = length($oriseq);
1514 my $c;
1516 for (my $i = 0; $i< $len; $i++ ) {
1517 $c++ if substr($oriseq, $i, 1) eq substr($seq, $i, 1);
1519 return 100 * $c/$len;