change eutil URL
[bioperl-live.git] / Bio / SeqUtils.pm
blobc9290bca88f2e14249c8ca27280a24765a9654ac
1 # BioPerl module for Bio::SeqUtils
3 # Please direct questions and support issues to <bioperl-l@bioperl.org>
5 # Cared for by Heikki Lehvaslaiho <heikki-at-bioperl-dot-org>
7 # Copyright Heikki Lehvaslaiho
9 # You may distribute this module under the same terms as perl itself
11 # POD documentation - main docs before the code
13 =head1 NAME
15 Bio::SeqUtils - Additional methods for PrimarySeq objects
17 =head1 SYNOPSIS
19 use Bio::SeqUtils;
20 # get a Bio::PrimarySeqI compliant object, $seq, somehow
21 $util = Bio::SeqUtils->new();
22 $polypeptide_3char = $util->seq3($seq);
23 # or
24 $polypeptide_3char = Bio::SeqUtils->seq3($seq);
26 # set the sequence string (stored in one char code in the object)
27 Bio::SeqUtils->seq3($seq, $polypeptide_3char);
29 # translate a sequence in all six frames
30 @seqs = Bio::SeqUtils->translate_6frames($seq);
32 # inplace editing of the sequence
33 Bio::SeqUtils->mutate($seq,
34 Bio::LiveSeq::Mutation->new(-seq => 'c',
35 -pos => 3
36 ));
37 # mutate a sequence to desired similarity%
38 $newseq = Bio::SeqUtils-> evolve
39 ($seq, $similarity, $transition_transversion_rate);
41 # concatenate two or more sequences with annotations and features,
42 # the first sequence will be modified
43 Bio::SeqUtils->cat(@seqs);
44 my $catseq=$seqs[0];
46 # truncate a sequence, retaining features and adjusting their
47 # coordinates if necessary
48 my $truncseq = Bio::SeqUtils->trunc_with_features($seq, 100, 200);
50 # reverse complement a sequence and its features
51 my $revcomseq = Bio::SeqUtils->revcom_with_features($seq);
53 # simulate cloning of a fragment into a vector. Cut the vector at
54 # positions 1000 and 1100 (deleting postions 1001 to 1099) and
55 # "ligate" a fragment into the sites. The fragment is
56 # reverse-complemented in this example (option "flip").
57 # All features of the vector and fragment are preserved and
58 # features that are affected by the deletion/insertion are
59 # modified accordingly.
60 # $vector and $fragment must be Bio::SeqI compliant objects
61 my $new_molecule = Bio::Sequtils->ligate(
62 -vector => $vector,
63 -fragment => $fragment,
64 -left => 1000,
65 -right => 1100,
66 -flip => 1
69 # delete a segment of a seqence (from pos 1000 to 1100, inclusive),
70 # again preserving features and annotations
71 my $new_molecule = Bio::SeqUtils->cut( $seq, 1000, 1100 );
73 # insert a fragment into a recipient between positions 1000 and
74 # 1001. $recipient is a Bio::SeqI compliant object
75 my $new_molecule = Bio::SeqUtils::PbrTools->insert(
76 $recipient_seq,
77 $fragment_seq,
78 1000
81 =head1 DESCRIPTION
83 This class is a holder of methods that work on Bio::PrimarySeqI-
84 compliant sequence objects, e.g. Bio::PrimarySeq and
85 Bio::Seq. These methods are not part of the Bio::PrimarySeqI
86 interface and should in general not be essential to the primary function
87 of sequence objects. If you are thinking of adding essential
88 functions, it might be better to create your own sequence class.
89 See L<Bio::PrimarySeqI>, L<Bio::PrimarySeq>, and L<Bio::Seq> for more.
91 The methods take as their first argument a sequence object. It is
92 possible to use methods without first creating a SeqUtils object,
93 i.e. use it as an anonymous hash.
95 The first two methods, seq3() and seq3in(), give out or read in protein
96 sequences coded in three letter IUPAC amino acid codes.
98 The next two methods, translate_3frames() and translate_6frames(), wrap
99 around the standard translate method to give back an array of three
100 forward or all six frame translations.
102 The mutate() method mutates the sequence string with a mutation
103 description object.
105 The cat() method concatenates two or more sequences. The first sequence
106 is modified by addition of the remaining sequences. All annotations and
107 sequence features will be transferred.
109 The revcom_with_features() and trunc_with_features() methods are similar
110 to the revcom() and trunc() methods from Bio::Seq, but also adjust any
111 features associated with the sequence as appropriate.
113 There are also methods that simulate molecular cloning with rich
114 sequence objects.
115 The delete() method cuts a segment out of a sequence and re-joins the
116 left and right fragments (like splicing or digesting and re-ligating a
117 molecule). Positions (and types) of sequence features are adjusted
118 accordingly:
119 Features that span the deleted segment are converted to split featuress
120 to indicate the disruption. (Sub)Features that extend into the deleted
121 segment are truncated.
122 A new molecule is created and returned.
124 The insert() method inserts a fragment (which can be a rich Bio::Seq
125 object) into another sequence object adding all annotations and
126 features to the final product.
127 Features that span the insertion site are converted to split features
128 to indicate the disruption.
129 A new feature is added to indicate the inserted fragment itself.
130 A new molecule is created and returned.
132 The ligate() method simulates digesting a recipient (vector) and
133 ligating a fragment into it, which can also be flipped if needed. It
134 is simply a combination of a deletion and an insertion step and
135 returns a new molecule. The rules for modifying feature locations
136 outlined above are also used here, e.g. features that span the cut
137 sites are converted to split features with truncated sub-locations.
140 =head1 FEEDBACK
142 =head2 Mailing Lists
144 User feedback is an integral part of the evolution of this and other
145 Bioperl modules. Send your comments and suggestions preferably to one
146 of the Bioperl mailing lists. Your participation is much appreciated.
148 bioperl-l@bioperl.org - General discussion
149 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
151 =head2 Support
153 Please direct usage questions or support issues to the mailing list:
155 I<bioperl-l@bioperl.org>
157 rather than to the module maintainer directly. Many experienced and
158 reponsive experts will be able look at the problem and quickly
159 address it. Please include a thorough description of the problem
160 with code and data examples if at all possible.
162 =head2 Reporting Bugs
164 Report bugs to the Bioperl bug tracking system to help us keep track
165 the bugs and their resolution. Bug reports can be submitted via the
166 web:
168 https://redmine.open-bio.org/projects/bioperl/
170 =head1 AUTHOR - Heikki Lehvaslaiho
172 Email: heikki-at-bioperl-dot-org
174 =head1 CONTRIBUTORS
176 Roy R. Chaudhuri - roy.chaudhuri at gmail.com
177 Frank Schwach - frank.schwach@sanger.ac.uk
179 =head1 APPENDIX
181 The rest of the documentation details each of the object
182 methods. Internal methods are usually preceded with a _
184 =cut
186 # Let the code begin...
188 package Bio::SeqUtils;
189 use strict;
190 use warnings;
191 use Scalar::Util qw(blessed);
192 use parent qw(Bio::Root::Root);
194 # new inherited from RootI
196 our %ONECODE = (
197 'Ala' => 'A',
198 'Asx' => 'B',
199 'Cys' => 'C',
200 'Asp' => 'D',
201 'Glu' => 'E',
202 'Phe' => 'F',
203 'Gly' => 'G',
204 'His' => 'H',
205 'Ile' => 'I',
206 'Lys' => 'K',
207 'Leu' => 'L',
208 'Met' => 'M',
209 'Asn' => 'N',
210 'Pro' => 'P',
211 'Gln' => 'Q',
212 'Arg' => 'R',
213 'Ser' => 'S',
214 'Thr' => 'T',
215 'Val' => 'V',
216 'Trp' => 'W',
217 'Xaa' => 'X',
218 'Tyr' => 'Y',
219 'Glx' => 'Z',
220 'Ter' => '*',
221 'Sec' => 'U',
222 'Pyl' => 'O',
223 'Xle' => 'J'
226 our %THREECODE = (
227 'A' => 'Ala',
228 'B' => 'Asx',
229 'C' => 'Cys',
230 'D' => 'Asp',
231 'E' => 'Glu',
232 'F' => 'Phe',
233 'G' => 'Gly',
234 'H' => 'His',
235 'I' => 'Ile',
236 'K' => 'Lys',
237 'L' => 'Leu',
238 'M' => 'Met',
239 'N' => 'Asn',
240 'P' => 'Pro',
241 'Q' => 'Gln',
242 'R' => 'Arg',
243 'S' => 'Ser',
244 'T' => 'Thr',
245 'V' => 'Val',
246 'W' => 'Trp',
247 'Y' => 'Tyr',
248 'Z' => 'Glx',
249 'X' => 'Xaa',
250 '*' => 'Ter',
251 'U' => 'Sec',
252 'O' => 'Pyl',
253 'J' => 'Xle'
256 =head2 seq3
258 Title : seq3
259 Usage : $string = Bio::SeqUtils->seq3($seq)
260 Function: Read only method that returns the amino acid sequence as a
261 string of three letter codes. alphabet has to be
262 'protein'. Output follows the IUPAC standard plus 'Ter' for
263 terminator. Any unknown character, including the default
264 unknown character 'X', is changed into 'Xaa'. A noncoded
265 aminoacid selenocystein is recognized (Sec, U).
267 Returns : A scalar
268 Args : character used for stop in the protein sequence optional,
269 defaults to '*' string used to separate the output amino
270 acid codes, optional, defaults to ''
272 =cut
274 sub seq3 {
275 my ( $self, $seq, $stop, $sep ) = @_;
277 $seq->isa('Bio::PrimarySeqI')
278 || $self->throw('Not a Bio::PrimarySeqI object but [$self]');
279 $seq->alphabet eq 'protein'
280 || $self->throw('Not a protein sequence');
282 if ( defined $stop ) {
283 length $stop != 1
284 and $self->throw('One character stop needed, not [$stop]');
285 $THREECODE{$stop} = "Ter";
287 $sep ||= '';
289 my $aa3s;
290 foreach my $aa ( split //, uc $seq->seq ) {
291 $THREECODE{$aa} and $aa3s .= $THREECODE{$aa} . $sep, next;
292 $aa3s .= 'Xaa' . $sep;
294 $sep and substr( $aa3s, -( length $sep ), length $sep ) = '';
295 return $aa3s;
298 =head2 seq3in
300 Title : seq3in
301 Usage : $seq = Bio::SeqUtils->seq3in($seq, 'MetGlyTer')
302 Function: Method for changing of the sequence of a
303 Bio::PrimarySeqI sequence object. The three letter amino
304 acid input string is converted into one letter code. Any
305 unknown character triplet, including the default 'Xaa', is
306 converted into 'X'.
308 Returns : Bio::PrimarySeq object
309 Args : sequence string
310 optional character to be used for stop in the protein sequence,
311 defaults to '*'
312 optional character to be used for unknown in the protein sequence,
313 defaults to 'X'
315 =cut
317 sub seq3in {
318 my ( $self, $seq, $string, $stop, $unknown ) = @_;
320 $seq->isa('Bio::PrimarySeqI')
321 || $self->throw("Not a Bio::PrimarySeqI object but [$self]");
322 $seq->alphabet eq 'protein'
323 || $self->throw('Not a protein sequence');
325 if ( defined $stop ) {
326 length $stop != 1
327 and $self->throw("One character stop needed, not [$stop]");
328 $ONECODE{'Ter'} = $stop;
330 if ( defined $unknown ) {
331 length $unknown != 1
332 and $self->throw("One character stop needed, not [$unknown]");
333 $ONECODE{'Xaa'} = $unknown;
336 my ( $aas, $aa3 );
337 my $length = ( length $string ) - 2;
338 for ( my $i = 0 ; $i < $length ; $i += 3 ) {
339 $aa3 = substr( $string, $i, 3 );
340 $aa3 = ucfirst( lc($aa3) );
341 $ONECODE{$aa3} and $aas .= $ONECODE{$aa3}, next;
342 $aas .= $ONECODE{'Xaa'};
344 $seq->seq($aas);
345 return $seq;
348 =head2 translate_3frames
350 Title : translate_3frames
351 Usage : @prots = Bio::SeqUtils->translate_3frames($seq)
352 Function: Translate a nucleotide sequence in three forward frames.
353 The IDs of the sequences are appended with '-0F', '-1F', '-2F'.
354 Returns : An array of seq objects
355 Args : sequence object
356 same arguments as to Bio::PrimarySeqI::translate
358 =cut
360 sub translate_3frames {
361 my ( $self, $seq, @args ) = @_;
363 $self->throw( 'Object [$seq] '
364 . 'of class ['
365 . ref($seq)
366 . '] can not be translated.' )
367 unless $seq->can('translate');
369 my ( $stop, $unknown, $frame, $tableid, $fullCDS, $throw ) = @args;
370 my @seqs;
371 my $f = 0;
372 while ( $f != 3 ) {
373 my $translation =
374 $seq->translate( $stop, $unknown, $f, $tableid, $fullCDS, $throw );
375 $translation->id( $seq->id . "-" . $f . "F" );
376 push @seqs, $translation;
377 $f++;
380 return @seqs;
383 =head2 translate_6frames
385 Title : translate_6frames
386 Usage : @prots = Bio::SeqUtils->translate_6frames($seq)
387 Function: translate a nucleotide sequence in all six frames
388 The IDs of the sequences are appended with '-0F', '-1F', '-2F',
389 '-0R', '-1R', '-2R'.
390 Returns : An array of seq objects
391 Args : sequence object
392 same arguments as to Bio::PrimarySeqI::translate
394 =cut
396 sub translate_6frames {
397 my ( $self, $seq, @args ) = @_;
399 my @seqs = $self->translate_3frames( $seq, @args );
400 my @seqs2 = $self->translate_3frames( $seq->revcom, @args );
401 foreach my $seq2 (@seqs2) {
402 my ($tmp) = $seq2->id;
403 $tmp =~ s/F$/R/g;
404 $seq2->id($tmp);
406 return @seqs, @seqs2;
409 =head2 valid_aa
411 Title : valid_aa
412 Usage : my @aa = $table->valid_aa
413 Function: Retrieves a list of the valid amino acid codes.
414 The list is ordered so that first 21 codes are for unique
415 amino acids. The rest are ['B', 'Z', 'X', '*'].
416 Returns : array of all the valid amino acid codes
417 Args : [optional] $code => [0 -> return list of 1 letter aa codes,
418 1 -> return list of 3 letter aa codes,
419 2 -> return associative array of both ]
421 =cut
423 sub valid_aa {
424 my ( $self, $code ) = @_;
426 if ( !$code ) {
427 my @codes;
428 foreach my $c ( sort values %ONECODE ) {
429 push @codes, $c unless ( $c =~ /[BZX\*]/ );
431 push @codes, qw(B Z X *); # so they are in correct order ?
432 return @codes;
434 elsif ( $code == 1 ) {
435 my @codes;
436 foreach my $c ( sort keys %ONECODE ) {
437 push @codes, $c unless ( $c =~ /(Asx|Glx|Xaa|Ter)/ );
439 push @codes, ( 'Asx', 'Glx', 'Xaa', 'Ter' );
440 return @codes;
442 elsif ( $code == 2 ) {
443 my %codes = %ONECODE;
444 foreach my $c ( keys %ONECODE ) {
445 my $aa = $ONECODE{$c};
446 $codes{$aa} = $c;
448 return %codes;
450 else {
451 $self->warn(
452 "unrecognized code in " . ref($self) . " method valid_aa()" );
453 return ();
457 =head2 mutate
459 Title : mutate
460 Usage : Bio::SeqUtils->mutate($seq,$mutation1, $mutation2);
461 Function: Inplace editing of the sequence.
463 The second argument can be a Bio::LiveSeq::Mutation object
464 or an array of them. The mutations are applied sequentially
465 checking only that their position is within the current
466 sequence. Insertions are inserted before the given
467 position.
469 Returns : boolean
470 Args : sequence object
471 mutation, a Bio::LiveSeq::Mutation object, or an array of them
473 See L<Bio::LiveSeq::Mutation>.
475 =cut
477 sub mutate {
478 my ( $self, $seq, @mutations ) = @_;
480 $self->throw( 'Object [$seq] '
481 . 'of class ['
482 . ref($seq)
483 . '] should be a Bio::PrimarySeqI ' )
484 unless $seq->isa('Bio::PrimarySeqI');
485 $self->throw( 'Object [$mutations[0]] '
486 . 'of class ['
487 . ref( $mutations[0] )
488 . '] should be a Bio::LiveSeq::Mutation' )
489 unless $mutations[0]->isa('Bio::LiveSeq::Mutation');
491 foreach my $mutation (@mutations) {
492 $self->throw('Attempting to mutate sequence beyond its length')
493 unless $mutation->pos - 1 <= $seq->length;
495 my $string = $seq->seq;
496 substr $string, $mutation->pos - 1, $mutation->len, $mutation->seq;
497 $seq->seq($string);
502 =head2 cat
504 Title : cat
505 Usage : Bio::SeqUtils->cat(@seqs);
506 my $catseq=$seqs[0];
507 Function: Concatenates a list of Bio::Seq objects, adding them all on to the
508 end of the first sequence. Annotations and sequence features are
509 copied over from any additional objects, and the coordinates of any
510 copied features are adjusted appropriately.
511 Returns : a boolean
512 Args : array of sequence objects
514 Note that annotations have no sequence locations. If you concatenate
515 sequences with the same annotations they will all be added.
517 =cut
519 sub cat {
520 my ( $self, $seq, @seqs ) = @_;
521 $self->throw( 'Object [$seq] '
522 . 'of class ['
523 . ref($seq)
524 . '] should be a Bio::PrimarySeqI ' )
525 unless $seq->isa('Bio::PrimarySeqI');
527 for my $catseq (@seqs) {
528 $self->throw( 'Object [$catseq] '
529 . 'of class ['
530 . ref($catseq)
531 . '] should be a Bio::PrimarySeqI ' )
532 unless $catseq->isa('Bio::PrimarySeqI');
534 $self->throw(
535 'Trying to concatenate sequences with different alphabets: '
536 . $seq->display_id . '('
537 . $seq->alphabet
538 . ') and '
539 . $catseq->display_id . '('
540 . $catseq->alphabet
541 . ')' )
542 unless $catseq->alphabet eq $seq->alphabet;
544 my $length = $seq->length;
545 $seq->seq( $seq->seq . $catseq->seq );
547 # move annotations
548 if ( $seq->isa("Bio::AnnotatableI")
549 and $catseq->isa("Bio::AnnotatableI") )
551 foreach my $key ( $catseq->annotation->get_all_annotation_keys() ) {
553 foreach my $value ( $catseq->annotation->get_Annotations($key) )
555 $seq->annotation->add_Annotation( $key, $value );
560 # move SeqFeatures
561 if ( $seq->isa('Bio::SeqI') and $catseq->isa('Bio::SeqI') ) {
562 for my $feat ( $catseq->get_SeqFeatures ) {
563 $seq->add_SeqFeature( $self->_coord_adjust( $feat, $length ) );
571 =head2 trunc_with_features
573 Title : trunc_with_features
574 Usage : $trunc=Bio::SeqUtils->trunc_with_features($seq, $start, $end);
575 Function: Like Bio::Seq::trunc, but keeps features (adjusting coordinates
576 where necessary. Features that partially overlap the region have
577 their location changed to a Bio::Location::Fuzzy.
578 Returns : A new sequence object
579 Args : A sequence object, start coordinate, end coordinate (inclusive)
582 =cut
584 sub trunc_with_features {
585 use Bio::Range;
586 my ( $self, $seq, $start, $end ) = @_;
587 $self->throw( 'Object [$seq] '
588 . 'of class ['
589 . ref($seq)
590 . '] should be a Bio::SeqI ' )
591 unless $seq->isa('Bio::SeqI');
592 my $trunc = $seq->trunc( $start, $end );
593 my $truncrange =
594 Bio::Range->new( -start => $start, -end => $end, -strand => 0 );
596 # move annotations
597 foreach my $key ( $seq->annotation->get_all_annotation_keys() ) {
598 foreach my $value ( $seq->annotation->get_Annotations($key) ) {
599 $trunc->annotation->add_Annotation( $key, $value );
603 # move features
604 foreach (
605 grep {
606 $_ = $self->_coord_adjust( $_, 1 - $start, $end + 1 - $start )
607 if $_->overlaps($truncrange)
608 } $seq->get_SeqFeatures
611 $trunc->add_SeqFeature($_);
613 return $trunc;
616 =head2 delete
618 Title : delete
619 Function: cuts a segment out of a sequence and re-joins the left and right fragments
620 (like splicing or digesting and re-ligating a molecule).
621 Positions (and types) of sequence features are adjusted accordingly:
622 Features that span the cut site are converted to split featuress to
623 indicate the disruption.
624 Features that extend into the cut-out fragment are truncated.
625 A new molecule is created and returned.
626 Usage : my $cutseq = Bio::SeqUtils::PbrTools->cut( $seq, 1000, 1100 );
627 Args : a Bio::PrimarySeqI compliant object to cut,
628 first nt of the segment to be deleted
629 last nt of the segment to be deleted
630 optional:
631 hash-ref of options:
632 clone_obj: if true, clone the input sequence object rather
633 than calling "new" on the object's class
635 Returns : a new Bio::Seq object
637 =cut
639 sub delete {
640 my $self = shift;
641 my ( $seq, $left, $right, $opts_ref ) = @_;
642 $self->throw( 'was expecting 3-4 paramters but got ' . @_ )
643 unless @_ == 3 || @_ == 4;
645 $self->throw(
646 'Object of class [' . ref($seq) . '] should be a Bio::PrimarySeqI ' )
647 unless blessed($seq) && $seq->isa('Bio::PrimarySeqI');
649 $self->throw("Left coordinate ($left) must be >= 1") if $left < 1;
650 if ( $right > $seq->length ) {
651 $self->throw( "Right coordinate ($right) must be less than "
652 . 'sequence length ('
653 . $seq->length
654 . ')' );
657 # piece together the sequence string of the remaining fragments
658 my $left_seq = $seq->subseq( 1, $left - 1 );
659 my $right_seq = $seq->subseq( $right + 1, $seq->length );
660 if ( !$left_seq || !$right_seq ) {
661 $self->throw(
662 'could not assemble sequences. At least one of the fragments is empty'
665 my $seq_str = $left_seq . $right_seq;
667 # create the new seq object with the same class as the recipient
668 # or (if requested), make a clone of the existing object. In the
669 # latter case we need to remove sequence features from the cloned
670 # object instead of copying them
671 my $product;
672 if ( $opts_ref->{clone_obj} ) {
673 $product = $self->_new_seq_via_clone( $seq, $seq_str );
675 else {
676 $product = $self->_new_seq_from_old( $seq, { seq => $seq_str } );
679 # move sequence features
680 if ( $product->isa('Bio::SeqI') && $seq->isa('Bio::SeqI') ) {
681 for my $feat ( $seq->get_SeqFeatures ) {
682 my $adjfeat = $self->_coord_adjust_deletion( $feat, $left, $right );
683 $product->add_SeqFeature($adjfeat) if $adjfeat;
687 # add a feature to annotatde the deletion
688 my $deletion_feature = Bio::SeqFeature::Generic->new(
689 -primary_tag => 'misc_feature',
690 -tag => { note => 'deletion of ' . ( $right - $left + 1 ) . 'bp' },
691 -location => Bio::Location::Simple->new(
692 -start => $left - 1,
693 -end => $left,
694 -location_type => 'IN-BETWEEN'
697 $product->add_SeqFeature($deletion_feature);
698 return $product;
701 =head2 insert
703 Title : insert
704 Function: inserts a fragment (a Bio::Seq object) into a nother sequence object
705 adding all annotations and features to the final product.
706 Features that span the insertion site are converted to split
707 features to indicate the disruption.
708 A new feature is added to indicate the inserted fragment itself.
709 A new molecule is created and returned.
710 Usage : # insert a fragment after pos 1000
711 my $insert_seq = Bio::SeqUtils::PbrTools->insert(
712 $recipient_seq,
713 $fragment_seq,
714 1000
716 Args : recipient sequence (a Bio::PrimarySeqI compliant object),
717 a fragmetn to insert (Bio::PrimarySeqI compliant object),
718 insertion position (fragment is inserted to the right of this pos)
719 pos=0 will prepend the fragment to the recipient
720 optional:
721 hash-ref of options:
722 clone_obj: if true, clone the input sequence object rather
723 than calling "new" on the object's class
724 Returns : a new Bio::Seq object
726 =cut
728 sub insert {
729 my $self = shift;
730 my ( $recipient, $fragment, $insert_pos, $opts_ref ) = @_;
731 $self->throw( 'was expecting 3-4 paramters but got ' . @_ )
732 unless @_ == 3 || @_ == 4;
734 $self->throw( 'Recipient object of class ['
735 . ref($recipient)
736 . '] should be a Bio::PrimarySeqI ' )
737 unless blessed($recipient) && $recipient->isa('Bio::PrimarySeqI');
739 $self->throw( 'Fragment object of class ['
740 . ref($fragment)
741 . '] should be a Bio::PrimarySeqI ' )
742 unless blessed($fragment) && $fragment->isa('Bio::PrimarySeqI');
744 $self->throw( 'Can\'t concatenate sequences with different alphabets: '
745 . 'recipient is '
746 . $recipient->alphabet
747 . ' and fragment is '
748 . $fragment->alphabet )
749 unless $recipient->alphabet eq $fragment->alphabet;
751 if ( $insert_pos < 0 or $insert_pos > $recipient->length ) {
752 $self->throw( "insertion position ($insert_pos) must be between 0 and "
753 . 'recipient sequence length ('
754 . $recipient->length
755 . ')' );
758 if ( $fragment->can('is_circular') && $fragment->is_circular ) {
759 $self->throw('Can\'t insert circular fragments');
762 if ( !$recipient->seq ) {
763 $self->throw(
764 'Recipient has no sequence, can not insert into this object');
767 # construct raw sequence of the new molecule
768 my $left_seq =
769 $insert_pos > 0
770 ? $recipient->subseq( 1, $insert_pos )
771 : '';
772 my $mid_seq = $fragment->seq;
773 my $right_seq =
774 $insert_pos < $recipient->length
775 ? $recipient->subseq( $insert_pos + 1, $recipient->length )
776 : '';
777 my $seq_str = $left_seq . $mid_seq . $right_seq;
779 # create the new seq object with the same class as the recipient
780 # or (if requested), make a clone of the existing object. In the
781 # latter case we need to remove sequence features from the cloned
782 # object instead of copying them
783 my $product;
784 if ( $opts_ref->{clone_obj} ) {
785 $product = $self->_new_seq_via_clone( $recipient, $seq_str );
787 else {
788 my @desc;
789 push @desc, 'Inserted fragment: ' . $fragment->desc
790 if defined $fragment->desc;
791 push @desc, 'Recipient: ' . $recipient->desc
792 if defined $recipient->desc;
793 $product = $self->_new_seq_from_old(
794 $recipient,
796 seq => $seq_str,
797 display_id => $recipient->display_id,
798 accession_number => $recipient->accession_number || '',
799 alphabet => $recipient->alphabet,
800 desc => join( '; ', @desc ),
801 verbose => $recipient->verbose || $fragment->verbose,
802 is_circular => $recipient->is_circular || 0,
806 } # if clone_obj
808 # move annotations from fragment to product
809 if ( $product->isa("Bio::AnnotatableI")
810 && $fragment->isa("Bio::AnnotatableI") )
812 foreach my $key ( $fragment->annotation->get_all_annotation_keys ) {
813 foreach my $value ( $fragment->annotation->get_Annotations($key) ) {
814 $product->annotation->add_Annotation( $key, $value );
819 # move sequence features to product with adjusted coordinates
820 if ( $product->isa('Bio::SeqI') ) {
822 # for the fragment, just shift the features to new position
823 if ( $fragment->isa('Bio::SeqI') ) {
824 for my $feat ( $fragment->get_SeqFeatures ) {
825 my $adjfeat = $self->_coord_adjust( $feat, $insert_pos );
826 $product->add_SeqFeature($adjfeat) if $adjfeat;
830 # for recipient, shift and modify features according to insertion.
831 if ( $recipient->isa('Bio::SeqI') ) {
832 for my $feat ( $recipient->get_SeqFeatures ) {
833 my $adjfeat =
834 $self->_coord_adjust_insertion( $feat, $insert_pos,
835 $fragment->length );
836 $product->add_SeqFeature($adjfeat) if $adjfeat;
841 # add a feature to annotate the insertion
842 my $insertion_feature = Bio::SeqFeature::Generic->new(
843 -start => $insert_pos + 1,
844 -end => $insert_pos + $fragment->length,
845 -primary_tag => 'misc_feature',
846 -tag => { note => 'inserted fragment' },
848 $product->add_SeqFeature($insertion_feature);
850 return $product;
853 =head2 ligate
855 title : ligate
856 function: pastes a fragment (which can also have features) into a recipient
857 sequence between two "cut" sites, preserving features and adjusting
858 their locations.
859 This is a shortcut for deleting a segment from a sequence object followed
860 by an insertion of a fragmnet and is supposed to be used to simulate
861 in-vitro cloning where a recipient (a vector) is digested and a fragment
862 is then ligated into the recipient molecule. The fragment can be flipped
863 (reverse-complemented with all its features).
864 A new sequence object is returned to represent the product of the reaction.
865 Features and annotations are transferred from the insert to the product
866 and features on the recipient are adjusted according to the methods
867 L</"delete"> amd L</"insert">:
868 Features spanning the insertion site will be split up into two sub-locations.
869 (Sub-)features in the deleted region are themselves deleted.
870 (Sub-)features that extend into the deleted region are truncated.
871 The class of the product object depends on the class of the recipient (vector)
872 sequence object. if it is not possible to instantiate a new
873 object of that class, a Bio::Primaryseq object is created instead.
874 usage : # insert the flipped fragment between positions 1000 and 1100 of the
875 # vector, i.e. everything between these two positions is deleted and
876 # replaced by the fragment
877 my $new_molecule = Bio::Sequtils::Pbrtools->ligate(
878 -recipient => $vector,
879 -fragment => $fragment,
880 -left => 1000,
881 -right => 1100,
882 -flip => 1,
883 -clone_obj => 1
885 args : recipient: the recipient/vector molecule
886 fragment: molecule that is to be ligated into the vector
887 left: left cut site (fragment will be inserted to the right of
888 this position)
889 optional:
890 right: right cut site (fragment will be inseterted to the
891 left of this position). defaults to left+1
892 flip: boolean, if true, the fragment is reverse-complemented
893 (including features) before inserting
894 clone_obj: if true, clone the recipient object to create the product
895 instead of calling "new" on its class
896 returns : a new Bio::Seq object of the ligated fragments
898 =cut
900 sub ligate {
901 my $self = shift;
902 my ( $recipient, $fragment, $left, $right, $flip, $clone_obj ) =
903 $self->_rearrange( [qw(RECIPIENT FRAGMENT LEFT RIGHT FLIP CLONE_OBJ )],
904 @_ );
905 $self->throw("missing required parameter 'recipient'") unless $recipient;
906 $self->throw("missing required parameter 'fragment'") unless $fragment;
907 $self->throw("missing required parameter 'left'") unless defined $left;
909 $right ||= $left + 1;
911 $self->throw(
912 "Fragment must be a Bio::PrimarySeqI compliant object but it is a "
913 . ref($fragment) )
914 unless blessed($fragment) && $fragment->isa('Bio::PrimarySeqI');
916 $fragment = $self->revcom_with_features($fragment) if $flip;
918 my $opts_ref = {};
919 $opts_ref->{clone_obj} = 1 if $clone_obj;
921 # clone in two steps: first delete between the insertion sites,
922 # then insert the fragment. Step 1 is skipped if insert positions
923 # are adjacent (no deletion)
924 my ( $product1, $product2 );
925 eval {
926 if ( $right == $left + 1 ) {
927 $product1 = $recipient;
929 else {
930 $product1 =
931 $self->delete( $recipient, $left + 1, $right - 1, $opts_ref );
934 $self->throw( "Failed in step 1 (cut recipient): " . $@ ) if $@;
935 eval { $product2 = $self->insert( $product1, $fragment, $left, $opts_ref ) };
936 $self->throw( "Failed in step 2 (insert fragment): " . $@ ) if $@;
938 return $product2;
942 =head2 _coord_adjust_deletion
944 title : _coord_adjust_deletion
945 function: recursively adjusts coordinates of seqfeatures on a molecule
946 where a segment has been deleted.
947 (sub)features that span the deletion site become split features.
948 (sub)features that extend into the deletion site are truncated.
949 A note is added to the feature to inform about the size and
950 position of the deletion.
951 usage : my $adjusted_feature = Bio::Sequtils::_coord_adjust_deletion(
952 $feature,
953 $start,
954 $end
956 args : a Bio::SeqFeatureI compliant object,
957 start (inclusive) position of the deletion site,
958 end (inclusive) position of the deletion site
959 returns : a Bio::SeqFeatureI compliant object
961 =cut
963 sub _coord_adjust_deletion {
964 my ( $self, $feat, $left, $right ) = @_;
966 $self->throw( 'object [$feat] '
967 . 'of class ['
968 . ref($feat)
969 . '] should be a Bio::SeqFeatureI ' )
970 unless $feat->isa('Bio::SeqFeatureI');
971 $self->throw('missing coordinates: need a left and a right position')
972 unless defined $left && defined $right;
974 if ( $left > $right ) {
975 if ( $feat->can('is_circular') && $feat->is_circular ) {
977 # todo handle circular molecules
978 $self->throw(
979 'can not yet handle deletions in circular molecules if deletion spans origin'
982 else {
983 $self->throw(
984 "left coordinate ($left) must be less than right ($right)"
985 . " but it was greater" );
988 my $deletion = Bio::Location::Simple->new(
989 -start => $left,
990 -end => $right,
992 my $del_length = $right - $left + 1;
994 my @adjsubfeat;
995 for my $subfeat ( $feat->get_SeqFeatures ) {
996 my $adjsubfeat =
997 $self->_coord_adjust_deletion( $subfeat, $left, $right );
998 push @adjsubfeat, $adjsubfeat if $adjsubfeat;
1001 my @loc;
1002 my $note;
1003 for ( $feat->location->each_Location ) {
1004 next if $deletion->contains($_); # this location will be deleted;
1005 my $strand = $_->strand;
1006 my $type = $_->location_type;
1007 my $start = $_->start;
1008 my $start_type = $_->can('start_pos_type') ? $_->start_pos_type : undef;
1009 my $end = $_->end;
1010 my $end_type = $_->can('end_pos_type') ? $_->end_pos_type : undef;
1011 my @newcoords = ();
1012 if ( $start < $deletion->start && $end > $deletion->end )
1013 { # split the feature
1014 @newcoords = (
1015 [ $start, ( $deletion->start - 1 ), $start_type, $end_type ],
1017 ( $deletion->start ), $end - $del_length,
1018 $start_type, $end_type
1021 $note =
1022 $del_length
1023 . 'bp internal deletion between pos '
1024 . ( $deletion->start - 1 ) . ' and '
1025 . $deletion->start;
1027 elsif ( $_->start < $deletion->start && $_->end >= $deletion->start )
1028 { # truncate feature end
1029 @newcoords =
1030 ( [ $start, ( $deletion->start - 1 ), $start_type, $end_type ] );
1031 $note =
1032 ( $end - $deletion->start + 1 ) . 'bp deleted from feature ';
1033 if ( $feat->strand ) {
1034 $note .= $feat->strand == 1 ? "3' " : "5' ";
1036 $note .= 'end';
1038 elsif ( $_->start <= $deletion->end && $_->end > $deletion->end )
1039 { # truncate feature start and shift end
1040 @newcoords = (
1042 ( $deletion->start ), $end - $del_length,
1043 $start_type, $end_type
1046 $note =
1047 ( $deletion->end - $start + 1 ) . 'bp deleted from feature ';
1048 if ( $feat->strand ) {
1049 $note .= $feat->strand == 1 ? "5' end" : "3' end";
1051 else {
1052 $note .= 'start';
1055 elsif ( $start >= $deletion->end ) { # just shift entire location
1056 @newcoords = (
1058 $start - $del_length, $end - $del_length,
1059 $start_type, $end_type
1063 else { # not affected by deletion
1064 @newcoords = ( [ $start, $end, $start_type, $end_type ] );
1067 # if we have no coordinates, we return nothing
1068 # the feature is deleted
1069 return unless @newcoords;
1071 my @subloc =
1072 $self->_location_objects_from_coordinate_list( \@newcoords, $strand,
1073 $type );
1074 push @loc, $self->_single_loc_object_from_collection(@subloc);
1075 } # each location
1077 # create new feature based on original one and move annotation across
1078 my $newfeat =
1079 Bio::SeqFeature::Generic->new( -primary => $feat->primary_tag );
1080 foreach my $key ( $feat->annotation->get_all_annotation_keys() ) {
1081 foreach my $value ( $feat->annotation->get_Annotations($key) ) {
1082 $newfeat->annotation->add_Annotation( $key, $value );
1085 foreach my $key ( $feat->get_all_tags() ) {
1086 $newfeat->add_tag_value( $key, $feat->get_tag_values($key) );
1089 # If we have a note about the deleted bases, add it
1090 if ($note) {
1091 $newfeat->add_tag_value( 'note', $note );
1094 # set modified location(s) for the new feature and
1095 # add its subfeatures if any
1096 my $loc = $self->_single_loc_object_from_collection(@loc);
1097 $loc ? $newfeat->location($loc) : return;
1098 $newfeat->add_SeqFeature($_) for @adjsubfeat;
1100 return $newfeat;
1104 =head2 _coord_adjust_insertion
1106 title : _coord_adjust_insertion
1107 function: recursively adjusts coordinates of seqfeatures on a molecule
1108 where another sequence has been inserted.
1109 (sub)features that span the insertion site become split features
1110 and a note is added about the size and positin of the insertion.
1111 Features with an IN-BETWEEN location at the insertion site
1112 are lost (such features can only exist between adjacent bases)
1113 usage : my $adjusted_feature = Bio::Sequtils::_coord_adjust_insertion(
1114 $feature,
1115 $insert_pos,
1116 $insert_length
1118 args : a Bio::SeqFeatureI compliant object,
1119 insertion position (insert to the right of this position)
1120 length of inserted fragment
1121 returns : a Bio::SeqFeatureI compliant object
1123 =cut
1125 sub _coord_adjust_insertion {
1126 my ( $self, $feat, $insert_pos, $insert_len ) = @_;
1128 $self->throw( 'object [$feat] '
1129 . 'of class ['
1130 . ref($feat)
1131 . '] should be a Bio::SeqFeatureI ' )
1132 unless $feat->isa('Bio::SeqFeatureI');
1133 $self->throw('missing insert position') unless defined $insert_pos;
1134 $self->throw('missing insert length') unless defined $insert_len;
1136 my @adjsubfeat;
1137 for my $subfeat ( $feat->get_SeqFeatures ) {
1138 push @adjsubfeat,
1139 $self->_coord_adjust_insertion( $subfeat, $insert_pos, $insert_len );
1142 my @loc;
1143 my $note;
1144 for ( $feat->location->each_Location ) {
1146 # loose IN-BETWEEN features at the insertion site
1147 next
1148 if ( $_->location_type eq 'IN-BETWEEN' && $_->start == $insert_pos );
1149 my $strand = $_->strand;
1150 my $type = $_->location_type;
1151 my $start = $_->start;
1152 my $start_type = $_->can('start_pos_type') ? $_->start_pos_type : undef;
1153 my $end = $_->end;
1154 my $end_type = $_->can('end_pos_type') ? $_->end_pos_type : undef;
1155 my @newcoords = ();
1156 if ( $start <= $insert_pos && $end > $insert_pos ) { # split the feature
1157 @newcoords = (
1158 [ $start, $insert_pos, $start_type, $end_type ],
1160 ( $insert_pos + 1 + $insert_len ), $end + $insert_len,
1161 $start_type, $end_type
1164 $note =
1165 $insert_len
1166 . 'bp internal insertion between pos '
1167 . $insert_pos . ' and '
1168 . ( $insert_pos + $insert_len + 1 );
1171 elsif ( $start > $insert_pos ) { # just shift entire location
1172 @newcoords = (
1174 $start + $insert_len, $end + $insert_len,
1175 $start_type, $end_type
1179 else { # not affected
1180 @newcoords = ( [ $start, $end, $start_type, $end_type ] );
1183 # if we have deleted all coordinates, return nothing
1184 # (possible if all locations are IN-BETWEEN)
1185 return unless @newcoords;
1187 my @subloc =
1188 $self->_location_objects_from_coordinate_list( \@newcoords, $strand,
1189 $type );
1191 # put together final location which could be a split now
1192 push @loc, $self->_single_loc_object_from_collection(@subloc);
1193 } # each location
1195 # create new feature based on original one and move annotation across
1196 my $newfeat =
1197 Bio::SeqFeature::Generic->new( -primary => $feat->primary_tag );
1198 foreach my $key ( $feat->annotation->get_all_annotation_keys() ) {
1199 foreach my $value ( $feat->annotation->get_Annotations($key) ) {
1200 $newfeat->annotation->add_Annotation( $key, $value );
1203 foreach my $key ( $feat->get_all_tags() ) {
1204 $newfeat->add_tag_value( $key, $feat->get_tag_values($key) );
1207 # If we have a note about the inserted bases, add it
1208 if ($note) {
1209 $newfeat->add_tag_value( 'note', $note );
1212 # set modified location(s) for the new feature and
1213 # add its subfeatures if any
1214 my $loc = $self->_single_loc_object_from_collection(@loc);
1215 $loc ? $newfeat->location($loc) : return;
1216 $newfeat->add_SeqFeature($_) for @adjsubfeat;
1218 return $newfeat;
1222 =head2 _single_loc_object_from_collection
1224 Title : _single_loc_object_from_collection
1225 Function: takes an array of location objects. Returns either a split
1226 location object if there are more than one locations in the
1227 array or returns the single location if there is only one
1228 Usage : my $loc = _single_loc_object_from_collection( @sublocs );
1229 Args : array of Bio::Location objects
1230 Returns : a single Bio:;Location object containing all locations
1232 =cut
1234 sub _single_loc_object_from_collection {
1235 my ( $self, @locs ) = @_;
1236 my $loc;
1237 if ( @locs > 1 ) {
1238 $loc = Bio::Location::Split->new;
1239 $loc->add_sub_Location(@locs);
1241 elsif ( @locs == 1 ) {
1242 $loc = shift @locs;
1244 return $loc;
1245 } # _single_loc_object_from_collection
1247 =head2 _location_objects_from_coordinate_list
1249 Title : _location_objects_from_coordinate_list
1250 Function: takes an array-ref of start/end coordinates, a strand and a
1251 type and returns a list of Bio::Location objects (Fuzzy by
1252 default, Simple in case of in-between coordinates).
1253 If location type is not "IN-BETWEEN", individual types may be
1254 passed in for start and end location as per Bio::Location::Fuzzy
1255 documentation.
1256 Usage : my @loc_objs = $self->_location_objects_from_coordinate_list(
1257 \@coords,
1258 $strand,
1259 $type
1261 Args : array-ref of array-refs each containing:
1262 start, end [, start-type, end-type]
1263 where types are optional. If given, must be
1264 a one of ('BEFORE', 'AFTER', 'EXACT','WITHIN', 'BETWEEN')
1265 strand (all locations must be on same strand)
1266 location-type (EXACT, IN-BETWEEN etc)
1267 Returns : list of Bio::Location objects
1269 =cut
1271 sub _location_objects_from_coordinate_list {
1272 my $self = shift;
1273 my ( $coords_ref, $strand, $type ) = @_;
1274 $self->throw( 'expected 3 parameters but got ' . @_ ) unless @_ == 3;
1275 $self->throw('first argument must be an ARRAY reference#')
1276 unless ref($coords_ref) eq 'ARRAY';
1278 my @loc;
1279 foreach my $coords_set (@$coords_ref) {
1280 my ( $start, $end, $start_type, $end_type ) = @$coords_set;
1282 # taken from Bio::SeqUtils::_coord_adjust
1283 if ( $type ne 'IN-BETWEEN' ) {
1284 my $loc = Bio::Location::Fuzzy->new(
1285 -start => $start,
1286 -end => $end,
1287 -strand => $strand,
1288 -location_type => $type
1290 $loc->start_pos_type($start_type) if $start_type;
1291 $loc->end_pos_type($end_type) if $end_type;
1292 push @loc, $loc;
1294 else {
1295 push @loc,
1296 Bio::Location::Simple->new(
1297 -start => $start,
1298 -end => $end,
1299 -strand => $strand,
1300 -location_type => $type
1303 } # each coords_set
1304 return @loc;
1305 } # _location_objects_from_coordinate_list
1307 =head2 _new_seq_via_clone
1309 Title : _new_seq_via_clone
1310 Function: clone a sequence object using Bio::Root::Root::clone and set the new sequence string
1311 sequence features are removed.
1312 Usage : my $new_seq = $self->_new_seq_via_clone( $seq_obj, $seq_str );
1313 Args : original seq object [, new sequence string]
1314 Returns : a clone of the original sequence object, optionally with new sequence string
1316 =cut
1318 sub _new_seq_via_clone {
1319 my ( $self, $in_seq_obj, $seq_str ) = @_;
1320 my $out_seq_obj = $in_seq_obj->clone;
1321 $out_seq_obj->remove_SeqFeatures if $out_seq_obj->can('remove_SeqFeatures');
1322 if ( blessed $out_seq_obj->seq
1323 && $out_seq_obj->seq->isa('Bio::PrimarySeq') )
1325 $out_seq_obj->seq->seq($seq_str);
1327 else {
1328 $out_seq_obj->seq($seq_str);
1330 return $out_seq_obj;
1332 } # _new_seq_via_clone
1334 =head2 _new_seq_from_old
1336 Title : _new_seq_from_old
1337 Function: creates a new sequence obejct, if possible of the same class as the old and adds
1338 attributes to it. Also copies annotation across to the new object.
1339 Usage : my $new_seq = $self->_new_seq_from_old( $seq_obj, { seq => $seq_str, display_id => 'some_ID'});
1340 Args : old sequence object
1341 hashref of attributes for the new sequence (sequence string etc.)
1342 Returns : a new Bio::Seq object
1344 =cut
1346 sub _new_seq_from_old {
1347 my ( $self, $in_seq_obj, $attr ) = @_;
1348 $self->throw('attributes must be a hashref')
1349 if $attr && ref($attr) ne 'HASH';
1351 my $seqclass;
1352 if ( $in_seq_obj->can_call_new ) {
1353 $seqclass = ref($in_seq_obj);
1355 else {
1356 $seqclass = 'Bio::Primaryseq';
1357 $self->_attempt_to_load_seq;
1360 my $out_seq_obj = $seqclass->new(
1361 -seq => $attr->{seq} || $in_seq_obj->seq,
1362 -display_id => $attr->{display_id} || $in_seq_obj->display_id,
1363 -accession_number => $attr->{accession_number}
1364 || $in_seq_obj->accession_number
1365 || '',
1366 -alphabet => $in_seq_obj->alphabet,
1367 -desc => $attr->{desc} || $in_seq_obj->desc,
1368 -verbose => $attr->{verbose} || $in_seq_obj->verbose,
1369 -is_circular => $attr->{is_circular} || $in_seq_obj->is_circular || 0,
1372 # move the annotation across to the product
1373 if ( $out_seq_obj->isa("Bio::AnnotatableI")
1374 && $in_seq_obj->isa("Bio::AnnotatableI") )
1376 foreach my $key ( $in_seq_obj->annotation->get_all_annotation_keys ) {
1377 foreach my $value ( $in_seq_obj->annotation->get_Annotations($key) )
1379 $out_seq_obj->annotation->add_Annotation( $key, $value );
1383 return $out_seq_obj;
1384 } # _new_seq_from_old
1386 =head2 _coord_adjust
1388 Title : _coord_adjust
1389 Usage : my $newfeat=Bio::SeqUtils->_coord_adjust($feature, 100, $seq->length);
1390 Function: Recursive subroutine to adjust the coordinates of a feature
1391 and all its subfeatures. If a sequence length is specified, then
1392 any adjusted features that have locations beyond the boundaries
1393 of the sequence are converted to Bio::Location::Fuzzy objects.
1395 Returns : A Bio::SeqFeatureI compliant object.
1396 Args : A Bio::SeqFeatureI compliant object,
1397 the number of bases to add to the coordinates
1398 (optional) the length of the parent sequence
1401 =cut
1403 sub _coord_adjust {
1404 my ( $self, $feat, $add, $length ) = @_;
1405 $self->throw( 'Object [$feat] '
1406 . 'of class ['
1407 . ref($feat)
1408 . '] should be a Bio::SeqFeatureI ' )
1409 unless $feat->isa('Bio::SeqFeatureI');
1410 my @adjsubfeat;
1411 for my $subfeat ( $feat->get_SeqFeatures ) {
1412 push @adjsubfeat, $self->_coord_adjust( $subfeat, $add, $length );
1414 my @loc;
1415 for ( $feat->location->each_Location ) {
1416 my @coords = ( $_->start, $_->end );
1417 my $strand = $_->strand;
1418 my $type = $_->location_type;
1419 foreach (@coords) {
1420 $self->throw("can not handle negative feature positions (got: $_)")
1421 if $_ < 0;
1422 if ( $add + $_ < 1 ) {
1423 $_ = '<1';
1425 elsif ( defined $length and $add + $_ > $length ) {
1426 $_ = ">$length";
1428 else {
1429 $_ = $add + $_;
1432 push @loc,
1433 $self->_location_objects_from_coordinate_list( [ \@coords ],
1434 $strand, $type );
1436 my $newfeat =
1437 Bio::SeqFeature::Generic->new( -primary => $feat->primary_tag );
1438 foreach my $key ( $feat->annotation->get_all_annotation_keys() ) {
1439 foreach my $value ( $feat->annotation->get_Annotations($key) ) {
1440 $newfeat->annotation->add_Annotation( $key, $value );
1443 foreach my $key ( $feat->get_all_tags() ) {
1444 $newfeat->add_tag_value( $key, $feat->get_tag_values($key) );
1446 my $loc = $self->_single_loc_object_from_collection(@loc);
1447 $loc ? $newfeat->location($loc) : return;
1448 $newfeat->add_SeqFeature($_) for @adjsubfeat;
1449 return $newfeat;
1452 =head2 revcom_with_features
1454 Title : revcom_with_features
1455 Usage : $revcom=Bio::SeqUtils->revcom_with_features($seq);
1456 Function: Like Bio::Seq::revcom, but keeps features (adjusting coordinates
1457 as appropriate.
1458 Returns : A new sequence object
1459 Args : A sequence object
1462 =cut
1464 sub revcom_with_features {
1465 my ( $self, $seq ) = @_;
1466 $self->throw( 'Object [$seq] '
1467 . 'of class ['
1468 . ref($seq)
1469 . '] should be a Bio::SeqI ' )
1470 unless $seq->isa('Bio::SeqI');
1471 my $revcom = $seq->revcom;
1473 #move annotations
1474 foreach my $key ( $seq->annotation->get_all_annotation_keys() ) {
1475 foreach my $value ( $seq->annotation->get_Annotations($key) ) {
1476 $revcom->annotation->add_Annotation( $key, $value );
1480 #move features
1481 for ( map { $self->_feature_revcom( $_, $seq->length ) }
1482 reverse $seq->get_SeqFeatures )
1484 $revcom->add_SeqFeature($_);
1486 return $revcom;
1489 =head2 _feature_revcom
1491 Title : _feature_revcom
1492 Usage : my $newfeat=Bio::SeqUtils->_feature_revcom($feature, $seq->length);
1493 Function: Recursive subroutine to reverse complement a feature and
1494 all its subfeatures. The length of the parent sequence must be
1495 specified.
1497 Returns : A Bio::SeqFeatureI compliant object.
1498 Args : A Bio::SeqFeatureI compliant object,
1499 the length of the parent sequence
1502 =cut
1504 sub _feature_revcom {
1505 my ( $self, $feat, $length ) = @_;
1506 $self->throw( 'Object [$feat] '
1507 . 'of class ['
1508 . ref($feat)
1509 . '] should be a Bio::SeqFeatureI ' )
1510 unless $feat->isa('Bio::SeqFeatureI');
1511 my @adjsubfeat;
1512 for my $subfeat ( $feat->get_SeqFeatures ) {
1513 push @adjsubfeat, $self->_feature_revcom( $subfeat, $length );
1515 my @loc;
1516 for ( $feat->location->each_Location ) {
1517 my $type = $_->location_type;
1518 my $strand;
1519 if ( $_->strand == -1 ) { $strand = 1 }
1520 elsif ( $_->strand == 1 ) { $strand = -1 }
1521 else { $strand = $_->strand }
1522 my $newend =
1523 $self->_coord_revcom( $_->start, $_->start_pos_type, $length );
1524 my $newstart =
1525 $self->_coord_revcom( $_->end, $_->end_pos_type, $length );
1526 my $newstart_type = $_->end_pos_type;
1527 $newstart_type = 'BEFORE' if $_->end_pos_type eq 'AFTER';
1528 $newstart_type = 'AFTER' if $_->end_pos_type eq 'BEFORE';
1529 my $newend_type = $_->start_pos_type;
1530 $newend_type = 'BEFORE' if $_->start_pos_type eq 'AFTER';
1531 $newend_type = 'AFTER' if $_->start_pos_type eq 'BEFORE';
1532 push @loc,
1533 $self->_location_objects_from_coordinate_list(
1534 [ [ $newstart, $newend, $newstart_type, $newend_type ] ],
1535 $strand, $type );
1537 my $newfeat =
1538 Bio::SeqFeature::Generic->new( -primary => $feat->primary_tag );
1539 foreach my $key ( $feat->annotation->get_all_annotation_keys() ) {
1540 foreach my $value ( $feat->annotation->get_Annotations($key) ) {
1541 $newfeat->annotation->add_Annotation( $key, $value );
1544 foreach my $key ( $feat->get_all_tags() ) {
1545 $newfeat->add_tag_value( $key, $feat->get_tag_values($key) );
1548 my $loc = $self->_single_loc_object_from_collection(@loc);
1549 $loc ? $newfeat->location($loc) : return;
1551 $newfeat->add_SeqFeature($_) for @adjsubfeat;
1552 return $newfeat;
1555 sub _coord_revcom {
1556 my ( $self, $coord, $type, $length ) = @_;
1557 if ( $type eq 'BETWEEN' or $type eq 'WITHIN' ) {
1558 $coord =~ s/(\d+)(\D*)(\d+)/$length+1-$3.$2.$length+1-$1/ge;
1560 else {
1561 $coord =~ s/(\d+)/$length+1-$1/ge;
1562 $coord =~ tr/<>/></;
1563 $coord = '>' . $coord
1564 if $type eq 'BEFORE' and substr( $coord, 0, 1 ) ne '>';
1565 $coord = '<' . $coord
1566 if $type eq 'AFTER' and substr( $coord, 0, 1 ) ne '<';
1568 return $coord;
1571 =head2 evolve
1573 Title : evolve
1574 Usage : my $newseq = Bio::SeqUtils->
1575 evolve($seq, $similarity, $transition_transversion_rate);
1576 Function: Mutates the sequence by point mutations until the similarity of
1577 the new sequence has decreased to the required level.
1578 Transition/transversion rate is adjustable.
1579 Returns : A new Bio::PrimarySeq object
1580 Args : sequence object
1581 percentage similarity (e.g. 80)
1582 tr/tv rate, optional, defaults to 1 (= 1:1)
1584 Set the verbosity of the Bio::SeqUtils object to positive integer to
1585 see the mutations as they happen.
1587 This method works only on nucleotide sequences. It prints a warning if
1588 you set the target similarity to be less than 25%.
1590 Transition/transversion ratio is an observed attribute of an sequence
1591 comparison. We are dealing here with the transition/transversion rate
1592 that we set for our model of sequence evolution.
1594 =cut
1596 sub evolve {
1597 my ( $self, $seq, $sim, $rate ) = @_;
1598 $rate ||= 1;
1600 $self->throw( 'Object [$seq] '
1601 . 'of class ['
1602 . ref($seq)
1603 . '] should be a Bio::PrimarySeqI ' )
1604 unless $seq->isa('Bio::PrimarySeqI');
1606 $self->throw(
1607 "[$sim] " . ' should be a positive integer or float under 100' )
1608 unless $sim =~ /^[+\d.]+$/ and $sim <= 100;
1610 $self->warn(
1611 "Nucleotide sequences are 25% similar by chance.
1612 Do you really want to set similarity to [$sim]%?\n"
1613 ) unless $sim > 25;
1615 $self->throw('Only nucleotide sequences are supported')
1616 if $seq->alphabet eq 'protein';
1618 # arrays of possible changes have transitions as first items
1619 my %changes;
1620 $changes{'a'} = [ 't', 'c', 'g' ];
1621 $changes{'t'} = [ 'a', 'c', 'g' ];
1622 $changes{'c'} = [ 'g', 'a', 't' ];
1623 $changes{'g'} = [ 'c', 'a', 't' ];
1625 # given the desired rate, find out where cut off points need to be
1626 # when random numbers are generated from 0 to 100
1627 # we are ignoring identical mutations (e.g. A->A) to speed things up
1628 my $bin_size = 100 / ( $rate + 2 );
1629 my $transition = 100 - ( 2 * $bin_size );
1630 my $first_transversion = $transition + $bin_size;
1632 # unify the look of sequence strings
1633 my $string = lc $seq->seq; # lower case
1634 $string =~
1635 s/u/t/; # simplyfy our life; modules should deal with the change anyway
1636 # store the original sequence string
1637 my $oristring = $string;
1638 my $length = $seq->length;
1640 # stop evolving if the limit has been reached
1641 until ( $self->_get_similarity( $oristring, $string ) <= $sim ) {
1643 # find the location in the string to change
1644 my $loc = int( rand $length ) + 1;
1646 # nucleotide to change
1647 my $oldnuc = substr $string, $loc - 1, 1;
1648 my $newnuc;
1650 # nucleotide it is changed to
1651 my $choose = rand(100);
1652 if ( $choose < $transition ) {
1653 $newnuc = $changes{$oldnuc}[0];
1655 elsif ( $choose < $first_transversion ) {
1656 $newnuc = $changes{$oldnuc}[1];
1658 else {
1659 $newnuc = $changes{$oldnuc}[2];
1662 # do the change
1663 substr $string, $loc - 1, 1, $newnuc;
1665 $self->debug("$loc$oldnuc>$newnuc\n");
1668 return new Bio::PrimarySeq(
1669 -id => $seq->id . "-$sim",
1670 -description => $seq->description,
1671 -seq => $string
1675 sub _get_similarity {
1676 my ( $self, $oriseq, $seq ) = @_;
1678 my $len = length($oriseq);
1679 my $c;
1681 for ( my $i = 0 ; $i < $len ; $i++ ) {
1682 $c++ if substr( $oriseq, $i, 1 ) eq substr( $seq, $i, 1 );
1684 return 100 * $c / $len;