rework tempfile handling to explicitly close a filehanle - assigning it to undef...
[bioperl-run.git] / Bio / Tools / Run / Alignment / TCoffee.pm
blobfe05a4fd2e0ce14acd1c44139481e4e41f1560be
1 # $Id$
3 # BioPerl module for Bio::Tools::Run::Alignment::TCoffee
5 # Cared for by Jason Stajich, Peter Schattner
7 # Copyright Jason Stajich, Peter Schattner
9 # You may distribute this module under the same terms as perl itself
11 # POD documentation - main docs before the code
13 =head1 NAME
15 Bio::Tools::Run::Alignment::TCoffee - Object for the calculation of a
16 multiple sequence alignment from a set of unaligned sequences or
17 alignments using the TCoffee program
19 =head1 SYNOPSIS
21 # Build a tcoffee alignment factory
22 @params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
23 $factory = new Bio::Tools::Run::Alignment::TCoffee (@params);
25 # Pass the factory a list of sequences to be aligned.
26 $inputfilename = 't/cysprot.fa';
27 # $aln is a SimpleAlign object.
28 $aln = $factory->align($inputfilename);
30 # or where @seq_array is an array of Bio::Seq objects
31 $seq_array_ref = \@seq_array;
32 $aln = $factory->align($seq_array_ref);
34 # Or one can pass the factory a pair of (sub)alignments
35 #to be aligned against each other, e.g.:
37 # where $aln1 and $aln2 are Bio::SimpleAlign objects.
38 $aln = $factory->profile_align($aln1,$aln2);
40 # Or one can pass the factory an alignment and one or more
41 # unaligned sequences to be added to the alignment. For example:
43 # $seq is a Bio::Seq object.
44 $aln = $factory->profile_align($aln1,$seq);
46 There are various additional options and input formats available. See
47 the DESCRIPTION section that follows for additional details.
49 =head1 DESCRIPTION
51 Note: this DESCRIPTION only documents the (Bio)perl interface to
52 TCoffee.
54 There are a number of possible parameters one can pass in TCoffee.
55 One should really read the online manual for the best explaination of
56 all the features. See
57 http://igs-server.cnrs-mrs.fr/~cnotred/Documentation/t_coffee/t_coffee_doc.html
59 These can be specified as parameters when instantiating a new TCoffee
60 object, or through get/set methods of the same name (lowercase).
62 =head1 PARAMETERS FOR ALIGNMENT COMPUTATION
64 =head2 IN
66 Title : IN
67 Description : (optional) input filename, this is specified when
68 align so should not use this directly unless one
69 understand TCoffee program very well.
71 =head2 TYPE
73 Title : TYPE
74 Args : [string] DNA, PROTEIN
75 Description : (optional) set the sequence type, guessed automatically
76 so should not use this directly
78 =head2 PARAMETERS
80 Title : PARAMETERS
81 Description : (optional) Indicates a file containing extra parameters
83 =head2 EXTEND
85 Title : EXTEND
86 Args : 0, 1, or positive value
87 Default : 1
88 Description : Flag indicating that library extension should be
89 carried out when performing multiple alignments, if set
90 to 0 then extension is not made, if set to 1 extension
91 is made on all pairs in the library. If extension is
92 set to another positive value, the extension is only
93 carried out on pairs having a weigth value superior to
94 the specified limit.
96 =head2 DP_NORMALISE
98 Title : DP_NORMALISE
99 Args : 0 or positive value
100 Default : 1000
101 Description : When using a value different from 0, this flag sets the
102 score of the highest scoring pair to 1000.
104 =head2 DP_MODE
106 Title : DP_MODE
107 Args : [string] gotoh_pair_wise, myers_miller_pair_wise,
108 fasta_pair_wise cfasta_pair_wise
109 Default : cfast_fair_wise
110 Description : Indicates the type of dynamic programming used by
111 the program
113 gotoh_pair_wise : implementation of the gotoh algorithm
114 (quadratic in memory and time)
116 myers_miller_pair_wise : implementation of the Myers and Miller
117 dynamic programming algorithm ( quadratic in time and linear in
118 space). This algorithm is recommended for very long sequences. It
119 is about 2 time slower than gotoh. It only accepts tg_mode=1.
121 fasta_pair_wise: implementation of the fasta algorithm. The
122 sequence is hashed, looking for ktuples words. Dynamic programming
123 is only carried out on the ndiag best scoring diagonals. This is
124 much faster but less accurate than the two previous.
126 cfasta_pair_wise : c stands for checked. It is the same
127 algorithm. The dynamic programming is made on the ndiag best
128 diagonals, and then on the 2*ndiags, and so on until the scores
129 converge. Complexity will depend on the level of divergence of the
130 sequences, but will usually be L*log(L), with an accuracy
131 comparable to the two first mode ( this was checked on BaliBase).
133 =head2 KTUPLE
135 Title : KTUPLE
136 Args : numeric value
137 Default : 1 or 2 (1 for protein, 2 for DNA )
139 Description : Indicates the ktuple size for cfasta_pair_wise dp_mode
140 and fasta_pair_wise. It is set to 1 for proteins, and 2
141 for DNA. The alphabet used for protein is not the 20
142 letter code, but a mildly degenerated version, where
143 some residues are grouped under one letter, based on
144 physicochemical properties:
145 rk, de, qh, vilm, fy (the other residues are
146 not degenerated).
148 =head2 NDIAGS
150 Title : NDIAGS
151 Args : numeric value
152 Default : 0
153 Description : Indicates the number of diagonals used by the
154 fasta_pair_wise algorithm. When set to 0,
155 n_diag=Log (length of the smallest sequence)
157 =head2 DIAG_MODE
159 Title : DIAG_MODE
160 Args : numeric value
161 Default : 0
164 Description : Indicates the manner in which diagonals are scored
165 during the fasta hashing.
167 0 indicates that the score of a diagonal is equal to the
168 sum of the scores of the exact matches it contains.
171 1 indicates that this score is set equal to the score of
172 the best uninterrupted segment
174 1 can be useful when dealing with fragments of sequences.
176 =head2 SIM_MATRIX
178 Title : SIM_MATRIX
179 Args : string
180 Default : vasiliky
181 Description : Indicates the manner in which the amino acid is being
182 degenerated when hashing. All the substitution matrix
183 are acceptable. Categories will be defined as sub-group
184 of residues all having a positive substitution score
185 (they can overlap).
187 If you wish to keep the non degenerated amino acid
188 alphabet, use 'idmat'
190 =head2 MATRIX
192 Title : MATRIX
193 Args :
194 Default :
195 Description : This flag is provided for compatibility with
196 ClustalW. Setting matrix = 'blosum' is equivalent to
197 -in=Xblosum62mt , -matrix=pam is equivalent to
198 in=Xpam250mt . Apart from this, the rules are similar
199 to those applying when declaring a matrix with the
200 -in=X fl
202 =head2 GAPOPEN
204 Title : GAPOPEN
205 Args : numeric
206 Default : 0
207 Description : Indicates the penalty applied for opening a gap. The
208 penalty must be negative. If you provide a positive
209 value, it will automatically be turned into a negative
210 number. We recommend a value of 10 with pam matrices,
211 and a value of 0 when a library is used.
213 =head2 GAPEXT
215 Title : GAPEXT
216 Args : numeric
217 Default : 0
218 Description : Indicates the penalty applied for extending a gap.
221 =head2 COSMETIC_PENALTY
223 Title : COSMETIC_PENALTY
224 Args : numeric
225 Default : 100
226 Description : Indicates the penalty applied for opening a gap. This
227 penalty is set to a very low value. It will only have
228 an influence on the portions of the alignment that are
229 unalignable. It will not make them more correct, but
230 only more pleasing to the eye ( i.e. Avoid stretches of
231 lonely residues).
233 The cosmetic penalty is automatically turned off if a
234 substitution matrix is used rather than a library.
236 =head2 TG_MODE
238 Title : TG_MODE
239 Args : 0,1,2
240 Default : 1
241 Description : (Terminal Gaps)
242 0: indicates that terminal gaps must be panelized with
243 a gapopen and a gapext penalty.
244 1: indicates that terminal gaps must be penalized only
245 with a gapext penalty
246 2: indicates that terminal gaps must not be penalized.
248 =head2 WEIGHT
250 Title : WEIGHT
251 Args : sim or sim_<matrix_name or matrix_file> or integer value
252 Default : sim
255 Description : Weight defines the way alignments are weighted when
256 turned into a library.
258 sim indicates that the weight equals the average
259 identity within the match residues.
261 sim_matrix_name indicates the average identity with two
262 residues regarded as identical when their
263 substitution value is positive. The valid matrices
264 names are in matrices.h (pam250mt) . Matrices not
265 found in this header are considered to be
266 filenames. See the format section for matrices. For
267 instance, -weight=sim_pam250mt indicates that the
268 grouping used for similarity will be the set of
269 classes with positive substitutions. Other groups
270 include
272 sim_clustalw_col ( categories of clustalw
273 marked with :)
275 sim_clustalw_dot ( categories of clustalw
276 marked with .)
279 Value indicates that all the pairs found in the
280 alignments must be given the same weight equal to
281 value. This is useful when the alignment one wishes to
282 turn into a library must be given a pre-specified score
283 (for instance if they come from a structure
284 super-imposition program). Value is an integer:
286 -weight=1000
288 Note : Weight only affects methods that return an alignment to
289 T-Coffee, such as ClustalW. On the contrary, the
290 version of Lalign we use here returns a library where
291 weights have already been applied and are therefore
292 insensitive to the -weight flag.
294 =head2 SEQ_TO_ALIGN
296 Title : SEQ_TO_ALIGN
297 Args : filename
298 Default : no file - align all the sequences
300 Description : You may not wish to align all the sequences brought in
301 by the -in flag. Supplying the seq_to_align flag allows
302 for this, the file is simply a list of names in Fasta
303 format.
305 However, note that library extension will be carried out
306 on all the sequences.
308 =head1 PARAMETERS FOR TREE COMPUTATION AND OUTPUT
310 =head2 NEWTREE
312 Title : NEWTREE
313 Args : treefile
314 Default : no file
315 Description : Indicates the name of the new tree to compute. The
316 default will be <sequence_name>.dnd, or <run_name.dnd>.
317 Format is Phylips tree format
319 =head2 USETREE
321 Title : USETREE
322 Args : treefile
323 Default : no file specified
324 Description : This flag indicates that rather than computing a new
325 dendrogram, t_coffee can use a pre-computed one. The
326 tree files are in phylips format and compatible with
327 ClustalW. In most cases, using a pre-computed tree will
328 halve the computation time required by t_coffee. It is
329 also possible to use trees output by ClustalW or
330 Phylips. Format is Phylips tree format
332 =head2 TREE_MODE
334 Title : TREE_MODE
335 Args : slow, fast, very_fast
336 Default : very_fast
337 Description : This flag indicates the method used for computing the
338 dendrogram.
339 slow : the chosen dp_mode using the extended library,
340 fast : The fasta dp_mode using the extended library.
341 very_fast: The fasta dp_mode using pam250mt.
343 =head2 QUICKTREE
345 Title : QUICKTREE
346 Args :
347 Default :
348 Description : This flag is kept for compatibility with ClustalW.
349 It indicates that: -tree_mode=very_fast
351 =head1 PARAMETERS FOR ALIGNMENT OUTPUT
353 =head2 OUTFILE
355 Title : OUTFILE
356 Args : out_aln file, default, no
357 Default : default ( yourseqfile.aln)
358 Description : indicates name of output alignment file
360 =head2 OUTPUT
362 Title : OUTPUT
363 Args : format1, format2
364 Default : clustalw
365 Description : Indicated format for outputting outputfile
366 Supported formats are:
368 clustalw_aln, clustalw: ClustalW format.
369 gcg, msf_aln : Msf alignment.
370 pir_aln : pir alignment.
371 fasta_aln : fasta alignment.
372 phylip : Phylip format.
373 pir_seq : pir sequences (no gap).
374 fasta_seq : fasta sequences (no gap).
375 As well as:
376 score_html : causes the output to be a reliability
377 plot in HTML
378 score_pdf : idem in PDF.
379 score_ps : idem in postscript.
381 More than one format can be indicated:
382 -output=clustalw,gcg, score_html
384 =head2 CASE
386 Title : CASE
387 Args : upper, lower
388 Default : upper
389 Description : triggers choice of the case for output
391 =head2 CPU
393 Title : CPU
394 Args : value
395 Default : 0
396 Description : Indicates the cpu time (micro seconds) that must be
397 added to the t_coffee computation time.
399 =head2 OUT_LIB
401 Title : OUT_LIB
402 Args : name of library, default, no
403 Default : default
404 Description : Sets the name of the library output. Default implies
405 <run_name>.tc_lib
407 =head2 OUTORDER
409 Title : OUTORDER
410 Args : input or aligned
411 Default : input
412 Description : Sets the name of the library output. Default implies
413 <run_name>.tc_lib
415 =head2 SEQNOS
417 Title : SEQNOS
418 Args : on or off
419 Default : off
420 Description : Causes the output alignment to contain residue numbers
421 at the end of each line:
423 =head1 PARAMETERS FOR GENERIC OUTPUT
425 =head2 RUN_NAME
427 Title : RUN_NAME
428 Args : your run name
429 Default :
430 Description : This flag causes the prefix <your sequences> to be
431 replaced by <your run name> when renaming the default
432 files.
434 =head2 ALIGN
436 Title : ALIGN
437 Args :
438 Default :
439 Description : Indicates that the program must produce the
440 alignment. This flag is here for compatibility with
441 ClustalW
443 =head2 QUIET
445 Title : QUIET
446 Args : stderr, stdout, or filename, or nothing
447 Default : stderr
448 Description : Redirects the standard output to either a file.
449 -quiet on its own redirect the output to /dev/null.
451 =head2 CONVERT
453 Title : CONVERT
454 Args :
455 Default :
456 Description : Indicates that the program must not compute the
457 alignment but simply convert all the sequences,
458 alignments and libraries into the format indicated with
459 -output. This flag can also be used if you simply want
460 to compute a library ( i.e. You have an alignment and
461 you want to turn it into a library).
463 =head1 FEEDBACK
465 =head2 Mailing Lists
467 User feedback is an integral part of the evolution of this and other
468 Bioperl modules. Send your comments and suggestions preferably to one
469 of the Bioperl mailing lists. Your participation is much appreciated.
471 bioperl-l@bioperl.org - General discussion
472 http://bio.perl.org/MailList.html - About the mailing lists
474 =head2 Reporting Bugs
476 Report bugs to the Bioperl bug tracking system to help us keep track
477 the bugs and their resolution. Bug reports can be submitted via
478 email or the web:
480 bioperl-bugs@bio.perl.org
481 http://bio.perl.org/bioperl-bugs/
483 =head1 AUTHOR - Jason Stajich, Peter Schattner
485 Email jason@bioperl.org, schattner@alum.mit.edu
487 =head1 APPENDIX
489 The rest of the documentation details each of the object
490 methods. Internal methods are usually preceded with a _
492 =cut
494 package Bio::Tools::Run::Alignment::TCoffee;
496 use vars qw($AUTOLOAD @ISA $PROGRAMNAME $PROGRAM %DEFAULTS
497 @TCOFFEE_PARAMS @TCOFFEE_SWITCHES @OTHER_SWITCHES %OK_FIELD
499 use strict;
500 use Bio::Seq;
501 use Bio::SeqIO;
502 use Bio::SimpleAlign;
503 use Bio::AlignIO;
504 use Bio::Root::Root;
505 use Bio::Root::IO;
506 use Bio::Factory::ApplicationFactoryI;
507 use Bio::Tools::Run::WrapperBase;
508 @ISA = qw(Bio::Root::Root Bio::Tools::Run::WrapperBase
509 Bio::Factory::ApplicationFactoryI);
511 # You will need to enable TCoffee to find the tcoffee program. This can be done
512 # in (at least) twp ways:
513 # 1. define an environmental variable TCOFFEE:
514 # export TCOFFEEDIR=/home/progs/tcoffee or
515 # 2. include a definition of an environmental variable TCOFFEEDIR
516 # in every script that will
517 # use Bio::Tools::Run::Alignment::TCoffee.pm.
518 # BEGIN {$ENV{TCOFFEEDIR} = '/home/progs/tcoffee'; }
520 BEGIN {
521 %DEFAULTS = ( 'MATRIX' => 'blosum',
522 'OUTPUT' => 'clustalw',
523 'AFORMAT'=> 'msf',
524 'METHODS' => [qw(Mlalign_id_pair Mclustalw_pair)]
528 @TCOFFEE_PARAMS = qw(IN TYPE PARAMETERS DO_NORMALISE EXTEND
529 DP_MODE KTUPLE NDIAGS DIAG_MODE SIM_MATRIX
530 MATRIX GAPOPEN GAPEXT COSMETIC_PENALTY TG_MODE
531 WEIGHT SEQ_TO_ALIGN NEWTREE USETREE TREE_MODE
532 OUTFILE OUTPUT CASE CPU OUT_LIB OUTORDER SEQNOS
533 RUN_NAME CONVERT
536 @TCOFFEE_SWITCHES = qw(QUICKTREE);
538 @OTHER_SWITCHES = qw(QUIET ALIGN KEEPDND);
540 # Authorize attribute fields
541 foreach my $attr ( @TCOFFEE_PARAMS, @TCOFFEE_SWITCHES, @OTHER_SWITCHES ) {
542 $OK_FIELD{$attr}++; }
545 =head2 program_name
547 Title : program_name
548 Usage : $factory>program_name()
549 Function: holds the program name
550 Returns: string
551 Args : None
553 =cut
555 sub program_name {
556 return 't_coffee';
559 =head2 program_dir
561 Title : program_dir
562 Usage : $factory->program_dir(@params)
563 Function: returns the program directory, obtiained from ENV variable.
564 Returns: string
565 Args :
567 =cut
569 sub program_dir {
570 return Bio::Root::IO->catfile($ENV{TCOFFEEDIR}) if $ENV{TCOFFEEDIR};
573 sub new {
574 my ($class,@args) = @_;
575 my $self = $class->SUPER::new(@args);
576 my ($attr, $value);
578 while (@args) {
579 $attr = shift @args;
580 $value = shift @args;
581 next if( $attr =~ /^-/); # don't want named parameters
582 $self->$attr($value);
584 $self->matrix($DEFAULTS{'MATRIX'}) unless( $self->matrix );
585 $self->output($DEFAULTS{'OUTPUT'}) unless( $self->output );
586 # $self->aformat($DEFAULTS{'AFORMAT'}) unless $self->aformat;
588 $self->methods($DEFAULTS{'METHODS'}) unless $self->methods;
590 return $self;
593 sub AUTOLOAD {
594 my $self = shift;
595 my $attr = $AUTOLOAD;
596 $attr =~ s/.*:://;
597 $attr = uc $attr;
598 # aliasing
599 $attr = 'OUTFILE' if $attr eq 'OUTFILE_NAME';
600 $self->throw("Unallowed parameter: $attr !") unless $OK_FIELD{$attr};
602 $self->{$attr} = shift if @_;
603 return $self->{$attr};
606 =head2 error_string
608 Title : error_string
609 Usage : $obj->error_string($newval)
610 Function: Where the output from the last analysus run is stored.
611 Returns : value of error_string
612 Args : newvalue (optional)
615 =cut
617 sub error_string{
618 my ($self,$value) = @_;
619 if( defined $value) {
620 $self->{'error_string'} = $value;
622 return $self->{'error_string'};
626 =head2 version
628 Title : version
629 Usage : exit if $prog->version() < 1.8
630 Function: Determine the version number of the program
631 Example :
632 Returns : float or undef
633 Args : none
635 =cut
637 sub version {
638 my ($self) = @_;
639 my $exe;
640 return undef unless $exe = $self->executable;
641 my $string = `$exe -quiet=stdout 2>&1` ;
642 $string =~ /Version_([\d.]+)/;
643 return $1 || undef;
647 =head2 run
649 Title : run
650 Usage : my $output = $application->run(-seq => $seq,
651 -profile => $profile,
652 -type => 'profile-aln');
653 Function: Generic run of an application
654 Returns : Bio::SimpleAlign object
655 Args : key-value parameters allowed for TCoffee runs AND
656 -type => profile-aln or alignment for profile alignments or
657 just multiple sequence alignment
658 -seq => either Bio::PrimarySeqI object OR
659 array ref of Bio::PrimarySeqI objects OR
660 filename of sequences to run with
661 -profile => profile to align to, if this is an array ref
662 will specify the first two entries as the two
663 profiles to align to each other
668 =cut
670 sub run{
671 my ($self,@args) = @_;
672 my ($type,$seq,$profile) = $self->_rearrange([qw(TYPE
674 PROFILE)],
675 @args);
676 if( $type =~ /align/i ) {
677 return $self->align($seq);
678 } elsif( $type =~ /profile/i ) {
679 return $self->profile_align($seq,$profile);
680 } else {
681 $self->warn("unrecognized alignment type $type\n");
683 return undef;
686 =head2 align
688 Title : align
689 Usage :
690 $inputfilename = 't/cysprot.fa';
691 $aln = $factory->align($inputfilename);
693 $seq_array_ref = \@seq_array; @seq_array is array of Seq objs
694 $aln = $factory->align($seq_array_ref);
695 Function: Perform a multiple sequence alignment
696 Example :
697 Returns : Reference to a SimpleAlign object containing the
698 sequence alignment.
699 Args : Name of a file containing a set of unaligned fasta sequences
700 or else an array of references to Bio::Seq objects.
702 Throws an exception if argument is not either a string (eg a
703 filename) or a reference to an array of Bio::Seq objects. If
704 argument is string, throws exception if file corresponding to string
705 name can not be found. If argument is Bio::Seq array, throws
706 exception if less than two sequence objects are in array.
708 =cut
710 sub align {
711 my ($self,$input) = @_;
712 # Create input file pointer
713 $self->io->_io_cleanup();
714 my ($infilename,$type) = $self->_setinput($input);
715 if (!$infilename) {
716 $self->throw("Bad input data or less than 2 sequences in $input !");
719 my $param_string = $self->_setparams();
721 # run tcoffee
722 return &_run($self, 'align', [$infilename,$type], $param_string);
724 #################################################
726 =head2 profile_align
728 Title : profile_align
729 Usage :
730 Function: Perform an alignment of 2 (sub)alignments
731 Example :
732 Returns : Reference to a SimpleAlign object containing the (super)alignment.
733 Args : Names of 2 files containing the subalignments
734 or references to 2 Bio::SimpleAlign objects.
736 Throws an exception if arguments are not either strings (eg filenames)
737 or references to SimpleAlign objects.
740 =cut
742 sub profile_align {
743 my $self = shift;
744 my $input1 = shift;
745 my $input2 = shift;
746 my ($temp,$infilename1,$infilename2,$type1,$type2,$input,$seq);
748 $self->io->_io_cleanup();
749 # Create input file pointers
750 ($infilename1,$type1) = $self->_setinput($input1);
751 ($infilename2,$type2) = $self->_setinput($input2);
753 if (!$infilename1 || !$infilename2) {
754 $self->throw("Bad input data: $input1 or $input2 !");
757 my $param_string = $self->_setparams();
759 # run tcoffee
760 my $aln = $self->_run('profile-aln',
761 [$infilename1,$type1],
762 [$infilename2,$type2],
763 $param_string);
766 #################################################
768 =head2 _run
770 Title : _run
771 Usage : Internal function, not to be called directly
772 Function: makes actual system call to tcoffee program
773 Example :
774 Returns : nothing; tcoffee output is written to a
775 temporary file OR specified output file
776 Args : Name of a file containing a set of unaligned fasta sequences
777 and hash of parameters to be passed to tcoffee
780 =cut
782 sub _run {
783 my ($infilename, $infile1,$infile2) = ('','','');
784 my $self = shift;
785 my $command = shift;
786 my $instring;
787 if ($command =~ /align/) {
788 my $infile = shift ;
789 my $type;
790 ($infilename,$type) = @$infile;
791 $instring = '-in='.join(',',($infilename, 'X'.$self->matrix,
792 $self->methods));
794 if ($command =~ /profile/) {
795 my $in1 = shift ;
796 my $in2 = shift ;
797 my ($type1,$type2);
798 ($infile1,$type1) = @$in1;
799 ($infile2,$type2) = @$in2;
800 $instring = '-in='.join(',',($type1.$infile1, $type2.$infile2,
801 'X'.$self->matrix,
802 $self->methods));
804 my $param_string = shift;
805 # my ($paramfh,$parameterFile) = $self->io->tempfile;
806 # print $paramfh ( "$instring\n-output=gcg$param_string") ;
807 # close($paramfh);
808 # my $commandstring = "t_coffee -output=gcg -parameters $parameterFile" ; ##MJL
810 my $commandstring = $self->executable." $instring $param_string";
812 $self->debug( "tcoffee command = $commandstring \n");
814 my $status = system($commandstring);
815 my $outfile = $self->outfile();
817 $self->throw( "TCoffee call crashed: $? [command $commandstring]\n")
818 if( !-e $outfile || -z $outfile );
820 # unlink ($parameterFile);
823 # retrieve alignment (Note: MSF format for AlignIO = GCG format of
824 # tcoffee)
826 my $in = Bio::AlignIO->new('-file' => $outfile, '-format' =>
827 $self->output);
828 my $aln = $in->next_aln();
830 # Replace file suffix with dnd to find name of dendrogram file(s) to delete
831 if( ! $self->keepdnd ) {
832 foreach my $f ( $infilename, $infile1, $infile2 ) {
833 next if( !defined $f || $f eq '');
834 $f =~ s/\.[^\.]*$// ;
835 # because TCoffee writes these files to the CWD
836 if( $Bio::Root::IO::PATHSEP ) {
837 my @line = split(/$Bio::Root::IO::PATHSEP/, $f);
838 $f = pop @line;
839 } else {
840 (undef, undef, $f) = File::Spec->splitpath($f);
842 unlink $f .'.dnd' if( $f ne '' );
845 return $aln;
849 =head2 _setinput
851 Title : _setinput
852 Usage : Internal function, not to be called directly
853 Function: Create input file for tcoffee program
854 Example :
855 Returns : name of file containing tcoffee data input AND
856 type of file (if known, S for sequence, L for sequence library,
857 A for sequence alignment)
858 Args : Seq or Align object reference or input file name
861 =cut
863 sub _setinput {
864 my ($self,$input) = @_;
865 my ($infilename, $seq, $temp, $tfh);
866 # If $input is not a reference it better be the name of a
867 # file with the sequence/ alignment data...
868 my $type = '';
869 if (! ref $input) {
870 # check that file exists or throw
871 $infilename = $input;
872 unless (-e $input) {return 0;}
873 # let's peek and guess
874 open(IN,$infilename) || $self->throw("Cannot open $infilename");
875 my $header = <IN>;
876 if( $header =~ /^\s+\d+\s+\d+/ ||
877 $header =~ /Pileup/i ||
878 $header =~ /clustal/i ) { # phylip
879 $type = 'A';
881 close(IN);
882 return ($infilename,$type);
883 } elsif (ref($input) =~ /ARRAY/i ) { # $input may be an
884 # array of BioSeq objects...
885 # Open temporary file for both reading & writing of array
886 ($tfh,$infilename) = $self->io->tempfile();
887 if( ! ref($input->[0]) ) {
888 $self->warn("passed an array ref which did not contain objects to _setinput");
889 return undef;
890 } elsif( $input->[0]->isa('Bio::PrimarySeqI') ) {
891 $temp = Bio::SeqIO->new('-fh' => $tfh,
892 '-format' => 'fasta');
893 my $ct = 1;
894 foreach $seq (@$input) {
895 return 0 unless ( ref($seq) &&
896 $seq->isa("Bio::PrimarySeqI") );
897 if( ! defined $seq->display_id ||
898 $seq->display_id =~ /^\s+$/) {
899 $seq->display_id( "Seq".$ct++);
901 $temp->write_seq($seq);
903 $temp->close();
904 undef $temp;
905 close($tfh);
906 $tfh = undef;
907 $type = 'S';
908 } elsif( $input->[0]->isa('Bio::Align::AlignI' ) ) {
909 $temp = Bio::AlignIO->new('-fh' => $tfh,
910 '-format' => $self->aformat);
911 foreach my $aln (@$input) {
912 next unless ( ref($aln) &&
913 $aln->isa("Bio::Align::AlignI") );
914 $temp->write_aln($aln);
916 $temp->close();
917 undef $temp;
918 close($tfh);
919 $tfh = undef;
920 $type = 'A';
921 } else {
922 $self->warn( "got an array ref with 1st entry ".
923 $input->[0].
924 " and don't know what to do with it\n");
927 return ($infilename,$type);
928 # $input may be a SimpleAlign object.
929 } elsif ( $input->isa("Bio::Align::AlignI") ) {
930 # Open temporary file for both reading & writing of SimpleAlign object
931 ($tfh, $infilename) = $self->io->tempfile();
932 $temp = Bio::AlignIO->new(-fh=>$tfh,
933 '-format' => 'clustalw');
934 $temp->write_aln($input);
935 close($tfh);
936 undef $tfh;
937 return ($infilename,'A');
940 # or $input may be a single BioSeq object (to be added to
941 # a previous alignment)
942 elsif ( $input->isa("Bio::PrimarySeqI")) {
943 # Open temporary file for both reading & writing of BioSeq object
944 ($tfh,$infilename) = $self->io->tempfile();
945 $temp = Bio::SeqIO->new(-fh=> $tfh, '-format' =>'Fasta');
946 $temp->write_seq($input);
947 $temp->close();
948 close($tfh);
949 undef $tfh;
950 return ($infilename,'S');
951 } else {
952 $self->warn("Got $input and don't know what to do with it\n");
954 return 0;
958 =head2 _setparams
960 Title : _setparams
961 Usage : Internal function, not to be called directly
962 Function: Create parameter inputs for tcoffee program
963 Example :
964 Returns : parameter string to be passed to tcoffee
965 during align or profile_align
966 Args : name of calling object
968 =cut
970 sub _setparams {
971 my ($self) = @_;
972 my ($attr, $value,$param_string);
973 $param_string = '';
974 my $laststr;
975 for $attr ( @TCOFFEE_PARAMS ) {
976 $value = $self->$attr();
977 next unless (defined $value);
978 my $attr_key = lc $attr;
979 if( $attr_key =~ /matrix/ ) {
980 $self->{'_in'} = [ "X".lc($value) ];
981 } else {
982 $attr_key = ' -'.$attr_key;
983 $param_string .= $attr_key .'='.$value;
986 for $attr ( @TCOFFEE_SWITCHES) {
987 $value = $self->$attr();
988 next unless ($value);
989 my $attr_key = lc $attr; #put switches in format expected by tcoffee
990 $attr_key = ' -'.$attr_key;
991 $param_string .= $attr_key ;
994 # Set default output file if no explicit output file selected
995 unless ($self->outfile ) {
996 my ($tfh, $outfile) = $self->io->tempfile(-dir=>$self->tempdir());
997 close($tfh);
998 undef $tfh;
999 $self->outfile($outfile);
1000 $param_string .= " -outfile=$outfile" ;
1003 if ($self->quiet() || $self->verbose < 0) { $param_string .= ' -quiet';}
1004 return $param_string;
1007 =head2 aformat
1009 Title : aformat
1010 Usage : my $alignmentformat = $self->aformat();
1011 Function: Get/Set alignment format
1012 Returns : string
1013 Args : string
1016 =cut
1018 sub aformat{
1019 my $self = shift;
1020 $self->{'_aformat'} = shift if @_;
1021 return $self->{'_aformat'};
1025 =head2 methods
1027 Title : methods
1028 Usage : my @methods = $self->methods()
1029 Function: Get/Set Alignment methods - NOT VALIDATED
1030 Returns : array of strings
1031 Args : arrayref of strings
1034 =cut
1036 sub methods{
1037 my ($self) = shift;
1039 @{$self->{'_methods'}} = @{shift || []} if @_;
1040 return @{$self->{'_methods'} || []};
1044 =head1 Bio::Tools::Run::BaseWrapper methods
1046 =cut
1048 =head2 no_param_checks
1050 Title : no_param_checks
1051 Usage : $obj->no_param_checks($newval)
1052 Function: Boolean flag as to whether or not we should
1053 trust the sanity checks for parameter values
1054 Returns : value of no_param_checks
1055 Args : newvalue (optional)
1058 =cut
1060 =head2 save_tempfiles
1062 Title : save_tempfiles
1063 Usage : $obj->save_tempfiles($newval)
1064 Function:
1065 Returns : value of save_tempfiles
1066 Args : newvalue (optional)
1069 =cut
1071 =head2 outfile_name
1073 Title : outfile_name
1074 Usage : my $outfile = $tcoffee->outfile_name();
1075 Function: Get/Set the name of the output file for this run
1076 (if you wanted to do something special)
1077 Returns : string
1078 Args : [optional] string to set value to
1081 =cut
1084 =head2 tempdir
1086 Title : tempdir
1087 Usage : my $tmpdir = $self->tempdir();
1088 Function: Retrieve a temporary directory name (which is created)
1089 Returns : string which is the name of the temporary directory
1090 Args : none
1093 =cut
1095 =head2 cleanup
1097 Title : cleanup
1098 Usage : $tcoffee->cleanup();
1099 Function: Will cleanup the tempdir directory
1100 Returns : none
1101 Args : none
1104 =cut
1106 =head2 io
1108 Title : io
1109 Usage : $obj->io($newval)
1110 Function: Gets a L<Bio::Root::IO> object
1111 Returns : L<Bio::Root::IO>
1112 Args : none
1115 =cut
1117 1; # Needed to keep compiler happy