Clarify warnings when defaulting the value of end()
[bioperl-live.git] / Bio / AlignIO.pm
blob089d94e8e25b0412ce78b58a1e37d82c83d5cfe8
1 # $Id$
3 # BioPerl module for Bio::AlignIO
5 # based on the Bio::SeqIO module
6 # by Ewan Birney <birney@ebi.ac.uk>
7 # and Lincoln Stein <lstein@cshl.org>
9 # Copyright Peter Schattner
11 # You may distribute this module under the same terms as perl itself
13 # History
14 # September, 2000 AlignIO written by Peter Schattner
16 # POD documentation - main docs before the code
18 =head1 NAME
20 Bio::AlignIO - Handler for AlignIO Formats
22 =head1 SYNOPSIS
24 use Bio::AlignIO;
26 $inputfilename = "testaln.fasta";
27 $in = Bio::AlignIO->new(-file => $inputfilename ,
28 -format => 'fasta');
29 $out = Bio::AlignIO->new(-file => ">out.aln.pfam" ,
30 -format => 'pfam');
32 while ( my $aln = $in->next_aln() ) {
33 $out->write_aln($aln);
36 # OR
38 use Bio::AlignIO;
40 open MYIN,"testaln.fasta";
41 $in = Bio::AlignIO->newFh(-fh => \*MYIN,
42 -format => 'fasta');
43 open my $MYOUT, '>', 'testaln.pfam';
44 $out = Bio::AlignIO->newFh(-fh => $MYOUT,
45 -format => 'pfam');
47 # World's smallest Fasta<->pfam format converter:
48 print $out $_ while <$in>;
50 =head1 DESCRIPTION
52 L<Bio::AlignIO> is a handler module for the formats in the AlignIO set,
53 for example, L<Bio::AlignIO::fasta>. It is the officially sanctioned way
54 of getting at the alignment objects. The resulting alignment is a
55 L<Bio::Align::AlignI>-compliant object.
57 The idea is that you request an object for a particular format.
58 All the objects have a notion of an internal file that is read
59 from or written to. A particular AlignIO object instance is configured
60 for either input or output, you can think of it as a stream object.
62 Each object has functions:
64 $stream->next_aln();
66 And:
68 $stream->write_aln($aln);
70 Also:
72 $stream->type() # returns 'INPUT' or 'OUTPUT'
74 As an added bonus, you can recover a filehandle that is tied to the
75 AlignIO object, allowing you to use the standard E<lt>E<gt> and print
76 operations to read and write alignment objects:
78 use Bio::AlignIO;
80 # read from standard input
81 $stream = Bio::AlignIO->newFh(-format => 'Fasta');
83 while ( $aln = <$stream> ) {
84 # do something with $aln
87 And:
89 print $stream $aln; # when stream is in output mode
91 L<Bio::AlignIO> is patterned on the L<Bio::SeqIO> module and shares
92 most of its features. One significant difference is that
93 L<Bio::AlignIO> usually handles IO for only a single alignment at a time,
94 whereas L<Bio::SeqIO> handles IO for multiple sequences in a single stream.
95 The principal reason for this is that whereas simultaneously handling
96 multiple sequences is a common requirement, simultaneous handling of
97 multiple alignments is not. The only current exception is format
98 C<bl2seq> which parses results of the BLAST C<bl2seq> program and which
99 may produce several alignment pairs. This set of alignment pairs can
100 be read using multiple calls to L<next_aln>.
102 =head1 CONSTRUCTORS
104 =head2 Bio::AlignIO-E<gt>new()
106 $seqIO = Bio::AlignIO->new(-file => 'filename', -format=>$format);
107 $seqIO = Bio::AlignIO->new(-fh => \*FILEHANDLE, -format=>$format);
108 $seqIO = Bio::AlignIO->new(-format => $format);
109 $seqIO = Bio::AlignIO->new(-fh => \*STDOUT, -format => $format);
111 The L<new> class method constructs a new L<Bio::AlignIO> object.
112 The returned object can be used to retrieve or print alignment
113 objects. L<new> accepts the following parameters:
115 =over 4
117 =item -file
119 A file path to be opened for reading or writing. The usual Perl
120 conventions apply:
122 'file' # open file for reading
123 '>file' # open file for writing
124 '>>file' # open file for appending
125 '+<file' # open file read/write
126 'command |' # open a pipe from the command
127 '| command' # open a pipe to the command
129 =item -fh
131 You may provide new() with a previously-opened filehandle. For
132 example, to read from STDIN:
134 $seqIO = Bio::AlignIO->new(-fh => \*STDIN);
136 Note that you must pass filehandles as references to globs.
138 If neither a filehandle nor a filename is specified, then the module
139 will read from the @ARGV array or STDIN, using the familiar E<lt>E<gt>
140 semantics.
142 =item -format
144 Specify the format of the file. Supported formats include:
146 bl2seq Bl2seq Blast output
147 clustalw clustalw (.aln) format
148 emboss EMBOSS water and needle format
149 fasta FASTA format
150 maf Multiple Alignment Format
151 mase mase (seaview) format
152 mega MEGA format
153 meme MEME format
154 msf msf (GCG) format
155 nexus Swofford et al NEXUS format
156 pfam Pfam sequence alignment format
157 phylip Felsenstein PHYLIP format
158 prodom prodom (protein domain) format
159 psi PSI-BLAST format
160 selex selex (hmmer) format
161 stockholm stockholm format
163 Currently only those formats which were implemented in L<Bio::SimpleAlign>
164 have been incorporated into L<Bio::AlignIO>. Specifically, C<mase>, C<stockholm>
165 and C<prodom> have only been implemented for input. See the specific module
166 (e.g. L<Bio::AlignIO::prodom>) for notes on supported versions.
168 If no format is specified and a filename is given, then the module
169 will attempt to deduce it from the filename suffix. If this is unsuccessful,
170 C<fasta> format is assumed.
172 The format name is case insensitive; C<FASTA>, C<Fasta> and C<fasta> are
173 all treated equivalently.
175 =back
177 =head2 Bio::AlignIO-E<gt>newFh()
179 $fh = Bio::AlignIO->newFh(-fh => \*FILEHANDLE, -format=>$format);
180 # read from STDIN or use @ARGV:
181 $fh = Bio::AlignIO->newFh(-format => $format);
183 This constructor behaves like L<new>, but returns a tied filehandle
184 rather than a L<Bio::AlignIO> object. You can read sequences from this
185 object using the familiar E<lt>E<gt> operator, and write to it using
186 L<print>. The usual array and $_ semantics work. For example, you can
187 read all sequence objects into an array like this:
189 @sequences = <$fh>;
191 Other operations, such as read(), sysread(), write(), close(), and printf()
192 are not supported.
194 =over 1
196 =item -flush
198 By default, all files (or filehandles) opened for writing alignments
199 will be flushed after each write_aln() making the file immediately
200 usable. If you do not need this facility and would like to marginally
201 improve the efficiency of writing multiple sequences to the same file
202 (or filehandle), pass the -flush option '0' or any other value that
203 evaluates as defined but false:
205 my $clustal = Bio::AlignIO->new( -file => "<prot.aln",
206 -format => "clustalw" );
207 my $msf = Bio::AlignIO->new(-file => ">prot.msf",
208 -format => "msf",
209 -flush => 0 ); # go as fast as we can!
210 while($seq = $clustal->next_aln) { $msf->write_aln($seq) }
212 =back
214 =head1 OBJECT METHODS
216 See below for more detailed summaries. The main methods are:
218 =head2 $alignment = $AlignIO-E<gt>next_aln()
220 Fetch an alignment from a formatted file.
222 =head2 $AlignIO-E<gt>write_aln($aln)
224 Write the specified alignment to a file..
226 =head2 TIEHANDLE(), READLINE(), PRINT()
228 These provide the tie interface. See L<perltie> for more details.
230 =head1 FEEDBACK
232 =head2 Mailing Lists
234 User feedback is an integral part of the evolution of this and other
235 Bioperl modules. Send your comments and suggestions preferably to one
236 of the Bioperl mailing lists. Your participation is much appreciated.
238 bioperl-l@bioperl.org - General discussion
239 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
241 =head2 Support
243 Please direct usage questions or support issues to the mailing list:
245 I<bioperl-l@bioperl.org>
247 rather than to the module maintainer directly. Many experienced and
248 reponsive experts will be able look at the problem and quickly
249 address it. Please include a thorough description of the problem
250 with code and data examples if at all possible.
252 =head2 Reporting Bugs
254 Report bugs to the Bioperl bug tracking system to help us keep track
255 the bugs and their resolution. Bug reports can be submitted via the
256 web:
258 http://bugzilla.open-bio.org/
260 =head1 AUTHOR - Peter Schattner
262 Email: schattner@alum.mit.edu
264 =head1 CONTRIBUTORS
266 Jason Stajich, jason@bioperl.org
268 =head1 APPENDIX
270 The rest of the documentation details each of the object
271 methods. Internal methods are usually preceded with a _
273 =cut
275 # 'Let the code begin...
277 package Bio::AlignIO;
279 use strict;
281 use Bio::Seq;
282 use Bio::LocatableSeq;
283 use Bio::SimpleAlign;
284 use Bio::Tools::GuessSeqFormat;
285 use base qw(Bio::Root::Root Bio::Root::IO);
287 =head2 new
289 Title : new
290 Usage : $stream = Bio::AlignIO->new(-file => $filename,
291 -format => 'Format')
292 Function: Returns a new seqstream
293 Returns : A Bio::AlignIO::Handler initialised with
294 the appropriate format
295 Args : -file => $filename
296 -format => format
297 -fh => filehandle to attach to
298 -displayname_flat => 1 [optional]
299 to force the displayname to not show start/end
300 information
302 =cut
304 sub new {
305 my ($caller,@args) = @_;
306 my $class = ref($caller) || $caller;
308 # or do we want to call SUPER on an object if $caller is an
309 # object?
310 if( $class =~ /Bio::AlignIO::(\S+)/ ) {
311 my ($self) = $class->SUPER::new(@args);
312 $self->_initialize(@args);
313 return $self;
314 } else {
316 my %param = @args;
317 @param{ map { lc $_ } keys %param } = values %param; # lowercase keys
318 my $format = $param{'-format'} ||
319 $class->_guess_format( $param{-file} || $ARGV[0] );
320 unless ($format) {
321 if ($param{-file}) {
322 $format = Bio::Tools::GuessSeqFormat->new(-file => $param{-file}||$ARGV[0] )->guess;
324 elsif ($param{-fh}) {
325 $format = Bio::Tools::GuessSeqFormat->new(-fh => $param{-fh}||$ARGV[0] )->guess;
328 $format = "\L$format"; # normalize capitalization to lower case
329 $class->throw("Unknown format given or could not determine it [$format]")
330 unless $format;
332 return unless( $class->_load_format_module($format) );
333 return "Bio::AlignIO::$format"->new(@args);
338 =head2 newFh
340 Title : newFh
341 Usage : $fh = Bio::AlignIO->newFh(-file=>$filename,-format=>'Format')
342 Function: does a new() followed by an fh()
343 Example : $fh = Bio::AlignIO->newFh(-file=>$filename,-format=>'Format')
344 $sequence = <$fh>; # read a sequence object
345 print $fh $sequence; # write a sequence object
346 Returns : filehandle tied to the Bio::AlignIO::Fh class
347 Args :
349 =cut
351 sub newFh {
352 my $class = shift;
353 return unless my $self = $class->new(@_);
354 return $self->fh;
357 =head2 fh
359 Title : fh
360 Usage : $obj->fh
361 Function:
362 Example : $fh = $obj->fh; # make a tied filehandle
363 $sequence = <$fh>; # read a sequence object
364 print $fh $sequence; # write a sequence object
365 Returns : filehandle tied to the Bio::AlignIO::Fh class
366 Args :
368 =cut
371 sub fh {
372 my $self = shift;
373 my $class = ref($self) || $self;
374 my $s = Symbol::gensym;
375 tie $$s,$class,$self;
376 return $s;
379 # _initialize is where the heavy stuff will happen when new is called
381 sub _initialize {
382 my($self,@args) = @_;
383 my ($flat,$alphabet) = $self->_rearrange([qw(DISPLAYNAME_FLAT ALPHABET)],
384 @args);
385 $self->force_displayname_flat($flat) if defined $flat;
386 $self->alphabet($alphabet);
387 $self->_initialize_io(@args);
391 =head2 _load_format_module
393 Title : _load_format_module
394 Usage : *INTERNAL AlignIO stuff*
395 Function: Loads up (like use) a module at run time on demand
396 Example :
397 Returns :
398 Args :
400 =cut
402 sub _load_format_module {
403 my ($self,$format) = @_;
404 my $module = "Bio::AlignIO::" . $format;
405 my $ok;
407 eval {
408 $ok = $self->_load_module($module);
410 if ( $@ ) {
411 print STDERR <<END;
412 $self: $format cannot be found
413 Exception $@
414 For more information about the AlignIO system please see the AlignIO docs.
415 This includes ways of checking for formats at compile time, not run time
418 return;
420 return 1;
423 =head2 next_aln
425 Title : next_aln
426 Usage : $aln = stream->next_aln
427 Function: reads the next $aln object from the stream
428 Returns : a Bio::Align::AlignI compliant object
429 Args :
431 =cut
433 sub next_aln {
434 my ($self,$aln) = @_;
435 $self->throw("Sorry, you cannot read from a generic Bio::AlignIO object.");
438 =head2 write_aln
440 Title : write_aln
441 Usage : $stream->write_aln($aln)
442 Function: writes the $aln object into the stream
443 Returns : 1 for success and 0 for error
444 Args : Bio::Seq object
446 =cut
448 sub write_aln {
449 my ($self,$aln) = @_;
450 $self->throw("Sorry, you cannot write to a generic Bio::AlignIO object.");
453 =head2 _guess_format
455 Title : _guess_format
456 Usage : $obj->_guess_format($filename)
457 Function:
458 Example :
459 Returns : guessed format of filename (lower case)
460 Args :
462 =cut
464 sub _guess_format {
465 my $class = shift;
466 return unless $_ = shift;
467 return 'clustalw' if /\.aln$/i;
468 return 'emboss' if /\.(water|needle)$/i;
469 return 'metafasta' if /\.metafasta$/;
470 return 'fasta' if /\.(fasta|fast|seq|fa|fsa|nt|aa)$/i;
471 return 'maf' if /\.maf/i;
472 return 'mega' if /\.(meg|mega)$/i;
473 return 'meme' if /\.meme$/i;
474 return 'msf' if /\.(msf|pileup|gcg)$/i;
475 return 'nexus' if /\.(nexus|nex)$/i;
476 return 'pfam' if /\.(pfam|pfm)$/i;
477 return 'phylip' if /\.(phylip|phlp|phyl|phy|ph)$/i;
478 return 'psi' if /\.psi$/i;
479 return 'stockholm' if /\.stk$/i;
480 return 'selex' if /\.(selex|slx|selx|slex|sx)$/i;
481 return 'xmfa' if /\.xmfa$/i;
484 sub DESTROY {
485 my $self = shift;
486 $self->close();
489 sub TIEHANDLE {
490 my $class = shift;
491 return bless {'alignio' => shift},$class;
494 sub READLINE {
495 my $self = shift;
496 return $self->{'alignio'}->next_aln() unless wantarray;
497 my (@list,$obj);
498 push @list,$obj while $obj = $self->{'alignio'}->next_aln();
499 return @list;
502 sub PRINT {
503 my $self = shift;
504 $self->{'alignio'}->write_aln(@_);
508 =head2 force_displayname_flat
510 Title : force_displayname_flat
511 Usage : $obj->force_displayname_flat($newval)
512 Function:
513 Example :
514 Returns : value of force_displayname_flat (a scalar)
515 Args : on set, new value (a scalar or undef, optional)
518 =cut
520 sub force_displayname_flat{
521 my $self = shift;
522 return $self->{'_force_displayname_flat'} = shift if @_;
523 return $self->{'_force_displayname_flat'} || 0;
526 =head2 alphabet
528 Title : alphabet
529 Usage : $obj->alphabet($newval)
530 Function: Get/Set alphabet for purpose of passing to Bio::LocatableSeq creation
531 Example : $obj->alphabet('dna');
532 Returns : value of alphabet (a scalar)
533 Args : on set, new value (a scalar or undef, optional)
536 =cut
538 sub alphabet {
539 my $self = shift;
540 my $value = shift;
541 if ( defined $value ) {
542 $self->throw("Invalid alphabet $value") unless $value eq 'rna' || $value eq 'protein' || $value eq 'dna';
543 $self->{'_alphabet'} = $value;
545 return $self->{'_alphabet'};