tag fourth (and hopefully last) alpha
[bioperl-live.git] / branch-1-6 / Bio / Structure / IO.pm
blobc9c58131524cc714017ceef906516b34ab7457dc
1 # $Id$
3 # BioPerl module for Bio::Structure::IO
5 # Copyright 2001, 2002 Kris Boulez
7 # You may distribute this module under the same terms as perl itself
9 # _history
10 # October 18, 1999 Largely rewritten by Lincoln Stein
11 # November 16, 2001 Copied Bio::SeqIO to Bio::Structure::IO and modified
12 # where needed. Factoring out common methods
13 # (to Bio::Root::IO) might be a good idea.
15 # POD documentation - main docs before the code
17 =head1 NAME
19 Bio::Structure::IO - Handler for Structure Formats
21 =head1 SYNOPSIS
23 use Bio::Structure::IO;
25 $in = Bio::Structure::IO->new(-file => "inputfilename",
26 -format => 'pdb');
28 while ( my $struc = $in->next_structure() ) {
29 print "Structure ", $struc->id, " number of models: ",
30 scalar $struc->model,"\n";
33 =head1 DESCRIPTION
35 Bio::Structure::IO is a handler module for the formats in the
36 Structure::IO set (e.g. L<Bio::Structure::IO::pdb>). It is the officially
37 sanctioned way of getting at the format objects, which most people
38 should use.
40 The Bio::Structure::IO system can be thought of like biological file
41 handles. They are attached to filehandles with smart formatting rules
42 (e.g. PDB format) and can either read or write structure objects
43 (Bio::Structure objects, or more correctly, Bio::Structure::StructureI
44 implementing objects, of which Bio::Structure is one such object). If
45 you want to know what to do with a Bio::Structure object, read
46 L<Bio::Structure>.
48 The idea is that you request a stream object for a particular format.
49 All the stream objects have a notion of an internal file that is read
50 from or written to. A particular Structure::IO object instance is
51 configured for either input or output. A specific example of a stream
52 object is the Bio::Structure::IO::pdb object.
54 Each stream object has functions
56 $stream->next_structure();
58 and
60 $stream->write_structure($struc);
62 also
64 $stream->type() # returns 'INPUT' or 'OUTPUT'
66 As an added bonus, you can recover a filehandle that is tied to the
67 Structure::IOIO object, allowing you to use the standard E<lt>E<gt>
68 and print operations to read and write structure::IOuence objects:
70 use Bio::Structure::IO;
72 $stream = Bio::Structure::IO->newFh(-format => 'pdb'); # read from standard input
74 while ( $structure = <$stream> ) {
75 # do something with $structure
78 and
80 print $stream $structure; # when stream is in output mode
83 =head1 CONSTRUCTORS
85 =head2 Bio::Structure::IO-E<gt>new()
87 $stream = Bio::Structure::IO->new(-file => 'filename', -format=>$format);
88 $stream = Bio::Structure::IO->new(-fh => \*FILEHANDLE, -format=>$format);
89 $stream = Bio::Structure::IO->new(-format => $format);
91 The new() class method constructs a new Bio::Structure::IO object. The
92 returned object can be used to retrieve or print Bio::Structure
93 objects. new() accepts the following parameters:
95 =over 4
97 =item -file
99 A file path to be opened for reading or writing. The usual Perl
100 conventions apply:
102 'file' # open file for reading
103 '>file' # open file for writing
104 '>>file' # open file for appending
105 '+<file' # open file read/write
106 'command |' # open a pipe from the command
107 '| command' # open a pipe to the command
109 =item -fh
111 You may provide new() with a previously-opened filehandle. For
112 example, to read from STDIN:
114 $strucIO = Bio::Structure::IO->new(-fh => \*STDIN);
116 Note that you must pass filehandles as references to globs.
118 If neither a filehandle nor a filename is specified, then the module
119 will read from the @ARGV array or STDIN, using the familiar E<lt>E<gt>
120 semantics.
122 =item -format
124 Specify the format of the file. Supported formats include:
126 pdb Protein Data Bank format
128 If no format is specified and a filename is given, then the module
129 will attempt to deduce it from the filename. If this is unsuccessful,
130 PDB format is assumed.
132 The format name is case insensitive. 'PDB', 'Pdb' and 'pdb' are
133 all supported.
135 =back
137 =head2 Bio::Structure::IO-E<gt>newFh()
139 $fh = Bio::Structure::IO->newFh(-fh => \*FILEHANDLE, -format=>$format);
140 $fh = Bio::Structure::IO->newFh(-format => $format);
141 # etc.
143 This constructor behaves like new(), but returns a tied filehandle
144 rather than a Bio::Structure::IO object. You can read structures from this
145 object using the familiar E<lt>E<gt> operator, and write to it using
146 print(). The usual array and $_ semantics work. For example, you can
147 read all structure objects into an array like this:
149 @structures = <$fh>;
151 Other operations, such as read(), sysread(), write(), close(), and printf()
152 are not supported.
154 =head1 OBJECT METHODS
156 See below for more detailed summaries. The main methods are:
158 =head2 $structure = $structIO-E<gt>next_structure()
160 Fetch the next structure from the stream.
162 =head2 $structIO-E<gt>write_structure($struc [,$another_struc,...])
164 Write the specified structure(s) to the stream.
166 =head2 TIEHANDLE(), READLINE(), PRINT()
168 These provide the tie interface. See L<perltie> for more details.
170 =head1 FEEDBACK
172 =head2 Mailing Lists
174 User feedback is an integral part of the evolution of this and other
175 Bioperl modules. Send your comments and suggestions preferably to one
176 of the Bioperl mailing lists. Your participation is much appreciated.
178 bioperl-l@bioperl.org - General discussion
179 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
181 =head2 Support
183 Please direct usage questions or support issues to the mailing list:
185 I<bioperl-l@bioperl.org>
187 rather than to the module maintainer directly. Many experienced and
188 reponsive experts will be able look at the problem and quickly
189 address it. Please include a thorough description of the problem
190 with code and data examples if at all possible.
192 =head2 Reporting Bugs
194 Report bugs to the Bioperl bug tracking system to help us keep track
195 the bugs and their resolution.
196 Bug reports can be submitted via the web:
198 http://bugzilla.open-bio.org/
200 =head1 AUTHORS - Ewan Birney, Lincoln Stein, Kris Boulez
202 Email birney@ebi.ac.uk, lstein@cshl.org, kris.boulez@algonomics.com
205 =head1 APPENDIX
207 The rest of the documentation details each of the object
208 methods. Internal methods are usually preceded with a _
210 =cut
212 # Let the code begin...
214 package Bio::Structure::IO;
216 use strict;
218 use Bio::PrimarySeq;
219 use Symbol;
221 use base qw(Bio::Root::Root Bio::Root::IO);
223 =head2 new
225 Title : new
226 Usage : $stream = Bio::Structure::IO->new(-file => $filename, -format => 'Format')
227 Function: Returns a new structIOstream
228 Returns : A Bio::Structure::IO handler initialised with the appropriate format
229 Args : -file => $filename
230 -format => format
231 -fh => filehandle to attach to
233 =cut
235 my $entry = 0;
237 sub new {
238 my ($caller,@args) = @_;
239 my $class = ref($caller) || $caller;
241 # or do we want to call SUPER on an object if $caller is an
242 # object?
243 if( $class =~ /Bio::Structure::IO::(\S+)/ ) {
244 my ($self) = $class->SUPER::new(@args);
245 $self->_initialize(@args);
246 return $self;
247 } else {
249 my %param = @args;
250 @param{ map { lc $_ } keys %param } = values %param; # lowercase keys
251 my $format = $param{'-format'} ||
252 $class->_guess_format( $param{-file} || $ARGV[0] ) ||
253 'pdb';
254 $format = "\L$format"; # normalize capitalization to lower case
256 # normalize capitalization
257 return unless( &_load_format_module($format) );
258 return "Bio::Structure::IO::$format"->new(@args);
262 =head2 newFh
264 Title : newFh
265 Usage : $fh = Bio::Structure::IO->newFh(-file=>$filename,-format=>'Format')
266 Function: does a new() followed by an fh()
267 Example : $fh = Bio::Structure::IO->newFh(-file=>$filename,-format=>'Format')
268 $structure = <$fh>; # read a structure object
269 print $fh $structure; # write a structure object
270 Returns : filehandle tied to the Bio::Structure::IO::Fh class
271 Args :
273 =cut
275 sub newFh {
276 my $class = shift;
277 return unless my $self = $class->new(@_);
278 return $self->fh;
281 =head2 fh
283 Title : fh
284 Usage : $obj->fh
285 Function:
286 Example : $fh = $obj->fh; # make a tied filehandle
287 $structure = <$fh>; # read a structure object
288 print $fh $structure; # write a structure object
289 Returns : filehandle tied to the Bio::Structure::IO::Fh class
290 Args :
292 =cut
295 sub fh {
296 my $self = shift;
297 my $class = ref($self) || $self;
298 my $s = Symbol::gensym;
299 tie $$s,$class,$self;
300 return $s;
304 # _initialize is chained for all SeqIO classes
306 sub _initialize {
307 my($self, @args) = @_;
309 # not really necessary unless we put more in RootI
310 $self->SUPER::_initialize(@args);
312 # initialize the IO part
313 $self->_initialize_io(@args);
316 =head2 next_structure
318 Title : next_structure
319 Usage : $structure = stream->next_structure
320 Function: Reads the next structure object from the stream and returns a
321 Bio::Structure::Entry object.
323 Certain driver modules may encounter entries in the stream that
324 are either misformatted or that use syntax not yet understood
325 by the driver. If such an incident is recoverable, e.g., by
326 dismissing a feature of a feature table or some other non-mandatory
327 part of an entry, the driver will issue a warning. In the case
328 of a non-recoverable situation an exception will be thrown.
329 Do not assume that you can resume parsing the same stream after
330 catching the exception. Note that you can always turn recoverable
331 errors into exceptions by calling $stream->verbose(2) (see
332 Bio::RootI POD page).
333 Returns : a Bio::Structure::Entry object
334 Args : none
336 =cut
338 sub next_structure {
339 my ($self, $struc) = @_;
340 $self->throw("Sorry, you cannot read from a generic Bio::Structure::IO object.");
343 # Do we want people to read out the sequence directly from a $structIO stream
345 ##=head2 next_primary_seq
347 ## Title : next_primary_seq
348 ## Usage : $seq = $stream->next_primary_seq
349 ## Function: Provides a primaryseq type of sequence object
350 ## Returns : A Bio::PrimarySeqI object
351 ## Args : none
354 ##=cut
356 ##sub next_primary_seq {
357 ## my ($self) = @_;
359 ## # in this case, we default to next_seq. This is because
360 ## # Bio::Seq's are Bio::PrimarySeqI objects. However we
361 ## # expect certain sub classes to override this method to provide
362 ## # less parsing heavy methods to retrieving the objects
364 ## return $self->next_seq();
367 =head2 write_structure
369 Title : write_structure
370 Usage : $stream->write_structure($structure)
371 Function: writes the $structure object into the stream
372 Returns : 1 for success and 0 for error
373 Args : Bio::Structure object
375 =cut
377 sub write_seq {
378 my ($self, $struc) = @_;
379 $self->throw("Sorry, you cannot write to a generic Bio::Structure::IO object.");
383 # De we need this here
385 ##=head2 alphabet
387 ## Title : alphabet
388 ## Usage : $self->alphabet($newval)
389 ## Function: Set/get the molecule type for the Seq objects to be created.
390 ## Example : $seqio->alphabet('protein')
391 ## Returns : value of alphabet: 'dna', 'rna', or 'protein'
392 ## Args : newvalue (optional)
393 ## Throws : Exception if the argument is not one of 'dna', 'rna', or 'protein'
395 ##=cut
397 ##sub alphabet {
398 ## my ($self, $value) = @_;
400 ## if ( defined $value) {
401 ## # instead of hard-coding the allowed values once more, we check by
402 ## # creating a dummy sequence object
403 ## eval {
404 ## my $seq = Bio::PrimarySeq->new('-alphabet' => $value);
405 ## };
406 ## if($@) {
407 ## $self->throw("Invalid alphabet: $value\n. See Bio::PrimarySeq for allowed values.");
408 ## }
409 ## $self->{'alphabet'} = "\L$value";
410 ## }
411 ## return $self->{'alphabet'};
414 =head2 _load_format_module
416 Title : _load_format_module
417 Usage : *INTERNAL Structure::IO stuff*
418 Function: Loads up (like use) a module at run time on demand
419 Example :
420 Returns :
421 Args :
423 =cut
425 sub _load_format_module {
426 my ($format) = @_;
427 my ($module, $load, $m);
429 $module = "_<Bio/Structure/IO/$format.pm";
430 $load = "Bio/Structure/IO/$format.pm";
432 return 1 if $main::{$module};
433 eval {
434 require $load;
436 if ( $@ ) {
437 print STDERR <<END;
438 $load: $format cannot be found
439 Exception $@
440 For more information about the Structure::IO system please see the
441 Bio::Structure::IO docs. This includes ways of checking for formats at
442 compile time, not run time
445 return;
447 return 1;
450 =head2 _concatenate_lines
452 Title : _concatenate_lines
453 Usage : $s = _concatenate_lines($line, $continuation_line)
454 Function: Private. Concatenates two strings assuming that the second stems
455 from a continuation line of the first. Adds a space between both
456 unless the first ends with a dash.
458 Takes care of either arg being empty.
459 Example :
460 Returns : A string.
461 Args :
463 =cut
465 sub _concatenate_lines {
466 my ($self, $s1, $s2) = @_;
467 $s1 .= " " if($s1 && ($s1 !~ /-$/) && $s2);
468 return ($s1 ? $s1 : "") . ($s2 ? $s2 : "");
471 =head2 _filehandle
473 Title : _filehandle
474 Usage : $obj->_filehandle($newval)
475 Function: This method is deprecated. Call _fh() instead.
476 Example :
477 Returns : value of _filehandle
478 Args : newvalue (optional)
481 =cut
483 sub _filehandle {
484 my ($self,@args) = @_;
485 return $self->_fh(@args);
488 =head2 _guess_format
490 Title : _guess_format
491 Usage : $obj->_guess_format($filename)
492 Function:
493 Example :
494 Returns : guessed format of filename (lower case)
495 Args :
497 =cut
499 sub _guess_format {
500 my $class = shift;
501 return unless $_ = shift;
502 return 'fasta' if /\.(fasta|fast|seq|fa|fsa|nt|aa)$/i;
503 return 'genbank' if /\.(gb|gbank|genbank)$/i;
504 return 'scf' if /\.scf$/i;
505 return 'pir' if /\.pir$/i;
506 return 'embl' if /\.(embl|ebl|emb|dat)$/i;
507 return 'raw' if /\.(txt)$/i;
508 return 'gcg' if /\.gcg$/i;
509 return 'ace' if /\.ace$/i;
510 return 'bsml' if /\.(bsm|bsml)$/i;
511 return 'pdb' if /\.(ent|pdb)$/i;
514 sub DESTROY {
515 my $self = shift;
517 $self->close();
520 sub TIEHANDLE {
521 my ($class,$val) = @_;
522 return bless {'structio' => $val}, $class;
525 sub READLINE {
526 my $self = shift;
527 return $self->{'structio'}->next_seq() unless wantarray;
528 my (@list, $obj);
529 push @list, $obj while $obj = $self->{'structio'}->next_seq();
530 return @list;
533 sub PRINT {
534 my $self = shift;
535 $self->{'structio'}->write_seq(@_);