Sync README and README.md
[bioperl-live.git] / Bio / Structure / IO.pm
blob23c263ca600a57f8122b58d88e9e26aa1413f9fa
2 # BioPerl module for Bio::Structure::IO
4 # Copyright 2001, 2002 Kris Boulez
6 # You may distribute this module under the same terms as perl itself
8 # _history
9 # October 18, 1999 Largely rewritten by Lincoln Stein
10 # November 16, 2001 Copied Bio::SeqIO to Bio::Structure::IO and modified
11 # where needed. Factoring out common methods
12 # (to Bio::Root::IO) might be a good idea.
14 # POD documentation - main docs before the code
16 =head1 NAME
18 Bio::Structure::IO - Handler for Structure Formats
20 =head1 SYNOPSIS
22 use Bio::Structure::IO;
24 $in = Bio::Structure::IO->new(-file => "inputfilename",
25 -format => 'pdb');
27 while ( my $struc = $in->next_structure() ) {
28 print "Structure ", $struc->id, " number of models: ",
29 scalar $struc->model,"\n";
32 =head1 DESCRIPTION
34 Bio::Structure::IO is a handler module for the formats in the
35 Structure::IO set (e.g. L<Bio::Structure::IO::pdb>). It is the officially
36 sanctioned way of getting at the format objects, which most people
37 should use.
39 The Bio::Structure::IO system can be thought of like biological file
40 handles. They are attached to filehandles with smart formatting rules
41 (e.g. PDB format) and can either read or write structure objects
42 (Bio::Structure objects, or more correctly, Bio::Structure::StructureI
43 implementing objects, of which Bio::Structure is one such object). If
44 you want to know what to do with a Bio::Structure object, read
45 L<Bio::Structure>.
47 The idea is that you request a stream object for a particular format.
48 All the stream objects have a notion of an internal file that is read
49 from or written to. A particular Structure::IO object instance is
50 configured for either input or output. A specific example of a stream
51 object is the Bio::Structure::IO::pdb object.
53 Each stream object has functions
55 $stream->next_structure();
57 and
59 $stream->write_structure($struc);
61 also
63 $stream->type() # returns 'INPUT' or 'OUTPUT'
65 As an added bonus, you can recover a filehandle that is tied to the
66 Structure::IOIO object, allowing you to use the standard E<lt>E<gt>
67 and print operations to read and write structure::IOuence objects:
69 use Bio::Structure::IO;
71 $stream = Bio::Structure::IO->newFh(-format => 'pdb'); # read from standard input
73 while ( $structure = <$stream> ) {
74 # do something with $structure
77 and
79 print $stream $structure; # when stream is in output mode
82 =head1 CONSTRUCTORS
84 =head2 Bio::Structure::IO-E<gt>new()
86 $stream = Bio::Structure::IO->new(-file => 'filename', -format=>$format);
87 $stream = Bio::Structure::IO->new(-fh => \*FILEHANDLE, -format=>$format);
88 $stream = Bio::Structure::IO->new(-format => $format);
90 The new() class method constructs a new Bio::Structure::IO object. The
91 returned object can be used to retrieve or print Bio::Structure
92 objects. new() accepts the following parameters:
94 =over 4
96 =item -file
98 A file path to be opened for reading or writing. The usual Perl
99 conventions apply:
101 'file' # open file for reading
102 '>file' # open file for writing
103 '>>file' # open file for appending
104 '+<file' # open file read/write
105 'command |' # open a pipe from the command
106 '| command' # open a pipe to the command
108 =item -fh
110 You may provide new() with a previously-opened filehandle. For
111 example, to read from STDIN:
113 $strucIO = Bio::Structure::IO->new(-fh => \*STDIN);
115 Note that you must pass filehandles as references to globs.
117 If neither a filehandle nor a filename is specified, then the module
118 will read from the @ARGV array or STDIN, using the familiar E<lt>E<gt>
119 semantics.
121 =item -format
123 Specify the format of the file. Supported formats include:
125 pdb Protein Data Bank format
127 If no format is specified and a filename is given, then the module
128 will attempt to deduce it from the filename. If this is unsuccessful,
129 PDB format is assumed.
131 The format name is case insensitive. 'PDB', 'Pdb' and 'pdb' are
132 all supported.
134 =back
136 =head2 Bio::Structure::IO-E<gt>newFh()
138 $fh = Bio::Structure::IO->newFh(-fh => \*FILEHANDLE, -format=>$format);
139 $fh = Bio::Structure::IO->newFh(-format => $format);
140 # etc.
142 This constructor behaves like new(), but returns a tied filehandle
143 rather than a Bio::Structure::IO object. You can read structures from this
144 object using the familiar E<lt>E<gt> operator, and write to it using
145 print(). The usual array and $_ semantics work. For example, you can
146 read all structure objects into an array like this:
148 @structures = <$fh>;
150 Other operations, such as read(), sysread(), write(), close(), and printf()
151 are not supported.
153 =head1 OBJECT METHODS
155 See below for more detailed summaries. The main methods are:
157 =head2 $structure = $structIO-E<gt>next_structure()
159 Fetch the next structure from the stream.
161 =head2 $structIO-E<gt>write_structure($struc [,$another_struc,...])
163 Write the specified structure(s) to the stream.
165 =head2 TIEHANDLE(), READLINE(), PRINT()
167 These provide the tie interface. See L<perltie> for more details.
169 =head1 FEEDBACK
171 =head2 Mailing Lists
173 User feedback is an integral part of the evolution of this and other
174 Bioperl modules. Send your comments and suggestions preferably to one
175 of the Bioperl mailing lists. Your participation is much appreciated.
177 bioperl-l@bioperl.org - General discussion
178 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
180 =head2 Support
182 Please direct usage questions or support issues to the mailing list:
184 I<bioperl-l@bioperl.org>
186 rather than to the module maintainer directly. Many experienced and
187 reponsive experts will be able look at the problem and quickly
188 address it. Please include a thorough description of the problem
189 with code and data examples if at all possible.
191 =head2 Reporting Bugs
193 Report bugs to the Bioperl bug tracking system to help us keep track
194 the bugs and their resolution.
195 Bug reports can be submitted via the web:
197 https://github.com/bioperl/bioperl-live/issues
199 =head1 AUTHORS - Ewan Birney, Lincoln Stein, Kris Boulez
201 Email birney@ebi.ac.uk, lstein@cshl.org, kris.boulez@algonomics.com
204 =head1 APPENDIX
206 The rest of the documentation details each of the object
207 methods. Internal methods are usually preceded with a _
209 =cut
211 # Let the code begin...
213 package Bio::Structure::IO;
215 use strict;
217 use Bio::PrimarySeq;
218 use Symbol;
220 use base qw(Bio::Root::Root Bio::Root::IO);
222 =head2 new
224 Title : new
225 Usage : $stream = Bio::Structure::IO->new(-file => $filename, -format => 'Format')
226 Function: Returns a new structIOstream
227 Returns : A Bio::Structure::IO handler initialised with the appropriate format
228 Args : -file => $filename
229 -format => format
230 -fh => filehandle to attach to
232 =cut
234 my $entry = 0;
236 sub new {
237 my ($caller,@args) = @_;
238 my $class = ref($caller) || $caller;
240 # or do we want to call SUPER on an object if $caller is an
241 # object?
242 if( $class =~ /Bio::Structure::IO::(\S+)/ ) {
243 my ($self) = $class->SUPER::new(@args);
244 $self->_initialize(@args);
245 return $self;
246 } else {
248 my %param = @args;
249 @param{ map { lc $_ } keys %param } = values %param; # lowercase keys
250 my $format = $param{'-format'} ||
251 $class->_guess_format( $param{-file} || $ARGV[0] ) ||
252 'pdb';
253 $format = "\L$format"; # normalize capitalization to lower case
255 # normalize capitalization
256 return unless( &_load_format_module($format) );
257 return "Bio::Structure::IO::$format"->new(@args);
261 =head2 newFh
263 Title : newFh
264 Usage : $fh = Bio::Structure::IO->newFh(-file=>$filename,-format=>'Format')
265 Function: does a new() followed by an fh()
266 Example : $fh = Bio::Structure::IO->newFh(-file=>$filename,-format=>'Format')
267 $structure = <$fh>; # read a structure object
268 print $fh $structure; # write a structure object
269 Returns : filehandle tied to the Bio::Structure::IO::Fh class
270 Args :
272 =cut
274 sub newFh {
275 my $class = shift;
276 return unless my $self = $class->new(@_);
277 return $self->fh;
280 =head2 fh
282 Title : fh
283 Usage : $obj->fh
284 Function:
285 Example : $fh = $obj->fh; # make a tied filehandle
286 $structure = <$fh>; # read a structure object
287 print $fh $structure; # write a structure object
288 Returns : filehandle tied to the Bio::Structure::IO::Fh class
289 Args :
291 =cut
294 sub fh {
295 my $self = shift;
296 my $class = ref($self) || $self;
297 my $s = Symbol::gensym;
298 tie $$s,$class,$self;
299 return $s;
303 =head2 format
305 Title : format
306 Usage : $format = $obj->format()
307 Function: Get the structure format
308 Returns : structure format
309 Args : none
311 =cut
313 # format() method inherited from Bio::Root::IO
316 # _initialize is chained for all SeqIO classes
318 sub _initialize {
319 my($self, @args) = @_;
321 # not really necessary unless we put more in RootI
322 $self->SUPER::_initialize(@args);
324 # initialize the IO part
325 $self->_initialize_io(@args);
328 =head2 next_structure
330 Title : next_structure
331 Usage : $structure = stream->next_structure
332 Function: Reads the next structure object from the stream and returns a
333 Bio::Structure::Entry object.
335 Certain driver modules may encounter entries in the stream that
336 are either misformatted or that use syntax not yet understood
337 by the driver. If such an incident is recoverable, e.g., by
338 dismissing a feature of a feature table or some other non-mandatory
339 part of an entry, the driver will issue a warning. In the case
340 of a non-recoverable situation an exception will be thrown.
341 Do not assume that you can resume parsing the same stream after
342 catching the exception. Note that you can always turn recoverable
343 errors into exceptions by calling $stream->verbose(2) (see
344 Bio::RootI POD page).
345 Returns : a Bio::Structure::Entry object
346 Args : none
348 =cut
350 sub next_structure {
351 my ($self, $struc) = @_;
352 $self->throw("Sorry, you cannot read from a generic Bio::Structure::IO object.");
355 # Do we want people to read out the sequence directly from a $structIO stream
357 ##=head2 next_primary_seq
359 ## Title : next_primary_seq
360 ## Usage : $seq = $stream->next_primary_seq
361 ## Function: Provides a primaryseq type of sequence object
362 ## Returns : A Bio::PrimarySeqI object
363 ## Args : none
366 ##=cut
368 ##sub next_primary_seq {
369 ## my ($self) = @_;
371 ## # in this case, we default to next_seq. This is because
372 ## # Bio::Seq's are Bio::PrimarySeqI objects. However we
373 ## # expect certain sub classes to override this method to provide
374 ## # less parsing heavy methods to retrieving the objects
376 ## return $self->next_seq();
379 =head2 write_structure
381 Title : write_structure
382 Usage : $stream->write_structure($structure)
383 Function: writes the $structure object into the stream
384 Returns : 1 for success and 0 for error
385 Args : Bio::Structure object
387 =cut
389 sub write_seq {
390 my ($self, $struc) = @_;
391 $self->throw("Sorry, you cannot write to a generic Bio::Structure::IO object.");
395 # De we need this here
397 ##=head2 alphabet
399 ## Title : alphabet
400 ## Usage : $self->alphabet($newval)
401 ## Function: Set/get the molecule type for the Seq objects to be created.
402 ## Example : $seqio->alphabet('protein')
403 ## Returns : value of alphabet: 'dna', 'rna', or 'protein'
404 ## Args : newvalue (optional)
405 ## Throws : Exception if the argument is not one of 'dna', 'rna', or 'protein'
407 ##=cut
409 ##sub alphabet {
410 ## my ($self, $value) = @_;
412 ## if ( defined $value) {
413 ## # instead of hard-coding the allowed values once more, we check by
414 ## # creating a dummy sequence object
415 ## eval {
416 ## my $seq = Bio::PrimarySeq->new('-alphabet' => $value);
417 ## };
418 ## if($@) {
419 ## $self->throw("Invalid alphabet: $value\n. See Bio::PrimarySeq for allowed values.");
420 ## }
421 ## $self->{'alphabet'} = "\L$value";
422 ## }
423 ## return $self->{'alphabet'};
426 =head2 _load_format_module
428 Title : _load_format_module
429 Usage : *INTERNAL Structure::IO stuff*
430 Function: Loads up (like use) a module at run time on demand
431 Example :
432 Returns :
433 Args :
435 =cut
437 sub _load_format_module {
438 my ($format) = @_;
439 my ($module, $load, $m);
441 $module = "_<Bio/Structure/IO/$format.pm";
442 $load = "Bio/Structure/IO/$format.pm";
444 return 1 if $main::{$module};
445 eval {
446 require $load;
448 if ( $@ ) {
449 print STDERR <<END;
450 $load: $format cannot be found
451 Exception $@
452 For more information about the Structure::IO system please see the
453 Bio::Structure::IO docs. This includes ways of checking for formats at
454 compile time, not run time
457 return;
459 return 1;
462 =head2 _concatenate_lines
464 Title : _concatenate_lines
465 Usage : $s = _concatenate_lines($line, $continuation_line)
466 Function: Private. Concatenates two strings assuming that the second stems
467 from a continuation line of the first. Adds a space between both
468 unless the first ends with a dash.
470 Takes care of either arg being empty.
471 Example :
472 Returns : A string.
473 Args :
475 =cut
477 sub _concatenate_lines {
478 my ($self, $s1, $s2) = @_;
479 $s1 .= " " if($s1 && ($s1 !~ /-$/) && $s2);
480 return ($s1 ? $s1 : "") . ($s2 ? $s2 : "");
483 =head2 _filehandle
485 Title : _filehandle
486 Usage : $obj->_filehandle($newval)
487 Function: This method is deprecated. Call _fh() instead.
488 Example :
489 Returns : value of _filehandle
490 Args : newvalue (optional)
493 =cut
495 sub _filehandle {
496 my ($self,@args) = @_;
497 return $self->_fh(@args);
500 =head2 _guess_format
502 Title : _guess_format
503 Usage : $obj->_guess_format($filename)
504 Function:
505 Example :
506 Returns : guessed format of filename (lower case)
507 Args :
509 =cut
511 sub _guess_format {
512 my $class = shift;
513 return unless $_ = shift;
514 return 'fasta' if /\.(fasta|fast|seq|fa|fsa|nt|aa)$/i;
515 return 'genbank' if /\.(gb|gbank|genbank)$/i;
516 return 'scf' if /\.scf$/i;
517 return 'pir' if /\.pir$/i;
518 return 'embl' if /\.(embl|ebl|emb|dat)$/i;
519 return 'raw' if /\.(txt)$/i;
520 return 'gcg' if /\.gcg$/i;
521 return 'ace' if /\.ace$/i;
522 return 'bsml' if /\.(bsm|bsml)$/i;
523 return 'pdb' if /\.(ent|pdb)$/i;
526 sub DESTROY {
527 my $self = shift;
529 $self->close();
532 sub TIEHANDLE {
533 my ($class,$val) = @_;
534 return bless {'structio' => $val}, $class;
537 sub READLINE {
538 my $self = shift;
539 return $self->{'structio'}->next_seq() || undef unless wantarray;
540 my (@list, $obj);
541 push @list, $obj while $obj = $self->{'structio'}->next_seq();
542 return @list;
545 sub PRINT {
546 my $self = shift;
547 $self->{'structio'}->write_seq(@_);