Bio/SeqIO/genbank.pm

   1 #
   2 # BioPerl module for Bio::SeqIO::genbank
   3 #
   4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
   5 #
   6 # Cared for by Bioperl project bioperl-l(at)bioperl.org
   7 #
   8 # Copyright Elia Stupka and contributors see AUTHORS section
   9 #
  10 # You may distribute this module under the same terms as perl itself
  11
  12 # POD documentation - main docs before the code
  13
  14 =head1 NAME
  15
  16 Bio::SeqIO::genbank - GenBank sequence input/output stream
  17
  18 =head1 SYNOPSIS
  19
  20 It is probably best not to use this object directly, but
  21 rather go through the SeqIO handler:
  22
  23     $stream = Bio::SeqIO->new(-file   => $filename,
  24                               -format => 'GenBank');
  25
  26     while ( my $seq = $stream->next_seq ) {
  27         # do something with $seq
  28     }
  29
  30
  31 =head1 DESCRIPTION
  32
  33 This object can transform Bio::Seq objects to and from GenBank flat
  34 file databases.
  35
  36 There is some flexibility here about how to write GenBank output
  37 that is not fully documented.
  38
  39 =head2 Optional functions
  40
  41 =over 3
  42
  43 =item _show_dna()
  44
  45 (output only) shows the dna or not
  46
  47 =item _post_sort()
  48
  49 (output only) provides a sorting func which is applied to the FTHelpers
  50 before printing
  51
  52 =item _id_generation_func()
  53
  54 This is function which is called as
  55
  56    print "ID   ", $func($seq), "\n";
  57
  58 To generate the ID line. If it is not there, it generates a sensible ID
  59 line using a number of tools.
  60
  61 If you want to output annotations in Genbank format they need to be
  62 stored in a Bio::Annotation::Collection object which is accessible
  63 through the Bio::SeqI interface method L<annotation()|annotation>.
  64
  65 The following are the names of the keys which are pulled from a
  66 L<Bio::Annotation::Collection> object:
  67
  68  reference       - Should contain Bio::Annotation::Reference objects
  69  comment         - Should contain Bio::Annotation::Comment objects
  70  dblink          - Should contain a Bio::Annotation::DBLink object
  71  segment         - Should contain a Bio::Annotation::SimpleValue object
  72  origin          - Should contain a Bio::Annotation::SimpleValue object
  73  wgs             - Should contain a Bio::Annotation::SimpleValue object
  74
  75 =back
  76
  77 =head1 Where does the data go?
  78
  79 Data parsed in Bio::SeqIO::genbank is stored in a variety of data
  80 fields in the sequence object that is returned. Here is a partial list
  81 of fields.
  82
  83 Items listed as RichSeq or Seq or PrimarySeq and then NAME() tell you
  84 the top level object which defines a function called NAME() which
  85 stores this information.
  86
  87 Items listed as Annotation 'NAME' tell you the data is stored the
  88 associated Bio::AnnotationCollectionI object which is associated with
  89 Bio::Seq objects.  If it is explicitly requested that no annotations
  90 should be stored when parsing a record of course they will not be
  91 available when you try and get them.  If you are having this problem
  92 look at the type of SeqBuilder that is being used to contruct your
  93 sequence object.
  94
  95  Comments             Annotation 'comment'
  96  References           Annotation 'reference'
  97  Segment              Annotation 'segment'
  98  Origin               Annotation 'origin'
  99  Dbsource             Annotation 'dblink'
 100
 101  Accessions           PrimarySeq accession_number()
 102  Secondary accessions RichSeq get_secondary_accessions()
 103  GI number            PrimarySeq primary_id()
 104  LOCUS                PrimarySeq display_id()
 105  Keywords             RichSeq get_keywords()
 106  Dates                RichSeq get_dates()
 107  Molecule             RichSeq molecule()
 108  Seq Version          RichSeq seq_version()
 109  PID                  RichSeq pid()
 110  Division             RichSeq division()
 111  Features             Seq get_SeqFeatures()
 112  Alphabet             PrimarySeq alphabet()
 113  Definition           PrimarySeq description() or desc()
 114  Version              PrimarySeq version()
 115
 116  Sequence             PrimarySeq seq()
 117
 118 There is more information in the Feature-Annotation HOWTO about each
 119 field and how it is mapped to the Sequence object
 120 L<http://bioperl.open-bio.org/wiki/HOWTO:Feature-Annotation>.
 121
 122 =head1 FEEDBACK
 123
 124 =head2 Mailing Lists
 125
 126 User feedback is an integral part of the evolution of this and other
 127 Bioperl modules. Send your comments and suggestions preferably to one
 128 of the Bioperl mailing lists.  Your participation is much appreciated.
 129
 130   bioperl-l@bioperl.org                  - General discussion
 131   http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
 132
 133 =head2 Support
 134
 135 Please direct usage questions or support issues to the mailing list:
 136
 137 I<bioperl-l@bioperl.org>
 138
 139 rather than to the module maintainer directly. Many experienced and
 140 reponsive experts will be able look at the problem and quickly
 141 address it. Please include a thorough description of the problem
 142 with code and data examples if at all possible.
 143
 144 =head2 Reporting Bugs
 145
 146 Report bugs to the Bioperl bug tracking system to help us keep track
 147 the bugs and their resolution. Bug reports can be submitted via the web:
 148
 149   https://github.com/bioperl/bioperl-live/issues
 150
 151 =head1 AUTHOR - Bioperl Project
 152
 153 bioperl-l at bioperl.org
 154
 155 Original author Elia Stupka, elia -at- tigem.it
 156
 157 =head1 CONTRIBUTORS
 158
 159 Ewan Birney birney at ebi.ac.uk
 160 Jason Stajich jason at bioperl.org
 161 Chris Mungall cjm at fruitfly.bdgp.berkeley.edu
 162 Lincoln Stein lstein at cshl.org
 163 Heikki Lehvaslaiho, heikki at ebi.ac.uk
 164 Hilmar Lapp, hlapp at gmx.net
 165 Donald G. Jackson, donald.jackson at bms.com
 166 James Wasmuth, james.wasmuth at ed.ac.uk
 167 Brian Osborne, bosborne at alum.mit.edu
 168 Chris Fields, cjfields at bioperl dot org
 169
 170 =head1 APPENDIX
 171
 172 The rest of the documentation details each of the object
 173 methods. Internal methods are usually preceded with a _
 174
 175 =cut
 176
 177 # Let the code begin...
 178
 179 package Bio::SeqIO::genbank;
 180 use strict;
 181
 182 use Bio::SeqIO::FTHelper;
 183 use Bio::SeqFeature::Generic;
 184 use Bio::Species;
 185 use Bio::Seq::SeqFactory;
 186 use Bio::Annotation::Collection;
 187 use Bio::Annotation::Comment;
 188 use Bio::Annotation::Reference;
 189 use Bio::Annotation::DBLink;
 190
 191 use base qw(Bio::SeqIO);
 192
 193 # Note that a qualifier that exceeds one line (i.e. a long label) will
 194 # automatically be quoted regardless:
 195
 196 our $FTQUAL_LINE_LENGTH = 60;
 197
 198 our %FTQUAL_NO_QUOTE = map {$_ => 1} qw(
 199     anticodon           citation
 200     codon               codon_start
 201     cons_splice         direction
 202     evidence            label
 203     mod_base            number
 204     rpt_type            rpt_unit
 205     transl_except       transl_table
 206     usedin
 207     );
 208
 209 our %DBSOURCE = map {$_ => 1} qw(
 210     EchoBASE     IntAct    SWISS-2DPAGE    ECO2DBASE    ECOGENE    TIGRFAMs
 211     TIGR    GO    InterPro    Pfam    PROSITE    SGD    GermOnline
 212     HSSP    PhosSite    Ensembl    RGD    AGD    ArrayExpress    KEGG
 213     H-InvDB    HGNC    LinkHub    PANTHER    PRINTS    SMART    SMR
 214     MGI    MIM    RZPD-ProtExp    ProDom    MEROPS    TRANSFAC    Reactome
 215     UniGene    GlycoSuiteDB    PIRSF    HSC-2DPAGE    PHCI-2DPAGE
 216     PMMA-2DPAGE    Siena-2DPAGE    Rat-heart-2DPAGE    Aarhus/Ghent-2DPAGE
 217     Biocyc    MetaCyc    Biocyc:Metacyc    GenomeReviews    FlyBase
 218     TMHOBP    COMPLUYEAST-2DPAGE    OGP    DictyBase    HAMAP
 219     PhotoList    Gramene    WormBase    WormPep    Genew    ZFIN
 220     PeroxiBase    MaizeDB    TAIR    DrugBank    REBASE    HPA
 221     swissprot    GenBank    GenPept    REFSEQ    embl    PDB    UniProtKB
 222     DIP    PeptideAtlas    PRIDE    CYGD    HOGENOME    Gene3D
 223     Project);
 224
 225 our %VALID_MOLTYPE = map {$_ => 1} qw(NA DNA RNA tRNA rRNA cDNA cRNA ms-DNA
 226     mRNA  uRNA  ss-RNA  ss-DNA  snRNA snoRNA PRT);
 227
 228 our %VALID_ALPHABET = (
 229     'bp' => 'dna',
 230     'aa' => 'protein',
 231     'rc' => '' # rc = release candidate; file has no sequences
 232 );
 233
 234 sub _initialize {
 235     my($self, @args) = @_;
 236
 237     $self->SUPER::_initialize(@args);
 238     # hash for functions for decoding keys.
 239     $self->{'_func_ftunit_hash'} = {};
 240     $self->_show_dna(1); # sets this to one by default. People can change it
 241     if ( not defined $self->sequence_factory ) {
 242         $self->sequence_factory
 243             (Bio::Seq::SeqFactory->new(-verbose => $self->verbose,
 244                                        -type    => 'Bio::Seq::RichSeq'));
 245     }
 246 }
 247
 248 =head2 next_seq
 249
 250  Title   : next_seq
 251  Usage   : $seq = $stream->next_seq()
 252  Function: returns the next sequence in the stream
 253  Returns : Bio::Seq object
 254  Args    :
 255
 256 =cut
 257
 258 sub next_seq {
 259     my ($self, @args) = @_;
 260     my %args    = @args;
 261     my $builder = $self->sequence_builder;
 262     my $seq;
 263     my %params;
 264
 265   RECORDSTART:
 266     while (1) {
 267         my $buffer;
 268         my ( @acc,        @features );
 269         my ( $display_id, $annotation );
 270         my $species;
 271
 272         # initialize; we may come here because of starting over
 273         @features   = ();
 274         $annotation = undef;
 275         @acc        = ();
 276         $species    = undef;
 277         %params     = ( -verbose => $self->verbose );    # reset hash
 278         local ($/) = "\n";
 279         while ( defined( $buffer = $self->_readline ) ) {
 280             last if index( $buffer, 'LOCUS       ' ) == 0;
 281         }
 282         return unless defined $buffer;                   # end of file
 283         $buffer =~ /^LOCUS\s+(\S.*)$/o
 284             or $self->throw(  "GenBank stream with bad LOCUS line. "
 285                             . "Not GenBank in my book. Got '$buffer'");
 286         my @tokens = split( ' ', $1 );
 287
 288         # this is important to have the id for display in e.g. FTHelper,
 289         # otherwise you won't know which entry caused an error
 290         $display_id = shift @tokens;
 291         $params{'-display_id'} = $display_id;
 292
 293         # may still be useful if we don't want the seq
 294         my $seqlength = shift @tokens;
 295         if ( exists $VALID_ALPHABET{$seqlength} ) {
 296             # moved one token too far.  No locus name?
 297             $self->warn(  "Bad LOCUS name?  Changing [$params{'-display_id'}] "
 298                         . "to 'unknown' and length to '$display_id'"
 299             );
 300             $params{'-display_id'} = 'unknown';
 301             $params{'-length'}     = $display_id;
 302
 303             # add token back...
 304             unshift @tokens, $seqlength;
 305         }
 306         else {
 307             $params{'-length'} = $seqlength;
 308         }
 309
 310         # the alphabet of the entry
 311         # shouldn't assign alphabet unless one
 312         # is specifically designated (such as for rc files)
 313         my $alphabet = lc( shift @tokens );
 314         $params{'-alphabet'} =
 315           ( exists $VALID_ALPHABET{$alphabet} )
 316           ? $VALID_ALPHABET{$alphabet}
 317           : $self->warn("Unknown alphabet: $alphabet");
 318
 319         # for aa there is usually no 'molecule' (mRNA etc)
 320         if ( $params{'-alphabet'} eq 'protein' ) {
 321             $params{'-molecule'} = 'PRT';
 322         }
 323         else {
 324             $params{'-molecule'} = shift(@tokens);
 325         }
 326
 327         # take care of lower case issues
 328         if ( $params{'-molecule'} eq 'dna' or $params{'-molecule'} eq 'rna' ) {
 329             $params{'-molecule'} = uc $params{'-molecule'};
 330         }
 331         $self->debug( "Unrecognized molecule type: " . $params{'-molecule'} )
 332             if not exists( $VALID_MOLTYPE{ $params{'-molecule'} } );
 333
 334         my $circ = shift @tokens;
 335         if ( $circ eq 'circular' ) {
 336             $params{'-is_circular'} = 1;
 337             $params{'-division'}    = shift @tokens;
 338         }
 339         else {
 340             # 'linear' or 'circular' may actually be omitted altogether
 341             $params{'-division'} =
 342                 ( CORE::length($circ) == 3 ) ? $circ : shift @tokens;
 343         }
 344         my $date = join( ' ', @tokens );    # we lump together the rest
 345
 346         # this is per request bug #1513
 347         # we can handle:
 348         # 9-10-2003
 349         # 9-10-03
 350         # 09-10-2003
 351         # 09-10-03
 352         if ( $date =~ s/\s*((\d{1,2})-(\w{3})-(\d{2,4})).*/$1/ ) {
 353             if ( length($date) < 11 ) {
 354                 # improperly formatted date
 355                 # But we'll be nice and fix it for them
 356                 my ( $d, $m, $y ) = ( $2, $3, $4 );
 357                 if ( length($d) == 1 ) {
 358                     $d = "0$d";
 359                 }
 360
 361                 # guess the century here
 362                 if ( length($y) == 2 ) {
 363                     if ( $y > 60 ) {    # arbitrarily guess that '60' means 1960
 364                         $y = "19$y";
 365                     }
 366                     else {
 367                         $y = "20$y";
 368                     }
 369                     $self->warn(  "Date was malformed, guessing the "
 370                                 . "century for $date to be $y\n"
 371                     );
 372                 }
 373                 $params{'-dates'} = [ join( '-', $d, $m, $y ) ];
 374             }
 375             else {
 376                 $params{'-dates'} = [$date];
 377             }
 378         }
 379
 380         # set them all at once
 381         $builder->add_slot_value(%params);
 382         %params = ();
 383
 384         # parse the rest if desired, otherwise start over
 385         if ( not $builder->want_object ) {
 386             $builder->make_object;
 387             next RECORDSTART;
 388         }
 389
 390         # set up annotation depending on what the builder wants
 391         if ( $builder->want_slot('annotation') ) {
 392             $annotation = Bio::Annotation::Collection->new;
 393         }
 394
 395         $buffer = $self->_readline;
 396         while ( defined( my $line = $buffer ) ) {
 397             # Description line(s)
 398             if ($line =~ /^DEFINITION\s+(\S.*\S)/) {
 399                 my @desc = ($1);
 400                 while ( defined( $line = $self->_readline ) ) {
 401                     if ($line =~ /^\s+(.*)/) {
 402                         push( @desc, $1 );
 403                         next;
 404                     }
 405                     last;
 406                 }
 407                 $builder->add_slot_value( -desc => join( ' ', @desc ) );
 408
 409                 # we'll continue right here because DEFINITION
 410                 # always comes at the top of the entry
 411                 $buffer = $line;
 412             }
 413
 414             # accession number (there can be multiple accessions)
 415             if ($line =~ /^ACCESSION\s+(\S.*\S)/) {
 416                 push( @acc, split( /\s+/, $1 ) );
 417                 while ( defined( $line = $self->_readline ) ) {
 418                     if ($line =~ /^\s+(.*)/) {
 419                         push( @acc, split( /\s+/, $1 ) );
 420                         next;
 421                     }
 422                     last;
 423                 }
 424                 $buffer = $line;
 425                 next;
 426             }
 427
 428             # PID
 429             elsif ($line =~ /^PID\s+(\S+)/) {
 430                 $params{'-pid'} = $1;
 431             }
 432
 433             # Version number
 434             elsif ($line =~ /^VERSION\s+(\S.+)$/) {
 435                 my ( $acc, $gi ) = split( ' ', $1 );
 436                 if ( $acc =~ /^\w+\.(\d+)/ ) {
 437                     $params{'-version'}     = $1;
 438                     $params{'-seq_version'} = $1;
 439                 }
 440                 if ( $gi && ( index( $gi, "GI:" ) == 0 ) ) {
 441                     $params{'-primary_id'} = substr( $gi, 3 );
 442                 }
 443             }
 444
 445             # Keywords
 446             elsif ($line =~ /^KEYWORDS\s+(\S.*)/) {
 447                 my @kw = split( /\s*\;\s*/, $1 );
 448                 while ( defined( $line = $self->_readline ) ) {
 449                     chomp $line;
 450                     if ($line =~ /^\s+(.*)/) {
 451                         push( @kw, split( /\s*\;\s*/, $1 ) );
 452                         next;
 453                     }
 454                     last;
 455                 }
 456
 457                 @kw && $kw[-1] =~ s/\.$//;
 458                 $params{'-keywords'} = \@kw;
 459                 $buffer = $line;
 460                 next;
 461             }
 462
 463             # Organism name and phylogenetic information
 464             elsif ($line =~ /^SOURCE\s+\S/) {
 465                 if ( $builder->want_slot('species') ) {
 466                     $species = $self->_read_GenBank_Species( \$buffer );
 467                     $builder->add_slot_value( -species => $species );
 468                 }
 469                 else {
 470                     while ( defined( $buffer = $self->_readline ) ) {
 471                         last if substr( $buffer, 0, 1 ) ne ' ';
 472                     }
 473                 }
 474                 next;
 475             }
 476
 477             # References
 478             elsif ($line =~ /^REFERENCE\s+\S/) {
 479                 if ($annotation) {
 480                     my @refs = $self->_read_GenBank_References( \$buffer );
 481                     foreach my $ref (@refs) {
 482                         $annotation->add_Annotation( 'reference', $ref );
 483                     }
 484                 }
 485                 else {
 486                     while ( defined( $buffer = $self->_readline ) ) {
 487                         last if substr( $buffer, 0, 1 ) ne ' ';
 488                     }
 489                 }
 490                 next;
 491             }
 492
 493             # Project
 494             elsif ($line =~ /^PROJECT\s+(\S.*)/) {
 495                 if ($annotation) {
 496                     my $project =
 497                         Bio::Annotation::SimpleValue->new( -value => $1 );
 498                     $annotation->add_Annotation( 'project', $project );
 499                 }
 500             }
 501
 502             # Comments
 503             elsif ($line =~ /^COMMENT\s+(\S.*)/) {
 504                 if ($annotation) {
 505                     my $comment = $1;
 506                     while ( defined( $line = $self->_readline ) ) {
 507                         last if ($line =~ /^\S/);
 508                         $comment .= $line;
 509                     }
 510                     $comment =~ s/\n/ /g;
 511                     $comment =~ s/  +/ /g;
 512                     $annotation->add_Annotation(
 513                         'comment',
 514                         Bio::Annotation::Comment->new(
 515                             -text    => $comment,
 516                             -tagname => 'comment'
 517                         )
 518                     );
 519                     $buffer = $line;
 520                 }
 521                 else {
 522                     while ( defined( $buffer = $self->_readline ) ) {
 523                         last if substr( $buffer, 0, 1 ) ne ' ';
 524                     }
 525                 }
 526                 next;
 527             }
 528
 529             # Corresponding Genbank nucleotide id, Genpept only
 530             elsif ($line =~ /^DB(?:SOURCE|LINK)\s+(\S.+)/) {
 531                 if ($annotation) {
 532                     my $dbsource = $1;
 533                     while ( defined( $line = $self->_readline ) ) {
 534                         last if ($line =~ /^\S/);
 535                         $dbsource .= $line;
 536                     }
 537
 538                     # deal with UniProKB dbsources
 539                     if ( $dbsource =~
 540                         s/(UniProt(?:KB)?|swissprot):\s+locus\s+(\S+)\,.+\n//
 541                         ) {
 542                         $annotation->add_Annotation(
 543                             'dblink',
 544                             Bio::Annotation::DBLink->new(
 545                                 -primary_id => $2,
 546                                 -database   => $1,
 547                                 -tagname    => 'dblink'
 548                             )
 549                         );
 550                         if ( $dbsource =~ s/\s+created:\s+([^\.]+)\.\n// ) {
 551                             $annotation->add_Annotation(
 552                                 'swissprot_dates',
 553                                 Bio::Annotation::SimpleValue->new(
 554                                     -tagname => 'date_created',
 555                                     -value   => $1
 556                                 )
 557                             );
 558                         }
 559                         while ( $dbsource =~
 560                                s/\s+(sequence|annotation)\s+updated:\s+([^\.]+)\.\n//g
 561                             ) {
 562                             $annotation->add_Annotation(
 563                                 'swissprot_dates',
 564                                 Bio::Annotation::SimpleValue->new(
 565                                     -tagname => 'date_updated',
 566                                     -value   => $2
 567                                 )
 568                             );
 569                         }
 570                         $dbsource =~ s/\n/ /g;
 571                         if ( $dbsource =~
 572                             s/\s+xrefs:\s+((?:\S+,\s+)+\S+)\s+xrefs/xrefs/
 573                             ) {
 574                             # will use $i to determine even or odd
 575                             # for swissprot the accessions are paired
 576                             my $i = 0;
 577                             for my $dbsrc ( split( /,\s+/, $1 ) ) {
 578                                 if (   $dbsrc =~ /(\S+)\.(\d+)/
 579                                     or $dbsrc =~ /(\S+)/
 580                                     ) {
 581                                     my ( $id, $version ) = ( $1, $2 );
 582                                     $version = '' unless defined $version;
 583                                     my $db = ( $id =~ /^\d\S{3}/ ) ? 'PDB'
 584                                            : ( $i++ % 2 )          ? 'GenPept'
 585                                            : 'GenBank';
 586
 587                                     $annotation->add_Annotation(
 588                                         'dblink',
 589                                         Bio::Annotation::DBLink->new(
 590                                             -primary_id => $id,
 591                                             -version    => $version,
 592                                             -database   => $db,
 593                                             -tagname    => 'dblink'
 594                                         )
 595                                     );
 596                                 }
 597                             }
 598                         }
 599                         elsif ( $dbsource =~ s/\s+xrefs:\s+(.+)\s+xrefs/xrefs/i ) {
 600                             # download screwed up and ncbi didn't put acc in for gi numbers
 601                             my $i = 0;
 602                             for my $id ( split( /\,\s+/, $1 ) ) {
 603                                 my ( $acc, $db );
 604                                 if ( $id =~ /gi:\s+(\d+)/ ) {
 605                                     $acc = $1;
 606                                     $db = ( $i++ % 2 ) ? 'GenPept' : 'GenBank';
 607                                 }
 608                                 elsif ( $id =~ /pdb\s+accession\s+(\S+)/ ) {
 609                                     $acc = $1;
 610                                     $db  = 'PDB';
 611                                 }
 612                                 else {
 613                                     $acc = $id;
 614                                     $db  = '';
 615                                 }
 616                                 $annotation->add_Annotation(
 617                                     'dblink',
 618                                     Bio::Annotation::DBLink->new(
 619                                         -primary_id => $acc,
 620                                         -database   => $db,
 621                                         -tagname    => 'dblink'
 622                                     )
 623                                 );
 624                             }
 625                         }
 626                         else {
 627                             $self->debug("Cannot match $dbsource\n");
 628                         }
 629                         if ( $dbsource =~ s/xrefs\s+
 630                                             \(non\-sequence\s+databases\):\s+
 631                                             ((?:\S+,\s+)+\S+)//x
 632                             ) {
 633                             for my $id ( split( /\,\s+/, $1 ) ) {
 634                                 my $db;
 635
 636                                 # this is because GenBank dropped the spaces!!!
 637                                 # I'm sure we're not going to get this right
 638                                 ##if ( $id =~ s/^://i ) {
 639                                 ##    $db = $1;
 640                                 ##}
 641                                 $db = substr( $id, 0, index( $id, ':' ) );
 642                                 if ( not exists $DBSOURCE{$db} ) {
 643                                     $db = '';    # do we want 'GenBank' here?
 644                                 }
 645                                 $id = substr( $id, index( $id, ':' ) + 1 );
 646                                 $annotation->add_Annotation(
 647                                     'dblink',
 648                                     Bio::Annotation::DBLink->new(
 649                                         -primary_id => $id,
 650                                         -database   => $db,
 651                                         -tagname    => 'dblink'
 652                                     )
 653                                 );
 654                             }
 655                         }
 656                     }
 657                     else {
 658                         if ( $dbsource =~
 659                             /^(\S*?):?\s*accession\s+(\S+)\.(\d+)/
 660                             ) {
 661                             my ( $db, $id, $version ) = ( $1, $2, $3 );
 662                             $annotation->add_Annotation(
 663                                 'dblink',
 664                                 Bio::Annotation::DBLink->new(
 665                                     -primary_id => $id,
 666                                     -version    => $version,
 667                                     -database   => $db || 'GenBank',
 668                                     -tagname    => 'dblink'
 669                                 )
 670                             );
 671                         }
 672                         elsif ( $dbsource =~ /^(\S*?):?\s*accession\s+(\S+)/ ) {
 673                             my ( $db, $id ) = ( $1, $2 );
 674                             $annotation->add_Annotation(
 675                                 'dblink',
 676                                 Bio::Annotation::DBLink->new(
 677                                     -primary_id => $id,
 678                                     -database   => $db || 'GenBank',
 679                                     -tagname    => 'dblink'
 680                                 )
 681                             );
 682                         }
 683                         elsif ( $dbsource =~ /(\S+)([\.:])\s*(\S+)/ ) {
 684                             my ( $db, $version );
 685                             my @ids = ();
 686                             if ( $2 eq ':' ) {
 687                                 $db = $1;
 688
 689                                 # Genbank 192 release notes say this: "The second
 690                                 # field can consist of multiple comma-separated
 691                                 # identifiers, if a sequence record has multiple
 692                                 # DBLINK cross-references of a given type."
 693                                 # For example: DBLINK      Project:100,200,300"
 694                                 @ids = split( /,/, $3 );
 695                             }
 696                             else {
 697                                 ( $db, $version ) = ( 'GenBank', $3 );
 698                                 $ids[0] = $1;
 699                             }
 700
 701                             foreach my $id (@ids) {
 702                                 $annotation->add_Annotation(
 703                                     'dblink',
 704                                     Bio::Annotation::DBLink->new(
 705                                         -primary_id => $id,
 706                                         -version    => $version,
 707                                         -database   => $db,
 708                                         -tagname    => 'dblink'
 709                                     )
 710                                 );
 711                             }
 712                         }
 713                         else {
 714                             $self->warn(
 715                                 "Unrecognized DBSOURCE data: $dbsource\n");
 716                         }
 717                     }
 718                     $buffer = $line;
 719                 }
 720                 else {
 721                     while ( defined( $buffer = $self->_readline ) ) {
 722                         last if substr( $buffer, 0, 1 ) ne ' ';
 723                     }
 724                 }
 725                 next;
 726             }
 727
 728             # Exit at start of Feature table, or start of sequence
 729             if ($line =~ /^(FEATURES|ORIGIN)/) {
 730                 my $trap;
 731             }
 732             last if ($line =~ /^(FEATURES|ORIGIN)/);
 733
 734             # Get next line and loop again
 735             $buffer = $self->_readline;
 736         }
 737         return unless defined $buffer;
 738
 739         # add them all at once for efficiency
 740         $builder->add_slot_value(
 741             -accession_number     => shift(@acc),
 742             -secondary_accessions => \@acc,
 743             %params
 744         );
 745         $builder->add_slot_value( -annotation => $annotation ) if $annotation;
 746         %params = ();    # reset before possible re-use to avoid setting twice
 747
 748         # start over if we don't want to continue with this entry
 749         if ( not $builder->want_object ) {
 750             $builder->make_object;
 751             next RECORDSTART;
 752         }
 753
 754         # some "minimal" formats may not necessarily have a feature table
 755         if (    $builder->want_slot('features')
 756             and defined $buffer
 757             and $buffer =~ /^FEATURES/o
 758             ) {
 759             # need to read the first line of the feature table
 760             $buffer = $self->_readline;
 761
 762             # DO NOT read lines in the while condition -- this is done
 763             # as a side effect in _read_FTHelper_GenBank!
 764
 765             # part of new circular spec:
 766             # commented out for now until kinks worked out
 767             #my $sourceEnd = 0;
 768             #$sourceEnd = $2 if ($buffer =~ /(\d+?)\.\.(\d+?)$/);
 769
 770             while ( defined $buffer ) {
 771                 # check immediately -- not at the end of the loop
 772                 # note: GenPept entries obviously do not have a BASE line
 773                 last if ( $buffer =~ /^BASE|ORIGIN|CONTIG|WGS/o );
 774
 775                 # slurp in one feature at a time -- at return, the start of
 776                 # the next feature will have been read already, so we need
 777                 # to pass a reference, and the called method must set this
 778                 # to the last line read before returning
 779                 my $ftunit = $self->_read_FTHelper_GenBank( \$buffer );
 780
 781                 # implement new circular spec: features that cross the origin are now
 782                 # seamless instead of being 2 separate joined features
 783                 # commented out until kinks get worked out
 784                 #if ((! $args{'-nojoin'}) && $ftunit->{'loc'} =~ /^join\((\d+?)\.\.(\d+?),(\d+?)..(\d+?)\)$/
 785                 #&& $sourceEnd == $2 && $3 == 1) {
 786                 #my $start = $1;
 787                 #my $end = $2 + $4;
 788                 #$ftunit->{'loc'} = "$start..$end";
 789                 #}
 790
 791                 # fix suggested by James Diggans
 792
 793                 if ( not defined $ftunit ) {
 794                     # GRRRR. We have fallen over. Try to recover
 795                     $self->warn(  "Unexpected error in feature table for "
 796                                 . $params{'-display_id'}
 797                                 . " Skipping feature, attempting to recover" );
 798
 799                     unless (   $buffer =~ /^\s{5,5}\S+/o
 800                             or $buffer =~ /^\S+/o
 801                         ) {
 802                         $buffer = $self->_readline;
 803                     }
 804                     next;    # back to reading FTHelpers
 805                 }
 806
 807                 # process ftunit
 808                 my $feat =
 809                     $ftunit->_generic_seqfeature( $self->location_factory,
 810                                                   $display_id );
 811
 812                 # add taxon_id from source if available
 813                 if (   $species
 814                     and $feat->primary_tag eq 'source'
 815                     and $feat->has_tag('db_xref')
 816                     and (    not $species->ncbi_taxid
 817                          or (    $species->ncbi_taxid
 818                              and $species->ncbi_taxid =~ /^list/ ) )
 819                     ) {
 820                     foreach my $tagval ( $feat->get_tag_values('db_xref') ) {
 821                         if ( index( $tagval, "taxon:" ) == 0 ) {
 822                             $species->ncbi_taxid( substr( $tagval, 6 ) );
 823                             last;
 824                         }
 825                     }
 826                 }
 827
 828                 # add feature to list of features
 829                 push( @features, $feat );
 830             }
 831             $builder->add_slot_value( -features => \@features );
 832         }
 833
 834         if ( defined $buffer ) {
 835             # CONTIG lines: TODO, this needs to be cleaned up
 836             if ($buffer =~/^CONTIG\s+(.*)/o) {
 837                 my $ctg = $1;
 838                 while ( defined( $buffer = $self->_readline ) ) {
 839                     last if $buffer =~ m{^ORIGIN|//}o;
 840                     $buffer =~ s/\s+(.*)/$1/;
 841                     $ctg .= $buffer;
 842                 }
 843                 if ($ctg) {
 844                     $annotation->add_Annotation(
 845                         Bio::Annotation::SimpleValue->new(
 846                             -tagname => 'contig',
 847                             -value   => $ctg
 848                         )
 849                     );
 850                 }
 851             }
 852             elsif ($buffer =~ /^WGS|WGS_SCAFLD\s+/o) {    # catch WGS/WGS_SCAFLD lines
 853                 while ( $buffer =~ s/(^WGS|WGS_SCAFLD)\s+// ) {    # gulp lines
 854                     chomp $buffer;
 855                     $annotation->add_Annotation(
 856                         Bio::Annotation::SimpleValue->new(
 857                             -value   => $buffer,
 858                             -tagname => lc $1
 859                         )
 860                     );
 861                     $buffer = $self->_readline;
 862                 }
 863             }
 864             elsif ( $buffer !~ m{^ORIGIN|//}o ) {    # advance to the sequence, if any
 865                 while ( defined( $buffer = $self->_readline ) ) {
 866                     last if $buffer =~ m{^(ORIGIN|//)};
 867                 }
 868             }
 869         }
 870         if ( not $builder->want_object ) {
 871             $builder->make_object;        # implicit end-of-object
 872             next RECORDSTART;
 873         }
 874         if ( $builder->want_slot('seq') ) {
 875             # the fact that we want a sequence does not necessarily mean that
 876             # there also is a sequence ...
 877             if ( defined $buffer and $buffer =~ s/^ORIGIN\s+// ) {
 878                 if ( $annotation and length($buffer) > 0 ) {
 879                     $annotation->add_Annotation(
 880                         'origin',
 881                         Bio::Annotation::SimpleValue->new(
 882                             -tagname => 'origin',
 883                             -value   => $buffer
 884                         )
 885                     );
 886                 }
 887                 my $seqc = '';
 888                 while ( defined( $buffer = $self->_readline ) ) {
 889                     last if $buffer =~ m{^//};
 890                     $buffer = uc $buffer;
 891                     $buffer =~ s/[^A-Za-z]//g;
 892                     $seqc .= $buffer;
 893                 }
 894
 895                 $builder->add_slot_value( -seq => $seqc );
 896             }
 897         }
 898         elsif ( defined($buffer) and ( substr( $buffer, 0, 2 ) ne '//' ) ) {
 899             # advance to the end of the record
 900             while ( defined( $buffer = $self->_readline ) ) {
 901                 last if substr( $buffer, 0, 2 ) eq '//';
 902             }
 903         }
 904
 905         # Unlikely, but maybe the sequence is so weird that we don't want it
 906         # anymore. We don't want to return undef if the stream's not exhausted
 907         # yet.
 908         $seq = $builder->make_object;
 909         next RECORDSTART unless $seq;
 910         last RECORDSTART;
 911     }    # end while RECORDSTART
 912
 913     return $seq;
 914 }
 915
 916 =head2 write_seq
 917
 918  Title   : write_seq
 919  Usage   : $stream->write_seq($seq)
 920  Function: writes the $seq object (must be seq) to the stream
 921  Returns : 1 for success and 0 for error
 922  Args    : array of 1 to n Bio::SeqI objects
 923
 924 =cut
 925
 926 sub write_seq {
 927     my ($self,@seqs) = @_;
 928
 929     foreach my $seq ( @seqs ) {
 930         $self->throw("Attempting to write with no seq!") unless defined $seq;
 931
 932         if ( not ref $seq or not $seq->isa('Bio::SeqI') ) {
 933             $self->warn(" $seq is not a SeqI compliant module. Attempting to dump, but may fail!");
 934         }
 935
 936         my $str   = $seq->seq;
 937         my $len   = $seq->length;
 938         my $alpha = $seq->alphabet;
 939
 940         my ($div, $mol);
 941         if (   not $seq->can('division')
 942             or not defined($div = $seq->division)
 943             ) {
 944             $div = 'UNK';
 945         }
 946         if (   not $seq->can('molecule')
 947             or not defined ($mol = $seq->molecule)
 948            ) {
 949             $mol =  $alpha || 'DNA';
 950         }
 951
 952         my $circular = ($seq->is_circular) ? 'circular' : 'linear  ';
 953
 954         local($^W) = 0; # supressing warnings about uninitialized fields.
 955
 956         my $temp_line;
 957         if ( $self->_id_generation_func ) {
 958             $temp_line = &{$self->_id_generation_func}($seq);
 959         }
 960         else {
 961             my $date = '';
 962             if ( $seq->can('get_dates') ) {
 963                 ($date) = $seq->get_dates;
 964             }
 965
 966             $self->warn("No whitespace allowed in GenBank display id [". $seq->display_id. "]")
 967                 if $seq->display_id =~ /\s/;
 968
 969             my @data = ( lc($alpha) eq 'protein' ) ? ('aa', '', '') : ('bp', '', $mol);
 970             $temp_line = sprintf ("%-12s%-15s%13s %s%4s%-8s%-8s %3s %-s\n",
 971                                   'LOCUS', $seq->id, $len,
 972                                   @data, $circular, $div, $date);
 973         }
 974
 975         $self->_print($temp_line);
 976         $self->_write_line_GenBank_regex("DEFINITION  ", "            ",
 977                                          $seq->desc,     "\\s\+\|\$",80);
 978
 979         # if there, write the accession line
 980
 981         if ( $self->_ac_generation_func ) {
 982             $temp_line = &{$self->_ac_generation_func}($seq);
 983             $self->_print("ACCESSION   $temp_line\n");
 984         }
 985         else {
 986             my @acc = ();
 987             push @acc, $seq->accession_number;
 988             if ( $seq->isa('Bio::Seq::RichSeqI') ) {
 989                 push @acc, $seq->get_secondary_accessions;
 990             }
 991             $self->_print("ACCESSION   ", join(" ", @acc), "\n");
 992             # otherwise - cannot print <sigh>
 993         }
 994
 995         # if PID defined, print it
 996         if ($seq->isa('Bio::Seq::RichSeqI') and $seq->pid) {
 997             $self->_print("PID         ", $seq->pid, "\n");
 998         }
 999
1000         # if there, write the version line
1001         if ( defined $self->_sv_generation_func ) {
1002             $temp_line = &{$self->_sv_generation_func}($seq);
1003             if ( $temp_line ) {
1004                 $self->_print("VERSION     $temp_line\n");
1005             }
1006         }
1007         elsif ($seq->isa('Bio::Seq::RichSeqI') and defined($seq->seq_version)) {
1008             my $id = $seq->primary_id; # this may be a GI number
1009             my $data = (defined $id and $id =~ /^\d+$/) ? "  GI:$id" : "";
1010             $self->_print("VERSION     ",
1011                           $seq->accession_number, ".",
1012                           $seq->seq_version, $data, "\n");
1013         }
1014
1015         # if there, write the PROJECT line
1016         for my $proj ( $seq->annotation->get_Annotations('project') ) {
1017             $self->_print("PROJECT     ".$proj->value."\n");
1018         }
1019
1020         # if there, write the DBSOURCE line
1021         foreach my $ref ( $seq->annotation->get_Annotations('dblink') ) {
1022             my ($db, $id) = ($ref->database, $ref->primary_id);
1023             my $prefix = $db eq 'Project' ? 'DBLINK' : 'DBSOURCE';
1024             my $text   = $db eq 'GenBank' ? ''
1025                        : $db eq 'Project' ? "$db:$id"
1026                        : "$db accession $id";
1027             $self->_print(sprintf ("%-11s %s\n", $prefix, $text));
1028         }
1029
1030         # if there, write the keywords line
1031         if ( defined $self->_kw_generation_func ) {
1032             $temp_line = &{$self->_kw_generation_func}($seq);
1033             $self->_print("KEYWORDS    $temp_line\n");
1034         }
1035         elsif ( $seq->can('keywords') ) {
1036             my $kw = $seq->keywords;
1037             $kw .= '.' if ( $kw !~ /\.$/ );
1038             $self->_print("KEYWORDS    $kw\n");
1039         }
1040
1041         # SEGMENT if it exists
1042         foreach my $ref ( $seq->annotation->get_Annotations('segment') ) {
1043             $self->_print(sprintf ("%-11s %s\n",'SEGMENT',
1044                                    $ref->value));
1045         }
1046
1047         # Organism lines
1048         if (my $spec = $seq->species) {
1049             my ($on, $sn, $cn) = ($spec->can('organelle') ? $spec->organelle : '',
1050                                   $spec->scientific_name,
1051                                   $spec->common_name);
1052             my @classification;
1053             if ($spec->isa('Bio::Species')) {
1054                 @classification = $spec->classification;
1055                 shift @classification;
1056             }
1057             else {
1058                 # Bio::Taxon should have a DB handle of some type attached, so
1059                 # derive the classification from that
1060                 my $node = $spec;
1061                 while ($node) {
1062                     $node = $node->ancestor || last;
1063                     unshift @classification, $node->node_name;
1064                     #$node eq $root && last;
1065                 }
1066                 @classification = reverse @classification;
1067             }
1068             my $abname = $spec->name('abbreviated') ? # from genbank file
1069                          $spec->name('abbreviated')->[0] : $sn;
1070             my $sl = $on ? "$on "          : '';
1071             $sl   .= $cn ? "$abname ($cn)" : $abname;
1072
1073             $self->_write_line_GenBank_regex("SOURCE      ", ' 'x12, $sl, "\\s\+\|\$", 80);
1074             $self->_print("  ORGANISM  ", $spec->scientific_name, "\n");
1075             my $OC = join('; ', reverse @classification) . '.';
1076             $self->_write_line_GenBank_regex(' 'x12,' 'x12, $OC, "\\s\+\|\$", 80);
1077         }
1078
1079         # Reference lines
1080         my $count = 1;
1081         foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {
1082             $temp_line = "REFERENCE   $count";
1083             if ($ref->start) {
1084                 $temp_line .= sprintf ("  (%s %d to %d)",
1085                                        ($seq->alphabet() eq "protein" ?
1086                                         "residues" : "bases"),
1087                                        $ref->start, $ref->end);
1088             }
1089             elsif ($ref->gb_reference) {
1090                 $temp_line .= sprintf ("  (%s)", $ref->gb_reference);
1091             }
1092             $self->_print("$temp_line\n");
1093             $self->_write_line_GenBank_regex("  AUTHORS   ", ' 'x12,
1094                                              $ref->authors,    "\\s\+\|\$", 80);
1095             $self->_write_line_GenBank_regex("  CONSRTM   ", ' 'x12,
1096                                              $ref->consortium, "\\s\+\|\$", 80) if $ref->consortium;
1097             $self->_write_line_GenBank_regex("  TITLE     ", ' 'x12,
1098                                              $ref->title,      "\\s\+\|\$", 80);
1099             $self->_write_line_GenBank_regex("  JOURNAL   ", ' 'x12,
1100                                              $ref->location,   "\\s\+\|\$", 80);
1101             if ( $ref->medline) {
1102                 $self->_write_line_GenBank_regex("  MEDLINE   ", ' 'x12,
1103                                                  $ref->medline,  "\\s\+\|\$", 80);
1104                 # I am assuming that pubmed entries only exist when there
1105                 # are also MEDLINE entries due to the indentation
1106             }
1107             # This could be a wrong assumption
1108             if ( $ref->pubmed ) {
1109                 $self->_write_line_GenBank_regex("   PUBMED   ", ' 'x12,
1110                                                  $ref->pubmed,   "\\s\+\|\$", 80);
1111             }
1112             # put remark at the end
1113             if ($ref->comment) {
1114                 $self->_write_line_GenBank_regex("  REMARK    ", ' 'x12,
1115                                                  $ref->comment,  "\\s\+\|\$", 80);
1116             }
1117             $count++;
1118         }
1119
1120         # Comment lines
1121         foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {
1122             $self->_write_line_GenBank_regex("COMMENT     ", ' 'x12,
1123                                              $comment->text, "\\s\+\|\$", 80);
1124         }
1125
1126         # FEATURES section
1127         $self->_print("FEATURES             Location/Qualifiers\n");
1128
1129         if ( defined $self->_post_sort ) {
1130             # we need to read things into an array. Process. Sort them. Print 'em
1131             my $post_sort_func = $self->_post_sort;
1132             my @fth;
1133
1134             foreach my $sf ( $seq->top_SeqFeatures ) {
1135                 push @fth, Bio::SeqIO::FTHelper::from_SeqFeature($sf, $seq);
1136             }
1137
1138             @fth = sort { &$post_sort_func($a, $b) } @fth;
1139
1140             foreach my $fth ( @fth ) {
1141                 $self->_print_GenBank_FTHelper($fth);
1142             }
1143         }
1144         else {
1145             # not post sorted. And so we can print as we get them.
1146             # lower memory load...
1147             foreach my $sf ( $seq->top_SeqFeatures ) {
1148                 my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf, $seq);
1149                 foreach my $fth ( @fth ) {
1150                     if ( ! $fth->isa('Bio::SeqIO::FTHelper') ) {
1151                         $sf->throw("Cannot process FTHelper... $fth");
1152                     }
1153                     $self->_print_GenBank_FTHelper($fth);
1154                 }
1155             }
1156         }
1157
1158         # deal with WGS; WGS_SCAFLD present only if WGS is also present
1159         if ($seq->annotation->get_Annotations('wgs')) {
1160             foreach my $wgs (map {$seq->annotation->get_Annotations($_)}
1161                              qw(wgs wgs_scaffold)
1162                 ) {
1163                 $self->_print(sprintf ("%-11s %s\n",
1164                                        uc($wgs->tagname),
1165                                        $wgs->value));
1166             }
1167             $self->_show_dna(0);
1168         }
1169         if ($seq->annotation->get_Annotations('contig')) {
1170             my $ct = 0;
1171             my $cline;
1172             foreach my $contig ($seq->annotation->get_Annotations('contig')) {
1173                 unless ($ct) {
1174                     $cline = uc($contig->tagname) . "      " . $contig->value . "\n";
1175                 }
1176                 else {
1177                     $cline = "            " . $contig->value . "\n";
1178                 }
1179                 $self->_print($cline);
1180                 $ct++;
1181             }
1182             $self->_show_dna(0);
1183         }
1184         if ( $seq->length == 0 ) {
1185             $self->_show_dna(0);
1186         }
1187
1188         if ( $self->_show_dna == 0 ) {
1189             $self->_print("\n//\n");
1190             return;
1191         }
1192
1193         # finished printing features.
1194
1195         $str =~ tr/A-Z/a-z/;
1196
1197         my ($o) = $seq->annotation->get_Annotations('origin');
1198         $self->_print(sprintf("%-12s%s\n",
1199                               'ORIGIN', $o ? $o->value : ''));
1200         # print out the sequence
1201         my $nuc = 60;           # Number of nucleotides per line
1202         my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line
1203         my $out_pat   = 'A11' x 6; # Pattern for packing a line
1204         my $length = length $str;
1205
1206         # Calculate the number of nucleotides which fit on whole lines
1207         my $whole = int($length / $nuc) * $nuc;
1208
1209         # Print the whole lines
1210         my $i;
1211         for ($i = 0; $i < $whole; $i += $nuc) {
1212             my $blocks = pack $out_pat,
1213             unpack $whole_pat,
1214             substr($str, $i, $nuc);
1215             chop $blocks;
1216             $self->_print(sprintf("%9d $blocks\n", $i + $nuc - 59));
1217         }
1218
1219         # Print the last line
1220         if (my $last = substr($str, $i)) {
1221             my $last_len = length($last);
1222             my $last_pat = 'a10' x int($last_len / 10)
1223                          . 'a' . $last_len % 10;
1224             my $blocks = pack $out_pat,
1225             unpack($last_pat, $last);
1226             $blocks =~ s/ +$//;
1227             $self->_print(sprintf("%9d $blocks\n",
1228                                   $length - $last_len + 1));
1229         }
1230
1231         $self->_print("//\n");
1232
1233         $self->flush if $self->_flush_on_write && defined $self->_fh;
1234         return 1;
1235     }
1236 }
1237
1238 =head2 _print_GenBank_FTHelper
1239
1240  Title   : _print_GenBank_FTHelper
1241  Usage   :
1242  Function:
1243  Example :
1244  Returns :
1245  Args    :
1246
1247 =cut
1248
1249 sub _print_GenBank_FTHelper {
1250     my ( $self, $fth ) = @_;
1251
1252     if ( not ref $fth or not $fth->isa('Bio::SeqIO::FTHelper') ) {
1253         $fth->warn(
1254             "$fth is not a FTHelper class. Attempting to print but there could be issues"
1255         );
1256     }
1257
1258     my $spacer = ( length $fth->key >= 15 ) ? ' ' : '';
1259     $self->_write_line_GenBank_regex(
1260         sprintf( "     %-16s%s", $fth->key, $spacer ),
1261                  " " x 21, $fth->loc, "\,\|\$", 80 );
1262
1263     foreach my $tag ( keys %{ $fth->field } ) {
1264         # Account for hash structure in Annotation::DBLink, not the expected array
1265         if ( $tag eq 'db_xref' and grep /HASH/, @{ $fth->field->{$tag} }) {
1266             for my $ref ( @{ $fth->field->{$tag} } ) {
1267                 my $db = $ref->{'database'};
1268                 my $id = $ref->{'primary_id'};
1269                 $self->_write_line_GenBank_regex
1270                     ( " " x 21, " " x 21,
1271                       "/$tag=\"$db:$id\"", "\.\|\$", 80 );
1272             }
1273         }
1274         # The usual case, where all values are found in an array
1275         else {
1276             foreach my $value ( @{ $fth->field->{$tag} } ) {
1277                 $value =~ s/\"/\"\"/g;
1278                 if ( $value eq "_no_value" ) {
1279                     $self->_write_line_GenBank_regex
1280                         ( " " x 21, " " x 21,
1281                           "/$tag", "\.\|\$", 80 );
1282                 }
1283
1284                # There are almost 3x more quoted qualifier values and they
1285                # are more common too so we take quoted ones first.
1286                # Long qualifiers, that will be line wrapped, are always quoted
1287                 elsif (   not $FTQUAL_NO_QUOTE{$tag}
1288                        or length("/$tag=$value") >= $FTQUAL_LINE_LENGTH
1289                     ) {
1290                     my ($pat) = ( $value =~ /\s/ ? '\s|$' : '.|$' );
1291                     $self->_write_line_GenBank_regex
1292                         ( " " x 21, " " x 21,
1293                           "/$tag=\"$value\"", $pat, 80 );
1294                 }
1295                 else {
1296                     $self->_write_line_GenBank_regex
1297                         ( " " x 21, " " x 21,
1298                           "/$tag=$value", "\.\|\$", 80 );
1299                 }
1300             }
1301         }
1302     }
1303 }
1304
1305 =head2 _read_GenBank_References
1306
1307  Title   : _read_GenBank_References
1308  Usage   :
1309  Function: Reads references from GenBank format. Internal function really
1310  Returns :
1311  Args    :
1312
1313 =cut
1314
1315 sub _read_GenBank_References {
1316     my ($self, $buffer) = @_;
1317     my (@refs);
1318     my $ref;
1319
1320     # assumme things are starting with RN
1321     if ( $$buffer !~ /^REFERENCE/ ) {
1322         warn("Not parsing line '$$buffer' which maybe important");
1323     }
1324
1325     my $line = $$buffer;
1326
1327     my (@title,@loc,@authors,@consort,@com,@medline,@pubmed);
1328
1329   REFLOOP:
1330     while( defined($line) or defined($line = $self->_readline) ) {
1331         if ($line =~ /^\s{2}AUTHORS\s+(.*)/o) {
1332             push @authors, $1;
1333             while ( defined($line = $self->_readline) ) {
1334                 if ($line =~ /^\s{9,}(.*)/o) {
1335                     push @authors, $1;
1336                     next;
1337                 }
1338                 last;
1339             }
1340             $ref->authors(join(' ', @authors));
1341         }
1342
1343         if ($line =~ /^\s{2}CONSRTM\s+(.*)/o) {
1344             push @consort, $1;
1345             while ( defined($line = $self->_readline) ) {
1346                 if ($line =~ /^\s{9,}(.*)/o) {
1347                     push @consort, $1;
1348                     next;
1349                 }
1350                 last;
1351             }
1352             $ref->consortium(join(' ', @consort));
1353         }
1354
1355         if ($line =~ /^\s{2}TITLE\s+(.*)/o) {
1356             push @title, $1;
1357             while ( defined($line = $self->_readline) ) {
1358                 if ($line =~ /^\s{9,}(.*)/o) {
1359                     push @title, $1;
1360                     next;
1361                 }
1362                 last;
1363             }
1364             $ref->title(join(' ', @title));
1365         }
1366
1367         if ($line =~ /^\s{2}JOURNAL\s+(.*)/o) {
1368             push @loc, $1;
1369             while ( defined($line = $self->_readline) ) {
1370                 # we only match when there are at least 4 spaces
1371                 # there is probably a better way to match this
1372                 # as it assumes that the describing tag is short enough
1373                 if ($line =~ /^\s{9,}(.*)/o) {
1374                     push @loc, $1;
1375                     next;
1376                 }
1377                 last;
1378             }
1379             $ref->location(join(' ', @loc));
1380             redo REFLOOP;
1381         }
1382
1383         if ($line =~ /^\s{2}REMARK\s+(.*)/o) {
1384             push @com, $1;
1385             while ( defined($line = $self->_readline) ) {
1386                 if ($line =~ /^\s{9,}(.*)/o) {
1387                     push @com, $1;
1388                     next;
1389                 }
1390                 last;
1391             }
1392             $ref->comment(join(' ', @com));
1393             redo REFLOOP;
1394         }
1395
1396         if ( $line =~ /^\s{2}MEDLINE\s+(.*)/ ) {
1397             push @medline, $1;
1398             while ( defined($line = $self->_readline) ) {
1399                 if ($line =~ /^\s{9,}(.*)/) {
1400                     push @medline, $1;
1401                     next;
1402                 }
1403                 last;
1404             }
1405             $ref->medline(join(' ', @medline));
1406             redo REFLOOP;
1407         }
1408
1409         if ( $line =~ /^\s{3}PUBMED\s+(.*)/ ) {
1410             push @pubmed, $1;
1411             while ( defined($line = $self->_readline) ) {
1412                 if ($line =~ /^\s{9,}(.*)/) {
1413                     push @pubmed, $1;
1414                     next;
1415                 }
1416                 last;
1417             }
1418             $ref->pubmed(join(' ', @pubmed));
1419             redo REFLOOP;
1420         }
1421
1422         if ( $line =~ /^REFERENCE/o ) {
1423             # store current reference
1424             $self->_add_ref_to_array(\@refs,$ref) if defined $ref;
1425
1426             # reset
1427             @authors = ();
1428             @title   = ();
1429             @loc     = ();
1430             @com     = ();
1431             @pubmed  = ();
1432             @medline = ();
1433
1434             # create the new reference object
1435             $ref = Bio::Annotation::Reference->new(-tagname => 'reference');
1436
1437             # check whether start and end base is given
1438             if ($line =~ /^REFERENCE\s+\d+\s+\([a-z]+ (\d+) to (\d+)\)/){
1439                 $ref->start($1);
1440                 $ref->end($2);
1441             }
1442             elsif ($line =~ /^REFERENCE\s+\d+\s+\((.*)\)/) {
1443                 $ref->gb_reference($1);
1444             }
1445         }
1446
1447         last if ($line =~ /^(FEATURES)|(COMMENT)/o);
1448
1449         $line = undef; # Empty $line to trigger read of next line
1450     }
1451
1452     # store last reference
1453     $self->_add_ref_to_array(\@refs, $ref) if defined $ref;
1454
1455     $$buffer = $line;
1456
1457     #print "\nnumber of references found: ", $#refs+1,"\n";
1458
1459     return @refs;
1460 }
1461
1462 =head2 _add_ref_to_array
1463
1464 Title: _add_ref_to_array
1465 Usage:
1466 Function: Adds a Reference object to an array of Reference objects, takes
1467           care of possible cleanups to be done (currently, only author and title
1468           will be chopped of trailing semicolons).
1469 Args:     A reference to an array of Reference objects and
1470           the Reference object to be added
1471 Returns: nothing
1472
1473 =cut
1474
1475 sub _add_ref_to_array {
1476     my ($self, $refs, $ref) = @_;
1477
1478     # first, polish author and title by removing possible trailing semicolons
1479     my $au    = $ref->authors;
1480     my $title = $ref->title;
1481     $au    =~ s/;\s*$//g if $au;
1482     $title =~ s/;\s*$//g if $title;
1483     $ref->authors($au);
1484     $ref->title($title);
1485     # the rest should be clean already, so go ahead and add it
1486     push @{$refs}, $ref;
1487 }
1488
1489 =head2 _read_GenBank_Species
1490
1491  Title   : _read_GenBank_Species
1492  Usage   :
1493  Function: Reads the GenBank Organism species and classification
1494            lines. Able to deal with unconvential Organism naming
1495            formats, and varietas in plants
1496  Example : ORGANISM  unknown marine gamma proteobacterium NOR5
1497            $genus = undef
1498            $species = unknown marine gamma proteobacterium NOR5
1499
1500            ORGANISM  Drosophila sp. 'white tip scutellum'
1501            $genus = Drosophila
1502            $species = sp. 'white tip scutellum'
1503            (yes, this really is a species and that is its name)
1504            $subspecies = undef
1505
1506            ORGANISM  Ajellomyces capsulatus var. farciminosus
1507            $genus = Ajellomyces
1508            $species = capsulatus
1509            $subspecies = var. farciminosus
1510
1511            ORGANISM  Hepatitis delta virus
1512            $genus = undef (though this virus has a genus in its lineage, we
1513                            cannot know that without a database lookup)
1514            $species = Hepatitis delta virus
1515
1516  Returns : A Bio::Species object
1517  Args    : A reference to the current line buffer
1518
1519 =cut
1520
1521 sub _read_GenBank_Species {
1522     my ($self, $buffer) = @_;
1523
1524     my @unkn_names = ('other', 'unknown organism', 'not specified', 'not shown',
1525                       'Unspecified', 'Unknown', 'None', 'unclassified',
1526                       'unidentified organism', 'not supplied');
1527     # dictionary of synonyms for taxid 32644
1528     my @unkn_genus = ('unknown', 'unclassified', 'uncultured', 'unidentified');
1529     # all above can be part of valid species name
1530
1531     my $line = $$buffer;
1532
1533     my( $sub_species, $species, $genus, $sci_name, $common,
1534         $class_lines, $source_flag, $abbr_name, $organelle, $sl );
1535     my %source = map { $_ => 1 } qw(SOURCE ORGANISM CLASSIFICATION);
1536
1537     # upon first entering the loop, we must not read a new line -- the SOURCE
1538     # line is already in the buffer (HL 05/10/2000)
1539     my ($ann, $tag, $data);
1540     while (defined($line) or defined($line = $self->_readline)) {
1541         # de-HTMLify (links that may be encountered here don't contain
1542         # escaped '>', so a simple-minded approach suffices)
1543         $line =~ s{<[^>]+>}{}g;
1544         if ($line =~ m{^(?:\s{0,2})(\w+)\s+(.+)?$}ox) {
1545             ($tag, $data) = ($1, $2 || '');
1546             last if ($tag and not exists $source{$tag});
1547         }
1548         else {
1549             return unless $tag;
1550             ($data = $line) =~ s{^\s+}{};
1551             chomp $data;
1552             $tag = 'CLASSIFICATION' if (    $tag ne 'CLASSIFICATION'
1553                                         and $tag eq 'ORGANISM'
1554                                         # Don't match "str." or "var." (NC_021815)
1555                                         and  $line =~ m{(?<!\bstr|\bvar)[;\.]+});
1556         }
1557         (exists $ann->{$tag}) ? ($ann->{$tag} .= ' '.$data) : ($ann->{$tag} .= $data);
1558         $line = undef;
1559     }
1560
1561     ($sl, $class_lines, $sci_name) = ($ann->{SOURCE}, $ann->{CLASSIFICATION}, $ann->{ORGANISM});
1562
1563     $$buffer = $line;
1564
1565     $sci_name or return;
1566
1567     # parse out organelle, common name, abbreviated name if present;
1568     # this should catch everything, but falls back to
1569     # entire SOURCE line just in case
1570     if ($sl =~ m{^(mitochondrion|chloroplast|plastid)?
1571                   \s*(.*?)
1572                   \s*(?: \( (.*?) \) )?\.?
1573                   $
1574                  }xms
1575         ) {
1576         ($organelle, $abbr_name, $common) = ($1, $2, $3); # optional
1577     }
1578     else {
1579         $abbr_name = $sl; # nothing caught; this is a backup!
1580     }
1581
1582     # Convert data in classification lines into classification array.
1583     # only split on ';' or '.' so that classification that is 2 or more words will
1584     # still get matched, use map() to remove trailing/leading/intervening spaces
1585     my @class = map { $_ =~ s/^\s+//;
1586                       $_ =~ s/\s+$//;
1587                       $_ =~ s/\s{2,}/ /g;
1588                       $_; }
1589                 split /(?<!subgen)[;\.]+/, $class_lines;
1590
1591     # do we have a genus?
1592     my $possible_genus =  quotemeta($class[-1])
1593                        . ($class[-2] ? "|" . quotemeta($class[-2]) : '');
1594     if ($sci_name =~ /^($possible_genus)/) {
1595         $genus = $1;
1596         ($species) = $sci_name =~ /^$genus\s+(.+)/;
1597     }
1598     else {
1599         $species = $sci_name;
1600     }
1601
1602     # is this organism of rank species or is it lower?
1603     # (we don't catch everything lower than species, but it doesn't matter -
1604     # this is just so we abide by previous behaviour whilst not calling a
1605     # species a subspecies)
1606     if ($species and $species =~ /(.+)\s+((?:subsp\.|var\.).+)/) {
1607         ($species, $sub_species) = ($1, $2);
1608     }
1609
1610     # Don't make a species object if it's empty or "Unknown" or "None"
1611     # return unless $genus and  $genus !~ /^(Unknown|None)$/oi;
1612     # Don't make a species object if it belongs to taxid 32644
1613 #    my $unkn = grep { $_ =~ /^\Q$sl\E$/; } @unkn_names;
1614     my $unkn = grep { $_ eq $sl } @unkn_names;
1615     return unless (defined $species or defined $genus) and $unkn == 0;
1616
1617     # Bio::Species array needs array in Species -> Kingdom direction
1618     push @class, $sci_name;
1619     @class = reverse @class;
1620
1621     my $make = Bio::Species->new;
1622     $make->scientific_name($sci_name);
1623     $make->classification(@class)          if @class > 0;
1624     $make->common_name( $common )          if $common;
1625     $make->name('abbreviated', $abbr_name) if $abbr_name;
1626     $make->organelle($organelle)           if $organelle;
1627     #$make->sub_species( $sub_species )     if $sub_species;
1628     return $make;
1629 }
1630
1631 =head2 _read_FTHelper_GenBank
1632
1633  Title   : _read_FTHelper_GenBank
1634  Usage   : _read_FTHelper_GenBank($buffer)
1635  Function: reads the next FT key line
1636  Example :
1637  Returns : Bio::SeqIO::FTHelper object
1638  Args    : filehandle and reference to a scalar
1639
1640 =cut
1641
1642 sub _read_FTHelper_GenBank {
1643     my ($self, $buffer) = @_;
1644
1645     my ($key, # The key of the feature
1646         $loc  # The location line from the feature
1647     );
1648     my @qual = (); # An array of lines making up the qualifiers
1649
1650     if ($$buffer =~ /^\s{5}(\S+)\s+(.+?)\s*$/o) {
1651         $key = $1;
1652         $loc = $2;
1653         # Read all the lines up to the next feature
1654         while ( defined(my $line = $self->_readline) ) {
1655             if ($line =~ /^(\s+)(.+?)\s*$/o) {
1656                 # Lines inside features are preceded by 21 spaces
1657                 # A new feature is preceded by 5 spaces
1658                 if (length($1) > 6) {
1659                     # Add to qualifiers if we're in the qualifiers, or if it's
1660                     # the first qualifier
1661                     if (@qual or (index($2,'/') == 0)) {
1662                         push @qual, $2;
1663                     }
1664                     # We're still in the location line, so append to location
1665                     else {
1666                         $loc .= $2;
1667                     }
1668                 }
1669                 else {
1670                     # We've reached the start of the next feature
1671                     # Put the first line of the next feature into the buffer
1672                     $$buffer = $line;
1673                     last;
1674                 }
1675             }
1676             else {
1677                 # We're at the end of the feature table
1678                 # Put the first line of the next feature into the buffer
1679                 $$buffer = $line;
1680                 last;
1681             }
1682         }
1683     }
1684     else {
1685         # No feature key
1686         $self->debug("no feature key!\n");
1687         # change suggested by JDiggans to avoid infinite loop-
1688         # see bugreport 1062.
1689         # reset buffer to prevent infinite loop
1690         $$buffer = $self->_readline;
1691         return;
1692     }
1693
1694     # Make the new FTHelper object
1695     my $out = Bio::SeqIO::FTHelper->new;
1696     $out->verbose($self->verbose);
1697     $out->key($key);
1698     $out->loc($loc);
1699
1700     # Now parse and add any qualifiers.  (@qual is kept
1701     # intact to provide informative error messages.)
1702   QUAL:
1703     for (my $i = 0; $i < @qual; $i++) {
1704         my $data = $qual[$i];
1705         my ( $qualifier, $value ) = ($data =~ m{^/([^=]+)(?:=(.+))?})
1706             or $self->warn(  "cannot see new qualifier in feature $key: "
1707                            . $qual[$i]);
1708         $qualifier = '' unless( defined $qualifier );
1709
1710         if (defined $value) {
1711             # Do we have a quoted value?
1712             if (substr($value, 0, 1) eq '"') {
1713                 # Keep adding to value until we find the trailing quote
1714                 # and the quotes are balanced
1715                 while ($value !~ /\"$/ or $value =~ tr/"/"/ % 2) {
1716                     if ($i >= $#qual) {
1717                         $self->warn(  "Unbalanced quote in:\n"
1718                                     . join("\n", @qual)
1719                                     . "No further qualifiers will "
1720                                     . "be added for this feature");
1721                         last QUAL;
1722                     }
1723                     # modifying a for-loop variable inside of the loop
1724                     # is not the best programming style ...
1725                     $i++;
1726                     my $next = $qual[$i];
1727
1728                     # add to value with a space unless the value appears
1729                     # to be a sequence (translation for example)
1730                     # if (($value.$next) =~ /[^A-Za-z\"\-]/o) {
1731                     # changed to explicitly look for translation tag - cjf 06/8/29
1732                     if ($qualifier !~ /^translation$/i ) {
1733                         $value .= " ";
1734                     }
1735                     $value .= $next;
1736                 }
1737                 # Trim leading and trailing quotes
1738                 $value =~ s/^"|"$//g;
1739                 # Undouble internal quotes
1740                 $value =~ s/""/\"/g;
1741             }
1742             elsif ( $value =~ /^\(/ ) { # values quoted by ()s
1743                 # Keep adding to value until we find the trailing bracket
1744                 # and the ()s are balanced
1745                 my $left  = ($value =~ tr/\(/\(/); # count left parens
1746                 my $right = ($value =~ tr/\)/\)/); # count right parens
1747
1748                 while( $left != $right ) { # was "$value !~ /\)$/ or $left != $right"
1749                     if ( $i >= $#qual) {
1750                         $self->warn(  "Unbalanced parens in:\n"
1751                                     . join("\n", @qual)
1752                                     . "\nNo further qualifiers will "
1753                                     . "be added for this feature");
1754                         last QUAL;
1755                     }
1756                     $i++;
1757                     my $next = $qual[$i];
1758                     $value .=  $next;
1759                     $left  += ($next =~ tr/\(/\(/);
1760                     $right += ($next =~ tr/\)/\)/);
1761                 }
1762             }
1763         }
1764         else {
1765             $value = '_no_value';
1766         }
1767         # Store the qualifier
1768         $out->field->{$qualifier} ||= [];
1769         push @{$out->field->{$qualifier}}, $value;
1770     }
1771     return $out;
1772 }
1773
1774 =head2 _write_line_GenBank
1775
1776  Title   : _write_line_GenBank
1777  Usage   :
1778  Function: internal function
1779  Example :
1780  Returns :
1781  Args    :
1782
1783 =cut
1784
1785 sub _write_line_GenBank {
1786     my ($self, $pre1, $pre2, $line, $length) = @_;
1787
1788     $length or $self->throw("Miscalled write_line_GenBank without length. Programming error!");
1789     my $subl  = $length - length $pre2;
1790     my $linel = length $line;
1791     my $i;
1792
1793     my $subr = substr($line,0,$length - length $pre1);
1794
1795     $self->_print("$pre1$subr\n");
1796     for($i = ($length - length $pre1); $i < $linel; $i += $subl) {
1797         $subr = substr($line, $i, $subl);
1798         $self->_print("$pre2$subr\n");
1799     }
1800 }
1801
1802 =head2 _write_line_GenBank_regex
1803
1804  Title   : _write_line_GenBank_regex
1805  Usage   :
1806  Function: internal function for writing lines of specified
1807            length, with different first and the next line
1808            left hand headers and split at specific points in the
1809            text
1810  Example :
1811  Returns : nothing
1812  Args    : file handle,
1813            first header,
1814            second header,
1815            text-line,
1816            regex for line breaks,
1817            total line length
1818
1819 =cut
1820
1821 sub _write_line_GenBank_regex {
1822     my ($self, $pre1, $pre2, $line, $regex, $length) = @_;
1823
1824     #print STDOUT "Going to print with $line!\n";
1825
1826     $length or $self->throw("Miscalled write_line_GenBank without length. Programming error!");
1827
1828     my $subl  = $length - (length $pre1) - 2;
1829     my @lines = ();
1830
1831   CHUNK:
1832     while ($line) {
1833         foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) {
1834             if ($line =~ m/^(.{0,$subl})($pat)(.*)/ ) {
1835                 my $l = $1 . $2;
1836                 $line = substr($line, length $l);
1837                 # be strict about not padding spaces according to
1838                 # genbank format
1839                 $l =~ s/\s+$//;
1840                 next CHUNK if ($l eq '');
1841                 push @lines, $l;
1842                 next CHUNK;
1843             }
1844         }
1845         # if we get here none of the patterns matched $subl or less chars
1846         $self->warn(  "trouble dissecting \"$line\"\n     into chunks "
1847                     . "of $subl chars or less - this tag won't print right");
1848         # insert a space char to prevent infinite loops
1849         $line = substr($line, 0, $subl) . " " . substr($line, $subl);
1850     }
1851     my $s = shift @lines;
1852     $self->_print("$pre1$s\n") if $s;
1853     foreach my $s ( @lines ) {
1854         $self->_print("$pre2$s\n");
1855     }
1856 }
1857
1858 =head2 _post_sort
1859
1860  Title   : _post_sort
1861  Usage   : $obj->_post_sort($newval)
1862  Function:
1863  Returns : value of _post_sort
1864  Args    : newvalue (optional)
1865
1866 =cut
1867
1868 sub _post_sort {
1869     my ($obj,$value) = @_;
1870     if ( defined $value) {
1871         $obj->{'_post_sort'} = $value;
1872     }
1873     return $obj->{'_post_sort'};
1874 }
1875
1876 =head2 _show_dna
1877
1878  Title   : _show_dna
1879  Usage   : $obj->_show_dna($newval)
1880  Function:
1881  Returns : value of _show_dna
1882  Args    : newvalue (optional)
1883
1884 =cut
1885
1886 sub _show_dna {
1887     my ($obj,$value) = @_;
1888     if ( defined $value) {
1889         $obj->{'_show_dna'} = $value;
1890     }
1891     return $obj->{'_show_dna'};
1892 }
1893
1894 =head2 _id_generation_func
1895
1896  Title   : _id_generation_func
1897  Usage   : $obj->_id_generation_func($newval)
1898  Function:
1899  Returns : value of _id_generation_func
1900  Args    : newvalue (optional)
1901
1902 =cut
1903
1904 sub _id_generation_func {
1905     my ($obj,$value) = @_;
1906     if ( defined $value ) {
1907         $obj->{'_id_generation_func'} = $value;
1908     }
1909     return $obj->{'_id_generation_func'};
1910 }
1911
1912 =head2 _ac_generation_func
1913
1914  Title   : _ac_generation_func
1915  Usage   : $obj->_ac_generation_func($newval)
1916  Function:
1917  Returns : value of _ac_generation_func
1918  Args    : newvalue (optional)
1919
1920 =cut
1921
1922 sub _ac_generation_func {
1923     my ($obj,$value) = @_;
1924     if ( defined $value ) {
1925         $obj->{'_ac_generation_func'} = $value;
1926     }
1927     return $obj->{'_ac_generation_func'};
1928 }
1929
1930 =head2 _sv_generation_func
1931
1932  Title   : _sv_generation_func
1933  Usage   : $obj->_sv_generation_func($newval)
1934  Function:
1935  Returns : value of _sv_generation_func
1936  Args    : newvalue (optional)
1937
1938 =cut
1939
1940 sub _sv_generation_func {
1941     my ($obj,$value) = @_;
1942     if ( defined $value ) {
1943         $obj->{'_sv_generation_func'} = $value;
1944     }
1945     return $obj->{'_sv_generation_func'};
1946 }
1947
1948 =head2 _kw_generation_func
1949
1950  Title   : _kw_generation_func
1951  Usage   : $obj->_kw_generation_func($newval)
1952  Function:
1953  Returns : value of _kw_generation_func
1954  Args    : newvalue (optional)
1955
1956 =cut
1957
1958 sub _kw_generation_func {
1959     my ($obj,$value) = @_;
1960     if ( defined $value ) {
1961         $obj->{'_kw_generation_func'} = $value;
1962     }
1963     return $obj->{'_kw_generation_func'};
1964 }
1965
1966 1;