scripts/biosql/load_ontology.pl

   1 #!/usr/local/bin/perl
   2 #
   3 # $Id$
   4 #
   5 # Cared for by Hilmar Lapp <hlapp at gmx.net>
   6 #
   7 # Copyright Hilmar Lapp
   8 #
   9 # You may distribute this module under the same terms as perl itself
  10
  11 #
  12 # (c) Hilmar Lapp, hlapp at gmx.net, 2003.
  13 # (c) GNF, Genomics Institute of the Novartis Research Foundation, 2003.
  14 #
  15 # You may distribute this module under the same terms as perl itself.
  16 # Refer to the Perl Artistic License (see the license accompanying this
  17 # software package, or see http://www.perl.com/language/misc/Artistic.html)
  18 # for the terms under which you may use, modify, and redistribute this module.
  19 #
  20 # THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
  21 # WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
  22 # MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
  23 #
  24
  25 =head1 NAME
  26
  27 load_ontology.pl
  28
  29 =head1 SYNOPSIS
  30
  31   # for loading the Gene Ontology:
  32   load_ontology.pl --host somewhere.edu --dbname biosql \
  33                    --namespace "Gene Ontology" --format goflat \
  34                    --fmtargs "-defs_file,GO.defs" \
  35                    function.ontology process.ontology component.ontology
  36   # in practice, you will want to use options for dealing with
  37   # obsolete terms; read the documentation of respective arguments
  38
  39   # for loading the SOFA part of the sequence ontology (currently
  40   # there is no term definition file for SOFA):
  41   load_ontology.pl --host somewhere.edu --dbname biosql \
  42                    --namespace "SOFA" --format soflat sofa.ontology
  43
  44 =head1 DESCRIPTION
  45
  46 This script loads a BioSQL database with an ontology. There are a number of
  47 options to do with where the BioSQL database is (ie, hostname,
  48 user for database, password, database name) followed by the database
  49 name you wish to load this into and then any number of files that make
  50 up the ontology. The files are assumed formatted identically with the
  51 format given in the --format flag.
  52
  53 There are more options than the ones shown above, see below. In
  54 particular, there is a variety of options to specify how you want to
  55 handle obsolete terms. If you try to load the Gene Ontology, you will
  56 want to check out those options. Also, you may want to consult a
  57 thread from the bioperl mailing list in this regard, see
  58 http://bioperl.org/pipermail/bioperl-l/2004-February/014846.html .
  59
  60 Also, consider using --safe always unless you do want the script to
  61 terminate at the first issue it encounters with loading.
  62
  63 =head1 ARGUMENTS
  64
  65 The arguments after the named options constitute the filelist. If
  66 there are no such files, input is read from stdin. Mandatory options
  67 are marked by (M). Default values for each parameter are shown in
  68 square brackets.  (Note that -bulk is no longer available):
  69
  70 =over 2
  71
  72 =item --host $URL
  73
  74 the host name or IP address incl. port [localhost]
  75
  76 =item --dbname $db_name
  77
  78 the name of the schema [biosql]
  79
  80 =item --dbuser $username
  81
  82 database username [root]
  83
  84 =item --dbpass $password
  85
  86 password [undef]
  87
  88 =item --driver $driver
  89
  90 the DBI driver name for the RDBMS e.g., mysql, Pg, or Oracle
  91
  92 =item --dsn dsn
  93
  94 Instead of providing the database connection and driver parameters
  95 individually, you may also specify the DBI-formatted DSN that is to be
  96 used verbatim for connecting to the database. Note that if you do give
  97 individual parameters in addition they will not supplant what is in
  98 the DSN string. Hence, the only database-related parameter that may be
  99 useful to specify in addition is --driver, as that is used also for
 100 selecting the driver-specific adaptors that generate SQL
 101 code. Usually, the driver will be parsed out from the DSN though and
 102 therefore will be set as well by setting the DSN.
 103
 104 Consult the POD of your DBI driver for how to properly format the DSN
 105 for it. A typical example is dbi:Pg:dbname=biosql;host=foo.bar.edu
 106 (for PostgreSQL). Note that the DSN will be specific to the driver
 107 being used.
 108
 109 =item --schema schemaname
 110
 111 The schema under which the BioSQL tables reside in the database. For
 112 Oracle and MySQL this is synonymous with the user, and won't have an
 113 effect. PostgreSQL since v7.4 supports schemas as the namespace for
 114 collections of tables within a database.
 115
 116 =item --initrc paramfile
 117
 118 Instead of, or in addition to, specifying every individual database
 119 connection parameter you may put them into a file that when read by
 120 perl evaluates to an array or hash reference. This option specifies
 121 the file to read; the special value DEFAULT (or no value) will use a
 122 file ./.bioperldb or $HOME/.bioperldb, whichever is found first in
 123 that order.
 124
 125 Constructing a file that evaluates to a hash reference is very
 126 simple. The first non-space character needs to be an open curly brace,
 127 and the last non-space character a closing curly brace. In between the
 128 curly braces, write option name enclosed by single quotes, followed by
 129 => (equal to or greater than), followed by the value in single
 130 quotes. Separate each such option/value pair by comma. Here is an
 131 example:
 132
 133 {
 134     '-dbname' => 'mybiosql', '-host' => 'foo.bar.edu', '-user' => 'cleo'
 135 }
 136
 137 Line breaks and white space don't matter (except if in the value
 138 itself). Also note that options only have a single dash as prefix, and
 139 they need to be those accepted by Bio::DB::BioDB->new()
 140 (L<Bio::DB::BioDB>) or Bio::DB::SimpleDBContext->new()
 141 (L<Bio::DB::SimpleDBContext>). Those sometimes differ slightly from the
 142 option names used by this script, e.g., --dbuser corresponds to -user.
 143
 144 Note also that using the above example, you can use it for --initrc
 145 and still connect as user caesar by also supplying --dbuser caesar on
 146 the command line. I.e., command line arguments override any parameters
 147 also found in the initrc file.
 148
 149 Finally, note that if using this option with default file name and the
 150 default file is not found at any of the default locations, the option
 151 will be ignored; it is not considered an error.
 152
 153 =item --namespace $namesp
 154
 155 The namespace (name of the ontology) under which the terms and
 156 relationships in the input files are to be created in the database
 157 [bioperl ontology]. Note that the namespace will be left untouched if the
 158 object(s) to be submitted has it set already.
 159
 160 Note that the DAG-edit flat file parser from more recent (1.2.2 and
 161 later) bioperl releases can auto-discover the ontology name.
 162
 163 =item --lookup
 164
 165 Flag to look-up by unique key first, converting the insert into an
 166 update if the object is found. This pertains to terms only, as there
 167 is nothing to update about relationships if they are found by unique
 168 key (the unique key comprises of all columns).
 169
 170 =item --noupdate
 171
 172 Don't update if object is found (with --lookup). Again, this only
 173 pertains to terms.
 174
 175 =item --remove
 176
 177 Flag to remove terms before actually adding them (this necessitates a
 178 prior lookup). Note that this is not relevant for relationships (if
 179 one is found by lookup, removing and re-adding has essentially the
 180 same result as leaving it untouched).
 181
 182 =item --noobsolete
 183
 184 Flag to exclude from upload terms marked as obsolete. Note that with
 185 this flag, any update, removal, or object merge that you specify using
 186 other parameters will not apply to obsolete terms. I.e., if you have
 187 terms existing in your database that are marked as obsolete in the
 188 input file, using this flag will prevent the existing terms from being
 189 updated to reflect the obsolete status. Therefore, this flag is best
 190 used when first loading an ontology. You may want to consider using
 191 --updobsolete instead.
 192
 193 Note that relationships found in the input file(s) that reference an
 194 obsolete term will be omitted from loading with this flag in effect.
 195
 196 =item --updobsolete
 197
 198 Flag to exclude from upload terms marked as obsolete unless they are
 199 already present in the database. If they are, they will be updated,
 200 and the --mergeobjs procedure will apply. If they are not, they will
 201 be treated as if --noobsolete had been specified. Note that
 202 relationships will not be updated for obsolete terms.
 203
 204 In contrast to --noobsolete, using this flag will increase the
 205 database operations mildly (because of the look-ups necessary to
 206 determine whether obsolete terms are present, and the subsequent
 207 update for those that are), but it will capture change of status for
 208 existing terms. At the same time, you won't load obsolete terms from a
 209 new ontology that you haven't loaded before.
 210
 211 =item --delobsolete
 212
 213 Delete terms marked as obsolete from the database. Note that --remove
 214 together with --noobsolete will have the same effect. Note also that
 215 specifying this flag will not affect those terms that are only in your
 216 database but not in the input file, regardless of whether they are
 217 marked as obsolete or not.
 218
 219 Be aware that even though deleting obsolete terms may sound like a
 220 very sane thing to do, you may have annotated features or bioentries using
 221 those terms. Deleting the obsolete terms will then remove those
 222 annotations (qualifier/value pairs) as well.
 223
 224 =item --safe
 225
 226 flag to continue despite errors when loading (the entire object
 227 transaction will still be rolled back)
 228
 229 =item --testonly
 230
 231 don't commit anything, rollback at the end
 232
 233 =item --format
 234
 235 This may theoretically be any OntologyIO format understood by
 236 bioperl. All input files must have the same format.
 237
 238 Examples:
 239     # this is the default
 240     --format goflat
 241     # Simple ASCII hierarchy (e.g., eVoc)
 242     --format simplehierarchy
 243
 244 Note that some formats may come with event-type parsers, specifically
 245 with XML SAX event parsers. While those aren't truly
 246 OntologyIO-compliant parsers (they can't be because OntologyIO defines
 247 a stream of ontologies as the API), this script supports them
 248 nevertheless. For instance, at the time of this writing there is an
 249 InterPro XML SAX event handler (aliased to --format interprosax) which
 250 will persist terms to the database as they are encountered in the
 251 event stream, which greatly reduces the amount of memory
 252 needed. Credit for conceiving this idea and writing the SAX handler
 253 goes to Juguang Xiao, juguang at tll.org.sg.
 254
 255 =item --fmtargs
 256
 257 Use this argument to specify initialization parameters for the parser
 258 for the input format. The argument value is expected to be a string
 259 with parameter names and values delimited by comma.
 260
 261 Usually you will want to protect the argument list from interpretation
 262 by the shell, so surround it with double or single quotes.
 263
 264 If a parameter value contains a comma, escape it with a backslash
 265 (which means you also must protect the whole argument from the shell
 266 in order to preserve the backslash)
 267
 268 Examples:
 269
 270     # turn parser exceptions into warnings (don't try this at home)
 271     --fmtargs "-verbose,-1"
 272     # verbose parser with an additional path argument
 273     --fmtargs "-verbose,1,-indexpath,/home/luke/warp"
 274     # escape commas in values
 275     --fmtargs "-ontology_name,Big Blue\, v2,-indent_string,\,"
 276
 277 =item --mergeobjs
 278
 279 This is a string or a file defining a closure. If provided, the
 280 closure is called if a look-up for the unique key of the new object
 281 was successful (hence, it will never be called without supplying
 282 --lookup, but not --noupdate, at the same time).
 283
 284 The closure will be passed three (3) arguments: the object found by
 285 lookup, the new object to be submitted, and the Bio::DB::DBAdaptorI
 286 (see L<Bio::DB::DBAdaptorI>) implementing object for the desired
 287 database. If the closure returns a value, it must be the object to be
 288 inserted or updated in the database (if $obj->primary_key returns a
 289 value, the object will be updated). If it returns undef, the script
 290 will skip to the next object in the input stream.
 291
 292 The purpose of the closure can be manifold. It was originally
 293 conceived as a means to customarily merge attributes or associated
 294 objects of the new object to the existing (found) one in order to
 295 avoid duplications but still capture additional information (e.g.,
 296 annotation). However, there is a multitude of other operations it can
 297 be used for, like physically deleting or altering certain associated
 298 information from the database (the found object and all its associated
 299 objects will implement Bio::DB::PersistentObjectI, see
 300 L<Bio::DB::PersistentObjectI>). Since the third argument is the
 301 persistent object and adaptor factory for the database, there is
 302 literally no limit as to the database operations the closure could
 303 possibly do.
 304
 305 =item --computetc "[identity];[base predicate];[subclasses];[ontology]"
 306
 307 Recompute the transitive closure table for the ontology after it has
 308 been loaded. A possibly existing transitive closure will be deleted
 309 first.
 310
 311 The argument specifies three terms the algorithm relies on, and their
 312 ontology, each separated by semicolon. Each of the three terms may be
 313 omitted, but the semicolons need to be present. Alternatively, you may
 314 omit the argument altogether in which case it will assume a sensible
 315 default value ("identity;related-to;implies;Predicate Ontology"). See
 316 below for what this means.
 317
 318 Every predicate in the ontology for which the transitive closure is to
 319 be computed is expected to have a relationship to itself. This
 320 relationship is commonly referred to as the identity relationship. The
 321 first term specifies the predicate name for this relationship, e.g.,
 322 'identity'. The second and third term pertain to ontologies that have
 323 valid paths with mixed predicates. If this occurs, the second term
 324 denotes the base predicate for any combination of two different
 325 predicates, and the third predicate denotes the predicate for the
 326 relationship between any predicate and the base predicate, where the
 327 base predicate is the object and the ontology's predicate is the
 328 subject. For instance, one might want to provide 'related-to' as the
 329 base predicate, and 'implies' as the predicate of the subclassing
 330 relationship, which would give rise to triples like
 331 (is-a,implies,related-to), (part-of,implies,related-to), etc. The
 332 string following the last semicolon denotes the name of the ontology
 333 under which to store those triples as well as the identity, base
 334 predicate, and subclasses predicate terms.
 335
 336 If any of the terms are omitted (provided as empty strings), the
 337 corresponding relationships will not be generated. Note that the
 338 computed transitive closure may then be incomplete.
 339
 340 =item more args
 341
 342 The remaining arguments will be treated as files to parse and load. If
 343 there are no additional arguments, input is expected to come from
 344 standard input.
 345
 346 =back
 347
 348 =head1 Authors
 349
 350 Hilmar Lapp E<lt>hlapp at gmx.netE<gt>
 351
 352 =cut
 353
 354
 355 use Getopt::Long;
 356 use Symbol;
 357 use Carp (qw:cluck confess croak:);
 358 use Bio::DB::BioDB;
 359 use Bio::OntologyIO;
 360 use Bio::Root::RootI;
 361
 362 ####################################################################
 363 # Defaults for options changeable through command line
 364 ####################################################################
 365 my $host; # should make the driver to default to localhost
 366 my $dbname;
 367 my $dbuser;
 368 my $driver;
 369 my $dbpass;
 370 my $schema;
 371 my $format = 'goflat';
 372 my $fmtargs = '';
 373 my $namespace = "bioperl ontology";
 374 my $initrc;              # use an initialization file for parameters?
 375 my $dsn;                 # DSN to use verbatim for connecting, if any
 376 my $mergefunc;           # if and how to merge old (found) and new objects
 377 # flags
 378 my $remove_flag = 0;     # remove object before creating?
 379 my $lookup_flag = 0;     # look up object before creating, update if found?
 380 my $no_update_flag = 0;  # do not update if found on look up?
 381 my $no_obsolete = 0;     # whether to include obsolete terms
 382 my $upd_obsolete = 0;    # whether to include obsolete terms
 383 my $del_obsolete = 0;    # whether to delete obsolete terms
 384 my $compute_tc;          # compute the transitive closure?
 385 my $help = 0;            # WTH?
 386 my $debug = 0;           # try it ...
 387 my $testonly_flag = 0;   # don't commit anything, rollback at the end?
 388 my $safe_flag = 0;       # tolerate exceptions on create?
 389 my $printerror = 0;      # whether to print DBI error messages
 390 my $computetc_default = "identity;related-to;implies;Predicate Ontology";
 391 ####################################################################
 392 # Global defaults or definitions not changeable through commandline
 393 ####################################################################
 394
 395 #
 396 # map of I/O type to the next_XXXX method name
 397 #
 398 # Right now there is only a single IO subsystem we support here, so we
 399 # could do well without. We leave it in here to easily be able to adapt
 400 # in the future should it become necessary.
 401 #
 402 my %nextobj_map = (
 403                    'Bio::OntologyIO' => 'next_ontology',
 404                    );
 405
 406 ####################################################################
 407 # End of defaults
 408 ####################################################################
 409
 410 #
 411 # get options from commandline
 412 #
 413 my $ok = GetOptions( 'host:s'      => \$host,
 414                      'driver:s'    => \$driver,
 415                      'dbname:s'    => \$dbname,
 416                      'dbuser:s'    => \$dbuser,
 417                      'dbpass:s'    => \$dbpass,
 418                      'dsn=s'       => \$dsn,
 419                      'schema=s'    => \$schema,
 420                      'format:s'    => \$format,
 421                      'fmtargs=s'   => \$fmtargs,
 422                      'initrc:s'    => \$initrc,
 423                      'namespace:s' => \$namespace,
 424                      'mergeobjs:s' => \$mergefunc,
 425                      'safe'        => \$safe_flag,
 426                      'remove'      => \$remove_flag,
 427                      'lookup'      => \$lookup_flag,
 428                      'noupdate'    => \$no_update_flag,
 429                      'noobsolete'  => \$no_obsolete,
 430                      'updobsolete' => \$upd_obsolete,
 431                      'delobsolete' => \$del_obsolete,
 432                      'computetc:s' => \$compute_tc,
 433                      'debug'       => \$debug,
 434                      'testonly'    => \$testonly_flag,
 435                      'printerror'  => \$printerror,
 436                      'h|help'      => \$help
 437                      );
 438
 439 if((! $ok) || $help) {
 440     if(! $ok) {
 441         print STDERR "missing or unsupported option(s) on commandline\n";
 442     }
 443     system("perldoc $0");
 444     exit($ok ? 0 : 2);
 445 }
 446
 447 #
 448 # determine the function for re-throwing exceptions depending on $debug and
 449 # $safe_flag
 450 #
 451 our $throw = $safe_flag ?
 452     ($debug > 0 ? \&Carp::cluck : \&Carp::carp) :
 453     ($debug > 0 ? \&Carp::confess : \&Carp::croak);
 454
 455 #
 456 # check $computetc whether it needs to assume the default value
 457 #
 458 $compute_tc = $computetc_default
 459     unless $compute_tc || (!defined($compute_tc));
 460
 461 #
 462 # load and/or parse object merge function if supplied
 463 #
 464 my $merge_objs = parse_code($mergefunc) if $mergefunc;
 465
 466 #
 467 # determine input source(s)
 468 #
 469 my @files = @ARGV ? @ARGV : (\*STDIN);
 470
 471 #
 472 # determine input format and type. Having copy-and-pasted it from
 473 # load_seqdatabase.pl, we support more sophistication than we currently
 474 # need or disclose.
 475 #
 476 my $objio;
 477 my @fmtelems = split(/::/, $format);
 478 if(@fmtelems > 1) {
 479     $format = pop(@fmtelems);
 480     $objio = join('::', @fmtelems);
 481 } else {
 482     # default is OntologyIO
 483     $objio = "OntologyIO";
 484 }
 485 $objio = "Bio::".$objio if $objio !~ /^Bio::/;
 486 my $nextobj = $nextobj_map{$objio}||"next_ontology";
 487
 488 # the format might come with argument specifications
 489 my @fmtargs = split(/,/,$fmtargs,-1);
 490 # arguments might have had commas in them - we require them to be
 491 # escaped by backslash and need to stitch them back together now
 492 my $i = 0;
 493 while($i+1 < @fmtargs) {
 494     if($fmtargs[$i] =~ s/\\$//) {
 495         splice(@fmtargs, $i, 2, $fmtargs[$i].",".$fmtargs[$i+1]);
 496     } else {
 497         $i++;
 498     }
 499 }
 500
 501 #
 502 # check whether we need to apply defaults
 503 #
 504 $initrc = "DEFAULT" unless $initrc || !defined($initrc);
 505
 506 #
 507 # create the DBAdaptorI for our database
 508 #
 509 my $db = Bio::DB::BioDB->new(-database   => "biosql",
 510                              -printerror => $printerror,
 511                              -host       => $host,
 512                              -dbname     => $dbname,
 513                              -driver     => $driver,
 514                              -user       => $dbuser,
 515                              -pass       => $dbpass,
 516                              -dsn        => $dsn,
 517                              -schema     => $schema,
 518                              -initrc     => $initrc,
 519                              );
 520 $db->verbose($debug) if $debug > 0;
 521
 522 #
 523 # Open the ontology parser on all files supplied. Unlike other IO parsers,
 524 # ontologies may easily involve more than 1 input file to extract the
 525 # entire ontology.
 526 #
 527
 528 # open depending on whether it's a stream or a bunch of files
 529 my $ontin;
 530 my @parserargs = $format ? (-format => $format) : ();
 531 push(@parserargs, @fmtargs);
 532
 533 if(@files == 1) {
 534     my $prmname = ref($files[0]) ? "-fh" : "-file";
 535     $ontin = $objio->new($prmname, $files[0], @parserargs);
 536 } else {
 537     $ontin = $objio->new(-files => \@files, @parserargs);
 538 }
 539
 540 # set up the array of constant arguments to pass to the persistence handler
 541 my @persist_args = ('-db'          => $db,
 542                     '-termfactory' => $ontin->term_factory,
 543                     '-throw'       => $throw,
 544                     '-mergeobs'    => $merge_objs,
 545                     '-lookup'      => $lookup_flag,
 546                     '-remove'      => $remove_flag,
 547                     '-noupdate'    => $no_update_flag,
 548                     '-noobsolete'  => $no_obsolete,
 549                     '-delobsolete' => $del_obsolete,
 550                     '-updobsolete' => $upd_obsolete,
 551                     '-testonly'    => $testonly_flag,
 552                     );
 553
 554
 555 # The input parser may in fact be a SAX event handler, not a truly
 556 # OntologyIO-compliant parser. A SAX handler needs to be treated
 557 # fundamentally different from this point on than an OntologyIO
 558 # compliant parser. While the former is to be handed off to a XML SAX
 559 # parser, the latter needs to be looped over the ontologies it
 560 # returns.
 561
 562 if ($ontin->isa("Bio::OntologyIO::Handlers::BaseSAXHandler")) {
 563     # this is a SAX event handler, not a true OntologyIO parser
 564
 565     # pull in the XML SAX parser
 566     eval {
 567         require XML::Parser::PerlSAX;
 568     };
 569     croak "failed to load required XML SAX parser:\n$@" if $@;
 570
 571     # complete setup of the SAX event handler: pass in our persistence handlers
 572     $ontin->persist_term_handler(\&persist_term, @persist_args);
 573     $ontin->persist_relationship_handler(\&persist_relationship,@persist_args);
 574     $ontin->db($db);
 575
 576     # make sure the (default) ontology has a name
 577     my $ont = $ontin->_ontology();
 578     $ont->name($namespace) unless $ont->name;
 579
 580     # instantiate the XML SAX parser and pass it the event handler
 581     my $parser = XML::Parser::PerlSAX->new(Handler => $ontin);
 582
 583     # parsing the file will persist all terms and relationships, so we need
 584     # to delete the relationships first to avoid having stale ones around
 585     print STDERR "\t...deleting all relationships for ",$ont->name,"\n";
 586     remove_all_relationships('-ontology' => $ont, @persist_args);
 587
 588     # now go ahead and parse the file
 589     print STDERR "\t...parsing and loading ",$ont->name,"\n";
 590     $parser->parse(Source => {SystemId => $files[0]});
 591
 592     # Generate the transitive closure if requested
 593     if($compute_tc) {
 594         print STDERR "\t... transitive closure\n";
 595         compute_tc($db, $ont, $ontin->term_factory(), $compute_tc);
 596     }
 597
 598     print STDERR "\tDone with ",$ont->name,"\n";
 599
 600 } else {
 601     # this is a truly OntologyIO compliant parser, or so I hope
 602
 603     # loop over the input stream(s)
 604     while( my $ont = $ontin->$nextobj ) {
 605         # don't forget to add namespace if the parser doesn't supply one
 606         $ont->name($namespace) unless $ont->name();
 607
 608         print STDERR "Loading ontology ",$ont->name(),":\n\t... terms\n";
 609
 610         # in order to allow callbacks to the user and generally a
 611         # better ability to interfere with and customize the upload
 612         # process, we load all terms first here instead of simply
 613         # going for the relationships
 614
 615         foreach my $term ($ont->get_all_terms()) {
 616             # call the persistence handler - there is only one right now
 617             persist_term('-term' => $term, @persist_args);
 618         }
 619
 620         # after all terms have been processed, we run through the relationships
 621         # more or less non-interactively (i.e., without invoking a callback)
 622
 623         print STDERR "\t... relationships\n";
 624
 625         # first off, we need to delete the existing relationships in order
 626         # to avoid having stale ones around
 627         remove_all_relationships('-ontology' => $ont, @persist_args);
 628
 629         # now go and insert all of them
 630         foreach my $rel ($ont->get_relationships()) {
 631             # pass on to persistence function - there's only one right now
 632             persist_relationship('-rel' => $rel, @persist_args);
 633         }
 634
 635         # Generate the transitive closure if requested
 636         if($compute_tc) {
 637             print STDERR "\t... transitive closure\n";
 638             compute_tc($db, $ont, $ontin->term_factory(), $compute_tc);
 639         }
 640
 641         print STDERR "\tDone with ".$ont->name.".\n";
 642     }
 643
 644     # close the parser explicitly in case it needs this to be called
 645     $ontin->close();
 646 }
 647
 648 print STDERR "Done, cleaning up.\n";
 649
 650 if ($db && $testonly_flag) {
 651     $db->get_object_adaptor("Bio::Ontology::TermI")->rollback();
 652 }
 653 # done!
 654
 655 #################################################################
 656 # Implementation of functions                                   #
 657 #################################################################
 658
 659 sub parse_code{
 660     my $src = shift;
 661     my $code;
 662
 663     # file or subroutine?
 664     if(-r $src) {
 665         if(! (($code = do $src) && (ref($code) eq "CODE"))) {
 666             die "error in parsing code block $src: $@" if $@;
 667             die "unable to read file $src: $!" if $!;
 668             die "failed to run $src, or it failed to return a closure";
 669         }
 670     } else {
 671         $code = eval $src;
 672         die "error in parsing code block \"$src\": $@" if $@;
 673         die "\"$src\" fails to return a closure"
 674             unless ref($code) eq "CODE";
 675     }
 676     return $code;
 677 }
 678
 679 sub compute_tc{
 680     my ($db, $ont, $termfact, $params) = @_;
 681
 682     # split the parameter string into term names and ontology name
 683     my ($idpred,$basepred,$subclpred,$predont) = split(/;/,$params);
 684     # setup the predicate ontology
 685     $predont = Bio::Ontology::Ontology->new(-name => $predont);
 686     # setup the terms from their names and ontology
 687     if($idpred) {
 688         $idpred = $termfact->create_object(-name => $idpred,
 689                                            -ontology => $predont);
 690     }
 691     if($basepred) {
 692         $basepred = $termfact->create_object(-name => $basepred,
 693                                              -ontology => $predont);
 694     }
 695     if($subclpred) {
 696         $subclpred = $termfact->create_object(-name => $subclpred,
 697                                               -ontology => $predont);
 698     }
 699     # we need the ontology object adaptor
 700     my $ontadp = $db->get_object_adaptor($predont);
 701
 702     # done with setup, go for it
 703     eval {
 704         $ontadp->compute_transitive_closure($ont,
 705                                             -truncate => 1,
 706                                             -predicate_superclass => $basepred,
 707                                             -subclass_predicate => $subclpred,
 708                                             -identity_predicate => $idpred);
 709         $ontadp->commit();
 710     };
 711     if($@) {
 712         my $msg = "transitive closure generation failed for ".$ont->name.
 713             ":\n$@";
 714         $ontadp->rollback();
 715         &$throw($msg);
 716     }
 717 }
 718
 719 =head2 persist_term
 720
 721  Title   : persist_term
 722  Usage   :
 723  Function: Persist an ontology term to the database. This function may
 724            also be used as the persistence handler for event handlers,
 725            e.g., an XML event stream handler.
 726
 727            This method requires many options and accepts even
 728            more. See below.
 729
 730  Example :
 731  Returns :
 732  Args    : Named parameters. Currently the following parameters are
 733            recognized. Mandatory parameters are marked by an M in
 734            parentheses. Flags by definition are not mandatory; their
 735            default value will be false.
 736
 737              -term        the ontology term object to persist (M)
 738              -db          the adaptor factory returned by Bio::DB::BioDB (M)
 739              -termfactory the factory for creating terms (M)
 740              -throw       the error notification method to use
 741              -mergeobs    the closure for merging old and new term
 742              -lookup      whether to lookup terms first
 743              -remove      whether to delete existing term first
 744              -noobsolete  whether to completely ignore obsolete terms
 745              -delobsolete whether to delete existing obsolete terms
 746              -updobsolete whether to update existing obsolete terms
 747              -testonly    whether to not commit the term upon success
 748
 749
 750 =cut
 751
 752 sub persist_term {
 753     my ($term, $db, $termfactory, $throw,
 754         $merge_objs, $lookup_flag, $remove_flag, $no_update_flag,
 755         $no_obsolete, $del_obsolete, $upd_obsolete,
 756         $testonly_flag) =
 757           Bio::Root::RootI->_rearrange([qw(TERM
 758                                            DB
 759                                            TERMFACTORY
 760                                            THROW
 761                                            MERGEOBJS
 762                                            LOOKUP
 763                                            REMOVE
 764                                            NOUPDATE
 765                                            NOOBSOLETE
 766                                            DELOBSOLETE
 767                                            UPDOBSOLETE
 768                                            TESTONLY)],
 769                                        @_);
 770     # if the term is obsolete and we don't want to look at obsolete
 771     # terms, skip to the next one right away
 772     return if $no_obsolete && $term->is_obsolete();
 773     # look up or delete first? this may pertain only to obsolete terms.
 774     my ($pterm, $lterm, $adp);
 775     if($lookup_flag || $remove_flag ||
 776        (($del_obsolete || $upd_obsolete) && $term->is_obsolete())) {
 777         # look up
 778         $adp = $db->get_object_adaptor($term);
 779         $lterm = $adp->find_by_unique_key($term,
 780                                           -obj_factory => $termfactory);
 781             # found?
 782         if($lterm) {
 783             # merge old and new if a function for this is provided
 784             $term = &$merge_objs($lterm, $term, $db) if $merge_objs;
 785             # the return value may indicate to skip to the next
 786             return unless $term;
 787         } elsif(($del_obsolete || $upd_obsolete) && $term->is_obsolete()) {
 788             # don't store obsolete terms if we're supposed to only update
 789             # or delete them
 790             return;
 791         }
 792     }
 793     # try to serialize
 794     eval {
 795         $adp = $lterm->adaptor() if $lterm;
 796         # delete if requested
 797         if($lterm &&
 798            ($remove_flag || ($del_obsolete && $term->is_obsolete()))) {
 799             $lterm->remove();
 800         }
 801         # on update, skip the rest if we are not supposed to update,
 802         # and proceed with insert or update otherwise
 803         if(! ($lterm && $no_update_flag)) {
 804             # create a persistent object out of the term
 805             $pterm = $db->create_persistent($term);
 806             $adp = $pterm->adaptor();
 807             # store the primary key of what we found by lookup (this
 808             # is going to be an udate then)
 809             if($lterm && $lterm->primary_key) {
 810                 $pterm->primary_key($lterm->primary_key);
 811             }
 812             $pterm->store();
 813         }
 814         $adp->commit() unless $testonly_flag;
 815     };
 816     if ($@) {
 817         my $msg = "Could not store term ";
 818         if (defined($term->object_id())) {
 819             $msg .= $term->object_id().", name ";
 820         }
 821         $msg .= "'".$term->name()."':\n$@\n";
 822         $adp->rollback();
 823         $throw = \&Carp::croak unless $throw;
 824         &$throw($msg);
 825     }
 826 }
 827
 828 =head2 remove_all_relationships
 829
 830  Title   : remove_all_relationships
 831  Usage   :
 832  Function: Removes all relationships of an ontology from the
 833            database. This is a necessary step before inserting the
 834            latest ones in order to avoid stale relationships staying
 835            in the database.
 836
 837            See below for the parameters that this method accepts
 838            and/or requires.
 839
 840  Example :
 841  Returns :
 842  Args    : Named parameters. Currently the following parameters are
 843            recognized. Mandatory parameters are marked by an M in
 844            parentheses. Flags by definition are not mandatory; their
 845            default value will be false.
 846
 847              -ontology    the ontology for which to remove relationships (M)
 848              -db          the adaptor factory returned by Bio::DB::BioDB (M)
 849              -throw       the error notification method to use
 850              -testonly    whether to not commit the term upon success
 851
 852
 853 =cut
 854
 855 sub remove_all_relationships {
 856     my ($ont, $db, $throw, $testonly_flag) =
 857           Bio::Root::RootI->_rearrange([qw(ONTOLOGY
 858                                            DB
 859                                            THROW
 860                                            TESTONLY)],
 861                                        @_);
 862
 863     my $reladp = $db->get_object_adaptor("Bio::Ontology::RelationshipI");
 864     eval {
 865         $reladp->remove_all_relationships($ont);
 866         $reladp->commit() unless $testonly_flag;
 867     };
 868     if ($@) {
 869         $reladp->rollback();
 870         $throw = \&Carp::croak;
 871         &$throw("failed to remove relationships prior to inserting them: $@");
 872     }
 873 }
 874
 875 =head2 persist_relationship
 876
 877  Title   : persist_relationship
 878  Usage   :
 879  Function: Persist a term relationship to the database. This function
 880            may also be used as the persistence handler for event
 881            handlers, e.g., an XML event stream handler.
 882
 883            See below for the required and recognized parameters.
 884
 885  Example :
 886  Returns :
 887  Args    : Named parameters. Currently the following parameters are
 888            recognized. Mandatory parameters are marked by an M in
 889            parentheses. Flags by definition are not mandatory; their
 890            default value will be false.
 891
 892              -rel         the term relationship object to persist (M)
 893              -db          the adaptor factory returned by Bio::DB::BioDB (M)
 894              -throw       the error notification method to use
 895              -noobsolete  whether to completely ignore obsolete terms
 896              -delobsolete whether to delete existing obsolete terms
 897              -testonly    whether to not commit the term upon success
 898
 899
 900 =cut
 901
 902 sub persist_relationship {
 903     my ($rel, $db, $throw, $no_obsolete, $del_obsolete, $testonly_flag) =
 904           Bio::Root::RootI->_rearrange([qw(REL
 905                                            DB
 906                                            THROW
 907                                            NOOBSOLETE
 908                                            DELOBSOLETE
 909                                            TESTONLY)],
 910                                        @_);
 911     # don't bother with relationships that reference an obsolete term
 912     # if we don't load obsolete terms
 913     if($del_obsolete || $no_obsolete) {
 914         return if ($rel->subject_term->is_obsolete() ||
 915                    $rel->object_term->is_obsolete() ||
 916                    $rel->predicate_term->is_obsolete());
 917     }
 918     my $prel = $db->create_persistent($rel);
 919     eval {
 920         $prel->create();
 921         $prel->commit() unless $testonly_flag;
 922     };
 923     if ($@) {
 924         my $msg = "Could not store term relationship (".
 925             join(",",
 926                  $rel->subject_term->name(),
 927                  $rel->predicate_term->name(),
 928                  $rel->object_term->name()).
 929                  "):\n$@\n";
 930         $prel->rollback();
 931         $throw = \&Carp::croak unless $throw;
 932         &$throw($msg);
 933     }
 934 }