squash perl 5.12 warning
[bioperl-db.git] / scripts / biosql / load_ontology.pl
blobdf3aee2944454d13d16535b93578de41d5763836
1 #!/usr/local/bin/perl
3 # $Id$
5 # Cared for by Hilmar Lapp <hlapp at gmx.net>
7 # Copyright Hilmar Lapp
9 # You may distribute this module under the same terms as perl itself
12 # (c) Hilmar Lapp, hlapp at gmx.net, 2003.
13 # (c) GNF, Genomics Institute of the Novartis Research Foundation, 2003.
15 # You may distribute this module under the same terms as perl itself.
16 # Refer to the Perl Artistic License (see the license accompanying this
17 # software package, or see http://www.perl.com/language/misc/Artistic.html)
18 # for the terms under which you may use, modify, and redistribute this module.
20 # THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
21 # WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
22 # MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
25 =head1 NAME
27 load_ontology.pl
29 =head1 SYNOPSIS
31 # for loading the Gene Ontology:
32 load_ontology.pl --host somewhere.edu --dbname biosql \
33 --namespace "Gene Ontology" --format goflat \
34 --fmtargs "-defs_file,GO.defs" \
35 function.ontology process.ontology component.ontology
36 # in practice, you will want to use options for dealing with
37 # obsolete terms; read the documentation of respective arguments
39 # for loading the SOFA part of the sequence ontology (currently
40 # there is no term definition file for SOFA):
41 load_ontology.pl --host somewhere.edu --dbname biosql \
42 --namespace "SOFA" --format soflat sofa.ontology
44 =head1 DESCRIPTION
46 This script loads a BioSQL database with an ontology. There are a number of
47 options to do with where the BioSQL database is (ie, hostname,
48 user for database, password, database name) followed by the database
49 name you wish to load this into and then any number of files that make
50 up the ontology. The files are assumed formatted identically with the
51 format given in the --format flag.
53 There are more options than the ones shown above, see below. In
54 particular, there is a variety of options to specify how you want to
55 handle obsolete terms. If you try to load the Gene Ontology, you will
56 want to check out those options. Also, you may want to consult a
57 thread from the bioperl mailing list in this regard, see
58 http://bioperl.org/pipermail/bioperl-l/2004-February/014846.html .
60 Also, consider using --safe always unless you do want the script to
61 terminate at the first issue it encounters with loading.
63 =head1 ARGUMENTS
65 The arguments after the named options constitute the filelist. If
66 there are no such files, input is read from stdin. Mandatory options
67 are marked by (M). Default values for each parameter are shown in
68 square brackets. (Note that -bulk is no longer available):
70 =over 2
72 =item --host $URL
74 the host name or IP address incl. port [localhost]
76 =item --dbname $db_name
78 the name of the schema [biosql]
80 =item --dbuser $username
82 database username [root]
84 =item --dbpass $password
86 password [undef]
88 =item --driver $driver
90 the DBI driver name for the RDBMS e.g., mysql, Pg, or Oracle
92 =item --dsn dsn
94 Instead of providing the database connection and driver parameters
95 individually, you may also specify the DBI-formatted DSN that is to be
96 used verbatim for connecting to the database. Note that if you do give
97 individual parameters in addition they will not supplant what is in
98 the DSN string. Hence, the only database-related parameter that may be
99 useful to specify in addition is --driver, as that is used also for
100 selecting the driver-specific adaptors that generate SQL
101 code. Usually, the driver will be parsed out from the DSN though and
102 therefore will be set as well by setting the DSN.
104 Consult the POD of your DBI driver for how to properly format the DSN
105 for it. A typical example is dbi:Pg:dbname=biosql;host=foo.bar.edu
106 (for PostgreSQL). Note that the DSN will be specific to the driver
107 being used.
109 =item --schema schemaname
111 The schema under which the BioSQL tables reside in the database. For
112 Oracle and MySQL this is synonymous with the user, and won't have an
113 effect. PostgreSQL since v7.4 supports schemas as the namespace for
114 collections of tables within a database.
116 =item --initrc paramfile
118 Instead of, or in addition to, specifying every individual database
119 connection parameter you may put them into a file that when read by
120 perl evaluates to an array or hash reference. This option specifies
121 the file to read; the special value DEFAULT (or no value) will use a
122 file ./.bioperldb or $HOME/.bioperldb, whichever is found first in
123 that order.
125 Constructing a file that evaluates to a hash reference is very
126 simple. The first non-space character needs to be an open curly brace,
127 and the last non-space character a closing curly brace. In between the
128 curly braces, write option name enclosed by single quotes, followed by
129 => (equal to or greater than), followed by the value in single
130 quotes. Separate each such option/value pair by comma. Here is an
131 example:
134 '-dbname' => 'mybiosql', '-host' => 'foo.bar.edu', '-user' => 'cleo'
137 Line breaks and white space don't matter (except if in the value
138 itself). Also note that options only have a single dash as prefix, and
139 they need to be those accepted by Bio::DB::BioDB->new()
140 (L<Bio::DB::BioDB>) or Bio::DB::SimpleDBContext->new()
141 (L<Bio::DB::SimpleDBContext>). Those sometimes differ slightly from the
142 option names used by this script, e.g., --dbuser corresponds to -user.
144 Note also that using the above example, you can use it for --initrc
145 and still connect as user caesar by also supplying --dbuser caesar on
146 the command line. I.e., command line arguments override any parameters
147 also found in the initrc file.
149 Finally, note that if using this option with default file name and the
150 default file is not found at any of the default locations, the option
151 will be ignored; it is not considered an error.
153 =item --namespace $namesp
155 The namespace (name of the ontology) under which the terms and
156 relationships in the input files are to be created in the database
157 [bioperl ontology]. Note that the namespace will be left untouched if the
158 object(s) to be submitted has it set already.
160 Note that the DAG-edit flat file parser from more recent (1.2.2 and
161 later) bioperl releases can auto-discover the ontology name.
163 =item --lookup
165 Flag to look-up by unique key first, converting the insert into an
166 update if the object is found. This pertains to terms only, as there
167 is nothing to update about relationships if they are found by unique
168 key (the unique key comprises of all columns).
170 =item --noupdate
172 Don't update if object is found (with --lookup). Again, this only
173 pertains to terms.
175 =item --remove
177 Flag to remove terms before actually adding them (this necessitates a
178 prior lookup). Note that this is not relevant for relationships (if
179 one is found by lookup, removing and re-adding has essentially the
180 same result as leaving it untouched).
182 =item --noobsolete
184 Flag to exclude from upload terms marked as obsolete. Note that with
185 this flag, any update, removal, or object merge that you specify using
186 other parameters will not apply to obsolete terms. I.e., if you have
187 terms existing in your database that are marked as obsolete in the
188 input file, using this flag will prevent the existing terms from being
189 updated to reflect the obsolete status. Therefore, this flag is best
190 used when first loading an ontology. You may want to consider using
191 --updobsolete instead.
193 Note that relationships found in the input file(s) that reference an
194 obsolete term will be omitted from loading with this flag in effect.
196 =item --updobsolete
198 Flag to exclude from upload terms marked as obsolete unless they are
199 already present in the database. If they are, they will be updated,
200 and the --mergeobjs procedure will apply. If they are not, they will
201 be treated as if --noobsolete had been specified. Note that
202 relationships will not be updated for obsolete terms.
204 In contrast to --noobsolete, using this flag will increase the
205 database operations mildly (because of the look-ups necessary to
206 determine whether obsolete terms are present, and the subsequent
207 update for those that are), but it will capture change of status for
208 existing terms. At the same time, you won't load obsolete terms from a
209 new ontology that you haven't loaded before.
211 =item --delobsolete
213 Delete terms marked as obsolete from the database. Note that --remove
214 together with --noobsolete will have the same effect. Note also that
215 specifying this flag will not affect those terms that are only in your
216 database but not in the input file, regardless of whether they are
217 marked as obsolete or not.
219 Be aware that even though deleting obsolete terms may sound like a
220 very sane thing to do, you may have annotated features or bioentries using
221 those terms. Deleting the obsolete terms will then remove those
222 annotations (qualifier/value pairs) as well.
224 =item --safe
226 flag to continue despite errors when loading (the entire object
227 transaction will still be rolled back)
229 =item --testonly
231 don't commit anything, rollback at the end
233 =item --format
235 This may theoretically be any OntologyIO format understood by
236 bioperl. All input files must have the same format.
238 Examples:
239 # this is the default
240 --format goflat
241 # Simple ASCII hierarchy (e.g., eVoc)
242 --format simplehierarchy
244 Note that some formats may come with event-type parsers, specifically
245 with XML SAX event parsers. While those aren't truly
246 OntologyIO-compliant parsers (they can't be because OntologyIO defines
247 a stream of ontologies as the API), this script supports them
248 nevertheless. For instance, at the time of this writing there is an
249 InterPro XML SAX event handler (aliased to --format interprosax) which
250 will persist terms to the database as they are encountered in the
251 event stream, which greatly reduces the amount of memory
252 needed. Credit for conceiving this idea and writing the SAX handler
253 goes to Juguang Xiao, juguang at tll.org.sg.
255 =item --fmtargs
257 Use this argument to specify initialization parameters for the parser
258 for the input format. The argument value is expected to be a string
259 with parameter names and values delimited by comma.
261 Usually you will want to protect the argument list from interpretation
262 by the shell, so surround it with double or single quotes.
264 If a parameter value contains a comma, escape it with a backslash
265 (which means you also must protect the whole argument from the shell
266 in order to preserve the backslash)
268 Examples:
270 # turn parser exceptions into warnings (don't try this at home)
271 --fmtargs "-verbose,-1"
272 # verbose parser with an additional path argument
273 --fmtargs "-verbose,1,-indexpath,/home/luke/warp"
274 # escape commas in values
275 --fmtargs "-ontology_name,Big Blue\, v2,-indent_string,\,"
277 =item --mergeobjs
279 This is a string or a file defining a closure. If provided, the
280 closure is called if a look-up for the unique key of the new object
281 was successful (hence, it will never be called without supplying
282 --lookup, but not --noupdate, at the same time).
284 The closure will be passed three (3) arguments: the object found by
285 lookup, the new object to be submitted, and the Bio::DB::DBAdaptorI
286 (see L<Bio::DB::DBAdaptorI>) implementing object for the desired
287 database. If the closure returns a value, it must be the object to be
288 inserted or updated in the database (if $obj->primary_key returns a
289 value, the object will be updated). If it returns undef, the script
290 will skip to the next object in the input stream.
292 The purpose of the closure can be manifold. It was originally
293 conceived as a means to customarily merge attributes or associated
294 objects of the new object to the existing (found) one in order to
295 avoid duplications but still capture additional information (e.g.,
296 annotation). However, there is a multitude of other operations it can
297 be used for, like physically deleting or altering certain associated
298 information from the database (the found object and all its associated
299 objects will implement Bio::DB::PersistentObjectI, see
300 L<Bio::DB::PersistentObjectI>). Since the third argument is the
301 persistent object and adaptor factory for the database, there is
302 literally no limit as to the database operations the closure could
303 possibly do.
305 =item --computetc "[identity];[base predicate];[subclasses];[ontology]"
307 Recompute the transitive closure table for the ontology after it has
308 been loaded. A possibly existing transitive closure will be deleted
309 first.
311 The argument specifies three terms the algorithm relies on, and their
312 ontology, each separated by semicolon. Each of the three terms may be
313 omitted, but the semicolons need to be present. Alternatively, you may
314 omit the argument altogether in which case it will assume a sensible
315 default value ("identity;related-to;implies;Predicate Ontology"). See
316 below for what this means.
318 Every predicate in the ontology for which the transitive closure is to
319 be computed is expected to have a relationship to itself. This
320 relationship is commonly referred to as the identity relationship. The
321 first term specifies the predicate name for this relationship, e.g.,
322 'identity'. The second and third term pertain to ontologies that have
323 valid paths with mixed predicates. If this occurs, the second term
324 denotes the base predicate for any combination of two different
325 predicates, and the third predicate denotes the predicate for the
326 relationship between any predicate and the base predicate, where the
327 base predicate is the object and the ontology's predicate is the
328 subject. For instance, one might want to provide 'related-to' as the
329 base predicate, and 'implies' as the predicate of the subclassing
330 relationship, which would give rise to triples like
331 (is-a,implies,related-to), (part-of,implies,related-to), etc. The
332 string following the last semicolon denotes the name of the ontology
333 under which to store those triples as well as the identity, base
334 predicate, and subclasses predicate terms.
336 If any of the terms are omitted (provided as empty strings), the
337 corresponding relationships will not be generated. Note that the
338 computed transitive closure may then be incomplete.
340 =item more args
342 The remaining arguments will be treated as files to parse and load. If
343 there are no additional arguments, input is expected to come from
344 standard input.
346 =back
348 =head1 Authors
350 Hilmar Lapp E<lt>hlapp at gmx.netE<gt>
352 =cut
355 use Getopt::Long;
356 use Symbol;
357 use Carp (qw:cluck confess croak:);
358 use Bio::DB::BioDB;
359 use Bio::OntologyIO;
360 use Bio::Root::RootI;
362 ####################################################################
363 # Defaults for options changeable through command line
364 ####################################################################
365 my $host; # should make the driver to default to localhost
366 my $dbname;
367 my $dbuser;
368 my $driver;
369 my $dbpass;
370 my $schema;
371 my $format = 'goflat';
372 my $fmtargs = '';
373 my $namespace = "bioperl ontology";
374 my $initrc; # use an initialization file for parameters?
375 my $dsn; # DSN to use verbatim for connecting, if any
376 my $mergefunc; # if and how to merge old (found) and new objects
377 # flags
378 my $remove_flag = 0; # remove object before creating?
379 my $lookup_flag = 0; # look up object before creating, update if found?
380 my $no_update_flag = 0; # do not update if found on look up?
381 my $no_obsolete = 0; # whether to include obsolete terms
382 my $upd_obsolete = 0; # whether to include obsolete terms
383 my $del_obsolete = 0; # whether to delete obsolete terms
384 my $compute_tc; # compute the transitive closure?
385 my $help = 0; # WTH?
386 my $debug = 0; # try it ...
387 my $testonly_flag = 0; # don't commit anything, rollback at the end?
388 my $safe_flag = 0; # tolerate exceptions on create?
389 my $printerror = 0; # whether to print DBI error messages
390 my $computetc_default = "identity;related-to;implies;Predicate Ontology";
391 ####################################################################
392 # Global defaults or definitions not changeable through commandline
393 ####################################################################
396 # map of I/O type to the next_XXXX method name
398 # Right now there is only a single IO subsystem we support here, so we
399 # could do well without. We leave it in here to easily be able to adapt
400 # in the future should it become necessary.
402 my %nextobj_map = (
403 'Bio::OntologyIO' => 'next_ontology',
406 ####################################################################
407 # End of defaults
408 ####################################################################
411 # get options from commandline
413 my $ok = GetOptions( 'host:s' => \$host,
414 'driver:s' => \$driver,
415 'dbname:s' => \$dbname,
416 'dbuser:s' => \$dbuser,
417 'dbpass:s' => \$dbpass,
418 'dsn=s' => \$dsn,
419 'schema=s' => \$schema,
420 'format:s' => \$format,
421 'fmtargs=s' => \$fmtargs,
422 'initrc:s' => \$initrc,
423 'namespace:s' => \$namespace,
424 'mergeobjs:s' => \$mergefunc,
425 'safe' => \$safe_flag,
426 'remove' => \$remove_flag,
427 'lookup' => \$lookup_flag,
428 'noupdate' => \$no_update_flag,
429 'noobsolete' => \$no_obsolete,
430 'updobsolete' => \$upd_obsolete,
431 'delobsolete' => \$del_obsolete,
432 'computetc:s' => \$compute_tc,
433 'debug' => \$debug,
434 'testonly' => \$testonly_flag,
435 'printerror' => \$printerror,
436 'h|help' => \$help
439 if((! $ok) || $help) {
440 if(! $ok) {
441 print STDERR "missing or unsupported option(s) on commandline\n";
443 system("perldoc $0");
444 exit($ok ? 0 : 2);
448 # determine the function for re-throwing exceptions depending on $debug and
449 # $safe_flag
451 our $throw = $safe_flag ?
452 ($debug > 0 ? \&Carp::cluck : \&Carp::carp) :
453 ($debug > 0 ? \&Carp::confess : \&Carp::croak);
456 # check $computetc whether it needs to assume the default value
458 $compute_tc = $computetc_default
459 unless $compute_tc || (!defined($compute_tc));
462 # load and/or parse object merge function if supplied
464 my $merge_objs = parse_code($mergefunc) if $mergefunc;
467 # determine input source(s)
469 my @files = @ARGV ? @ARGV : (\*STDIN);
472 # determine input format and type. Having copy-and-pasted it from
473 # load_seqdatabase.pl, we support more sophistication than we currently
474 # need or disclose.
476 my $objio;
477 my @fmtelems = split(/::/, $format);
478 if(@fmtelems > 1) {
479 $format = pop(@fmtelems);
480 $objio = join('::', @fmtelems);
481 } else {
482 # default is OntologyIO
483 $objio = "OntologyIO";
485 $objio = "Bio::".$objio if $objio !~ /^Bio::/;
486 my $nextobj = $nextobj_map{$objio}||"next_ontology";
488 # the format might come with argument specifications
489 my @fmtargs = split(/,/,$fmtargs,-1);
490 # arguments might have had commas in them - we require them to be
491 # escaped by backslash and need to stitch them back together now
492 my $i = 0;
493 while($i+1 < @fmtargs) {
494 if($fmtargs[$i] =~ s/\\$//) {
495 splice(@fmtargs, $i, 2, $fmtargs[$i].",".$fmtargs[$i+1]);
496 } else {
497 $i++;
502 # check whether we need to apply defaults
504 $initrc = "DEFAULT" unless $initrc || !defined($initrc);
507 # create the DBAdaptorI for our database
509 my $db = Bio::DB::BioDB->new(-database => "biosql",
510 -printerror => $printerror,
511 -host => $host,
512 -dbname => $dbname,
513 -driver => $driver,
514 -user => $dbuser,
515 -pass => $dbpass,
516 -dsn => $dsn,
517 -schema => $schema,
518 -initrc => $initrc,
520 $db->verbose($debug) if $debug > 0;
523 # Open the ontology parser on all files supplied. Unlike other IO parsers,
524 # ontologies may easily involve more than 1 input file to extract the
525 # entire ontology.
528 # open depending on whether it's a stream or a bunch of files
529 my $ontin;
530 my @parserargs = $format ? (-format => $format) : ();
531 push(@parserargs, @fmtargs);
533 if(@files == 1) {
534 my $prmname = ref($files[0]) ? "-fh" : "-file";
535 $ontin = $objio->new($prmname, $files[0], @parserargs);
536 } else {
537 $ontin = $objio->new(-files => \@files, @parserargs);
540 # set up the array of constant arguments to pass to the persistence handler
541 my @persist_args = ('-db' => $db,
542 '-termfactory' => $ontin->term_factory,
543 '-throw' => $throw,
544 '-mergeobs' => $merge_objs,
545 '-lookup' => $lookup_flag,
546 '-remove' => $remove_flag,
547 '-noupdate' => $no_update_flag,
548 '-noobsolete' => $no_obsolete,
549 '-delobsolete' => $del_obsolete,
550 '-updobsolete' => $upd_obsolete,
551 '-testonly' => $testonly_flag,
555 # The input parser may in fact be a SAX event handler, not a truly
556 # OntologyIO-compliant parser. A SAX handler needs to be treated
557 # fundamentally different from this point on than an OntologyIO
558 # compliant parser. While the former is to be handed off to a XML SAX
559 # parser, the latter needs to be looped over the ontologies it
560 # returns.
562 if ($ontin->isa("Bio::OntologyIO::Handlers::BaseSAXHandler")) {
563 # this is a SAX event handler, not a true OntologyIO parser
565 # pull in the XML SAX parser
566 eval {
567 require XML::Parser::PerlSAX;
569 croak "failed to load required XML SAX parser:\n$@" if $@;
571 # complete setup of the SAX event handler: pass in our persistence handlers
572 $ontin->persist_term_handler(\&persist_term, @persist_args);
573 $ontin->persist_relationship_handler(\&persist_relationship,@persist_args);
574 $ontin->db($db);
576 # make sure the (default) ontology has a name
577 my $ont = $ontin->_ontology();
578 $ont->name($namespace) unless $ont->name;
580 # instantiate the XML SAX parser and pass it the event handler
581 my $parser = XML::Parser::PerlSAX->new(Handler => $ontin);
583 # parsing the file will persist all terms and relationships, so we need
584 # to delete the relationships first to avoid having stale ones around
585 print STDERR "\t...deleting all relationships for ",$ont->name,"\n";
586 remove_all_relationships('-ontology' => $ont, @persist_args);
588 # now go ahead and parse the file
589 print STDERR "\t...parsing and loading ",$ont->name,"\n";
590 $parser->parse(Source => {SystemId => $files[0]});
592 # Generate the transitive closure if requested
593 if($compute_tc) {
594 print STDERR "\t... transitive closure\n";
595 compute_tc($db, $ont, $ontin->term_factory(), $compute_tc);
598 print STDERR "\tDone with ",$ont->name,"\n";
600 } else {
601 # this is a truly OntologyIO compliant parser, or so I hope
603 # loop over the input stream(s)
604 while( my $ont = $ontin->$nextobj ) {
605 # don't forget to add namespace if the parser doesn't supply one
606 $ont->name($namespace) unless $ont->name();
608 print STDERR "Loading ontology ",$ont->name(),":\n\t... terms\n";
610 # in order to allow callbacks to the user and generally a
611 # better ability to interfere with and customize the upload
612 # process, we load all terms first here instead of simply
613 # going for the relationships
615 foreach my $term ($ont->get_all_terms()) {
616 # call the persistence handler - there is only one right now
617 persist_term('-term' => $term, @persist_args);
620 # after all terms have been processed, we run through the relationships
621 # more or less non-interactively (i.e., without invoking a callback)
623 print STDERR "\t... relationships\n";
625 # first off, we need to delete the existing relationships in order
626 # to avoid having stale ones around
627 remove_all_relationships('-ontology' => $ont, @persist_args);
629 # now go and insert all of them
630 foreach my $rel ($ont->get_relationships()) {
631 # pass on to persistence function - there's only one right now
632 persist_relationship('-rel' => $rel, @persist_args);
635 # Generate the transitive closure if requested
636 if($compute_tc) {
637 print STDERR "\t... transitive closure\n";
638 compute_tc($db, $ont, $ontin->term_factory(), $compute_tc);
641 print STDERR "\tDone with ".$ont->name.".\n";
644 # close the parser explicitly in case it needs this to be called
645 $ontin->close();
648 print STDERR "Done, cleaning up.\n";
650 if ($db && $testonly_flag) {
651 $db->get_object_adaptor("Bio::Ontology::TermI")->rollback();
653 # done!
655 #################################################################
656 # Implementation of functions #
657 #################################################################
659 sub parse_code{
660 my $src = shift;
661 my $code;
663 # file or subroutine?
664 if(-r $src) {
665 if(! (($code = do $src) && (ref($code) eq "CODE"))) {
666 die "error in parsing code block $src: $@" if $@;
667 die "unable to read file $src: $!" if $!;
668 die "failed to run $src, or it failed to return a closure";
670 } else {
671 $code = eval $src;
672 die "error in parsing code block \"$src\": $@" if $@;
673 die "\"$src\" fails to return a closure"
674 unless ref($code) eq "CODE";
676 return $code;
679 sub compute_tc{
680 my ($db, $ont, $termfact, $params) = @_;
682 # split the parameter string into term names and ontology name
683 my ($idpred,$basepred,$subclpred,$predont) = split(/;/,$params);
684 # setup the predicate ontology
685 $predont = Bio::Ontology::Ontology->new(-name => $predont);
686 # setup the terms from their names and ontology
687 if($idpred) {
688 $idpred = $termfact->create_object(-name => $idpred,
689 -ontology => $predont);
691 if($basepred) {
692 $basepred = $termfact->create_object(-name => $basepred,
693 -ontology => $predont);
695 if($subclpred) {
696 $subclpred = $termfact->create_object(-name => $subclpred,
697 -ontology => $predont);
699 # we need the ontology object adaptor
700 my $ontadp = $db->get_object_adaptor($predont);
702 # done with setup, go for it
703 eval {
704 $ontadp->compute_transitive_closure($ont,
705 -truncate => 1,
706 -predicate_superclass => $basepred,
707 -subclass_predicate => $subclpred,
708 -identity_predicate => $idpred);
709 $ontadp->commit();
711 if($@) {
712 my $msg = "transitive closure generation failed for ".$ont->name.
713 ":\n$@";
714 $ontadp->rollback();
715 &$throw($msg);
719 =head2 persist_term
721 Title : persist_term
722 Usage :
723 Function: Persist an ontology term to the database. This function may
724 also be used as the persistence handler for event handlers,
725 e.g., an XML event stream handler.
727 This method requires many options and accepts even
728 more. See below.
730 Example :
731 Returns :
732 Args : Named parameters. Currently the following parameters are
733 recognized. Mandatory parameters are marked by an M in
734 parentheses. Flags by definition are not mandatory; their
735 default value will be false.
737 -term the ontology term object to persist (M)
738 -db the adaptor factory returned by Bio::DB::BioDB (M)
739 -termfactory the factory for creating terms (M)
740 -throw the error notification method to use
741 -mergeobs the closure for merging old and new term
742 -lookup whether to lookup terms first
743 -remove whether to delete existing term first
744 -noobsolete whether to completely ignore obsolete terms
745 -delobsolete whether to delete existing obsolete terms
746 -updobsolete whether to update existing obsolete terms
747 -testonly whether to not commit the term upon success
750 =cut
752 sub persist_term {
753 my ($term, $db, $termfactory, $throw,
754 $merge_objs, $lookup_flag, $remove_flag, $no_update_flag,
755 $no_obsolete, $del_obsolete, $upd_obsolete,
756 $testonly_flag) =
757 Bio::Root::RootI->_rearrange([qw(TERM
759 TERMFACTORY
760 THROW
761 MERGEOBJS
762 LOOKUP
763 REMOVE
764 NOUPDATE
765 NOOBSOLETE
766 DELOBSOLETE
767 UPDOBSOLETE
768 TESTONLY)],
769 @_);
770 # if the term is obsolete and we don't want to look at obsolete
771 # terms, skip to the next one right away
772 return if $no_obsolete && $term->is_obsolete();
773 # look up or delete first? this may pertain only to obsolete terms.
774 my ($pterm, $lterm, $adp);
775 if($lookup_flag || $remove_flag ||
776 (($del_obsolete || $upd_obsolete) && $term->is_obsolete())) {
777 # look up
778 $adp = $db->get_object_adaptor($term);
779 $lterm = $adp->find_by_unique_key($term,
780 -obj_factory => $termfactory);
781 # found?
782 if($lterm) {
783 # merge old and new if a function for this is provided
784 $term = &$merge_objs($lterm, $term, $db) if $merge_objs;
785 # the return value may indicate to skip to the next
786 return unless $term;
787 } elsif(($del_obsolete || $upd_obsolete) && $term->is_obsolete()) {
788 # don't store obsolete terms if we're supposed to only update
789 # or delete them
790 return;
793 # try to serialize
794 eval {
795 $adp = $lterm->adaptor() if $lterm;
796 # delete if requested
797 if($lterm &&
798 ($remove_flag || ($del_obsolete && $term->is_obsolete()))) {
799 $lterm->remove();
801 # on update, skip the rest if we are not supposed to update,
802 # and proceed with insert or update otherwise
803 if(! ($lterm && $no_update_flag)) {
804 # create a persistent object out of the term
805 $pterm = $db->create_persistent($term);
806 $adp = $pterm->adaptor();
807 # store the primary key of what we found by lookup (this
808 # is going to be an udate then)
809 if($lterm && $lterm->primary_key) {
810 $pterm->primary_key($lterm->primary_key);
812 $pterm->store();
814 $adp->commit() unless $testonly_flag;
816 if ($@) {
817 my $msg = "Could not store term ";
818 if (defined($term->object_id())) {
819 $msg .= $term->object_id().", name ";
821 $msg .= "'".$term->name()."':\n$@\n";
822 $adp->rollback();
823 $throw = \&Carp::croak unless $throw;
824 &$throw($msg);
828 =head2 remove_all_relationships
830 Title : remove_all_relationships
831 Usage :
832 Function: Removes all relationships of an ontology from the
833 database. This is a necessary step before inserting the
834 latest ones in order to avoid stale relationships staying
835 in the database.
837 See below for the parameters that this method accepts
838 and/or requires.
840 Example :
841 Returns :
842 Args : Named parameters. Currently the following parameters are
843 recognized. Mandatory parameters are marked by an M in
844 parentheses. Flags by definition are not mandatory; their
845 default value will be false.
847 -ontology the ontology for which to remove relationships (M)
848 -db the adaptor factory returned by Bio::DB::BioDB (M)
849 -throw the error notification method to use
850 -testonly whether to not commit the term upon success
853 =cut
855 sub remove_all_relationships {
856 my ($ont, $db, $throw, $testonly_flag) =
857 Bio::Root::RootI->_rearrange([qw(ONTOLOGY
859 THROW
860 TESTONLY)],
861 @_);
863 my $reladp = $db->get_object_adaptor("Bio::Ontology::RelationshipI");
864 eval {
865 $reladp->remove_all_relationships($ont);
866 $reladp->commit() unless $testonly_flag;
868 if ($@) {
869 $reladp->rollback();
870 $throw = \&Carp::croak;
871 &$throw("failed to remove relationships prior to inserting them: $@");
875 =head2 persist_relationship
877 Title : persist_relationship
878 Usage :
879 Function: Persist a term relationship to the database. This function
880 may also be used as the persistence handler for event
881 handlers, e.g., an XML event stream handler.
883 See below for the required and recognized parameters.
885 Example :
886 Returns :
887 Args : Named parameters. Currently the following parameters are
888 recognized. Mandatory parameters are marked by an M in
889 parentheses. Flags by definition are not mandatory; their
890 default value will be false.
892 -rel the term relationship object to persist (M)
893 -db the adaptor factory returned by Bio::DB::BioDB (M)
894 -throw the error notification method to use
895 -noobsolete whether to completely ignore obsolete terms
896 -delobsolete whether to delete existing obsolete terms
897 -testonly whether to not commit the term upon success
900 =cut
902 sub persist_relationship {
903 my ($rel, $db, $throw, $no_obsolete, $del_obsolete, $testonly_flag) =
904 Bio::Root::RootI->_rearrange([qw(REL
906 THROW
907 NOOBSOLETE
908 DELOBSOLETE
909 TESTONLY)],
910 @_);
911 # don't bother with relationships that reference an obsolete term
912 # if we don't load obsolete terms
913 if($del_obsolete || $no_obsolete) {
914 return if ($rel->subject_term->is_obsolete() ||
915 $rel->object_term->is_obsolete() ||
916 $rel->predicate_term->is_obsolete());
918 my $prel = $db->create_persistent($rel);
919 eval {
920 $prel->create();
921 $prel->commit() unless $testonly_flag;
923 if ($@) {
924 my $msg = "Could not store term relationship (".
925 join(",",
926 $rel->subject_term->name(),
927 $rel->predicate_term->name(),
928 $rel->object_term->name()).
929 "):\n$@\n";
930 $prel->rollback();
931 $throw = \&Carp::croak unless $throw;
932 &$throw($msg);