Bio/AnalysisI.pm

   1 # $Id$
   2 #
   3 # BioPerl module for Bio::AnalysisI
   4 #
   5 # Cared for by Martin Senger <martin.senger@gmail.com>
   6 # For copyright and disclaimer see below.
   7 #
   8
   9 # POD documentation - main docs before the code
  10
  11 =head1 NAME
  12
  13 Bio::AnalysisI - An interface to any (local or remote) analysis tool
  14
  15 =head1 SYNOPSIS
  16
  17 This is an interface module - you do not instantiate it.
  18 Use C<Bio::Tools::Run::Analysis> module:
  19
  20   use Bio::Tools::Run::Analysis;
  21   my $tool = Bio::Tools::Run::Analysis->new(@args);
  22
  23 =head1 DESCRIPTION
  24
  25 This interface contains all public methods for accessing and
  26 controlling local and remote analysis tools. It is meant to be used on
  27 the client side.
  28
  29 =head1 FEEDBACK
  30
  31 =head2 Mailing Lists
  32
  33 User feedback is an integral part of the evolution of this and other
  34 Bioperl modules. Send your comments and suggestions preferably to
  35 the Bioperl mailing list.  Your participation is much appreciated.
  36
  37   bioperl-l@bioperl.org                  - General discussion
  38   http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
  39
  40 =head2 Reporting Bugs
  41
  42 Report bugs to the Bioperl bug tracking system to help us keep track
  43 of the bugs and their resolution. Bug reports can be submitted via the
  44 web:
  45
  46   http://bugzilla.open-bio.org/
  47
  48 =head1 AUTHOR
  49
  50 Martin Senger (martin.senger@gmail.com)
  51
  52 =head1 COPYRIGHT
  53
  54 Copyright (c) 2003, Martin Senger and EMBL-EBI.
  55 All Rights Reserved.
  56
  57 This module is free software; you can redistribute it and/or modify
  58 it under the same terms as Perl itself.
  59
  60 =head1 DISCLAIMER
  61
  62 This software is provided "as is" without warranty of any kind.
  63
  64 =head1 SEE ALSO
  65
  66 L<http://www.ebi.ac.uk/soaplab/Perl_Client.html>.
  67
  68 =head1 APPENDIX
  69
  70 This is actually the main documentation...
  71
  72 If you try to call any of these methods directly on this
  73 C<Bio::AnalysisI> object you will get a I<not implemented> error
  74 message. You need to call them on a C<Bio::Tools::Run::Analysis> object instead.
  75
  76 =cut
  77
  78
  79 # Let the code begin...
  80
  81 package Bio::AnalysisI;
  82 use strict;
  83
  84 use base qw(Bio::Root::RootI);
  85
  86 # -----------------------------------------------------------------------------
  87
  88 =head2 analysis_name
  89
  90  Usage   : $tool->analysis_name;
  91  Returns : a name of this analysis
  92  Args    : none
  93
  94 =cut
  95
  96 sub analysis_name { shift->throw_not_implemented(); }
  97
  98 # -----------------------------------------------------------------------------
  99
 100 =head2 analysis_spec
 101
 102  Usage   : $tool->analysis_spec;
 103  Returns : a hash reference describing this analysis
 104  Args    : none
 105
 106 The returned hash reference uses the following keys (not all of them always
 107 present, perhaps others present as well): C<name>, C<type>, C<version>,
 108 C<supplier>, C<installation>, C<description>.
 109
 110 Here is an example output:
 111
 112   Analysis 'edit.seqret':
 113         installation => EMBL-EBI
 114         description => Reads and writes (returns) sequences
 115         supplier => EMBOSS
 116         version => 2.6.0
 117         type => edit
 118         name => seqret
 119
 120 =cut
 121
 122 sub analysis_spec { shift->throw_not_implemented(); }
 123
 124 # -----------------------------------------------------------------------------
 125
 126 =head2 describe
 127
 128  Usage   : $tool->analysis_spec;
 129  Returns : an XML detailed description of this analysis
 130  Args    : none
 131
 132 The returned XML string contains metadata describing this analysis
 133 service. It includes also metadata returned (and easier used) by
 134 method C<analysis_spec>, C<input_spec> and C<result_spec>.
 135
 136 The DTD used for returned metadata is based on the adopted standard
 137 (BSA specification for analysis engine):
 138
 139   <!ELEMENT DsLSRAnalysis (analysis)+>
 140
 141   <!ELEMENT analysis (description?, input*, output*, extension?)>
 142
 143   <!ATTLIST analysis
 144       type          CDATA #REQUIRED
 145       name          CDATA #IMPLIED
 146       version       CDATA #IMPLIED
 147       supplier      CDATA #IMPLIED
 148       installation  CDATA #IMPLIED>
 149
 150   <!ELEMENT description ANY>
 151   <!ELEMENT extension ANY>
 152
 153   <!ELEMENT input (default?, allowed*, extension?)>
 154
 155   <!ATTLIST input
 156       type          CDATA #REQUIRED
 157       name          CDATA #REQUIRED
 158       mandatory     (true|false) "false">
 159
 160   <!ELEMENT default (#PCDATA)>
 161   <!ELEMENT allowed (#PCDATA)>
 162
 163   <!ELEMENT output (extension?)>
 164
 165   <!ATTLIST output
 166       type          CDATA #REQUIRED
 167       name          CDATA #REQUIRED>
 168
 169 But the DTD may be extended by provider-specific metadata. For
 170 example, the EBI experimental SOAP-based service on top of EMBOSS uses
 171 DTD explained at C<http://www.ebi.ac.uk/~senger/applab>.
 172
 173 =cut
 174
 175 sub describe { shift->throw_not_implemented(); }
 176
 177 # -----------------------------------------------------------------------------
 178
 179 =head2 input_spec
 180
 181  Usage   : $tool->input_spec;
 182  Returns : an array reference with hashes as elements
 183  Args    : none
 184
 185 The analysis input data are named, and can be also associated with a
 186 default value, with allowed values and with few other attributes. The
 187 names are important for feeding the service with the input data (the
 188 inputs are given to methods C<create_job>, C<run>, and/or C<wait_for>
 189 as name/value pairs).
 190
 191 Here is a (slightly shortened) example of an input specification:
 192
 193  $input_spec = [
 194           {
 195             'mandatory' => 'false',
 196             'type' => 'String',
 197             'name' => 'sequence_usa'
 198           },
 199           {
 200             'mandatory' => 'false',
 201             'type' => 'String',
 202             'name' => 'sequence_direct_data'
 203           },
 204           {
 205             'mandatory' => 'false',
 206             'allowed_values' => [
 207                                   'gcg',
 208                                   'gcg8',
 209                                   ...
 210                                   'raw'
 211                                 ],
 212             'type' => 'String',
 213             'name' => 'sformat'
 214           },
 215           {
 216             'mandatory' => 'false',
 217             'type' => 'String',
 218             'name' => 'sbegin'
 219           },
 220           {
 221             'mandatory' => 'false',
 222             'type' => 'String',
 223             'name' => 'send'
 224           },
 225           {
 226             'mandatory' => 'false',
 227             'type' => 'String',
 228             'name' => 'sprotein'
 229           },
 230           {
 231             'mandatory' => 'false',
 232             'type' => 'String',
 233             'name' => 'snucleotide'
 234           },
 235           {
 236             'mandatory' => 'false',
 237             'type' => 'String',
 238             'name' => 'sreverse'
 239           },
 240           {
 241             'mandatory' => 'false',
 242             'type' => 'String',
 243             'name' => 'slower'
 244           },
 245           {
 246             'mandatory' => 'false',
 247             'type' => 'String',
 248             'name' => 'supper'
 249           },
 250           {
 251             'mandatory' => 'false',
 252             'default' => 'false',
 253             'type' => 'String',
 254             'name' => 'firstonly'
 255           },
 256           {
 257             'mandatory' => 'false',
 258             'default' => 'fasta',
 259             'allowed_values' => [
 260                                   'gcg',
 261                                   'gcg8',
 262                                   'embl',
 263                                   ...
 264                                   'raw'
 265                                 ],
 266             'type' => 'String',
 267             'name' => 'osformat'
 268           }
 269         ];
 270
 271 =cut
 272
 273 sub input_spec { shift->throw_not_implemented(); }
 274
 275 # -----------------------------------------------------------------------------
 276
 277 =head2 result_spec
 278
 279  Usage   : $tool->result_spec;
 280  Returns : a hash reference with result names as keys
 281            and result types as values
 282  Args    : none
 283
 284 The analysis results are named and can be retrieved using their names
 285 by methods C<results> and C<result>.
 286
 287 Here is an example of the result specification (again for the service
 288 I<edit.seqret>):
 289
 290   $result_spec = {
 291           'outseq' => 'String',
 292           'report' => 'String',
 293           'detailed_status' => 'String'
 294         };
 295
 296 =cut
 297
 298 sub result_spec { shift->throw_not_implemented(); }
 299
 300 # -----------------------------------------------------------------------------
 301
 302 =head2 create_job
 303
 304  Usage   : $tool->create_job ( {'sequence'=>'tatat'} )
 305  Returns : Bio::Tools::Run::Analysis::Job
 306  Args    : data and parameters for this execution
 307            (in various formats)
 308
 309 Create an object representing a single execution of this analysis
 310 tool.
 311
 312 Call this method if you wish to "stage the scene" - to create a job
 313 with all input data but without actually running it. This method is
 314 called automatically from other methods (C<run> and C<wait_for>) so
 315 usually you do not need to call it directly.
 316
 317 The input data and prameters for this execution can be specified in
 318 various ways:
 319
 320 =over
 321
 322 =item array reference
 323
 324 The array has scalar elements of the form
 325
 326    name = [[@]value]
 327
 328 where C<name> is the name of an input data or input parameter (see
 329 method C<input_spec> for finding what names are recognized by this
 330 analysis) and C<value> is a value for this data/parameter. If C<value>
 331 is missing a 1 is assumed (which is convenient for the boolean
 332 options). If C<value> starts with C<@> it is treated as a local
 333 filename, and its contents is used as the data/parameter value.
 334
 335 =item hash reference
 336
 337 The same as with the array reference but now there is no need to use
 338 an equal sign. The hash keys are input names and hash values their
 339 data. The values can again start with a C<@> sign indicating a local
 340 filename.
 341
 342 =item scalar
 343
 344 In this case, the parameter represents a job ID obtained in some
 345 previous invocation - such job already exists on the server side, and
 346 we are just re-creating it here using the same job ID.
 347
 348 I<TBD: here we should allow the same by using a reference to the
 349 Bio::Tools::Run::Analysis::Job object.>
 350
 351 =item undef
 352
 353 Finally, if the parameter is undefined, ask server to create an empty
 354 job. The input data may be added later using C<set_data...>
 355 method(s) - see scripts/papplmaker.PLS for details.
 356
 357 =back
 358
 359 =cut
 360
 361 sub create_job { shift->throw_not_implemented(); }
 362
 363 # -----------------------------------------------------------------------------
 364
 365 =head2 run
 366
 367  Usage   : $tool->run ( ['sequence=@my.seq', 'osformat=embl'] )
 368  Returns : Bio::Tools::Run::Analysis::Job,
 369            representing started job (an execution)
 370  Args    : the same as for create_job
 371
 372 Create a job and start it, but do not wait for its completion.
 373
 374 =cut
 375
 376 sub run { shift->throw_not_implemented(); }
 377
 378 # -----------------------------------------------------------------------------
 379
 380 =head2 wait_for
 381
 382  Usage   : $tool->wait_for ( { 'sequence' => '@my,file' } )
 383  Returns : Bio::Tools::Run::Analysis::Job,
 384            representing finished job
 385  Args    : the same as for create_job
 386
 387 Create a job, start it and wait for its completion.
 388
 389 Note that this is a blocking method. It returns only after the
 390 executed job finishes, either normally or by an error.
 391
 392 Usually, after this call, you ask for results of the finished job:
 393
 394     $analysis->wait_for (...)->results;
 395
 396 =cut
 397
 398 sub wait_for { shift->throw_not_implemented(); }
 399
 400 # -----------------------------------------------------------------------------
 401 #
 402 #   Bio::AnalysisI::JobI
 403 #
 404 # -----------------------------------------------------------------------------
 405
 406 package Bio::AnalysisI::JobI;
 407
 408 =head1 Module Bio::AnalysisI::JobI
 409
 410 An interface to the public methods provided by C<Bio::Tools::Run::Analysis::Job>
 411 objects.
 412
 413 The C<Bio::Tools::Run::Analysis::Job> objects represent a created,
 414 running, or finished execution of an analysis tool.
 415
 416 The factory for these objects is module C<Bio::Tools::Run::Analysis>
 417 where the following methods return an
 418 C<Bio::Tools::Run::Analysis::Job> object:
 419
 420     create_job   (returning a prepared job)
 421     run          (returning a running job)
 422     wait_for     (returning a finished job)
 423
 424 =cut
 425
 426 use strict;
 427 use base qw(Bio::Root::RootI);
 428
 429 # -----------------------------------------------------------------------------
 430
 431 =head2 id
 432
 433  Usage   : $job->id;
 434  Returns : this job ID
 435  Args    : none
 436
 437 Each job (an execution) is identifiable by this unique ID which can be
 438 used later to re-create the same job (in other words: to re-connect to
 439 the same job). It is useful in cases when a job takes long time to
 440 finish and your client program does not want to wait for it within the
 441 same session.
 442
 443 =cut
 444
 445 sub id { shift->throw_not_implemented(); }
 446
 447 # -----------------------------------------------------------------------------
 448
 449 =head2 run
 450
 451  Usage   : $job->run
 452  Returns : itself
 453  Args    : none
 454
 455 It starts previously created job.  The job already must have all input
 456 data filled-in. This differs from the method of the same name of the
 457 C<Bio::Tools::Run::Analysis> object where the C<run> method creates
 458 also a new job allowing to set input data.
 459
 460 =cut
 461
 462 sub run { shift->throw_not_implemented(); }
 463
 464 # -----------------------------------------------------------------------------
 465
 466 =head2 wait_for
 467
 468  Usage   : $job->wait_for
 469  Returns : itself
 470  Args    : none
 471
 472 It waits until a previously started execution of this job finishes.
 473
 474 =cut
 475
 476 sub wait_for { shift->throw_not_implemented(); }
 477
 478 # -----------------------------------------------------------------------------
 479
 480 =head2 terminate
 481
 482  Usage   : $job->terminate
 483  Returns : itself
 484  Args    : none
 485
 486 Stop the currently running job (represented by this object). This is a
 487 definitive stop, there is no way to resume it later.
 488
 489 =cut
 490
 491 sub terminate { shift->throw_not_implemented(); }
 492
 493 # -----------------------------------------------------------------------------
 494
 495 =head2 last_event
 496
 497  Usage   : $job->last_event
 498  Returns : an XML string
 499  Args    : none
 500
 501 It returns a short XML document showing what happened last with this
 502 job. This is the used DTD:
 503
 504    <!-- place for extensions -->
 505    <!ENTITY % event_body_template "(state_changed | heartbeat_progress | percent_progress | time_progress | step_progress)">
 506
 507    <!ELEMENT analysis_event (message?, (%event_body_template;)?)>
 508
 509    <!ATTLIST analysis_event
 510        timestamp  CDATA #IMPLIED>
 511
 512    <!ELEMENT message (#PCDATA)>
 513
 514    <!ELEMENT state_changed EMPTY>
 515    <!ENTITY % analysis_state "created | running | completed | terminated_by_request | terminated_by_error">
 516    <!ATTLIST state_changed
 517        previous_state  (%analysis_state;) "created"
 518        new_state       (%analysis_state;) "created">
 519
 520    <!ELEMENT heartbeat_progress EMPTY>
 521
 522    <!ELEMENT percent_progress EMPTY>
 523    <!ATTLIST percent_progress
 524        percentage CDATA #REQUIRED>
 525
 526    <!ELEMENT time_progress EMPTY>
 527    <!ATTLIST time_progress
 528        remaining CDATA #REQUIRED>
 529
 530    <!ELEMENT step_progress EMPTY>
 531    <!ATTLIST step_progress
 532        total_steps      CDATA #IMPLIED
 533        steps_completed CDATA #REQUIRED>
 534
 535 Here is an example what is returned after a job was created and
 536 started, but before it finishes (note that the example uses an
 537 analysis 'showdb' which does not need any input data):
 538
 539    use Bio::Tools::Run::Analysis;
 540    print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
 541              ->run
 542              ->last_event;
 543
 544 It prints:
 545
 546    <?xml version = "1.0"?>
 547    <analysis_event>
 548      <message>Mar 3, 2003 5:14:46 PM (Europe/London)</message>
 549      <state_changed previous_state="created" new_state="running"/>
 550    </analysis_event>
 551
 552 The same example but now after it finishes:
 553
 554    use Bio::Tools::Run::Analysis;
 555    print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
 556              ->wait_for
 557              ->last_event;
 558
 559    <?xml version = "1.0"?>
 560    <analysis_event>
 561      <message>Mar 3, 2003 5:17:14 PM (Europe/London)</message>
 562      <state_changed previous_state="running" new_state="completed"/>
 563    </analysis_event>
 564
 565 =cut
 566
 567 sub last_event { shift->throw_not_implemented(); }
 568
 569 # -----------------------------------------------------------------------------
 570
 571 =head2 status
 572
 573  Usage   : $job->status
 574  Returns : string describing the job status
 575  Args    : none
 576
 577 It returns one of the following strings (and perhaps more if a server
 578 implementation extended possible job states):
 579
 580    CREATED
 581    RUNNING
 582    COMPLETED
 583    TERMINATED_BY_REQUEST
 584    TERMINATED_BY_ERROR
 585
 586 =cut
 587
 588 sub status { shift->throw_not_implemented(); }
 589
 590 # -----------------------------------------------------------------------------
 591
 592 =head2 created
 593
 594  Usage   : $job->created (1)
 595  Returns : time when this job was created
 596  Args    : optional
 597
 598 Without any argument it returns a time of creation of this job in
 599 seconds, counting from the beginning of the UNIX epoch
 600 (1.1.1970). With a true argument it returns a formatted time, using
 601 rules described in C<Bio::Tools::Run::Analysis::Utils::format_time>.
 602
 603 =cut
 604
 605 sub created { shift->throw_not_implemented(); }
 606
 607 # -----------------------------------------------------------------------------
 608
 609 =head2 started
 610
 611  Usage   : $job->started (1)
 612  Returns : time when this job was started
 613  Args    : optional
 614
 615 See C<created>.
 616
 617 =cut
 618
 619 sub started { shift->throw_not_implemented(); }
 620
 621 # -----------------------------------------------------------------------------
 622
 623 =head2 ended
 624
 625  Usage   : $job->ended (1)
 626  Returns : time when this job was terminated
 627  Args    : optional
 628
 629 See C<created>.
 630
 631 =cut
 632
 633 sub ended { shift->throw_not_implemented(); }
 634
 635 # -----------------------------------------------------------------------------
 636
 637 =head2 elapsed
 638
 639  Usage   : $job->elapsed
 640  Returns : elapsed time of the execution of the given job
 641            (in milliseconds), or 0 of job was not yet started
 642  Args    : none
 643
 644 Note that some server implementations cannot count in millisecond - so
 645 the returned time may be rounded to seconds.
 646
 647 =cut
 648
 649 sub elapsed { shift->throw_not_implemented(); }
 650
 651 # -----------------------------------------------------------------------------
 652
 653 =head2 times
 654
 655  Usage   : $job->times ('formatted')
 656  Returns : a hash refrence with all time characteristics
 657  Args    : optional
 658
 659 It is a convenient method returning a hash reference with the folowing
 660 keys:
 661
 662    created
 663    started
 664    ended
 665    elapsed
 666
 667 See C<create> for remarks on time formating.
 668
 669 An example - both for unformatted and formatted times:
 670
 671    use Data::Dumper;
 672    use Bio::Tools::Run::Analysis;
 673    my $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
 674              ->wait_for ( { 'sequence_usa' => 'embl:hsu52852' } )
 675              ->times (1);
 676    print Data::Dumper->Dump ( [$rh], ['Times']);
 677    $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
 678              ->wait_for ( { 'sequence_usa' => 'embl:AL499624' } )
 679              ->times;
 680    print Data::Dumper->Dump ( [$rh], ['Times']);
 681
 682    $Times = {
 683            'ended'   => 'Mon Mar  3 17:52:06 2003',
 684            'started' => 'Mon Mar  3 17:52:05 2003',
 685            'elapsed' => '1000',
 686            'created' => 'Mon Mar  3 17:52:05 2003'
 687          };
 688    $Times = {
 689            'ended'   => '1046713961',
 690            'started' => '1046713926',
 691            'elapsed' => '35000',
 692            'created' => '1046713926'
 693          };
 694
 695 =cut
 696
 697 sub times { shift->throw_not_implemented(); }
 698
 699 # -----------------------------------------------------------------------------
 700
 701 =head2 results
 702
 703  Usage   : $job->results (...)
 704  Returns : one or more results created by this job
 705  Args    : various, see belou
 706
 707 This is a complex method trying to make sense for all kinds of
 708 results. Especially it tries to help to put binary results (such as
 709 images) into local files. Generally it deals with fhe following facts:
 710
 711 =over
 712
 713 =item *
 714
 715 Each analysis tool may produce more results.
 716
 717 =item *
 718
 719 Some results may contain binary data not suitable for printing into a
 720 terminal window.
 721
 722 =item *
 723
 724 Some results may be split into variable number of parts (this is
 725 mainly true for the image results that can consist of more *.png
 726 files).
 727
 728 =back
 729
 730 Note also that results have names to distinguish if there are more of
 731 them. The names can be obtained by method C<result_spec>.
 732
 733 Here are the rules how the method works:
 734
 735     Retrieving NAMED results:
 736     -------------------------
 737      results ('name1', ...)   => return results as they are, no storing into files
 738
 739      results ( { 'name1' => 'filename', ... } )  => store into 'filename', return 'filename'
 740      results ( 'name1=filename', ...)            => ditto
 741
 742      results ( { 'name1' => '-', ... } )         => send result to the STDOUT, do not return anything
 743      results ( 'name1=-', ...)                   => ditto
 744
 745      results ( { 'name1' => '@', ... } )  => store into file whose name is invented by
 746                                              this method, perhaps using RESULT_NAME_TEMPLATE env
 747      results ( 'name1=@', ...)            => ditto
 748
 749      results ( { 'name1' => '?', ... } )  => find of what type is this result and then use
 750                                              {'name1'=>'@' for binary files, and a regular
 751                                              return for non-binary files
 752      results ( 'name=?', ...)             => ditto
 753
 754     Retrieving ALL results:
 755     -----------------------
 756      results()     => return all results as they are, no storing into files
 757
 758      results ('@') => return all results, as if each of them given
 759                       as {'name' => '@'} (see above)
 760
 761      results ('?') => return all results, as if each of them given
 762                       as {'name' => '?'} (see above)
 763
 764     Misc:
 765     -----
 766      * any result can be returned as a scalar value, or as an array reference
 767        (the latter is used for results consisting of more parts, such images);
 768        this applies regardless whether the returned result is the result itself
 769        or a filename created for the result
 770
 771      * look in the documentation of the C<panalysis[.PLS]> script for examples
 772        (especially how to use various templates for inventing file names)
 773
 774 =cut
 775
 776 sub results { shift->throw_not_implemented(); }
 777
 778 # -----------------------------------------------------------------------------
 779
 780 =head2 result
 781
 782  Usage   : $job->result (...)
 783  Returns : the first result
 784  Args    : see 'results'
 785
 786 =cut
 787
 788 sub result { shift->throw_not_implemented(); }
 789
 790 # -----------------------------------------------------------------------------
 791
 792 =head2 remove
 793
 794  Usage   : $job->remove
 795  Returns : 1
 796  Args    : none
 797
 798 The job object is not actually removed in this time but it is marked
 799 (setting 1 to C<_destroy_on_exit> attribute) as ready for deletion when
 800 the client program ends (including a request to server to forget the job
 801 mirror object on the server side).
 802
 803 =cut
 804
 805 sub remove { shift->throw_not_implemented(); }
 806
 807 # -----------------------------------------------------------------------------
 808
 809 1;
 810 __END__
 811