small update
[bioperl-live.git] / Bio / AnalysisI.pm
blobf1c09c6ebdb259ef79427ec91e8032be6067b00f
1 # $Id$
3 # BioPerl module for Bio::AnalysisI
5 # Cared for by Martin Senger <martin.senger@gmail.com>
6 # For copyright and disclaimer see below.
9 # POD documentation - main docs before the code
11 =head1 NAME
13 Bio::AnalysisI - An interface to any (local or remote) analysis tool
15 =head1 SYNOPSIS
17 This is an interface module - you do not instantiate it.
18 Use C<Bio::Tools::Run::Analysis> module:
20 use Bio::Tools::Run::Analysis;
21 my $tool = Bio::Tools::Run::Analysis->new(@args);
23 =head1 DESCRIPTION
25 This interface contains all public methods for accessing and
26 controlling local and remote analysis tools. It is meant to be used on
27 the client side.
29 =head1 FEEDBACK
31 =head2 Mailing Lists
33 User feedback is an integral part of the evolution of this and other
34 Bioperl modules. Send your comments and suggestions preferably to
35 the Bioperl mailing list. Your participation is much appreciated.
37 bioperl-l@bioperl.org - General discussion
38 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
40 =head2 Reporting Bugs
42 Report bugs to the Bioperl bug tracking system to help us keep track
43 of the bugs and their resolution. Bug reports can be submitted via the
44 web:
46 http://bugzilla.open-bio.org/
48 =head1 AUTHOR
50 Martin Senger (martin.senger@gmail.com)
52 =head1 COPYRIGHT
54 Copyright (c) 2003, Martin Senger and EMBL-EBI.
55 All Rights Reserved.
57 This module is free software; you can redistribute it and/or modify
58 it under the same terms as Perl itself.
60 =head1 DISCLAIMER
62 This software is provided "as is" without warranty of any kind.
64 =head1 SEE ALSO
66 L<http://www.ebi.ac.uk/soaplab/Perl_Client.html>.
68 =head1 APPENDIX
70 This is actually the main documentation...
72 If you try to call any of these methods directly on this
73 C<Bio::AnalysisI> object you will get a I<not implemented> error
74 message. You need to call them on a C<Bio::Tools::Run::Analysis> object instead.
76 =cut
79 # Let the code begin...
81 package Bio::AnalysisI;
82 use strict;
84 use base qw(Bio::Root::RootI);
86 # -----------------------------------------------------------------------------
88 =head2 analysis_name
90 Usage : $tool->analysis_name;
91 Returns : a name of this analysis
92 Args : none
94 =cut
96 sub analysis_name { shift->throw_not_implemented(); }
98 # -----------------------------------------------------------------------------
100 =head2 analysis_spec
102 Usage : $tool->analysis_spec;
103 Returns : a hash reference describing this analysis
104 Args : none
106 The returned hash reference uses the following keys (not all of them always
107 present, perhaps others present as well): C<name>, C<type>, C<version>,
108 C<supplier>, C<installation>, C<description>.
110 Here is an example output:
112 Analysis 'edit.seqret':
113 installation => EMBL-EBI
114 description => Reads and writes (returns) sequences
115 supplier => EMBOSS
116 version => 2.6.0
117 type => edit
118 name => seqret
120 =cut
122 sub analysis_spec { shift->throw_not_implemented(); }
124 # -----------------------------------------------------------------------------
126 =head2 describe
128 Usage : $tool->analysis_spec;
129 Returns : an XML detailed description of this analysis
130 Args : none
132 The returned XML string contains metadata describing this analysis
133 service. It includes also metadata returned (and easier used) by
134 method C<analysis_spec>, C<input_spec> and C<result_spec>.
136 The DTD used for returned metadata is based on the adopted standard
137 (BSA specification for analysis engine):
139 <!ELEMENT DsLSRAnalysis (analysis)+>
141 <!ELEMENT analysis (description?, input*, output*, extension?)>
143 <!ATTLIST analysis
144 type CDATA #REQUIRED
145 name CDATA #IMPLIED
146 version CDATA #IMPLIED
147 supplier CDATA #IMPLIED
148 installation CDATA #IMPLIED>
150 <!ELEMENT description ANY>
151 <!ELEMENT extension ANY>
153 <!ELEMENT input (default?, allowed*, extension?)>
155 <!ATTLIST input
156 type CDATA #REQUIRED
157 name CDATA #REQUIRED
158 mandatory (true|false) "false">
160 <!ELEMENT default (#PCDATA)>
161 <!ELEMENT allowed (#PCDATA)>
163 <!ELEMENT output (extension?)>
165 <!ATTLIST output
166 type CDATA #REQUIRED
167 name CDATA #REQUIRED>
169 But the DTD may be extended by provider-specific metadata. For
170 example, the EBI experimental SOAP-based service on top of EMBOSS uses
171 DTD explained at C<http://www.ebi.ac.uk/~senger/applab>.
173 =cut
175 sub describe { shift->throw_not_implemented(); }
177 # -----------------------------------------------------------------------------
179 =head2 input_spec
181 Usage : $tool->input_spec;
182 Returns : an array reference with hashes as elements
183 Args : none
185 The analysis input data are named, and can be also associated with a
186 default value, with allowed values and with few other attributes. The
187 names are important for feeding the service with the input data (the
188 inputs are given to methods C<create_job>, C<run>, and/or C<wait_for>
189 as name/value pairs).
191 Here is a (slightly shortened) example of an input specification:
193 $input_spec = [
195 'mandatory' => 'false',
196 'type' => 'String',
197 'name' => 'sequence_usa'
200 'mandatory' => 'false',
201 'type' => 'String',
202 'name' => 'sequence_direct_data'
205 'mandatory' => 'false',
206 'allowed_values' => [
207 'gcg',
208 'gcg8',
210 'raw'
212 'type' => 'String',
213 'name' => 'sformat'
216 'mandatory' => 'false',
217 'type' => 'String',
218 'name' => 'sbegin'
221 'mandatory' => 'false',
222 'type' => 'String',
223 'name' => 'send'
226 'mandatory' => 'false',
227 'type' => 'String',
228 'name' => 'sprotein'
231 'mandatory' => 'false',
232 'type' => 'String',
233 'name' => 'snucleotide'
236 'mandatory' => 'false',
237 'type' => 'String',
238 'name' => 'sreverse'
241 'mandatory' => 'false',
242 'type' => 'String',
243 'name' => 'slower'
246 'mandatory' => 'false',
247 'type' => 'String',
248 'name' => 'supper'
251 'mandatory' => 'false',
252 'default' => 'false',
253 'type' => 'String',
254 'name' => 'firstonly'
257 'mandatory' => 'false',
258 'default' => 'fasta',
259 'allowed_values' => [
260 'gcg',
261 'gcg8',
262 'embl',
264 'raw'
266 'type' => 'String',
267 'name' => 'osformat'
271 =cut
273 sub input_spec { shift->throw_not_implemented(); }
275 # -----------------------------------------------------------------------------
277 =head2 result_spec
279 Usage : $tool->result_spec;
280 Returns : a hash reference with result names as keys
281 and result types as values
282 Args : none
284 The analysis results are named and can be retrieved using their names
285 by methods C<results> and C<result>.
287 Here is an example of the result specification (again for the service
288 I<edit.seqret>):
290 $result_spec = {
291 'outseq' => 'String',
292 'report' => 'String',
293 'detailed_status' => 'String'
296 =cut
298 sub result_spec { shift->throw_not_implemented(); }
300 # -----------------------------------------------------------------------------
302 =head2 create_job
304 Usage : $tool->create_job ( {'sequence'=>'tatat'} )
305 Returns : Bio::Tools::Run::Analysis::Job
306 Args : data and parameters for this execution
307 (in various formats)
309 Create an object representing a single execution of this analysis
310 tool.
312 Call this method if you wish to "stage the scene" - to create a job
313 with all input data but without actually running it. This method is
314 called automatically from other methods (C<run> and C<wait_for>) so
315 usually you do not need to call it directly.
317 The input data and prameters for this execution can be specified in
318 various ways:
320 =over
322 =item array reference
324 The array has scalar elements of the form
326 name = [[@]value]
328 where C<name> is the name of an input data or input parameter (see
329 method C<input_spec> for finding what names are recognized by this
330 analysis) and C<value> is a value for this data/parameter. If C<value>
331 is missing a 1 is assumed (which is convenient for the boolean
332 options). If C<value> starts with C<@> it is treated as a local
333 filename, and its contents is used as the data/parameter value.
335 =item hash reference
337 The same as with the array reference but now there is no need to use
338 an equal sign. The hash keys are input names and hash values their
339 data. The values can again start with a C<@> sign indicating a local
340 filename.
342 =item scalar
344 In this case, the parameter represents a job ID obtained in some
345 previous invocation - such job already exists on the server side, and
346 we are just re-creating it here using the same job ID.
348 I<TBD: here we should allow the same by using a reference to the
349 Bio::Tools::Run::Analysis::Job object.>
351 =item undef
353 Finally, if the parameter is undefined, ask server to create an empty
354 job. The input data may be added later using C<set_data...>
355 method(s) - see scripts/papplmaker.PLS for details.
357 =back
359 =cut
361 sub create_job { shift->throw_not_implemented(); }
363 # -----------------------------------------------------------------------------
365 =head2 run
367 Usage : $tool->run ( ['sequence=@my.seq', 'osformat=embl'] )
368 Returns : Bio::Tools::Run::Analysis::Job,
369 representing started job (an execution)
370 Args : the same as for create_job
372 Create a job and start it, but do not wait for its completion.
374 =cut
376 sub run { shift->throw_not_implemented(); }
378 # -----------------------------------------------------------------------------
380 =head2 wait_for
382 Usage : $tool->wait_for ( { 'sequence' => '@my,file' } )
383 Returns : Bio::Tools::Run::Analysis::Job,
384 representing finished job
385 Args : the same as for create_job
387 Create a job, start it and wait for its completion.
389 Note that this is a blocking method. It returns only after the
390 executed job finishes, either normally or by an error.
392 Usually, after this call, you ask for results of the finished job:
394 $analysis->wait_for (...)->results;
396 =cut
398 sub wait_for { shift->throw_not_implemented(); }
400 # -----------------------------------------------------------------------------
402 # Bio::AnalysisI::JobI
404 # -----------------------------------------------------------------------------
406 package Bio::AnalysisI::JobI;
408 =head1 Module Bio::AnalysisI::JobI
410 An interface to the public methods provided by C<Bio::Tools::Run::Analysis::Job>
411 objects.
413 The C<Bio::Tools::Run::Analysis::Job> objects represent a created,
414 running, or finished execution of an analysis tool.
416 The factory for these objects is module C<Bio::Tools::Run::Analysis>
417 where the following methods return an
418 C<Bio::Tools::Run::Analysis::Job> object:
420 create_job (returning a prepared job)
421 run (returning a running job)
422 wait_for (returning a finished job)
424 =cut
426 use strict;
427 use base qw(Bio::Root::RootI);
429 # -----------------------------------------------------------------------------
431 =head2 id
433 Usage : $job->id;
434 Returns : this job ID
435 Args : none
437 Each job (an execution) is identifiable by this unique ID which can be
438 used later to re-create the same job (in other words: to re-connect to
439 the same job). It is useful in cases when a job takes long time to
440 finish and your client program does not want to wait for it within the
441 same session.
443 =cut
445 sub id { shift->throw_not_implemented(); }
447 # -----------------------------------------------------------------------------
449 =head2 run
451 Usage : $job->run
452 Returns : itself
453 Args : none
455 It starts previously created job. The job already must have all input
456 data filled-in. This differs from the method of the same name of the
457 C<Bio::Tools::Run::Analysis> object where the C<run> method creates
458 also a new job allowing to set input data.
460 =cut
462 sub run { shift->throw_not_implemented(); }
464 # -----------------------------------------------------------------------------
466 =head2 wait_for
468 Usage : $job->wait_for
469 Returns : itself
470 Args : none
472 It waits until a previously started execution of this job finishes.
474 =cut
476 sub wait_for { shift->throw_not_implemented(); }
478 # -----------------------------------------------------------------------------
480 =head2 terminate
482 Usage : $job->terminate
483 Returns : itself
484 Args : none
486 Stop the currently running job (represented by this object). This is a
487 definitive stop, there is no way to resume it later.
489 =cut
491 sub terminate { shift->throw_not_implemented(); }
493 # -----------------------------------------------------------------------------
495 =head2 last_event
497 Usage : $job->last_event
498 Returns : an XML string
499 Args : none
501 It returns a short XML document showing what happened last with this
502 job. This is the used DTD:
504 <!-- place for extensions -->
505 <!ENTITY % event_body_template "(state_changed | heartbeat_progress | percent_progress | time_progress | step_progress)">
507 <!ELEMENT analysis_event (message?, (%event_body_template;)?)>
509 <!ATTLIST analysis_event
510 timestamp CDATA #IMPLIED>
512 <!ELEMENT message (#PCDATA)>
514 <!ELEMENT state_changed EMPTY>
515 <!ENTITY % analysis_state "created | running | completed | terminated_by_request | terminated_by_error">
516 <!ATTLIST state_changed
517 previous_state (%analysis_state;) "created"
518 new_state (%analysis_state;) "created">
520 <!ELEMENT heartbeat_progress EMPTY>
522 <!ELEMENT percent_progress EMPTY>
523 <!ATTLIST percent_progress
524 percentage CDATA #REQUIRED>
526 <!ELEMENT time_progress EMPTY>
527 <!ATTLIST time_progress
528 remaining CDATA #REQUIRED>
530 <!ELEMENT step_progress EMPTY>
531 <!ATTLIST step_progress
532 total_steps CDATA #IMPLIED
533 steps_completed CDATA #REQUIRED>
535 Here is an example what is returned after a job was created and
536 started, but before it finishes (note that the example uses an
537 analysis 'showdb' which does not need any input data):
539 use Bio::Tools::Run::Analysis;
540 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
541 ->run
542 ->last_event;
544 It prints:
546 <?xml version = "1.0"?>
547 <analysis_event>
548 <message>Mar 3, 2003 5:14:46 PM (Europe/London)</message>
549 <state_changed previous_state="created" new_state="running"/>
550 </analysis_event>
552 The same example but now after it finishes:
554 use Bio::Tools::Run::Analysis;
555 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
556 ->wait_for
557 ->last_event;
559 <?xml version = "1.0"?>
560 <analysis_event>
561 <message>Mar 3, 2003 5:17:14 PM (Europe/London)</message>
562 <state_changed previous_state="running" new_state="completed"/>
563 </analysis_event>
565 =cut
567 sub last_event { shift->throw_not_implemented(); }
569 # -----------------------------------------------------------------------------
571 =head2 status
573 Usage : $job->status
574 Returns : string describing the job status
575 Args : none
577 It returns one of the following strings (and perhaps more if a server
578 implementation extended possible job states):
580 CREATED
581 RUNNING
582 COMPLETED
583 TERMINATED_BY_REQUEST
584 TERMINATED_BY_ERROR
586 =cut
588 sub status { shift->throw_not_implemented(); }
590 # -----------------------------------------------------------------------------
592 =head2 created
594 Usage : $job->created (1)
595 Returns : time when this job was created
596 Args : optional
598 Without any argument it returns a time of creation of this job in
599 seconds, counting from the beginning of the UNIX epoch
600 (1.1.1970). With a true argument it returns a formatted time, using
601 rules described in C<Bio::Tools::Run::Analysis::Utils::format_time>.
603 =cut
605 sub created { shift->throw_not_implemented(); }
607 # -----------------------------------------------------------------------------
609 =head2 started
611 Usage : $job->started (1)
612 Returns : time when this job was started
613 Args : optional
615 See C<created>.
617 =cut
619 sub started { shift->throw_not_implemented(); }
621 # -----------------------------------------------------------------------------
623 =head2 ended
625 Usage : $job->ended (1)
626 Returns : time when this job was terminated
627 Args : optional
629 See C<created>.
631 =cut
633 sub ended { shift->throw_not_implemented(); }
635 # -----------------------------------------------------------------------------
637 =head2 elapsed
639 Usage : $job->elapsed
640 Returns : elapsed time of the execution of the given job
641 (in milliseconds), or 0 of job was not yet started
642 Args : none
644 Note that some server implementations cannot count in millisecond - so
645 the returned time may be rounded to seconds.
647 =cut
649 sub elapsed { shift->throw_not_implemented(); }
651 # -----------------------------------------------------------------------------
653 =head2 times
655 Usage : $job->times ('formatted')
656 Returns : a hash refrence with all time characteristics
657 Args : optional
659 It is a convenient method returning a hash reference with the folowing
660 keys:
662 created
663 started
664 ended
665 elapsed
667 See C<create> for remarks on time formating.
669 An example - both for unformatted and formatted times:
671 use Data::Dumper;
672 use Bio::Tools::Run::Analysis;
673 my $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
674 ->wait_for ( { 'sequence_usa' => 'embl:hsu52852' } )
675 ->times (1);
676 print Data::Dumper->Dump ( [$rh], ['Times']);
677 $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
678 ->wait_for ( { 'sequence_usa' => 'embl:AL499624' } )
679 ->times;
680 print Data::Dumper->Dump ( [$rh], ['Times']);
682 $Times = {
683 'ended' => 'Mon Mar 3 17:52:06 2003',
684 'started' => 'Mon Mar 3 17:52:05 2003',
685 'elapsed' => '1000',
686 'created' => 'Mon Mar 3 17:52:05 2003'
688 $Times = {
689 'ended' => '1046713961',
690 'started' => '1046713926',
691 'elapsed' => '35000',
692 'created' => '1046713926'
695 =cut
697 sub times { shift->throw_not_implemented(); }
699 # -----------------------------------------------------------------------------
701 =head2 results
703 Usage : $job->results (...)
704 Returns : one or more results created by this job
705 Args : various, see belou
707 This is a complex method trying to make sense for all kinds of
708 results. Especially it tries to help to put binary results (such as
709 images) into local files. Generally it deals with fhe following facts:
711 =over
713 =item *
715 Each analysis tool may produce more results.
717 =item *
719 Some results may contain binary data not suitable for printing into a
720 terminal window.
722 =item *
724 Some results may be split into variable number of parts (this is
725 mainly true for the image results that can consist of more *.png
726 files).
728 =back
730 Note also that results have names to distinguish if there are more of
731 them. The names can be obtained by method C<result_spec>.
733 Here are the rules how the method works:
735 Retrieving NAMED results:
736 -------------------------
737 results ('name1', ...) => return results as they are, no storing into files
739 results ( { 'name1' => 'filename', ... } ) => store into 'filename', return 'filename'
740 results ( 'name1=filename', ...) => ditto
742 results ( { 'name1' => '-', ... } ) => send result to the STDOUT, do not return anything
743 results ( 'name1=-', ...) => ditto
745 results ( { 'name1' => '@', ... } ) => store into file whose name is invented by
746 this method, perhaps using RESULT_NAME_TEMPLATE env
747 results ( 'name1=@', ...) => ditto
749 results ( { 'name1' => '?', ... } ) => find of what type is this result and then use
750 {'name1'=>'@' for binary files, and a regular
751 return for non-binary files
752 results ( 'name=?', ...) => ditto
754 Retrieving ALL results:
755 -----------------------
756 results() => return all results as they are, no storing into files
758 results ('@') => return all results, as if each of them given
759 as {'name' => '@'} (see above)
761 results ('?') => return all results, as if each of them given
762 as {'name' => '?'} (see above)
764 Misc:
765 -----
766 * any result can be returned as a scalar value, or as an array reference
767 (the latter is used for results consisting of more parts, such images);
768 this applies regardless whether the returned result is the result itself
769 or a filename created for the result
771 * look in the documentation of the C<panalysis[.PLS]> script for examples
772 (especially how to use various templates for inventing file names)
774 =cut
776 sub results { shift->throw_not_implemented(); }
778 # -----------------------------------------------------------------------------
780 =head2 result
782 Usage : $job->result (...)
783 Returns : the first result
784 Args : see 'results'
786 =cut
788 sub result { shift->throw_not_implemented(); }
790 # -----------------------------------------------------------------------------
792 =head2 remove
794 Usage : $job->remove
795 Returns : 1
796 Args : none
798 The job object is not actually removed in this time but it is marked
799 (setting 1 to C<_destroy_on_exit> attribute) as ready for deletion when
800 the client program ends (including a request to server to forget the job
801 mirror object on the server side).
803 =cut
805 sub remove { shift->throw_not_implemented(); }
807 # -----------------------------------------------------------------------------
810 __END__