Merge branch 'master' of git@github.com:bioperl/bioperl-live
[bioperl-live.git] / Bio / AnalysisI.pm
blob8c10ea5b1bedd427ea88bd9ed1b23f0bfcb61bbd
1 # $Id$
3 # BioPerl module for Bio::AnalysisI
5 # Please direct questions and support issues to <bioperl-l@bioperl.org>
7 # Cared for by Martin Senger <martin.senger@gmail.com>
8 # For copyright and disclaimer see below.
11 # POD documentation - main docs before the code
13 =head1 NAME
15 Bio::AnalysisI - An interface to any (local or remote) analysis tool
17 =head1 SYNOPSIS
19 This is an interface module - you do not instantiate it.
20 Use C<Bio::Tools::Run::Analysis> module:
22 use Bio::Tools::Run::Analysis;
23 my $tool = Bio::Tools::Run::Analysis->new(@args);
25 =head1 DESCRIPTION
27 This interface contains all public methods for accessing and
28 controlling local and remote analysis tools. It is meant to be used on
29 the client side.
31 =head1 FEEDBACK
33 =head2 Mailing Lists
35 User feedback is an integral part of the evolution of this and other
36 Bioperl modules. Send your comments and suggestions preferably to
37 the Bioperl mailing list. Your participation is much appreciated.
39 bioperl-l@bioperl.org - General discussion
40 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
42 =head2 Support
44 Please direct usage questions or support issues to the mailing list:
46 I<bioperl-l@bioperl.org>
48 rather than to the module maintainer directly. Many experienced and
49 reponsive experts will be able look at the problem and quickly
50 address it. Please include a thorough description of the problem
51 with code and data examples if at all possible.
53 =head2 Reporting Bugs
55 Report bugs to the Bioperl bug tracking system to help us keep track
56 of the bugs and their resolution. Bug reports can be submitted via the
57 web:
59 http://bugzilla.open-bio.org/
61 =head1 AUTHOR
63 Martin Senger (martin.senger@gmail.com)
65 =head1 COPYRIGHT
67 Copyright (c) 2003, Martin Senger and EMBL-EBI.
68 All Rights Reserved.
70 This module is free software; you can redistribute it and/or modify
71 it under the same terms as Perl itself.
73 =head1 DISCLAIMER
75 This software is provided "as is" without warranty of any kind.
77 =head1 SEE ALSO
79 http://www.ebi.ac.uk/Tools/webservices/soaplab/guide
81 =head1 APPENDIX
83 This is actually the main documentation...
85 If you try to call any of these methods directly on this
86 C<Bio::AnalysisI> object you will get a I<not implemented> error
87 message. You need to call them on a C<Bio::Tools::Run::Analysis> object instead.
89 =cut
92 # Let the code begin...
94 package Bio::AnalysisI;
95 use strict;
97 use base qw(Bio::Root::RootI);
99 # -----------------------------------------------------------------------------
101 =head2 analysis_name
103 Usage : $tool->analysis_name;
104 Returns : a name of this analysis
105 Args : none
107 =cut
109 sub analysis_name { shift->throw_not_implemented(); }
111 # -----------------------------------------------------------------------------
113 =head2 analysis_spec
115 Usage : $tool->analysis_spec;
116 Returns : a hash reference describing this analysis
117 Args : none
119 The returned hash reference uses the following keys (not all of them always
120 present, perhaps others present as well): C<name>, C<type>, C<version>,
121 C<supplier>, C<installation>, C<description>.
123 Here is an example output:
125 Analysis 'edit.seqret':
126 installation => EMBL-EBI
127 description => Reads and writes (returns) sequences
128 supplier => EMBOSS
129 version => 2.6.0
130 type => edit
131 name => seqret
133 =cut
135 sub analysis_spec { shift->throw_not_implemented(); }
137 # -----------------------------------------------------------------------------
139 =head2 describe
141 Usage : $tool->analysis_spec;
142 Returns : an XML detailed description of this analysis
143 Args : none
145 The returned XML string contains metadata describing this analysis
146 service. It includes also metadata returned (and easier used) by
147 method C<analysis_spec>, C<input_spec> and C<result_spec>.
149 The DTD used for returned metadata is based on the adopted standard
150 (BSA specification for analysis engine):
152 <!ELEMENT DsLSRAnalysis (analysis)+>
154 <!ELEMENT analysis (description?, input*, output*, extension?)>
156 <!ATTLIST analysis
157 type CDATA #REQUIRED
158 name CDATA #IMPLIED
159 version CDATA #IMPLIED
160 supplier CDATA #IMPLIED
161 installation CDATA #IMPLIED>
163 <!ELEMENT description ANY>
164 <!ELEMENT extension ANY>
166 <!ELEMENT input (default?, allowed*, extension?)>
168 <!ATTLIST input
169 type CDATA #REQUIRED
170 name CDATA #REQUIRED
171 mandatory (true|false) "false">
173 <!ELEMENT default (#PCDATA)>
174 <!ELEMENT allowed (#PCDATA)>
176 <!ELEMENT output (extension?)>
178 <!ATTLIST output
179 type CDATA #REQUIRED
180 name CDATA #REQUIRED>
182 But the DTD may be extended by provider-specific metadata. For
183 example, the EBI experimental SOAP-based service on top of EMBOSS uses
184 DTD explained at C<http://www.ebi.ac.uk/~senger/applab>.
186 =cut
188 sub describe { shift->throw_not_implemented(); }
190 # -----------------------------------------------------------------------------
192 =head2 input_spec
194 Usage : $tool->input_spec;
195 Returns : an array reference with hashes as elements
196 Args : none
198 The analysis input data are named, and can be also associated with a
199 default value, with allowed values and with few other attributes. The
200 names are important for feeding the service with the input data (the
201 inputs are given to methods C<create_job>, C<Bio::AnalysisI|run>, and/or
202 C<Bio::AnalysisI|wait_for> as name/value pairs).
204 Here is a (slightly shortened) example of an input specification:
206 $input_spec = [
208 'mandatory' => 'false',
209 'type' => 'String',
210 'name' => 'sequence_usa'
213 'mandatory' => 'false',
214 'type' => 'String',
215 'name' => 'sequence_direct_data'
218 'mandatory' => 'false',
219 'allowed_values' => [
220 'gcg',
221 'gcg8',
223 'raw'
225 'type' => 'String',
226 'name' => 'sformat'
229 'mandatory' => 'false',
230 'type' => 'String',
231 'name' => 'sbegin'
234 'mandatory' => 'false',
235 'type' => 'String',
236 'name' => 'send'
239 'mandatory' => 'false',
240 'type' => 'String',
241 'name' => 'sprotein'
244 'mandatory' => 'false',
245 'type' => 'String',
246 'name' => 'snucleotide'
249 'mandatory' => 'false',
250 'type' => 'String',
251 'name' => 'sreverse'
254 'mandatory' => 'false',
255 'type' => 'String',
256 'name' => 'slower'
259 'mandatory' => 'false',
260 'type' => 'String',
261 'name' => 'supper'
264 'mandatory' => 'false',
265 'default' => 'false',
266 'type' => 'String',
267 'name' => 'firstonly'
270 'mandatory' => 'false',
271 'default' => 'fasta',
272 'allowed_values' => [
273 'gcg',
274 'gcg8',
275 'embl',
277 'raw'
279 'type' => 'String',
280 'name' => 'osformat'
284 =cut
286 sub input_spec { shift->throw_not_implemented(); }
288 # -----------------------------------------------------------------------------
290 =head2 result_spec
292 Usage : $tool->result_spec;
293 Returns : a hash reference with result names as keys
294 and result types as values
295 Args : none
297 The analysis results are named and can be retrieved using their names
298 by methods C<results> and C<result>.
300 Here is an example of the result specification (again for the service
301 I<edit.seqret>):
303 $result_spec = {
304 'outseq' => 'String',
305 'report' => 'String',
306 'detailed_status' => 'String'
309 =cut
311 sub result_spec { shift->throw_not_implemented(); }
313 # -----------------------------------------------------------------------------
315 =head2 create_job
317 Usage : $tool->create_job ( {'sequence'=>'tatat'} )
318 Returns : Bio::Tools::Run::Analysis::Job
319 Args : data and parameters for this execution
320 (in various formats)
322 Create an object representing a single execution of this analysis
323 tool.
325 Call this method if you wish to "stage the scene" - to create a job
326 with all input data but without actually running it. This method is
327 called automatically from other methods (C<Bio::AnalysisI|run> and
328 C<Bio::AnalysisI|wait_for>) so usually you do not need to call it directly.
330 The input data and prameters for this execution can be specified in
331 various ways:
333 =over
335 =item array reference
337 The array has scalar elements of the form
339 name = [[@]value]
341 where C<name> is the name of an input data or input parameter (see
342 method C<input_spec> for finding what names are recognized by this
343 analysis) and C<value> is a value for this data/parameter. If C<value>
344 is missing a 1 is assumed (which is convenient for the boolean
345 options). If C<value> starts with C<@> it is treated as a local
346 filename, and its contents is used as the data/parameter value.
348 =item hash reference
350 The same as with the array reference but now there is no need to use
351 an equal sign. The hash keys are input names and hash values their
352 data. The values can again start with a C<@> sign indicating a local
353 filename.
355 =item scalar
357 In this case, the parameter represents a job ID obtained in some
358 previous invocation - such job already exists on the server side, and
359 we are just re-creating it here using the same job ID.
361 I<TBD: here we should allow the same by using a reference to the
362 Bio::Tools::Run::Analysis::Job object.>
364 =item undef
366 Finally, if the parameter is undefined, ask server to create an empty
367 job. The input data may be added later using C<set_data...>
368 method(s) - see scripts/papplmaker.PLS for details.
370 =back
372 =cut
374 sub create_job { shift->throw_not_implemented(); }
376 # -----------------------------------------------------------------------------
378 =head2 run
380 Usage : $tool->run ( ['sequence=@my.seq', 'osformat=embl'] )
381 Returns : Bio::Tools::Run::Analysis::Job,
382 representing started job (an execution)
383 Args : the same as for create_job
385 Create a job and start it, but do not wait for its completion.
387 =cut
389 sub run { shift->throw_not_implemented(); }
391 # -----------------------------------------------------------------------------
393 =head2 wait_for
395 Usage : $tool->wait_for ( { 'sequence' => '@my,file' } )
396 Returns : Bio::Tools::Run::Analysis::Job,
397 representing finished job
398 Args : the same as for create_job
400 Create a job, start it and wait for its completion.
402 Note that this is a blocking method. It returns only after the
403 executed job finishes, either normally or by an error.
405 Usually, after this call, you ask for results of the finished job:
407 $analysis->wait_for (...)->results;
409 =cut
411 sub wait_for { shift->throw_not_implemented(); }
413 # -----------------------------------------------------------------------------
415 # Bio::AnalysisI::JobI
417 # -----------------------------------------------------------------------------
419 package Bio::AnalysisI::JobI;
421 =head1 Module Bio::AnalysisI::JobI
423 An interface to the public methods provided by C<Bio::Tools::Run::Analysis::Job>
424 objects.
426 The C<Bio::Tools::Run::Analysis::Job> objects represent a created,
427 running, or finished execution of an analysis tool.
429 The factory for these objects is module C<Bio::Tools::Run::Analysis>
430 where the following methods return an
431 C<Bio::Tools::Run::Analysis::Job> object:
433 create_job (returning a prepared job)
434 run (returning a running job)
435 wait_for (returning a finished job)
437 =cut
439 use strict;
440 use base qw(Bio::Root::RootI);
442 # -----------------------------------------------------------------------------
444 =head2 id
446 Usage : $job->id;
447 Returns : this job ID
448 Args : none
450 Each job (an execution) is identifiable by this unique ID which can be
451 used later to re-create the same job (in other words: to re-connect to
452 the same job). It is useful in cases when a job takes long time to
453 finish and your client program does not want to wait for it within the
454 same session.
456 =cut
458 sub id { shift->throw_not_implemented(); }
460 # -----------------------------------------------------------------------------
462 =head2 Bio::AnalysisI::JobI::run
464 Usage : $job->run
465 Returns : itself
466 Args : none
468 It starts previously created job. The job already must have all input
469 data filled-in. This differs from the method of the same name of the
470 C<Bio::Tools::Run::Analysis> object where the C<Bio::AnalysisI::JobI::run> method
471 creates also a new job allowing to set input data.
473 =cut
475 sub run { shift->throw_not_implemented(); }
477 # -----------------------------------------------------------------------------
479 =head2 Bio::AnalysisI::JobI::wait_for
481 Usage : $job->wait_for
482 Returns : itself
483 Args : none
485 It waits until a previously started execution of this job finishes.
487 =cut
489 sub wait_for { shift->throw_not_implemented(); }
491 # -----------------------------------------------------------------------------
493 =head2 terminate
495 Usage : $job->terminate
496 Returns : itself
497 Args : none
499 Stop the currently running job (represented by this object). This is a
500 definitive stop, there is no way to resume it later.
502 =cut
504 sub terminate { shift->throw_not_implemented(); }
506 # -----------------------------------------------------------------------------
508 =head2 last_event
510 Usage : $job->last_event
511 Returns : an XML string
512 Args : none
514 It returns a short XML document showing what happened last with this
515 job. This is the used DTD:
517 <!-- place for extensions -->
518 <!ENTITY % event_body_template "(state_changed | heartbeat_progress | percent_progress | time_progress | step_progress)">
520 <!ELEMENT analysis_event (message?, (%event_body_template;)?)>
522 <!ATTLIST analysis_event
523 timestamp CDATA #IMPLIED>
525 <!ELEMENT message (#PCDATA)>
527 <!ELEMENT state_changed EMPTY>
528 <!ENTITY % analysis_state "created | running | completed | terminated_by_request | terminated_by_error">
529 <!ATTLIST state_changed
530 previous_state (%analysis_state;) "created"
531 new_state (%analysis_state;) "created">
533 <!ELEMENT heartbeat_progress EMPTY>
535 <!ELEMENT percent_progress EMPTY>
536 <!ATTLIST percent_progress
537 percentage CDATA #REQUIRED>
539 <!ELEMENT time_progress EMPTY>
540 <!ATTLIST time_progress
541 remaining CDATA #REQUIRED>
543 <!ELEMENT step_progress EMPTY>
544 <!ATTLIST step_progress
545 total_steps CDATA #IMPLIED
546 steps_completed CDATA #REQUIRED>
548 Here is an example what is returned after a job was created and
549 started, but before it finishes (note that the example uses an
550 analysis 'showdb' which does not need any input data):
552 use Bio::Tools::Run::Analysis;
553 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
554 ->run
555 ->last_event;
557 It prints:
559 <?xml version = "1.0"?>
560 <analysis_event>
561 <message>Mar 3, 2003 5:14:46 PM (Europe/London)</message>
562 <state_changed previous_state="created" new_state="running"/>
563 </analysis_event>
565 The same example but now after it finishes:
567 use Bio::Tools::Run::Analysis;
568 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
569 ->wait_for
570 ->last_event;
572 <?xml version = "1.0"?>
573 <analysis_event>
574 <message>Mar 3, 2003 5:17:14 PM (Europe/London)</message>
575 <state_changed previous_state="running" new_state="completed"/>
576 </analysis_event>
578 =cut
580 sub last_event { shift->throw_not_implemented(); }
582 # -----------------------------------------------------------------------------
584 =head2 status
586 Usage : $job->status
587 Returns : string describing the job status
588 Args : none
590 It returns one of the following strings (and perhaps more if a server
591 implementation extended possible job states):
593 CREATED
594 RUNNING
595 COMPLETED
596 TERMINATED_BY_REQUEST
597 TERMINATED_BY_ERROR
599 =cut
601 sub status { shift->throw_not_implemented(); }
603 # -----------------------------------------------------------------------------
605 =head2 created
607 Usage : $job->created (1)
608 Returns : time when this job was created
609 Args : optional
611 Without any argument it returns a time of creation of this job in
612 seconds, counting from the beginning of the UNIX epoch
613 (1.1.1970). With a true argument it returns a formatted time, using
614 rules described in C<Bio::Tools::Run::Analysis::Utils::format_time>.
616 =cut
618 sub created { shift->throw_not_implemented(); }
620 # -----------------------------------------------------------------------------
622 =head2 started
624 Usage : $job->started (1)
625 Returns : time when this job was started
626 Args : optional
628 See C<created>.
630 =cut
632 sub started { shift->throw_not_implemented(); }
634 # -----------------------------------------------------------------------------
636 =head2 ended
638 Usage : $job->ended (1)
639 Returns : time when this job was terminated
640 Args : optional
642 See C<created>.
644 =cut
646 sub ended { shift->throw_not_implemented(); }
648 # -----------------------------------------------------------------------------
650 =head2 elapsed
652 Usage : $job->elapsed
653 Returns : elapsed time of the execution of the given job
654 (in milliseconds), or 0 of job was not yet started
655 Args : none
657 Note that some server implementations cannot count in millisecond - so
658 the returned time may be rounded to seconds.
660 =cut
662 sub elapsed { shift->throw_not_implemented(); }
664 # -----------------------------------------------------------------------------
666 =head2 times
668 Usage : $job->times ('formatted')
669 Returns : a hash refrence with all time characteristics
670 Args : optional
672 It is a convenient method returning a hash reference with the folowing
673 keys:
675 created
676 started
677 ended
678 elapsed
680 See C<create> for remarks on time formating.
682 An example - both for unformatted and formatted times:
684 use Data::Dumper;
685 use Bio::Tools::Run::Analysis;
686 my $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
687 ->wait_for ( { 'sequence_usa' => 'embl:hsu52852' } )
688 ->times (1);
689 print Data::Dumper->Dump ( [$rh], ['Times']);
690 $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
691 ->wait_for ( { 'sequence_usa' => 'embl:AL499624' } )
692 ->times;
693 print Data::Dumper->Dump ( [$rh], ['Times']);
695 $Times = {
696 'ended' => 'Mon Mar 3 17:52:06 2003',
697 'started' => 'Mon Mar 3 17:52:05 2003',
698 'elapsed' => '1000',
699 'created' => 'Mon Mar 3 17:52:05 2003'
701 $Times = {
702 'ended' => '1046713961',
703 'started' => '1046713926',
704 'elapsed' => '35000',
705 'created' => '1046713926'
708 =cut
710 sub times { shift->throw_not_implemented(); }
712 # -----------------------------------------------------------------------------
714 =head2 results
716 Usage : $job->results (...)
717 Returns : one or more results created by this job
718 Args : various, see belou
720 This is a complex method trying to make sense for all kinds of
721 results. Especially it tries to help to put binary results (such as
722 images) into local files. Generally it deals with fhe following facts:
724 =over
726 =item *
728 Each analysis tool may produce more results.
730 =item *
732 Some results may contain binary data not suitable for printing into a
733 terminal window.
735 =item *
737 Some results may be split into variable number of parts (this is
738 mainly true for the image results that can consist of more *.png
739 files).
741 =back
743 Note also that results have names to distinguish if there are more of
744 them. The names can be obtained by method C<result_spec>.
746 Here are the rules how the method works:
748 Retrieving NAMED results:
749 -------------------------
750 results ('name1', ...) => return results as they are, no storing into files
752 results ( { 'name1' => 'filename', ... } ) => store into 'filename', return 'filename'
753 results ( 'name1=filename', ...) => ditto
755 results ( { 'name1' => '-', ... } ) => send result to the STDOUT, do not return anything
756 results ( 'name1=-', ...) => ditto
758 results ( { 'name1' => '@', ... } ) => store into file whose name is invented by
759 this method, perhaps using RESULT_NAME_TEMPLATE env
760 results ( 'name1=@', ...) => ditto
762 results ( { 'name1' => '?', ... } ) => find of what type is this result and then use
763 {'name1'=>'@' for binary files, and a regular
764 return for non-binary files
765 results ( 'name=?', ...) => ditto
767 Retrieving ALL results:
768 -----------------------
769 results() => return all results as they are, no storing into files
771 results ('@') => return all results, as if each of them given
772 as {'name' => '@'} (see above)
774 results ('?') => return all results, as if each of them given
775 as {'name' => '?'} (see above)
777 Misc:
778 -----
779 * any result can be returned as a scalar value, or as an array reference
780 (the latter is used for results consisting of more parts, such images);
781 this applies regardless whether the returned result is the result itself
782 or a filename created for the result
784 * look in the documentation of the C<panalysis[.PLS]> script for examples
785 (especially how to use various templates for inventing file names)
787 =cut
789 sub results { shift->throw_not_implemented(); }
791 # -----------------------------------------------------------------------------
793 =head2 result
795 Usage : $job->result (...)
796 Returns : the first result
797 Args : see 'results'
799 =cut
801 sub result { shift->throw_not_implemented(); }
803 # -----------------------------------------------------------------------------
805 =head2 remove
807 Usage : $job->remove
808 Returns : 1
809 Args : none
811 The job object is not actually removed in this time but it is marked
812 (setting 1 to C<_destroy_on_exit> attribute) as ready for deletion when
813 the client program ends (including a request to server to forget the job
814 mirror object on the server side).
816 =cut
818 sub remove { shift->throw_not_implemented(); }
820 # -----------------------------------------------------------------------------
823 __END__