src/gromacs/analysisdata.h

   1 /*
   2  * This file is part of the GROMACS molecular simulation package.
   3  *
   4  * Copyright (c) 2010,2012,2013,2014,2015, by the GROMACS development team, led by
   5  * Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
   6  * and including many others, as listed in the AUTHORS file in the
   7  * top-level source directory and at http://www.gromacs.org.
   8  *
   9  * GROMACS is free software; you can redistribute it and/or
  10  * modify it under the terms of the GNU Lesser General Public License
  11  * as published by the Free Software Foundation; either version 2.1
  12  * of the License, or (at your option) any later version.
  13  *
  14  * GROMACS is distributed in the hope that it will be useful,
  15  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  16  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  17  * Lesser General Public License for more details.
  18  *
  19  * You should have received a copy of the GNU Lesser General Public
  20  * License along with GROMACS; if not, see
  21  * http://www.gnu.org/licenses, or write to the Free Software Foundation,
  22  * Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
  23  *
  24  * If you want to redistribute modifications to GROMACS, please
  25  * consider that scientific software is very special. Version
  26  * control is crucial - bugs must be traceable. We will be happy to
  27  * consider code for inclusion in the official distribution, but
  28  * derived work must not be called official GROMACS. Details are found
  29  * in the README & COPYING files - if they are missing, get the
  30  * official version at http://www.gromacs.org.
  31  *
  32  * To help us fund GROMACS development, we humbly ask that you cite
  33  * the research papers on the package. Check out http://www.gromacs.org.
  34  */
  35 /*! \defgroup module_analysisdata Parallelizable Handling of Output Data (analysisdata)
  36  * \ingroup group_analysismodules
  37  * \brief
  38  * Provides functionality for handling and processing output data from
  39  * analysis.
  40  *
  41  * <H3>Overview</H3>
  42  *
  43  * This module provides functionality to do common processing for tabular data
  44  * in analysis tools.  In addition to providing this common functionality, one
  45  * major driver for this module is to make it simple to write analysis tools
  46  * that process frames in parallel: the functionality in this module takes care
  47  * of necessary synchronization and communication such that output from the
  48  * frames is collected and output in the correct order.
  49  * See \ref page_analysisdata for an overview of the high-level functionality
  50  * and the terminology used.
  51  *
  52  * This module consists of two main parts.  The first is formed by the
  53  * gmx::AbstractAnalysisData class and classes that derive from it:
  54  * gmx::AnalysisData and gmx::AnalysisArrayData.  These classes are used to
  55  * process and store raw data as produced by the analysis tool.  They also
  56  * provide an interface to attach data modules that implement
  57  * gmx::IAnalysisDataModule.
  58  *
  59  * Modules that implement gmx::IAnalysisDataModule form the second part
  60  * of the module, and they provide functionality to do processing on the data.
  61  * These modules can also derive from gmx::AbstractAnalysisData, allowing other
  62  * modules to be attached to them to form a processing chain that best suits
  63  * the analysis tool.  Typically, such a processing chain ends in a plotting
  64  * module that writes the data into a file, but the final module can also
  65  * provide direct access to the processed data, allowing the analysis tool to
  66  * do custom postprocessing outside the module framework.
  67  *
  68  * <H3>Using Data Objects and Modules</H3>
  69  *
  70  * To use the functionality in this module, you typically declare one or more
  71  * AnalysisData objects and set its properties.  You then create some module
  72  * objects and set their properties (see the list of classes that implement
  73  * gmx::IAnalysisDataModule) and attach them to the data objects or to
  74  * one another using gmx::AbstractAnalysisData::addModule().  Then you add the
  75  * actual data values to the gmx::AnalysisData object, which automatically
  76  * passes it on to the modules.
  77  * After all data is added, you may optionally access some results directly
  78  * from the module objects or from the gmx::AnalysisData object itself.
  79  * However, in many cases it is sufficient to initially add a plotting module
  80  * to the processing chain, which will then automatically write the results
  81  * into a file.
  82  *
  83  * For simple processing needs with a small amount of data, an
  84  * gmx::AnalysisArrayData class is also provided, which keeps all the data in an
  85  * in-memory array and allows you to manipulate the data as you wish before you
  86  * pass the data to the attached modules.
  87  *
  88  * <H3>Data Modules</H3>
  89  *
  90  * Modules that derive from gmx::IAnalysisDataModule can operate in two
  91  * modes:
  92  *  - In _serial_ mode, the frames are presented to the module always in the
  93  *    order of increasing indices, even if they become ready in a different
  94  *    order in the attached data.
  95  *  - In _parallel_ mode, the frames are presented in the order that they
  96  *    become available in the input data, which may not be sequential.
  97  *    This mode allows the input data to optimize its behavior if it does not
  98  *    need to store and sort the frames.
  99  *
 100  * The figure below shows the sequence of callbacks that the module receives.
 101  * Arrows show a dependency between callbacks: the event at the start of the
 102  * arrow always occurs before the event at the end.  The events in the box are
 103  * repeated for each frame.  Dashed lines within this box show dependencies
 104  * between these frames:
 105  *  - In serial mode, all the events are called in a deterministic order, with
 106  *    each frame completely processed before the next starts.
 107  *  - In parallel mode, multiple frames can be in progress simultaneously, and
 108  *    the events for different frames can occur even concurrently on different
 109  *    threads.  However, frameFinishSerial() events will always occur in
 110  *    deterministic, sequential order for the frames.  Also, the number of
 111  *    concurrent frames is limited by the parallelization factor passed to
 112  *    parallelDataStarted(): only M frames after the last frame for which
 113  *    frameFinishSerial() has been called can be in progress
 114  *
 115  * \dot
 116  *     digraph datamodule_events {
 117  *         rankdir = LR
 118  *         node [ shape=box ]
 119  *
 120  *         start  [ label="dataStarted()",
 121  *                  URL="\ref gmx::IAnalysisDataModule::dataStarted()" ]
 122  *         pstart [ label="parallelDataStarted()",
 123  *                  URL="\ref gmx::IAnalysisDataModule::parallelDataStarted()" ]
 124  *         subgraph cluster_frame {
 125  *             label = "for each frame"
 126  *             framestart   [ label="frameStarted()",
 127  *                            URL="\ref gmx::IAnalysisDataModule::frameStarted()" ]
 128  *             pointsadd    [ label="pointsAdded()",
 129  *                            URL="\ref gmx::IAnalysisDataModule::pointsAdded()" ]
 130  *             framefinish  [ label="frameFinished()",
 131  *                            URL="\ref gmx::IAnalysisDataModule::frameFinished()" ]
 132  *             serialfinish [ label="frameFinishedSerial()",
 133  *                            URL="\ref gmx::IAnalysisDataModule::frameFinishedSerial()" ]
 134  *         }
 135  *         finish [ label="dataFinished()",
 136  *                  URL="\ref gmx::IAnalysisDataModule::dataFinished()" ]
 137  *
 138  *         start -> framestart
 139  *         pstart -> framestart
 140  *         framestart -> pointsadd
 141  *         pointsadd -> pointsadd [ label="0..*", dir=back ]
 142  *         pointsadd -> framefinish
 143  *         framefinish -> serialfinish
 144  *         serialfinish -> finish
 145  *
 146  *         framestart:se -> serialfinish:sw [ dir=back, style=dashed, weight=0,
 147  *                                            label="serial: frame n+1\nparallel: frame n+M" ]
 148  *         serialfinish -> serialfinish [ dir=back, style=dashed,
 149  *                                        label="frame n+1" ]
 150  *     }
 151  * \enddot
 152  *
 153  * If the input data supports parallel mode, it calls parallelDataStarted().
 154  * If the module returns `true` from this method, then it will process the
 155  * frames in the parallel mode.  If the module returns `false`, it will get the
 156  * frames in serial order.
 157  * If the input data does not support parallel mode, it calls dataStarted(),
 158  * and the module will always get the frames in order.
 159  *
 160  * The sequence of when the module methods are called with respect to when data
 161  * is added to the data object depends on the type of the module and the type
 162  * of the data.  However, generally the modules do not need to know the details
 163  * of how this happens, as long as they work with the above state diagram.
 164  *
 165  * For parallel processing, the gmx::AnalysisData object itself only provides
 166  * the infrastructure to support all of the above, including the reordering of
 167  * the frames for serial processing.  However, the caller is still responsible
 168  * of the actual thread synchronization, and must call
 169  * gmx::AnalysisData::finishFrameSerial() for each frame from a suitable
 170  * context where the serial processing for that frame can be done.  When using
 171  * the data objects as part of the trajectory analysis framework
 172  * (\ref page_analysisframework), these calls are handled by the framework.
 173  *
 174  * \if libapi
 175  * <H3>Writing New Data and Module Objects</H3>
 176  *
 177  * New data modules can be implemented to perform custom operations that are
 178  * not supported by the modules provided in this module.  This is done by
 179  * creating a new class that implements gmx::IAnalysisDataModule.
 180  * If the new module computes values that can be used as input for other
 181  * modules, the new class should also derive from gmx::AbstractAnalysisData, and
 182  * preferably use gmx::AnalysisDataStorage internally to implement storage of
 183  * values.  See the documentation of the mentioned classes for more details on
 184  * how to implement custom modules.
 185  * When implementing a new module, it should be considered whether it can be of
 186  * more general use, and if so, it should be added to this module.
 187  *
 188  * It is also possible to implement new data source objects by deriving a class
 189  * from gmx::AbstractAnalysisData.  This should not normally be necessary, since
 190  * this module provides general data source objects for most typical uses.
 191  * If the classes in this module are not suitable for some specific use, it
 192  * should be considered whether a new generic class could be added (or an
 193  * existing extended) instead of implementing a local custom solution.
 194  * \endif
 195  *
 196  * \author Teemu Murtola <teemu.murtola@gmail.com>
 197  */
 198 /*! \file
 199  * \brief
 200  * Public API convenience header for analysis data handling.
 201  *
 202  * \author Teemu Murtola <teemu.murtola@gmail.com>
 203  * \inpublicapi
 204  * \ingroup module_analysisdata
 205  */
 206 #ifndef GMX_ANALYSISDATA_H
 207 #define GMX_ANALYSISDATA_H
 208
 209 #include "gromacs/analysisdata/analysisdata.h"
 210 #include "gromacs/analysisdata/arraydata.h"
 211 #include "gromacs/analysisdata/dataframe.h"
 212 #include "gromacs/analysisdata/modules/average.h"
 213 #include "gromacs/analysisdata/modules/displacement.h"
 214 #include "gromacs/analysisdata/modules/histogram.h"
 215 #include "gromacs/analysisdata/modules/lifetime.h"
 216 #include "gromacs/analysisdata/modules/plot.h"
 217
 218 #endif