docs/manual/analyse.tex

   1 %
   2 % This file is part of the GROMACS molecular simulation package.
   3 %
   4 % Copyright (c) 2013,2014,2015,2016,2017, by the GROMACS development team, led by
   5 % Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
   6 % and including many others, as listed in the AUTHORS file in the
   7 % top-level source directory and at http://www.gromacs.org.
   8 %
   9 % GROMACS is free software; you can redistribute it and/or
  10 % modify it under the terms of the GNU Lesser General Public License
  11 % as published by the Free Software Foundation; either version 2.1
  12 % of the License, or (at your option) any later version.
  13 %
  14 % GROMACS is distributed in the hope that it will be useful,
  15 % but WITHOUT ANY WARRANTY; without even the implied warranty of
  16 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  17 % Lesser General Public License for more details.
  18 %
  19 % You should have received a copy of the GNU Lesser General Public
  20 % License along with GROMACS; if not, see
  21 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
  22 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
  23 %
  24 % If you want to redistribute modifications to GROMACS, please
  25 % consider that scientific software is very special. Version
  26 % control is crucial - bugs must be traceable. We will be happy to
  27 % consider code for inclusion in the official distribution, but
  28 % derived work must not be called official GROMACS. Details are found
  29 % in the README & COPYING files - if they are missing, get the
  30 % official version at http://www.gromacs.org.
  31 %
  32 % To help us fund GROMACS development, we humbly ask that you cite
  33 % the research papers on the package. Check out http://www.gromacs.org.
  34
  35 \chapter{Analysis}
  36 \label{ch:analysis}
  37 In this chapter different ways of analyzing your trajectory are described.
  38 The names of the corresponding analysis programs are given.
  39 Specific information on the in- and output of these programs can be found
  40 in the online manual at {\wwwpage}.
  41 The output files are often produced as finished Grace/Xmgr graphs.
  42
  43 First, in \secref{usinggroups}, the group concept in analysis is explained.
  44 \ssecref{selections} explains a newer concept of dynamic selections,
  45 which is currently supported by a few tools.
  46 Then, the different analysis tools are presented.
  47
  48 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Groups in Analysis
  49
  50 \section{Using Groups}
  51 \label{sec:usinggroups}\index{groups}
  52 {\tt gmx make_ndx, gmx mk_angndx, gmx select}\\
  53 In \chref{algorithms}, it was explained how {\em groups of
  54 atoms} can be used in {\tt mdrun} (see~\secref{groupconcept}).
  55 In most analysis programs, groups
  56 of atoms must also be chosen. Most programs can generate several default
  57 index groups, but groups can always be read from an index file. Let's
  58 consider the example of a simulation of a binary mixture of components A and B. When
  59 we want to calculate the radial distribution function (RDF)
  60 $g_{AB}(r)$ of A with respect to B, we have to calculate:
  61 \beq
  62 4\pi r^2 g_{AB}(r)      ~=~     V~\sum_{i \in A}^{N_A} \sum_{j \in B}^{N_B} P(r)
  63 \eeq
  64 where $V$ is the volume and $P(r)$ is the probability of finding a B atom
  65 at distance $r$ from an A atom.
  66
  67 By having the user define the {\em atom numbers} for groups A and B in
  68 a simple file, we can calculate this $g_{AB}$ in the most general way, without
  69 having to make any assumptions in the RDF program about the type of
  70 particles.
  71
  72 Groups can therefore consist of a series of {\em atom numbers}, but in
  73 some cases also of {\em molecule numbers}.  It is also possible to
  74 specify a series of angles by {\em triples} of {\em atom numbers},
  75 dihedrals by {\em quadruples} of {\em atom numbers} and bonds or
  76 vectors (in a molecule) by {\em pairs} of {\em atom numbers}. When
  77 appropriate the type of index file will be specified for the following
  78 analysis programs.  To help creating such \swapindex{index}{file}s ({\tt
  79 index.ndx}), there are a couple of programs to generate them, using
  80 either your input configuration or the topology.  To generate an
  81 index file consisting of a series of {\em atom numbers} (as in the
  82 example of $g_{AB}$), use {\tt \normindex{gmx make_ndx}} or
  83 {\tt \normindex{gmx select}}.  To generate an index file
  84 with angles or dihedrals, use {\tt \normindex{gmx mk_angndx}}.
  85 Of course you can also
  86 make them by hand. The general format is presented here:
  87
  88 \begin{verbatim}
  89 [ Oxygen ]
  90    1       4       7
  91
  92 [ Hydrogen ]
  93    2       3       5       6
  94    8       9
  95 \end{verbatim}
  96
  97 First, the group name is written between square brackets. The following
  98 atom numbers may be spread out over as many lines as you like. The
  99 atom numbering starts at 1.
 100
 101 \swapindexquiet{choosing}{groups}%
 102 Each tool that can use groups will offer the available alternatives
 103 for the user to choose. That choice can be made with the number of the
 104 group, or its name. In fact, the first few letters of the group
 105 name will suffice if that will distinguish the group from all others.
 106 There are ways to use Unix shell features to choose group names
 107 on the command line, rather than interactively. Consult {\wwwpage}
 108 for suggestions.
 109
 110 \subsection{Default Groups}
 111 \label{subsec:defaultgroups}
 112 When no index file is supplied to analysis tools or {\tt grompp},
 113 a number of \swapindex{default}{groups} are generated to choose from:
 114 \begin{description}
 115 \item[{\tt System}]\mbox{}\\
 116         all atoms in the system
 117 \item[{\tt Protein}]\mbox{}\\
 118         all protein atoms
 119 \item[{\tt Protein-H}]\mbox{}\\
 120         protein atoms excluding hydrogens
 121 \item[{\tt C-alpha}]\mbox{}\\
 122         C$_{\alpha}$ atoms
 123 \item[{\tt Backbone}]\mbox{}\\
 124         protein backbone atoms; N, C$_{\alpha}$ and C
 125 \item[{\tt MainChain}]\mbox{}\\
 126         protein main chain atoms: N, C$_{\alpha}$, C and O, including
 127         oxygens in C-terminus
 128 \item[{\tt MainChain+Cb}]\mbox{}\\
 129         protein main chain atoms including C$_{\beta}$
 130 \item[{\tt MainChain+H}]\mbox{}\\
 131         protein main chain atoms including backbone amide hydrogens and
 132         hydrogens on the N-terminus
 133 \item[{\tt SideChain}]\mbox{}\\
 134         protein side chain atoms; that is all atoms except N,
 135         C$_{\alpha}$, C, O, backbone amide hydrogens, oxygens in
 136         C-terminus and hydrogens on the N-terminus
 137 \item[{\tt SideChain-H}]\mbox{}\\
 138         protein side chain atoms excluding all hydrogens
 139 \item[{\tt Prot-Masses}]\mbox{}\\
 140         protein atoms excluding dummy masses (as used in virtual site
 141         constructions of NH$_3$ groups and tryptophan side-chains),
 142         see also \secref{vsitetop}; this group is only included when
 143         it differs from the ``{\tt Protein}'' group
 144 \item[{\tt Non-Protein}]\mbox{}\\
 145         all non-protein atoms
 146 \item[{\tt DNA}]\mbox{}\\
 147         all DNA atoms
 148 \item[{\tt RNA}]\mbox{}\\
 149         all RNA atoms
 150 \item[{\tt Water}]\mbox{}\\
 151         water molecules (names like {\tt SOL}, {\tt WAT}, {\tt HOH}, etc.)  See
 152         {\tt \normindex{residuetypes.dat}} for a full listing
 153 \item[{\tt non-Water}]\mbox{}\\
 154         anything not covered by the {\tt Water} group
 155 \item[{\tt Ion}]\mbox{}\\
 156         any name matching an Ion entry in {\tt \normindex{residuetypes.dat}}
 157 \item[{\tt Water_and_Ions}]\mbox{}\\
 158         combination of the {\tt Water} and {\tt Ions} groups
 159 \item[{\tt molecule_name}]\mbox{}\\
 160         for all residues/molecules which are not recognized as protein,
 161         DNA, or RNA; one group per residue/molecule name is generated
 162 \item[{\tt Other}]\mbox{}\\
 163         all atoms which are neither protein, DNA, nor RNA.
 164 \end{description}
 165 Empty groups will not be generated.
 166 Most of the groups only contain protein atoms.
 167 An atom is considered a protein atom if its residue name is listed
 168 in the {\tt \normindex{residuetypes.dat}} file and is listed as a
 169 ``Protein'' entry.  The process for determinding DNA, RNA, etc. is
 170 analogous. If you need to modify these classifications, then you
 171 can copy the file from the library directory into your working
 172 directory and edit the local copy.
 173
 174 \subsection{Selections}
 175 \label{subsec:selections}
 176 {\tt \normindex{gmx select}}\\
 177 Currently, a few analysis tools support an extended concept of {\em
 178 (dynamic) \normindex{selections}}.  There are three main differences to
 179 traditional index groups:
 180 \begin{itemize}
 181   \item The selections are specified as text instead of reading fixed
 182     atom indices from a file, using a syntax similar to VMD.  The text
 183     can be entered interactively, provided on the command line, or from
 184     a file.
 185   \item The selections are not restricted to atoms, but can also specify
 186     that the analysis is to be performed on, e.g., center-of-mass
 187     positions of a group of atoms.
 188     Some tools may not support selections that do not evaluate to single
 189     atoms, e.g., if they require information that is available only for
 190     single atoms, like atom names or types.
 191   \item The selections can be dynamic, i.e., evaluate to different atoms
 192     for different trajectory frames.  This allows analyzing only a
 193     subset of the system that satisfies some geometric criteria.
 194 \end{itemize}
 195 As an example of a simple selection,
 196 {\tt resname ABC and within 2 of resname DEF}
 197 selects all atoms in residues named ABC that are within 2\,nm of any
 198 atom in a residue named DEF.
 199
 200 Tools that accept selections can also use traditional index files
 201 similarly to older tools: it is possible to give an {\tt .ndx} file to
 202 the tool, and directly select a group from the index file as a
 203 selection, either by group number or by group name.  The index groups
 204 can also be used as a part of a more complicated selection.
 205
 206 To get started, you can run {\tt gmx select} with a single structure,
 207 and use the interactive prompt to try out different selections.
 208 The tool provides, among others, output options {\tt -on} and
 209 {\tt -ofpdb} to write out the selected atoms to an index file and to a
 210 {\tt .pdb} file, respectively.  This does not allow testing selections
 211 that evaluate to center-of-mass positions, but other selections can be
 212 tested and the result examined.
 213
 214 The detailed syntax and the individual keywords that can be used in
 215 selections can be accessed by typing {\tt help} in the interactive
 216 prompt of any selection-enabled tool, as well as with
 217 {\tt gmx help selections}.  The help is divided into subtopics that can
 218 be accessed with, e.g., {\tt help syntax} / {\tt gmx help selections
 219 syntax}.  Some individual selection keywords have extended help as well,
 220 which can be accessed with, e.g., {\tt help keywords within}.
 221
 222 The interactive prompt does not currently provide much editing
 223 capabilities.  If you need them, you can run the program under
 224 {\tt rlwrap}.
 225
 226 For tools that do not yet support the selection syntax, you can use
 227 {\tt gmx select -on} to generate static index groups to pass to the
 228 tool.  However, this only allows for a small subset (only the first
 229 bullet from the above list) of the flexibility that fully
 230 selection-aware tools offer.
 231
 232 It is also possible to write your own analysis
 233 tools to take advantage of the flexibility of these selections: see the
 234 {\tt template.cpp} file in the {\tt share/gromacs/template} directory of
 235 your installation for an example.
 236
 237 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Looking at your trajectory
 238
 239 \section{Looking at your trajectory}
 240 \label{sec:lookwhostalking}
 241 \begin{figure}
 242 \centerline{
 243 {\includegraphics[width=8cm,angle=90]{plots/ngmxdump}}}
 244 \caption{The window of {\tt gmx view} showing a box of water.}
 245 \label{fig:ngmxdump}
 246 \end{figure}
 247 {\tt gmx view}\\
 248 Before analyzing your trajectory it is often informative to look at
 249 your trajectory first. {\gromacs} comes with a simple trajectory
 250 viewer {\tt \normindex{gmx view}}; the advantage with this one is that it does not
 251 require OpenGL, which usually isn't present on {\eg} supercomputers.
 252 It is also possible to generate a
 253 hard-copy in Encapsulated Postscript format (see
 254 \figref{ngmxdump}). If you want a faster and more fancy viewer
 255  there are several programs
 256 that can read the {\gromacs} trajectory formats -- have a look at our
 257 homepage ({\wwwpage}) for updated links.
 258
 259 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% General properties
 260
 261 \section{General properties}
 262 \label{sec:genprop}
 263 {\tt gmx energy, gmx traj}\\
 264 To analyze some or all {\em energies} and other properties, such as
 265 {\em total pressure}, {\em pressure tensor}, {\em density}, {\em
 266 box-volume} and {\em box-sizes}, use the program {\tt \normindex{gmx energy}}.  A
 267 choice can be made from a list a set of energies, like potential,
 268 kinetic or total energy, or individual contributions, like
 269 Lennard-Jones or dihedral energies.
 270
 271 The {\em center-of-mass velocity}, defined as
 272 \beq
 273 {\bf v}_{com} = {1 \over M} \sum_{i=1}^N m_i {\bf v}_i
 274 \eeq
 275 with $M = \sum_{i=1}^N m_i$ the total mass of the system, can be
 276 monitored in time by the program {\tt \normindex{gmx traj} -com -ov}. It is however
 277 recommended to remove the center-of-mass velocity every step (see
 278 \chref{algorithms})!
 279
 280 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Radial distribution functions
 281
 282 \section{Radial distribution functions}
 283 \label{sec:rdf}
 284 {\tt gmx rdf}\\
 285 The {\em radial distribution function} (RDF) or pair correlation
 286 function $g_{AB}(r)$ between particles of type $A$ and $B$ is defined
 287 in the following way:
 288 \newcommand{\dfrac}[2]{\displaystyle \frac{#1}{#2}}
 289 \beq
 290 \begin{array}{rcl}
 291 g_{AB}(r)&=&    \dfrac{\langle \rho_B(r) \rangle}{\langle\rho_B\rangle_{local}}         \\
 292          &=&    \dfrac{1}{\langle\rho_B\rangle_{local}}\dfrac{1}{N_A}
 293                 \sum_{i \in A}^{N_A} \sum_{j \in B}^{N_B}
 294                 \dfrac{\delta( r_{ij} - r )}{4 \pi r^2}         \\
 295 \end{array}
 296 \eeq
 297 with $\langle\rho_B(r)\rangle$ the particle density of type $B$ at a distance $r$
 298 around particles $A$, and $\langle\rho_B\rangle_{local}$ the particle density of
 299 type $B$ averaged over all spheres around particles $A$ with radius
 300 $r_{max}$ (see \figref{rdfex}C).
 301
 302 \begin{figure}
 303 \centerline{
 304 {\includegraphics[width=7cm]{plots/rdf}}}
 305 \caption[Definition of slices in {\tt gmx rdf}.]{Definition of slices
 306 in {\tt gmx rdf}: A. $g_{AB}(r)$. B. $g_{AB}(r,\theta)$. The slices are
 307 colored gray. C. Normalization $\langle\rho_B\rangle_{local}$. D. Normalization
 308 $\langle\rho_B\rangle_{local,\:\theta }$. Normalization volumes are colored gray.}
 309 \label{fig:rdfex}
 310 \end{figure}
 311
 312 Usually the value of $r_{max}$ is half of the box length.  The
 313 averaging is also performed in time.  In practice the analysis program
 314 {\tt \normindex{gmx rdf}} divides the system into spherical slices (from $r$ to
 315 $r+dr$, see \figref{rdfex}A) and makes a histogram in stead of
 316 the $\delta$-function. An example of the RDF of oxygen-oxygen in
 317 SPC water~\cite{Berendsen81} is given in \figref{rdf}.
 318
 319 \begin{figure}
 320 \centerline{
 321 {\includegraphics[width=8cm]{plots/rdfO-O}}}
 322 \caption{$g_{OO}(r)$ for Oxygen-Oxygen of SPC-water.}
 323 \label{fig:rdf}
 324 \end{figure}
 325
 326 % TODO: This functionality isn't there...
 327 With {\tt gmx rdf} it is also possible to calculate an angle dependent rdf
 328 $g_{AB}(r,\theta)$, where the angle $\theta$ is defined with respect to a
 329 certain laboratory axis ${\bf e}$, see \figref{rdfex}B.
 330 \bea
 331 g_{AB}(r,\theta) &=& {1 \over \langle\rho_B\rangle_{local,\:\theta }} {1 \over N_A} \sum_{i \in A}^{N_A} \sum_{j \in B}^{N_B} {\delta( r_{ij} - r ) \delta(\theta_{ij} -\theta) \over 2 \pi r^2 sin(\theta)}\\
 332 cos(\theta_{ij}) &=& {{\bf r}_{ij} \cdot {\bf e} \over \|r_{ij}\| \;\| e\| }
 333 \eea
 334 This $g_{AB}(r,\theta)$ is useful for analyzing anisotropic systems.
 335 {\bf Note} that in this case the normalization $\langle\rho_B\rangle_{local,\:\theta}$ is
 336 the average density in all angle slices from $\theta$ to $\theta + d\theta$
 337 up to $r_{max}$, so angle dependent, see \figref{rdfex}D.
 338
 339 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Correlation functions
 340
 341 \section{Correlation functions}
 342 \label{sec:corr}
 343
 344 \subsection{Theory of correlation functions}
 345 The theory of correlation functions is well established~\cite{Allen87}.
 346 We describe here the implementation of the various
 347 \normindex{correlation} function flavors in the {\gromacs} code.
 348 The definition of the autocorrelation function\index{autocorrelation function}
 349 (ACF)
 350 $C_f(t)$ for a property $f(t)$ is:
 351 \beq
 352 C_f(t)  ~=~     \left\langle f(\xi) f(\xi+t)\right\rangle_{\xi}
 353 \label{eqn:corr}
 354 \eeq
 355 where the notation on the right hand side indicates averaging over $\xi$, {\ie} over
 356 time origins.
 357 It is also possible to compute cross-correlation function from two properties
 358 $f(t)$ and $g(t)$:
 359 \beq
 360 C_{fg}(t) ~=~   \left\langle f(\xi) g(\xi+t)\right\rangle_{\xi}
 361 \eeq
 362 however, in {\gromacs} there is no standard mechanism to do this
 363 ({\bf note:} you can use the {\tt \normindex{xmgr}} program to compute cross correlations).
 364 The integral of the correlation function over time is the
 365 correlation time $\tau_f$:
 366 \beq
 367 \tau_f  ~=~     \int_0^{\infty} C_f(t) {\rm d} t
 368 \label{eqn:corrtime}
 369 \eeq
 370
 371 In practice, correlation functions are calculated based on data points with
 372 discrete time intervals {$\Delta$t}, so that the ACF from an MD simulation is:
 373 \beq
 374 C_f(j\Delta t)  ~=~     \frac{1}{N-j}\sum_{i=0}^{N-1-j} f(i\Delta t) f((i+j)\Delta t)
 375 \label{eqn:corrmd}
 376 \eeq
 377 where $N$ is the number of available time frames for the calculation.
 378 The resulting ACF is
 379 obviously only available at time points with the same interval {$\Delta$t}.
 380 Since, for many applications, it is necessary to know the short time behavior
 381 of the ACF ({\eg} the first 10 ps) this often means that we have to save the
 382 data with intervals much shorter than the time scale of interest.
 383 Another implication of \eqnref{corrmd} is that in principle we can not compute
 384 all points of the ACF with the same accuracy, since we have $N-1$ data points
 385 for $C_f(\Delta t)$ but only 1 for $C_f((N-1)\Delta t)$. However, if we decide to
 386 compute only an ACF of length $M\Delta t$, where $M \leq N/2$ we can compute
 387 all points with the same statistical accuracy:
 388 \beq
 389 C_f(j\Delta t)  ~=~ \frac{1}{M}\sum_{i=0}^{N-1-M} f(i\Delta t)f((i+j)\Delta t)
 390 \eeq
 391 Here of course $j < M$.
 392 $M$ is sometimes referred to as the \normindex{time lag} of the correlation function.
 393 When we decide to do this, we intentionally do not use all the available points
 394 for very short time intervals ($j << M$), but it makes it easier to interpret
 395 the results.
 396 Another aspect that may not be neglected when computing
 397 ACFs from simulation is that usually the time origins $\xi$ (\eqnref{corr})
 398 are not statistically independent, which may introduce a bias in the results.
 399 This can be tested using a block-averaging procedure, where only time origins
 400 with a spacing at least the length of the time lag are included, {\eg} using
 401 $k$ time origins with spacing of $M\Delta t$ (where $kM \leq N$):
 402 \beq
 403 C_f(j\Delta t)  ~=~ \frac{1}{k}\sum_{i=0}^{k-1} f(iM\Delta t)f((iM+j)\Delta t)
 404 \eeq
 405 However, one
 406 needs very long simulations to get good accuracy this way, because there are
 407 many fewer points that contribute to the ACF.
 408
 409 \subsection{Using FFT for computation of the ACF}
 410 The computational cost for calculating an ACF according to \eqnref{corrmd}
 411 is proportional to $N^2$, which is considerable. However, this can be improved
 412 by using fast Fourier transforms to do the convolution~\cite{Allen87}.
 413
 414 \subsection{Special forms of the ACF}
 415 There are some important varieties on the ACF, {\eg} the ACF of a vector \ve{p}:
 416 \beq
 417 C_{\ve{p}}(t) ~=~       \int_0^{\infty} P_n(\cos\angle\left(\ve{p}(\xi),\ve{p}(\xi+t)\right) {\rm d} \xi
 418 \label{eqn:corrleg}
 419 \eeq
 420 where $P_n(x)$ is the $n^{th}$ order Legendre polynomial
 421 \footnote{$P_0(x) = 1$, $P_1(x) = x$, $P_2(x) = (3x^2-1)/2$}.
 422 Such correlation times
 423 can actually be obtained experimentally using {\eg} NMR or other relaxation
 424 experiments. {\gromacs} can compute correlations using
 425 the 1$^{st}$ and 2$^{nd}$ order Legendre polynomial (\eqnref{corrleg}).
 426 This can also be used for rotational autocorrelation
 427 ({\tt \normindex{gmx rotacf}})
 428 and dipole autocorrelation ({\tt \normindex{gmx dipoles}}).
 429
 430 In order to study torsion angle dynamics, we define a dihedral
 431 autocorrelation function as~\cite{Spoel97a}:
 432 \beq
 433 C(t)    ~=~     \left\langle \cos(\theta(\tau)-\theta(\tau+t))\right\rangle_{\tau}
 434 \label{eqn:coenk}
 435 \eeq
 436 {\bf Note} that this is not a  product of two functions
 437 as is generally used for correlation
 438 functions, but it may be rewritten as the sum of two products:
 439 \beq
 440 C(t)    ~=~     \left\langle\cos(\theta(\tau))\cos(\theta(\tau+t))\,+\,\sin(\theta(\tau))\sin(\theta(\tau+t))\right\rangle_{\tau}
 441 \label{eqn:cot}
 442 \eeq
 443
 444 \subsection{Some Applications}
 445 The program {\tt \normindex{gmx velacc}} calculates the {\em velocity autocorrelation
 446 function}.
 447 \beq
 448 C_{\ve{v}} (\tau) ~=~ \langle {\ve{v}}_i(\tau) \cdot {\ve{v}}_i(0) \rangle_{i \in A}
 449 \eeq
 450 The self diffusion coefficient can be calculated using the Green-Kubo
 451 relation~\cite{Allen87}:
 452 \beq
 453 D_A ~=~ {1\over 3} \int_0^{\infty} \langle {\bf v}_i(t) \cdot {\bf v}_i(0) \rangle_{i \in A} \; dt
 454 \eeq
 455 which is just the integral of the velocity autocorrelation function.
 456 There is a widely-held belief that the velocity ACF converges faster than the mean
 457 square displacement (\secref{msd}), which can also be used for the computation of
 458 diffusion constants. However, Allen \& Tildesley~\cite{Allen87}
 459 warn us that the long-time
 460 contribution to the velocity ACF can not be ignored, so care must be taken.
 461
 462 Another important quantity is the dipole correlation time. The {\em dipole
 463 correlation function} for particles of type $A$ is calculated as follows by
 464 {\tt \normindex{gmx dipoles}}:
 465 \beq
 466 C_{\mu} (\tau) ~=~
 467 \langle {\bf \mu}_i(\tau) \cdot {\bf \mu}_i(0) \rangle_{i \in A}
 468 \eeq
 469 with ${\bf \mu}_i = \sum_{j \in i} {\bf r}_j q_j$. The dipole correlation time
 470 can be computed using \eqnref{corrtime}.
 471 For some applications see~\cite{Spoel98a}.
 472
 473 The \normindex{viscosity} of a liquid can be related to the correlation
 474 time of the Pressure tensor $\ve{P}$~\cite{PSmith93c,Balasubramanian96}.
 475 {\tt \normindex{gmx energy}} can compute the viscosity,
 476 but this is not very accurate~\cite{Hess2002a}, and
 477 actually the values do not converge.
 478
 479 \section{Curve fitting in \gromacs}
 480 \subsection{Sum of exponential functions}
 481 Sometimes it is useful to fit a curve to an analytical function, for
 482 example in the case of autocorrelation functions with noisy
 483 tails. {\gromacs} is not a general purpose curve-fitting tool however
 484 and therefore {\gromacs} only supports a limited number of
 485 functions.
 486 Table~\ref{tab:fitfn} lists the available options with the
 487 corresponding command-line options. The underlying routines for
 488 fitting use the Levenberg-Marquardt algorithm as implemented in the
 489 {\tt lmfit} package~\cite{lmfit} (a bare-bones version of which is
 490 included in {\gromacs} in which an option for error-weighted fitting
 491 was implemented).
 492 \begin{table}[ht]
 493 \centering
 494 \caption{Overview of fitting functions supported in (most) analysis tools
 495   that compute autocorrelation functions. The ``Note'' column describes
 496   properties of the output parameters.}
 497 \label{tab:fitfn}
 498 \begin{tabular}{lcc}
 499 \hline
 500 Command & Functional form $f(t)$& Note\\
 501 line option         &           & \\
 502 \hline
 503 exp        & $e^{-t/{a_0}}$ &\\
 504 aexp      & $a_1e^{-t/{a_0}}$ &\\
 505 exp_exp & $a_1e^{-t/{a_0}}+(1-a_1)e^{-t/{a_2}}$ & $a_2\ge a_0\ge 0$\\
 506 exp5      & $a_1e^{-t/{a_0}}+a_3e^{-t/{a_2}}+a_4$ &$a_2\ge a_0\ge 0$\\
 507 exp7     &
 508 $a_1e^{-t/{a_0}}+a_3e^{-t/{a_2}}+a_5e^{-t/{a_4}}+a_6$&$a_4\ge a_2\ge
 509 a_0 \ge0$\\
 510 exp9    &
 511 $a_1e^{-t/{a_0}}+a_3e^{-t/{a_2}}+a_5e^{-t/{a_4}}+a_7e^{-t/{a_6}}+a_8$&$a_6\ge
 512 a_4\ge a_2\ge a_0\ge 0$\\
 513 \hline
 514 \end{tabular}
 515 \end{table}
 516
 517 \subsection{Error estimation}
 518 Under the hood {\gromacs} implements some more fitting functions,
 519 namely a function to estimate the error in time-correlated data due to Hess~\cite{Hess2002a}:
 520 \beq
 521 \varepsilon^2(t) =
 522 \alpha\tau_1\left(1+\frac{\tau_1}{t}\left(e^{-t/\tau_1}-1\right)\right)
 523       + (1-\alpha)\tau_2\left(1+\frac{\tau_2}{t}\left(e^{-t/\tau_2}-1\right)\right)
 524 \eeq
 525 where $\tau_1$ and
 526 $\tau_2$ are time constants (with $\tau_2 \ge \tau_1$) and $\alpha$
 527 usually is close to 1 (in the fitting procedure it is enforced that
 528 $0\leq\alpha\leq 1$). This is used in {\tt gmx analyze} for error
 529 estimation using
 530 \beq
 531 \lim_{t\rightarrow\infty}\varepsilon(t) = \sigma\sqrt{\frac{2(\alpha\tau_1+(1-\alpha)\tau_2)}{T}}
 532 \eeq
 533 where $\sigma$ is the standard deviation of the data set and $T$ is
 534 the total simulation time~\cite{Hess2002a}.
 535
 536 \subsection{Interphase boundary demarcation}
 537 In order to determine the position and width of an interface,
 538 Steen-S{\ae}thre {\em et al.} fitted a density profile to the
 539 following function
 540 \beq
 541 f(x) ~=~ \frac{a_0+a_1}{2} - \frac{a_0-a_1}{2}{\rm
 542   erf}\left(\frac{x-a_2}{a_3^2}\right)
 543 \eeq
 544 where $a_0$ and $a_1$ are densities of different phases, $x$ is the
 545 coordinate normal to the interface, $a_2$ is the position of the
 546 interface and $a_3$ is the width of the
 547 interface~\cite{Steen-Saethre2014a}.
 548 This is implemented in {\tt gmx densorder}.
 549
 550 \subsection{Transverse current autocorrelation function}
 551 In order to establish the transverse current autocorrelation function
 552 (useful for computing viscosity~\cite{Palmer1994a})
 553 the following function is fitted:
 554 \beq
 555 f(x) ~=~ e^{-\nu}\left({\rm cosh}(\omega\nu)+\frac{{\rm
 556     sinh}(\omega\nu)}{\omega}\right)
 557 \eeq
 558 with $\nu = x/(2a_0)$ and $\omega = \sqrt{1-a_1}$.
 559 This is implemented in {\tt gmx tcaf}.
 560
 561 \subsection{Viscosity estimation from pressure autocorrelation
 562   function}
 563 The viscosity is a notoriously difficult property to extract from
 564 simulations~\cite{Hess2002a,Wensink2003a}. It is {\em in principle}
 565 possible to determine it by integrating the pressure autocorrelation
 566 function~\cite{PSmith93c}, however this is often hampered by the noisy
 567 tail of the ACF. A workaround to this is fitting the ACF to the
 568 following function~\cite{Guo2002b}:
 569 \beq
 570 f(t)/f(0) = (1-C) {\rm cos}(\omega t) e^{-(t/\tau_f)^{\beta_f}} + C
 571 e^{-(t/\tau_s)^{\beta_s}}
 572 \eeq
 573 where $\omega$ is the frequency of rapid pressure oscillations (mainly
 574 due to bonded forces in molecular simulations), $\tau_f$ and $\beta_f$
 575 are the time constant and exponent of fast relaxation in a
 576 stretched-exponential approximation, $\tau_s$ and $\beta_s$ are constants
 577 for slow relaxation and $C$ is the pre-factor that determines the
 578 weight between fast and slow relaxation. After a fit, the integral of
 579 the function $f(t)$ is used to compute the viscosity:
 580 \beq
 581 \eta = \frac{V}{k_B T}\int_0^{\infty} f(t) dt
 582 \eeq
 583 This equation has been
 584 applied to computing the bulk and shear viscosity using different
 585 elements from the pressure tensor~\cite{Fanourgakis2012a}.
 586 This is implemented in {\tt gmx viscosity}.
 587
 588 \section{Mean Square Displacement}
 589 \label{sec:msd}
 590 {\tt gmx msd}\\
 591 To determine the self diffusion coefficient\index{diffusion coefficient} $D_A$
 592 of
 593 particles of type $A$, one can use the \normindex{Einstein
 594 relation}~\cite{Allen87}:
 595 \beq
 596 \lim_{t \rightarrow \infty} \langle
 597 \|{\bf r}_i(t) - {\bf r}_i(0)\|^2 \rangle_{i \in A} ~=~ 6 D_A t
 598 \eeq
 599 This {\em mean square displacement} and $D_A$ are calculated by the
 600 program {\tt \normindex{gmx msd}}. Normally an index file containing
 601 atom numbers is used and the MSD is averaged over these atoms.  For
 602 molecules consisting of more than one atom, ${\bf r}_i$ can be taken
 603 as the center of mass positions of the molecules. In that case, you
 604 should use an index file with molecule numbers. The results will be
 605 nearly identical to averaging over atoms, however. The {\tt gmx msd}
 606 program can
 607 also be used for calculating diffusion in one or two dimensions. This
 608 is useful for studying lateral diffusion on interfaces.
 609
 610 An example of the mean square displacement of SPC water is given in
 611 \figref{msdwater}.
 612
 613 \begin{figure}
 614 \centerline{
 615 {\includegraphics[width=8cm]{plots/msdwater}}}
 616 \caption{Mean Square Displacement of SPC-water.}
 617 \label{fig:msdwater}
 618 \end{figure}
 619
 620 %
 621 %%%%%%%%%%%%%%%%%%%%% Bonds, angles and dihedral %%%%%%%%%%%%%%%%%%%
 622 %
 623 \section{Bonds/distances, angles and dihedrals}
 624 \label{sec:bad}
 625 {\tt gmx distance, gmx angle, gmx gangle}\\
 626 To monitor specific {\em bonds} in your modules, or more generally
 627 distances between points, the program
 628 {\tt \normindex{gmx distance}} can calculate distances as a function of
 629 time, as well as the distribution of the distance.
 630 With a traditional index file, the groups should consist of pairs of
 631 atom numbers, for example:
 632 \begin{verbatim}
 633 [ bonds_1 ]
 634  1     2
 635  3     4
 636  9    10
 637
 638 [ bonds_2 ]
 639 12    13
 640 \end{verbatim}
 641 Selections are also supported, with first two positions defining the
 642 first distance, second pair of positions defining the second distance
 643 and so on.  You can calculate the distances between CA and CB atoms in
 644 all your residues (assuming that every residue either has both atoms, or
 645 neither) using a selection such as:
 646 \begin{verbatim}
 647 name CA CB
 648 \end{verbatim}
 649 The selections also allow more generic distances to be computed.
 650 For example, to compute the distances between centers of mass of two
 651 residues, you can use:
 652 \begin{verbatim}
 653 com of resname AAA plus com of resname BBB
 654 \end{verbatim}
 655
 656 The program {\tt \normindex{gmx angle}} calculates the distribution of {\em angles} and
 657 {\em dihedrals} in time. It also gives the average angle or dihedral.
 658 The index file consists of triplets or quadruples of atom numbers:
 659
 660 \begin{verbatim}
 661 [ angles ]
 662  1     2     3
 663  2     3     4
 664  3     4     5
 665
 666 [ dihedrals ]
 667  1     2     3     4
 668  2     3     5     5
 669 \end{verbatim}
 670
 671 For the dihedral angles you can use either the ``biochemical convention''
 672 ($\phi = 0 \equiv cis$) or ``polymer convention'' ($\phi = 0 \equiv trans$),
 673 see \figref{dih_def}.
 674
 675 \begin{figure}
 676 \centerline{
 677 {\includegraphics[width=3.5cm,angle=270]{plots/dih-def}}}
 678 \caption[Dihedral conventions.]{Dihedral conventions: A. ``Biochemical
 679 convention''. B. ``Polymer convention''.}
 680 \label{fig:dih_def}
 681 \end{figure}
 682
 683 The program {\tt \normindex{gmx gangle}} provides a selection-enabled
 684 version to compute angles.  This tool can also compute angles and
 685 dihedrals, but does not support all the options of {\tt gmx angle}, such
 686 as autocorrelation or other time series analyses.
 687 In addition, it supports angles between two vectors, a vector and a
 688 plane, two planes (defined by 2 or 3 points, respectively), a
 689 vector/plane and the $z$ axis, or a vector/plane and the normal of a
 690 sphere (determined by a single position).
 691 Also the angle between a vector/plane compared to its position in the
 692 first frame is supported.
 693 For planes, {\tt \normindex{gmx gangle}} uses the normal vector
 694 perpendicular to the plane.
 695 See \figref{sgangle}A, B, C) for the definitions.
 696
 697 \begin{figure}
 698 \centerline{
 699 {\includegraphics{plots/sgangle}}}
 700 \caption[Angle options of {\tt gmx gangle}.]{Angle options of {\tt gmx gangle}:
 701 A. Angle between two vectors.
 702 B. Angle between two planes.
 703 C. Angle between a vector and the $z$ axis.
 704 D. Angle between a vector and the normal of a sphere.
 705 Also other combinations are supported: planes and vectors can be used
 706 interchangeably.}
 707 \label{fig:sgangle}
 708 \end{figure}
 709
 710 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Radius of gyration and distances
 711
 712 \section{Radius of gyration and distances}
 713 \label{sec:rg}
 714 {\tt gmx gyrate, gmx distance, gmx mindist, gmx mdmat, gmx pairdist, gmx xpm2ps}\\
 715 To have a rough measure for the compactness of a structure, you can calculate
 716 the {\em radius of gyration} with the program {\tt \normindex{gmx gyrate}} as follows:
 717 \beq
 718 R_g ~=~ \left({\frac{\sum_i \|{\bf r}_i\|^2 m_i}{\sum_i m_i}}\right)^{\half}
 719 \label{eqn:rg}
 720 \eeq
 721 where $m_i$ is the mass of atom $i$ and ${\bf r}_i$ the position of
 722 atom $i$ with respect to the center of mass of the molecule. It is especially
 723 useful to characterize polymer solutions and proteins. The program will also
 724 provide the radius of gyration around the coordinate axis (or, optionally, principal axes)
 725 by only summing the radii components orthogonal to each axis, for instance
 726 \beq
 727 R_{g,x} ~=~ \left({\frac{\sum_i \left( r_{i,y}^2 + r_{i,z}^2 \right) m_i}{\sum_i m_i}}\right)^{\half}
 728 \label{eqn:rgaxis}
 729 \eeq
 730
 731 Sometimes it is interesting to plot the {\em distance} between two atoms,
 732 or the {\em minimum} distance between two groups of atoms
 733 ({\eg}: protein side-chains in a salt bridge).
 734 To calculate these distances between certain groups there are several
 735 possibilities:
 736 \begin{description}
 737 \item[$\bullet$]
 738 The {\em distance between the geometrical centers} of two groups can be
 739 calculated with the program
 740 {{\tt \normindex{gmx distance}}, as explained in \secref{bad}.}
 741 \item[$\bullet$]
 742 The {\em minimum distance} between two groups of atoms during time
 743 can be calculated with the program {\tt \normindex{gmx mindist}}. It also calculates the
 744 {\em number of contacts} between these groups
 745 within a certain radius $r_{max}$.
 746 \item[$\bullet$]
 747 {\tt \normindex{gmx pairdist}} is a selection-enabled version of
 748 {\tt gmx mindist}.
 749 \item[$\bullet$]
 750 To monitor the {\em minimum distances between amino acid residues}
 751 within a (protein) molecule, you can use
 752 the program {\tt \normindex{gmx mdmat}}. This minimum distance between two residues
 753 A$_i$ and A$_j$ is defined as the smallest distance between any pair of
 754 atoms (i $\in$ A$_i$, j $\in$ A$_j$).
 755 The output is a symmetrical matrix of smallest distances
 756 between all residues.
 757 To visualize this matrix, you can use a program such as {\tt xv}.
 758 If you want to view the axes and legend or if you want to print
 759 the matrix, you can convert it with
 760 {\tt xpm2ps} into a Postscript picture, see \figref{mdmat}.
 761 \begin{figure}
 762 \centerline{
 763 \includegraphics[width=6.5cm]{plots/distm}}
 764 \caption{A minimum distance matrix for a peptide~\protect\cite{Spoel96b}.}
 765 \label{fig:mdmat}
 766 \end{figure}
 767
 768 Plotting these matrices for different time-frames, one can analyze changes
 769 in the structure, and {\eg} forming of salt bridges.
 770 \end{description}
 771
 772 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Root mean square deviations
 773
 774 \section{Root mean square deviations in structure}
 775 \label{sec:rmsd}
 776 {\tt gmx rms, gmx rmsdist}\\
 777 The {\em root mean square deviation} ($RMSD$) of certain atoms in a molecule
 778 with respect to a reference structure can be calculated with the program
 779 {\tt \normindex{gmx rms}} by least-square fitting the structure to the reference structure
 780 ($t_2 = 0$) and subsequently calculating the $RMSD$ (\eqnref{rmsd}).
 781 \beq
 782 RMSD(t_1,t_2) ~=~ \left[\frac{1}{M} \sum_{i=1}^N m_i \|{\bf r}_i(t_1)-{\bf r}_i(t_2)\|^2 \right]^{\frac{1}{2}}
 783 \label{eqn:rmsd}
 784 \eeq
 785 where $M = \sum_{i=1}^N m_i$ and ${\bf r}_i(t)$ is the position of atom $i$ at time $t$.
 786 {\bf Note} that fitting does not have to use the same atoms as the calculation
 787 of the $RMSD$; {\eg} a protein is usually fitted on the backbone atoms
 788 (N,C$_{\alpha}$,C), but the $RMSD$ can be computed of the backbone
 789 or of the whole protein.
 790
 791 Instead of comparing the structures to the initial structure at time $t=0$
 792 (so for example a crystal structure), one can also calculate \eqnref{rmsd}
 793 with a structure at time $t_2=t_1-\tau$.
 794 This gives some insight in the mobility as a function of $\tau$.
 795 A matrix can also be made with the $RMSD$ as a function of $t_1$ and $t_2$,
 796 which gives a nice graphical interpretation of a trajectory.
 797 If there are transitions in a trajectory, they will clearly show up in
 798 such a matrix.
 799
 800 Alternatively the $RMSD$ can be computed using a fit-free method with the
 801 program {\tt \normindex{gmx rmsdist}}:
 802 \beq
 803 RMSD(t) ~=~     \left[\frac{1}{N^2}\sum_{i=1}^N \sum_{j=1}^N    \|{\bf r}_{ij}(t)-{\bf r}_{ij}(0)\|^2\right]^{\frac{1}{2}}
 804 \label{eqn:rmsdff}
 805 \eeq
 806 where the {\em distance} {\bf r}$_{ij}$ between atoms at time $t$
 807 is compared with the distance between the same atoms at time $0$.
 808
 809 \section{Covariance analysis\index{covariance analysis}}
 810 \label{sec:covanal}
 811 Covariance analysis, also called
 812 \seeindex{principal component analysis}{covariance analysis}
 813 or \seeindex{essential dynamics}{covariance analysis}
 814 \cite{Amadei93}{,} can find correlated motions.
 815 It uses the covariance matrix $C$ of the atomic coordinates:
 816 \beq
 817 C_{ij} = \left \langle
 818 M_{ii}^{\frac{1}{2}} (x_i - \langle x_i \rangle)
 819 M_{jj}^{\frac{1}{2}}  (x_j - \langle x_j \rangle)
 820 \right \rangle
 821 \eeq
 822 where $M$ is a diagonal matrix containing the masses of the atoms
 823 (mass-weighted analysis) or the unit matrix (non-mass weighted analysis).
 824 $C$ is a symmetric $3N \times 3N$ matrix, which can be diagonalized with
 825 an orthonormal transformation matrix $R$:
 826 \beq
 827 R^T C R = \mbox{diag}(\lambda_1,\lambda_2,\ldots,\lambda_{3N})
 828 ~~~~\mbox{where}~~\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_{3N}
 829 \eeq
 830 The columns of $R$ are the eigenvectors, also called principal or
 831 essential modes.
 832 $R$ defines a transformation to a new coordinate system. The trajectory
 833 can be projected on the principal modes to give the principal components
 834 $p_i(t)$:
 835 \beq
 836 {\bf p}(t) = R^T M^{\frac{1}{2}} ({\bf x}(t) - \langle {\bf x} \rangle)
 837 \eeq
 838 The eigenvalue $\lambda_i$ is the mean square fluctuation of principal
 839 component $i$. The first few principal modes often describe
 840 collective, global motions in the system.
 841 The trajectory can be filtered along one (or more) principal modes.
 842 For one principal mode $i$ this goes as follows:
 843 \beq
 844 {\bf x}^f(t) =
 845 \langle {\bf x} \rangle + M^{-\frac{1}{2}} R_{*i} \, p_i(t)
 846 \eeq
 847
 848 When the analysis is performed on a macromolecule, one often wants to
 849 remove the overall rotation and translation to look at the internal motion
 850 only. This can be achieved by least square fitting to a reference structure.
 851 Care has to be taken that the reference structure is representative for the
 852 ensemble, since the choice of reference structure influences the covariance
 853 matrix.
 854
 855 One should always check if the principal modes are well defined.
 856 If the first principal component resembles a half cosine and
 857 the second resembles a full cosine, you might be filtering noise (see below).
 858 A good way to check the relevance of the first few principal
 859 modes is to calculate the overlap of the sampling between
 860 the first and second half of the simulation.
 861 {\bf Note} that this can only be done when the same reference structure is
 862 used for the two halves.
 863
 864 A good measure for the overlap has been defined in~\cite{Hess2002b}.
 865 The elements of the covariance matrix are proportional to the square
 866 of the displacement, so we need to take the square root of the matrix
 867 to examine the extent of sampling. The square root can be
 868 calculated from the eigenvalues $\lambda_i$ and the eigenvectors,
 869 which are the columns of the rotation matrix $R$.
 870 For a symmetric and diagonally-dominant matrix $A$ of size $3N \times 3N$
 871 the square root can be calculated as:
 872 \beq
 873 A^\frac{1}{2} =
 874 R \, \mbox{diag}(\lambda_1^\frac{1}{2},\lambda_2^\frac{1}{2},\ldots,\lambda_{3N}^\frac{1}{2}) \, R^T
 875 \eeq
 876 It can be verified easily that the product of this matrix with itself gives
 877 $A$.
 878 Now we can define a difference $d$ between covariance matrices $A$ and $B$
 879 as follows:
 880 \begin{eqnarray}
 881 d(A,B) & = & \sqrt{\mbox{tr}\left(\left(A^\frac{1}{2} - B^\frac{1}{2}\right)^2\right)
 882 }
 883 \\ & = &
 884 \sqrt{\mbox{tr}\left(A + B - 2 A^\frac{1}{2} B^\frac{1}{2}\right)}
 885 \\ & = &
 886 \left( \sum_{i=1}^N \left( \lambda_i^A + \lambda_i^B \right)
 887 - 2 \sum_{i=1}^N \sum_{j=1}^N \sqrt{\lambda_i^A \lambda_j^B}
 888 \left(R_i^A \cdot R_j^B\right)^2 \right)^\frac{1}{2}
 889 \end{eqnarray}
 890 where tr is the trace of a matrix.
 891 We can now define the overlap $s$ as:
 892 \beq
 893 s(A,B) = 1 - \frac{d(A,B)}{\sqrt{\mbox{tr}A + \mbox{tr} B}}
 894 \eeq
 895 The overlap is 1 if and only if matrices $A$ and $B$ are identical.
 896 It is 0 when the sampled subspaces are completely orthogonal.
 897
 898 A commonly-used measure is the subspace overlap of the first few
 899 eigenvectors of covariance matrices.
 900 The overlap of the subspace spanned by $m$ orthonormal vectors
 901 ${\bf w}_1,\ldots,{\bf w}_m$ with a reference subspace spanned by
 902 $n$ orthonormal vectors ${\bf v}_1,\ldots,{\bf v}_n$
 903 can be quantified as follows:
 904 \beq
 905 \mbox{overlap}({\bf v},{\bf w}) =
 906 \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^m ({\bf v}_i \cdot {\bf w}_j)^2
 907 \eeq
 908 The overlap will increase with increasing $m$ and will be 1 when
 909 set ${\bf v}$ is a subspace of set ${\bf w}$.
 910 The disadvantage of this method is that it does not take the eigenvalues
 911 into account. All eigenvectors are weighted equally, and when
 912 degenerate subspaces are present (equal eigenvalues), the calculated overlap
 913 will be too low.
 914
 915 Another useful check is the cosine content. It has been proven that the
 916 the principal components of random diffusion are cosines with the number of
 917 periods equal to half the principal component index~\cite{Hess2000,Hess2002b}.
 918 The eigenvalues are proportional to the index to the power $-2$.
 919 The cosine content is defined as:
 920 \beq
 921 \frac{2}{T}
 922 \left( \int_0^T \cos\left(\frac{i \pi t}{T}\right) \, p_i(t) \mbox{d} t \right)^2
 923 \left( \int_0^T p_i^2(t) \mbox{d} t \right)^{-1}
 924 \eeq
 925 When the cosine content of the first few principal components
 926 is close to 1, the largest fluctuations are not connected with
 927 the potential, but with random diffusion.
 928
 929 The covariance matrix is built and diagonalized by
 930 {\tt \normindex{gmx covar}}.
 931 The principal components and overlap (and many more things)
 932 can be plotted and analyzed with {\tt \normindex{gmx anaeig}}.
 933 The cosine content can be calculated with {\tt \normindex{gmx analyze}}.
 934
 935 \section{Dihedral principal component analysis}
 936 {\tt gmx angle, gmx covar, gmx anaeig}\\
 937 Principal component analysis can be performed in dihedral
 938 space~\cite{Mu2005a} using {\gromacs}. You start by defining the
 939 dihedral angles of interest in an index file, either using {\tt
 940  gmx mk_angndx} or otherwise. Then you use the {\tt gmx angle} program
 941 with the {\tt -or} flag to produce a new {\tt .trr} file containing the cosine and
 942 sine of each dihedral angle in two coordinates, respectively. That is,
 943 in the {\tt .trr} file you will have a series of numbers corresponding to:
 944 cos($\phi_1$), sin($\phi_1$), cos($\phi_2$), sin($\phi_2$), ...,
 945 cos($\phi_n$), sin($\phi_n$), and the array is padded with zeros, if
 946 necessary.  Then you can use this {\tt .trr} file as input for the {\tt
 947  gmx covar} program and perform principal component analysis as usual.
 948 For this to work you will need to generate a reference file ({\tt .tpr},
 949 {\tt .gro}, {\tt .pdb} etc.) containing the same number of ``atoms''
 950 as the new {\tt .trr} file, that is for $n$ dihedrals you need 2$n$/3 atoms
 951 (rounded up if not an integer number).
 952 You should use the {\tt -nofit} option for {\tt
 953 gmx covar} since the coordinates in the dummy reference file do not
 954 correspond in any way to the information in the {\tt .trr} file. Analysis of
 955 the results is done using {\tt gmx anaeig}.
 956
 957 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Hydrogen bonds
 958
 959 \section{Hydrogen bonds}
 960 {\tt gmx hbond}\\
 961 The program {\tt \normindex{gmx hbond}} analyzes the {\em hydrogen bonds} (H-bonds)
 962 between all possible donors D and acceptors A. To determine if an
 963 H-bond exists, a geometrical criterion is used, see also
 964 \figref{hbond}:
 965 \beq
 966 \begin{array}{rclcl}
 967 r       & \leq  & r_{HB}        & = & 0.35~\mbox{nm}    \\
 968 \alpha  & \leq  & \alpha_{HB}   & = & 30^o              \\
 969 \end{array}
 970 \eeq
 971
 972 \begin{figure}
 973 \centerline{\includegraphics[width=2.5cm,angle=270]{plots/hbond}}
 974 \caption{Geometrical Hydrogen bond criterion.}
 975 \label{fig:hbond}
 976 \end{figure}
 977
 978 The value of $r_{HB} = 0.35$~nm corresponds to the first minimum of the RDF of
 979 SPC water (see also \figref{rdf}).
 980
 981 The program {\tt gmx hbond} analyzes all hydrogen bonds existing
 982 between two groups of atoms (which must be either identical or
 983 non-overlapping) or in specified donor-hydrogen-acceptor triplets, in
 984 the following ways:
 985
 986 \begin{figure}
 987 \centerline{
 988 {\includegraphics[width=5cm,angle=270]{plots/hbond-insert}}}
 989 \caption[Insertion of water into an H-bond.]{Insertion of water into
 990 an H-bond. (1) Normal H-bond between two residues. (2) H-bonding
 991 bridge via a water molecule.}
 992 \label{fig:insert}
 993 \end{figure}
 994
 995 \begin{itemize}
 996 \item
 997 Donor-Acceptor distance ($r$) distribution of all H-bonds
 998 \item
 999 Hydrogen-Donor-Acceptor angle ($\alpha$) distribution of all H-bonds
1000 \item
1001 The total number of H-bonds in each time frame
1002 \item
1003 \newcommand{\nn}[1]{$n$-$n$+$#1$}
1004 The number of H-bonds in time between residues, divided into groups
1005 \nn{i} where $n$ and $n$+$i$ stand for residue numbers and $i$ goes
1006 from 0 to 6. The group for $i=6$ also includes all H-bonds for
1007 $i>6$. These groups include the \nn{3}, \nn{4} and \nn{5} H-bonds,
1008 which provide a measure for the formation of $\alpha$-helices or
1009 $\beta$-turns or strands.
1010 \item
1011 The lifetime of the H-bonds is calculated from the average over all
1012 autocorrelation functions of the existence functions (either 0 or 1)
1013 of all H-bonds:
1014 \beq
1015 C(\tau) ~=~ \langle s_i(t)~s_i (t + \tau) \rangle
1016 \label{eqn:hbcorr}
1017 \eeq
1018 with $s_i(t) = \{0,1\}$ for H-bond $i$ at time $t$. The integral of
1019 $C(\tau)$ gives a rough estimate of the average H-bond lifetime
1020 $\tau_{HB}$:
1021 \beq
1022 \tau_{HB} ~=~ \int_{0}^{\infty} C(\tau) d\tau
1023 \label{eqn:hblife}
1024 \eeq
1025 Both the integral and the complete autocorrelation function $C(\tau)$
1026 will be output, so that more sophisticated analysis ({\eg}\@ using
1027 multi-exponential fits) can be used to get better estimates for
1028 $\tau_{HB}$. A more complete analysis is given in ref.~\cite{Spoel2006b};
1029 one of the more fancy option is the Luzar and Chandler analysis
1030 of hydrogen bond kinetics~\cite{Luzar96b,Luzar2000a}.
1031 \item
1032 An H-bond existence map can be generated of dimensions {\em
1033 \#~H-bonds}$\times${\em \#~frames}. The ordering is identical to the index
1034 file (see below), but reversed, meaning that the last triplet in the index
1035 file corresponds to the first row of the existence map.
1036 \item
1037 Index groups are output containing the analyzed groups, all
1038 donor-hydrogen atom pairs and acceptor atoms in these groups,
1039 donor-hydrogen-acceptor triplets involved in hydrogen bonds between
1040 the analyzed groups and all solvent atoms involved in insertion.
1041
1042 \end{itemize}
1043
1044 %
1045 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Protein related items
1046
1047 \section{Protein-related items}
1048 {\tt gmx do_dssp, gmx rama, gmx wheel}\\
1049 To analyze structural changes of a protein, you can calculate the radius of
1050 gyration or the minimum residue distances over time
1051 (see \secref{rg}), or calculate the RMSD (\secref{rmsd}).
1052
1053 You can also look at the changing of {\em secondary structure elements}
1054 during your run. For this, you can use the program {\tt \normindex{gmx do_dssp}}, which is
1055 an interface for the commercial program {\tt DSSP}~\cite{Kabsch83}. For
1056 further information, see the {\tt DSSP} manual. A typical output plot of
1057 {\tt gmx do_dssp} is given in \figref{dssp}.
1058
1059 \begin{figure}
1060 \centerline{
1061 \includegraphics[width=12cm]{plots/dssp}}
1062 \caption{Analysis of the secondary structure elements of a peptide in time.}
1063 \label{fig:dssp}
1064 \end{figure}
1065
1066 One other important analysis of proteins is the so-called
1067 {\em Ramachandran plot}.
1068 This is the projection of the structure on the two dihedral angles $\phi$ and
1069 $\psi$ of the protein backbone, see \figref{phipsi}.
1070
1071 \begin{figure}
1072 \centerline{
1073 \includegraphics[width=5cm]{plots/phipsi}}
1074 \caption{Definition of the dihedral angles $\phi$ and $\psi$ of the protein backbone.}
1075 \label{fig:phipsi}
1076 \end{figure}
1077
1078 To evaluate this Ramachandran plot you can use the program {\tt
1079 \normindex{gmx rama}}.
1080 A typical output is given in \figref{rama}.
1081
1082 \begin{figure}
1083 \centerline{
1084 {\includegraphics[width=8cm]{plots/rama}}}
1085 \caption{Ramachandran plot of a small protein.}
1086 \label{fig:rama}
1087 \end{figure}
1088
1089 When studying $\alpha$-helices
1090 it is useful to have a {\em helical wheel} projection
1091 of your peptide, to see whether a peptide is amphipathic. This can be done
1092 using the {\tt \normindex{gmx wheel}} program. Two examples are
1093 plotted in \figref{wheel}.
1094
1095 \begin{figure}
1096 \centerline{\includegraphics[width=\htw]{plots/hpr-wheel}}
1097 \caption{Helical wheel projection of the N-terminal helix of HPr.}
1098 \label{fig:wheel}
1099 \end{figure}
1100
1101 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Membrane Related Items
1102
1103 \section{Interface-related items}
1104 {\tt gmx order, gmx density, gmx potential, gmx traj}\\
1105 When simulating molecules with long carbon tails, it can be
1106 interesting to calculate their average orientation. There are several
1107 flavors of order parameters, most of which are related. The program
1108 {\tt \normindex{gmx order}} can calculate order parameters using the equation:
1109
1110 \begin{equation}
1111 S_{z} = \frac{3}{2}\langle {\cos^2{\theta_z}} \rangle - \frac{1}{2}
1112 \label{eqn:Sgr}
1113 \end{equation}
1114 where $\theta_z$ is the angle between the $z$-axis of the simulation
1115 box and the molecular axis under consideration. The latter is defined as the
1116 vector from C$_{n-1}$ to C$_{n+1}$. The parameters $S_x$
1117 and $S_y$ are defined in the same way. The brackets imply averaging over time
1118 and molecules. Order parameters can vary between 1 (full order along
1119 the interface normal) and $-1/2$ (full order perpendicular to the
1120 normal), with a value of zero in the case of isotropic orientation.
1121
1122 The program can do two things for you. It can calculate the order
1123 parameter for each CH$_2$ segment separately, for any of three axes,
1124 or it can divide the box in slices and calculate the average value of
1125 the order parameter per segment in one slice. The first method gives
1126 an idea of the ordering of a molecule from head to tail, the second
1127 method gives an idea of the ordering as function of the box length.
1128
1129 The electrostatic potential ($\psi$) across the interface can be
1130 computed from a trajectory by evaluating the double integral of the
1131 charge density ($\rho(z)$):
1132 \beq
1133 \psi(z) - \psi(-\infty) = - \int_{-\infty}^z dz' \int_{-\infty}^{z'} \rho(z'')dz''/ \epsilon_0
1134 \label{eqn:elpotgr}
1135 \eeq
1136 where the position $z=-\infty$ is far enough in the bulk phase such that
1137 the field is zero.  With this method, it is possible to ``split'' the
1138 total potential into separate contributions from lipid and water
1139 molecules. The program {\tt \normindex{gmx potential}} divides the box in slices and
1140 sums all charges of the atoms in each slice. It then integrates this
1141 charge density to give the electric field, which is in turn integrated to
1142 give the potential. Charge density, electric field, and potential are written
1143 to {\tt xvgr} input files.
1144
1145 The program {\tt \normindex{gmx traj}} is a very simple analysis program. All it
1146 does is print the coordinates, velocities, or forces of selected atoms.
1147 It can also calculate the center of mass of one or more
1148 molecules and print the coordinates of the center of mass to three
1149 files. By itself, this is probably not a very useful analysis, but
1150 having the coordinates of selected molecules or atoms can be very
1151 handy for further analysis, not only in interfacial systems.
1152
1153 The program {\tt \normindex{gmx density}} calculates the mass density of
1154 groups and gives a plot of the density against a box
1155 axis. This is useful for looking at the distribution of groups or
1156 atoms across the interface.
1157
1158 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Chemical shifts
1159
1160 % \section{Chemical shifts}
1161 % {\tt total, do_shift}\\
1162 % You can compute the NMR chemical shifts of protons with the program
1163 % {\tt \normindex{do_shift}}. This is just an {\gromacs} interface to
1164 % the public domain program {\tt total}~\cite{Williamson93a}. For
1165 % further information, read the article. Although there is limited
1166 % support for this in {\gromacs}, users are encouraged to use the
1167 % software provided by David Case's group at Scripps because it seems
1168 % to be more up-to-date.
1169
1170 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1171
1172 % LocalWords:  Xmgr ndx mk angndx rdf dihedrals grompp hydrogens MainChain Prot
1173 % LocalWords:  oxygens SideChain tryptophan vsitetop aminoacids dat ngmx dr SPC
1174 % LocalWords:  OO ACF fg xmgr corrmd ACFs kM iM FFT th NMR nd corrleg rotacf dt
1175 % LocalWords:  velacc Kubo msd Tildesley corrtime msdwater sgangle cis dih xpm
1176 % LocalWords:  mindist mdmat rms rmsdist rmsd jj diag eigenvectors covar anaeig
1177 % LocalWords:  macromolecule trr tpr gro pdb nofit hbond rclcl HB multi Luzar
1178 % LocalWords:  dssp rama xrama rg Ramachandran phipsi amphipathic HPr xvgr pvd
1179 % LocalWords:  Scripps RDF online usinggroups mdrun groupconcept HOH
1180 % LocalWords:  residuetypes determinding OpenGL traj ov rcl ij ps nm
1181 % LocalWords:  autocorrelation interfacial pairdist