docs/manual/algorithms.tex

   1 %
   2 % This file is part of the GROMACS molecular simulation package.
   3 %
   4 % Copyright (c) 2013,2014,2015, by the GROMACS development team, led by
   5 % Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
   6 % and including many others, as listed in the AUTHORS file in the
   7 % top-level source directory and at http://www.gromacs.org.
   8 %
   9 % GROMACS is free software; you can redistribute it and/or
  10 % modify it under the terms of the GNU Lesser General Public License
  11 % as published by the Free Software Foundation; either version 2.1
  12 % of the License, or (at your option) any later version.
  13 %
  14 % GROMACS is distributed in the hope that it will be useful,
  15 % but WITHOUT ANY WARRANTY; without even the implied warranty of
  16 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  17 % Lesser General Public License for more details.
  18 %
  19 % You should have received a copy of the GNU Lesser General Public
  20 % License along with GROMACS; if not, see
  21 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
  22 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
  23 %
  24 % If you want to redistribute modifications to GROMACS, please
  25 % consider that scientific software is very special. Version
  26 % control is crucial - bugs must be traceable. We will be happy to
  27 % consider code for inclusion in the official distribution, but
  28 % derived work must not be called official GROMACS. Details are found
  29 % in the README & COPYING files - if they are missing, get the
  30 % official version at http://www.gromacs.org.
  31 %
  32 % To help us fund GROMACS development, we humbly ask that you cite
  33 % the research papers on the package. Check out http://www.gromacs.org.
  34
  35 \newcommand{\nproc}{\mbox{$M$}}
  36 \newcommand{\natom}{\mbox{$N$}}
  37 \newcommand{\nx}{\mbox{$n_x$}}
  38 \newcommand{\ny}{\mbox{$n_y$}}
  39 \newcommand{\nz}{\mbox{$n_z$}}
  40 \newcommand{\nsgrid}{NS grid}
  41 \newcommand{\fftgrid}{FFT grid}
  42 \newcommand{\dgrid}{\mbox{$\delta_{grid}$}}
  43 \newcommand{\bfv}[1]{{\mbox{\boldmath{$#1$}}}}
  44 % non-italicized boldface for math (e.g. matrices)
  45 \newcommand{\bfm}[1]{{\bf #1}}
  46 \newcommand{\dt}{\Delta t}
  47 \newcommand{\rv}{\bfv{r}}
  48 \newcommand{\vv}{\bfv{v}}
  49 \newcommand{\F}{\bfv{F}}
  50 \newcommand{\pb}{\bfv{p}}
  51 \newcommand{\veps}{v_{\epsilon}}
  52 \newcommand{\peps}{p_{\epsilon}}
  53 \newcommand{\sinhx}[1]{\frac{\sinh{\left( #1\right)}}{#1}}
  54 \chapter{Algorithms}
  55 \label{ch:algorithms}
  56 \section{Introduction}
  57 In this chapter we first give describe some general concepts used in
  58 {\gromacs}:  {\em periodic boundary conditions} (\secref{pbc})
  59 and the {\em group concept} (\secref{groupconcept}). The MD algorithm is
  60 described in \secref{MD}: first a global form of the algorithm is
  61 given, which is refined in subsequent subsections. The (simple) EM
  62 (Energy Minimization) algorithm is described in \secref{EM}. Some
  63 other algorithms for special purpose dynamics are described after
  64 this.
  65
  66 %\ifthenelse{\equal{\gmxlite}{1}}{}{
  67 %In the final \secref{par} of this chapter a few principles are
  68 %given on which parallelization of {\gromacs} is based. The
  69 %parallelization is hardly visible for the user and is therefore not
  70 %treated in detail.
  71 %} % Brace matches ifthenelse test for gmxlite
  72
  73 A few issues are of general interest. In all cases the {\em system}
  74 must be defined, consisting of molecules. Molecules again consist of
  75 particles  with defined interaction functions. The detailed
  76 description of the {\em topology} of the molecules and of the {\em force
  77 field} and the calculation of forces is given in
  78 \chref{ff}. In the present chapter we describe
  79 other aspects of the algorithm, such as pair list generation, update of
  80 velocities  and positions, coupling to external temperature and
  81 pressure,  conservation of constraints.
  82 \ifthenelse{\equal{\gmxlite}{1}}{}{
  83 The {\em analysis} of the data generated by an MD simulation is treated in \chref{analysis}.
  84 } % Brace matches ifthenelse test for gmxlite
  85
  86 \section{Periodic boundary conditions\index{periodic boundary conditions}}
  87 \label{sec:pbc}
  88 \begin{figure}
  89 \centerline{\includegraphics[width=9cm]{plots/pbctric}}
  90 \caption {Periodic boundary conditions in two dimensions.}
  91 \label{fig:pbc}
  92 \end{figure}
  93 The classical way to minimize edge effects in a finite system is to
  94 apply {\em periodic boundary conditions}. The atoms of the system to
  95 be simulated are put into a space-filling box, which is surrounded by
  96 translated copies of itself (\figref{pbc}).  Thus there are no
  97 boundaries of the system; the artifact caused by unwanted boundaries
  98 in an isolated cluster is now replaced by the artifact of periodic
  99 conditions. If the system is crystalline, such boundary conditions are
 100 desired (although motions are naturally restricted to periodic motions
 101 with wavelengths fitting into the box). If one wishes to simulate
 102 non-periodic systems, such as liquids or solutions, the periodicity by
 103 itself causes errors. The errors can be evaluated by comparing various
 104 system sizes; they are expected to be less severe than the errors
 105 resulting from an unnatural boundary with vacuum.
 106
 107 There are several possible shapes for space-filling unit cells. Some,
 108 like the {\em \normindex{rhombic dodecahedron}} and the
 109 {\em \normindex{truncated octahedron}}~\cite{Adams79} are closer to being a sphere
 110 than a cube is, and are therefore better suited to the
 111 study of an approximately spherical macromolecule in solution, since
 112 fewer solvent molecules are required to fill the box given a minimum
 113 distance between macromolecular images. At the same time, rhombic
 114 dodecahedra and truncated octahedra are special cases of {\em triclinic}
 115 unit cells\index{triclinic unit cell}; the most general space-filling unit cells
 116 that comprise all possible space-filling shapes~\cite{Bekker95}.
 117 For this reason, {\gromacs} is based on the triclinic unit cell.
 118
 119 {\gromacs} uses periodic boundary conditions, combined with the {\em
 120 \normindex{minimum image convention}}: only one -- the nearest -- image of each
 121 particle is considered for short-range non-bonded interaction terms.
 122 For long-range electrostatic interactions this is not always accurate
 123 enough, and {\gromacs} therefore also incorporates lattice sum methods
 124 such as Ewald Sum, PME and PPPM.
 125
 126 {\gromacs} supports triclinic boxes of any shape.
 127 The simulation box (unit cell) is defined by the 3 box vectors
 128 ${\bf a}$,${\bf b}$ and ${\bf c}$.
 129 The box vectors must satisfy the following conditions:
 130 \beq
 131 \label{eqn:box_rot}
 132 a_y = a_z = b_z = 0
 133 \eeq
 134 \beq
 135 \label{eqn:box_shift1}
 136 a_x>0,~~~~b_y>0,~~~~c_z>0
 137 \eeq
 138 \beq
 139 \label{eqn:box_shift2}
 140 |b_x| \leq \frac{1}{2} \, a_x,~~~~
 141 |c_x| \leq \frac{1}{2} \, a_x,~~~~
 142 |c_y| \leq \frac{1}{2} \, b_y
 143 \eeq
 144 Equations \ref{eqn:box_rot} can always be satisfied by rotating the box.
 145 Inequalities (\ref{eqn:box_shift1}) and (\ref{eqn:box_shift2}) can always be
 146 satisfied by adding and subtracting box vectors.
 147
 148 Even when simulating using a triclinic box, {\gromacs} always keeps the
 149 particles in a brick-shaped volume for efficiency,
 150 as illustrated in \figref{pbc} for a 2-dimensional system.
 151 Therefore, from the output trajectory it might seem that the simulation was
 152 done in a rectangular box. The program {\tt trjconv} can be used to convert
 153 the trajectory to a different unit-cell representation.
 154
 155 It is also possible to simulate without periodic boundary conditions,
 156 but it is usually more efficient to simulate an isolated cluster of molecules
 157 in a large periodic box, since fast grid searching can only be used
 158 in a periodic system.
 159
 160 \begin{figure}
 161 \centerline{
 162 \includegraphics[width=5cm]{plots/rhododec}
 163 ~~~~\includegraphics[width=5cm]{plots/truncoct}
 164 }
 165 \caption {A rhombic dodecahedron and truncated octahedron
 166 (arbitrary orientations).}
 167 \label{fig:boxshapes}
 168 \end{figure}
 169
 170 \subsection{Some useful box types}
 171 \begin{table}
 172 \centerline{
 173 \begin{tabular}{|c|c|c|ccc|ccc|}
 174 \dline
 175 box type & image & box & \multicolumn{3}{c|}{box vectors} & \multicolumn{3}{c|}{box vector angles} \\
 176  & distance & volume & ~{\bf a}~ & {\bf b} & {\bf c} &
 177    $\angle{\bf bc}$ & $\angle{\bf ac}$ & $\angle{\bf ab}$ \\
 178 \dline
 179              &     &       & $d$ & 0              & 0              & & & \\
 180 cubic        & $d$ & $d^3$ & 0   & $d$            & 0              & $90^\circ$ & $90^\circ$ & $90^\circ$ \\
 181              &     &       & 0   & 0              & $d$            & & & \\
 182 \hline
 183 rhombic      &     &       & $d$ & 0              & $\frac{1}{2}\,d$ & & & \\
 184 dodecahedron & $d$ & $\frac{1}{2}\sqrt{2}\,d^3$ & 0   & $d$            & $\frac{1}{2}\,d$ & $60^\circ$ & $60^\circ$ & $90^\circ$ \\
 185 (xy-square)  &     & $0.707\,d^3$ & 0   & 0              & $\frac{1}{2}\sqrt{2}\,d$ & & & \\
 186 \hline
 187 rhombic      &     &       & $d$ & $\frac{1}{2}\,d$ & $\frac{1}{2}\,d$ & & & \\
 188 dodecahedron & $d$ & $\frac{1}{2}\sqrt{2}\,d^3$ & 0 & $\frac{1}{2}\sqrt{3}\,d$ & $\frac{1}{6}\sqrt{3}\,d$ & $60^\circ$ & $60^\circ$ & $60^\circ$ \\
 189 (xy-hexagon) &     & $0.707\,d^3$ & 0   & 0              & $\frac{1}{3}\sqrt{6}\,d$ & & & \\
 190 \hline
 191 truncated    &     &       & $d$ & $\frac{1}{3}\,d$ & $-\frac{1}{3}\,d$ & & &\\
 192 octahedron   & $d$ & $\frac{4}{9}\sqrt{3}\,d^3$ & 0   & $\frac{2}{3}\sqrt{2}\,d$ & $\frac{1}{3}\sqrt{2}\,d$ & $71.53^\circ$ & $109.47^\circ$ & $71.53^\circ$ \\
 193              &     & $0.770\,d^3$ & 0   & 0              & $\frac{1}{3}\sqrt{6}\,d$ & & & \\
 194 \dline
 195 \end{tabular}
 196 }
 197 \caption{The cubic box, the rhombic \normindex{dodecahedron} and the truncated
 198 \normindex{octahedron}.}
 199 \label{tab:boxtypes}
 200 \end{table}
 201 The three most useful box types for simulations of solvated systems
 202 are described in \tabref{boxtypes}.  The rhombic dodecahedron
 203 (\figref{boxshapes}) is the smallest and most regular space-filling
 204 unit cell. Each of the 12 image cells is at the same distance.  The
 205 volume is 71\% of the volume of a cube having the same image
 206 distance. This saves about 29\% of CPU-time when simulating a
 207 spherical or flexible molecule in solvent. There are two different
 208 orientations of a rhombic dodecahedron that satisfy equations
 209 \ref{eqn:box_rot}, \ref{eqn:box_shift1} and \ref{eqn:box_shift2}.
 210 The program {\tt editconf} produces the orientation
 211 which has a square intersection with the xy-plane.  This orientation
 212 was chosen because the first two box vectors coincide with the x and
 213 y-axis, which is easier to comprehend. The other orientation can be
 214 useful for simulations of membrane proteins. In this case the
 215 cross-section with the xy-plane is a hexagon, which has an area which
 216 is 14\% smaller than the area of a square with the same image
 217 distance.  The height of the box ($c_z$) should be changed to obtain
 218 an optimal spacing.  This box shape not only saves CPU time, it
 219 also results in a more uniform arrangement of the proteins.
 220
 221 \subsection{Cut-off restrictions}
 222 The \normindex{minimum image convention} implies that the cut-off radius used to
 223 truncate non-bonded interactions may not exceed half the shortest box
 224 vector:
 225 \beq
 226 \label{eqn:physicalrc}
 227   R_c < \half \min(\|{\bf a}\|,\|{\bf b}\|,\|{\bf c}\|),
 228 \eeq
 229 because otherwise more than one image would be within the cut-off distance
 230 of the force. When a macromolecule, such as a protein, is studied in
 231 solution, this restriction alone is not sufficient: in principle, a single
 232 solvent molecule should not be able
 233 to `see' both sides of the macromolecule. This means that the length of
 234 each box vector must exceed the length of the macromolecule in the
 235 direction of that edge {\em plus} two times the cut-off radius $R_c$.
 236 It is, however, common to compromise in this respect, and make the solvent
 237 layer somewhat smaller in order to reduce the computational cost.
 238 For efficiency reasons the cut-off with triclinic boxes is more restricted.
 239 For grid search the extra restriction is weak:
 240 \beq
 241 \label{eqn:gridrc}
 242 R_c < \min(a_x,b_y,c_z)
 243 \eeq
 244 For simple search the extra restriction is stronger:
 245 \beq
 246 \label{eqn:simplerc}
 247 R_c < \half \min(a_x,b_y,c_z)
 248 \eeq
 249
 250 Each unit cell (cubic, rectangular or triclinic)
 251 is surrounded by 26 translated images. A
 252 particular image can therefore always be identified by an index pointing to one
 253 of 27 {\em translation vectors} and constructed by applying a
 254 translation with the indexed vector (see \ssecref{forces}).
 255 Restriction (\ref{eqn:gridrc}) ensures that only 26 images need to be
 256 considered.
 257
 258 %\ifthenelse{\equal{\gmxlite}{1}}{}{
 259 \section{The group concept}
 260 \label{sec:groupconcept}\index{group}
 261 The {\gromacs} MD and analysis programs use user-defined {\em groups} of
 262 atoms to perform certain actions on. The maximum number of groups is
 263 256, but each atom can only belong to six different groups, one
 264 each of the following:
 265 \begin{description}
 266 \item[\swapindex{temperature-coupling}{group}]
 267 The \normindex{temperature coupling} parameters (reference
 268 temperature, time constant, number of degrees of freedom, see
 269 \ssecref{update}) can be defined for each T-coupling group
 270 separately. For example, in a solvated macromolecule the solvent (that
 271 tends to generate more heating by force and integration errors) can be
 272 coupled with a shorter time constant to a bath than is a macromolecule,
 273 or a surface can be kept cooler than an adsorbing molecule. Many
 274 different T-coupling groups may be defined. See also center of mass
 275 groups below.
 276
 277 \item[\swapindex{freeze}{group}\index{frozen atoms}]
 278 Atoms that belong to a freeze group are kept stationary in the
 279 dynamics. This is useful during equilibration, {\eg} to avoid badly
 280 placed solvent molecules giving unreasonable kicks to protein atoms,
 281 although the same effect can also be obtained by putting a restraining
 282 potential on the atoms that must be protected. The freeze option can
 283 be used, if desired, on just one or two coordinates of an atom,
 284 thereby freezing the atoms in a plane or on a line.  When an atom is
 285 partially frozen, constraints will still be able to move it, even in a
 286 frozen direction. A fully frozen atom can not be moved by constraints.
 287 Many freeze groups can be defined.  Frozen coordinates are unaffected
 288 by pressure scaling; in some cases this can produce unwanted results,
 289 particularly when constraints are also used (in this case you will
 290 get very large pressures). Accordingly, it is recommended to avoid
 291 combining freeze groups with constraints and pressure coupling. For the
 292 sake of equilibration it could suffice to start with freezing in a
 293 constant volume simulation, and afterward use position restraints in
 294 conjunction with constant pressure.
 295
 296 \item[\swapindex{accelerate}{group}]
 297 On each atom in an ``accelerate group'' an acceleration
 298 $\ve{a}^g$ is imposed. This is equivalent to an external
 299 force. This feature makes it possible to drive the system into a
 300 non-equilibrium state and enables the performance of
 301 \swapindex{non-equilibrium}{MD} and hence to obtain transport properties.
 302
 303 \item[\swapindex{energy-monitor}{group}]
 304 Mutual interactions between all energy-monitor groups are compiled
 305 during the simulation. This is done separately for Lennard-Jones and
 306 Coulomb terms.  In principle up to 256 groups could be defined, but
 307 that would lead to 256$\times$256 items! Better use this concept
 308 sparingly.
 309
 310 All non-bonded interactions between pairs of energy-monitor groups can
 311 be excluded\index{exclusions}
 312 \ifthenelse{\equal{\gmxlite}{1}}
 313 {.}
 314 {(see details in the User Guide).}
 315 Pairs of particles from excluded pairs of energy-monitor groups
 316 are not put into the pair list.
 317 This can result in a significant speedup
 318 for simulations where interactions within or between parts of the system
 319 are not required.
 320
 321 \item[\swapindex{center of mass}{group}\index{removing COM motion}]
 322 In \gromacs\ the center of mass (COM) motion can be removed, for
 323 either the complete system or for groups of atoms. The latter is
 324 useful, {\eg} for systems where there is limited friction ({\eg} gas
 325 systems) to prevent center of mass motion to occur. It makes sense to
 326 use the same groups for temperature coupling and center of mass motion
 327 removal.
 328
 329 \item[\swapindex{Compressed position output}{group}]
 330
 331 In order to further reduce the size of the compressed trajectory file
 332 ({\tt .xtc{\index{XTC}}} or {\tt .tng{\index{TNG}}}), it is possible
 333 to store only a subset of all particles. All x-compression groups that
 334 are specified are saved, the rest are not. If no such groups are
 335 specified, than all atoms are saved to the compressed trajectory file.
 336
 337 \end{description}
 338 The use of groups in {\gromacs} tools is described in
 339 \secref{usinggroups}.
 340 %} % Brace matches ifthenelse test for gmxlite
 341
 342 \section{Molecular Dynamics}
 343 \label{sec:MD}
 344 \begin{figure}
 345 \begin{center}
 346 \addtolength{\fboxsep}{0.5cm}
 347 \begin{shadowenv}[12cm]
 348 {\large \bf THE GLOBAL MD ALGORITHM}
 349 \rule{\textwidth}{2pt} \\
 350 {\bf 1. Input initial conditions}\\[2ex]
 351 Potential interaction $V$ as a function of atom positions\\
 352 Positions $\ve{r}$ of all atoms in the system\\
 353 Velocities $\ve{v}$ of all atoms in the system \\
 354 $\Downarrow$\\
 355 \rule{\textwidth}{1pt}\\
 356 {\bf repeat 2,3,4} for the required number of steps:\\
 357 \rule{\textwidth}{1pt}\\
 358 {\bf 2. Compute forces} \\[1ex]
 359 The force on any atom  \\[1ex]
 360 $\ve{F}_i = - \displaystyle\frac{\partial V}{\partial \ve{r}_i}$ \\[1ex]
 361 is computed by calculating the force between non-bonded atom pairs: \\
 362 $\ve{F}_i = \sum_j \ve{F}_{ij}$ \\
 363 plus the forces due to bonded interactions (which may depend on 1, 2,
 364 3, or 4 atoms), plus restraining and/or external forces. \\
 365 The potential and kinetic energies and the pressure tensor may be computed. \\
 366 $\Downarrow$\\
 367 {\bf 3. Update configuration} \\[1ex]
 368 The movement of the atoms is simulated by numerically solving Newton's
 369 equations of motion \\[1ex]
 370 $\displaystyle
 371 \frac {\de^2\ve{r}_i}{\de t^2} = \frac{\ve{F}_i}{m_i} $ \\
 372 or \\
 373 $\displaystyle
 374 \frac{\de\ve{r}_i}{\de t} = \ve{v}_i ; \;\;
 375 \frac{\de\ve{v}_i}{\de t} = \frac{\ve{F}_i}{m_i} $ \\[1ex]
 376 $\Downarrow$ \\
 377 {\bf 4.} if required: {\bf Output step} \\
 378 write positions, velocities, energies, temperature, pressure, etc. \\
 379 \end{shadowenv}
 380 \caption{The global MD algorithm}
 381 \label{fig:global}
 382 \end{center}
 383 \end{figure}
 384 A global flow scheme for MD is given in \figref{global}. Each
 385 MD or  EM run requires as input a set of initial coordinates and --
 386 optionally -- initial velocities of all particles involved. This
 387 chapter does not describe how these are obtained; for the setup of an
 388 actual MD run check the online manual at {\wwwpage}.
 389
 390 \subsection{Initial conditions}
 391 \subsubsection{Topology and force field}
 392 The system topology, including a description of the force field, must
 393 be read in.
 394 \ifthenelse{\equal{\gmxlite}{1}}
 395 {.}
 396 {Force fields and topologies are described in \chref{ff}
 397 and \ref{ch:top}, respectively.}
 398 All this information is static; it is never modified during the run.
 399
 400 \subsubsection{Coordinates and velocities}
 401 \begin{figure}
 402 \centerline{\includegraphics[width=8cm]{plots/maxwell}}
 403 \caption{A Maxwell-Boltzmann velocity distribution, generated from
 404     random numbers.}
 405 \label{fig:maxwell}
 406 \end{figure}
 407
 408 Then, before a run starts, the box size and the coordinates and
 409 velocities of  all particles are required. The box size and shape is
 410 determined by three vectors (nine numbers) $\ve{b}_1, \ve{b}_2, \ve{b}_3$,
 411 which represent the three basis vectors of the periodic box.
 412
 413 If the run starts at $t=t_0$, the coordinates at $t=t_0$ must be
 414 known. The {\em leap-frog algorithm}, the default algorithm used to
 415 update the time step with $\Dt$ (see \ssecref{update}), also requires
 416 that the velocities at $t=t_0 - \hDt$ are known. If velocities are not
 417 available, the program can generate initial atomic velocities
 418 $v_i, i=1\ldots 3N$ with a \index{Maxwell-Boltzmann distribution}
 419 (\figref{maxwell}) at a given absolute temperature $T$:
 420 \beq
 421 p(v_i) = \sqrt{\frac{m_i}{2 \pi kT}}\exp\left(-\frac{m_i v_i^2}{2kT}\right)
 422 \eeq
 423 where $k$ is Boltzmann's constant (see \chref{defunits}).
 424 To accomplish this, normally distributed random numbers are generated
 425 by adding twelve random numbers $R_k$ in the range $0 \le R_k < 1$ and
 426 subtracting 6.0 from their sum. The result is then multiplied by the
 427 standard deviation of the velocity distribution $\sqrt{kT/m_i}$. Since
 428 the resulting total energy will not correspond exactly to the required
 429 temperature $T$, a correction is made: first the center-of-mass motion
 430 is removed and then all velocities are scaled so that the total
 431 energy corresponds exactly to $T$ (see \eqnref{E-T}).
 432 % Why so complicated? What's wrong with Box-Mueller transforms?
 433
 434 \subsubsection{Center-of-mass motion\index{removing COM motion}}
 435 The \swapindex{center-of-mass}{velocity} is normally set to zero at
 436 every step; there is (usually) no net external force acting on the
 437 system and the center-of-mass velocity should remain constant. In
 438 practice, however, the update algorithm introduces a very slow change in
 439 the center-of-mass velocity, and therefore in the total kinetic energy of
 440 the system -- especially when temperature coupling is used. If such
 441 changes are not quenched, an appreciable center-of-mass motion
 442 can develop in long runs, and the temperature will be
 443 significantly misinterpreted. Something similar may happen due to overall
 444 rotational motion, but only when an isolated cluster is simulated. In
 445 periodic systems with filled boxes, the overall rotational motion is
 446 coupled to other degrees of freedom and does not cause such problems.
 447
 448
 449 \subsection{Neighbor searching\swapindexquiet{neighbor}{searching}}
 450 \label{subsec:ns}
 451 As mentioned in \chref{ff}, internal forces are
 452 either generated from fixed (static) lists, or from dynamic lists.
 453 The latter consist of non-bonded interactions between any pair of particles.
 454 When calculating the non-bonded forces, it is convenient to have all
 455 particles in a rectangular box.
 456 As shown in \figref{pbc}, it is possible to transform a
 457 triclinic box into a rectangular box.
 458 The output coordinates are always in a rectangular box, even when a
 459 dodecahedron or triclinic box was used for the simulation.
 460 Equation \ref{eqn:box_rot} ensures that we can reset particles
 461 in a rectangular box by first shifting them with
 462 box vector ${\bf c}$, then with ${\bf b}$ and finally with ${\bf a}$.
 463 Equations \ref{eqn:box_shift2}, \ref{eqn:physicalrc} and \ref{eqn:gridrc}
 464 ensure that we can find the 14 nearest triclinic images within
 465 a linear combination that does not involve multiples of box vectors.
 466
 467 \subsubsection{Pair lists generation}
 468 The non-bonded pair forces need to be calculated only for those pairs
 469 $i,j$  for which the distance $r_{ij}$ between $i$ and the
 470 \swapindex{nearest}{image}
 471 of $j$ is less than a given cut-off radius $R_c$. Some of the particle
 472 pairs that fulfill this criterion are excluded, when their interaction
 473 is already fully accounted for by bonded interactions.  {\gromacs}
 474 employs a {\em pair list} that contains those particle pairs for which
 475 non-bonded forces must be calculated.  The pair list contains particles
 476 $i$, a displacement vector for particle $i$, and all particles $j$ that
 477 are within \verb'rlist' of this particular image of particle $i$.  The
 478 list is updated every \verb'nstlist' steps, where \verb'nstlist' is
 479 typically 10. There is an option to calculate the total non-bonded
 480 force on each particle due to all particle in a shell around the
 481 list cut-off, {\ie} at a distance between \verb'rlist' and
 482 \verb'rlistlong'.  This force is calculated during the pair list update
 483 and  retained during \verb'nstlist' steps.
 484
 485 To make the \normindex{neighbor list}, all particles that are close
 486 ({\ie} within the neighbor list cut-off) to a given particle must be found.
 487 This searching, usually called neighbor search (NS) or pair search,
 488 involves periodic boundary conditions and determining the {\em image}
 489 (see \secref{pbc}). The search algorithm is $O(N)$, although a simpler
 490 $O(N^2)$ algorithm is still available under some conditions.
 491
 492 \subsubsection{\normindex{Cut-off schemes}: group versus Verlet}
 493 From version 4.6, {\gromacs} supports two different cut-off scheme
 494 setups: the original one based on particle groups and one using a Verlet
 495 buffer. There are some important differences that affect results,
 496 performance and feature support. The group scheme can be made to work
 497 (almost) like the Verlet scheme, but this will lead to a decrease in
 498 performance. The group scheme is especially fast for water molecules,
 499 which are abundant in many simulations, but on the most recent x86
 500 processors, this advantage is negated by the better instruction-level
 501 parallelism available in the Verlet-scheme implementation. The group
 502 scheme is deprecated in version 5.0, and will be removed in a future
 503 version. For practical details of choosing and setting up
 504 cut-off schemes, please see the User Guide.
 505
 506 In the group scheme, a neighbor list is generated consisting of pairs
 507 of groups of at least one particle. These groups were originally
 508 \swapindex{charge}{group}s \ifthenelse{\equal{\gmxlite}{1}}{}{(see
 509   \secref{chargegroup})}, but with a proper treatment of long-range
 510 electrostatics, performance in unbuffered simulations is their only advantage. A pair of groups
 511 is put into the neighbor list when their center of geometry is within
 512 the cut-off distance. Interactions between all particle pairs (one from
 513 each charge group) are calculated for a certain number of MD steps,
 514 until the neighbor list is updated. This setup is efficient, as the
 515 neighbor search only checks distance between charge-group pair, not
 516 particle pairs (saves a factor of $3 \times 3 = 9$ with a three-particle water
 517 model) and the non-bonded force kernels can be optimized for, say, a
 518 water molecule ``group''. Without explicit buffering, this setup leads
 519 to energy drift as some particle pairs which are within the cut-off don't
 520 interact and some outside the cut-off do interact. This can be caused
 521 by
 522 \begin{itemize}
 523 \item particles moving across the cut-off between neighbor search steps, and/or
 524 \item for charge groups consisting of more than one particle, particle pairs
 525   moving in/out of the cut-off when their charge group center of
 526   geometry distance is outside/inside of the cut-off.
 527 \end{itemize}
 528 Explicitly adding a buffer to the neighbor list will remove such
 529 artifacts, but this comes at a high computational cost. How severe the
 530 artifacts are depends on the system, the properties in which you are
 531 interested, and the cut-off setup.
 532
 533 The Verlet cut-off scheme uses a buffered pair list by default. It
 534 also uses clusters of particles, but these are not static as in the group
 535 scheme. Rather, the clusters are defined spatially and consist of 4 or
 536 8 particles, which is convenient for stream computing, using e.g. SSE, AVX
 537 or CUDA on GPUs. At neighbor search steps, a pair list is created
 538 with a Verlet buffer, ie. the pair-list cut-off is larger than the
 539 interaction cut-off. In the non-bonded kernels, interactions are only
 540 computed when a particle pair is within the cut-off distance at that
 541 particular time step. This ensures that as particles move between pair
 542 search steps, forces between nearly all particles within the cut-off
 543 distance are calculated. We say {\em nearly} all particles, because
 544 {\gromacs} uses a fixed pair list update frequency for
 545 efficiency. A particle-pair, whose distance was outside the cut-off,
 546 could possibly move enough during this fixed number of
 547 steps that its distance is now within the cut-off. This
 548 small chance results in a small energy drift, and the size of the
 549 chance depends on the temperature. When temperature
 550 coupling is used, the buffer size can be determined automatically,
 551 given a certain tolerance on the energy drift.
 552
 553 The Verlet cut-off scheme is implemented in a very efficient fashion
 554 based on clusters of particles. The simplest example is a cluster size
 555 of 4 particles. The pair list is then constructed based on cluster
 556 pairs. The cluster-pair search is much faster searching based on
 557 particle pairs, because $4 \times 4 = 16$ particle pairs are put in
 558 the list at once. The non-bonded force calculation kernel can then
 559 calculate many particle-pair interactions at once, which maps nicely
 560 to SIMD or SIMT units on modern hardware, which can perform multiple
 561 floating operations at once. These non-bonded kernels
 562 are much faster than the kernels used in the group scheme for most
 563 types of systems, particularly on newer hardware.
 564
 565 \ifthenelse{\equal{\gmxlite}{1}}{}{
 566 \subsubsection{Energy drift and pair-list buffering}
 567 For a canonical (NVT) ensemble, the average energy error caused by the
 568 finite Verlet buffer size can be determined from the atomic
 569 displacements and the shape of the potential at the cut-off.
 570 %Since we are interested in the small drift regime, we will assume
 571 %#that atoms will only move within the cut-off distance in the last step,
 572 %$n_\mathrm{ps}-1$, of the pair list update interval $n_\mathrm{ps}$.
 573 %Over this number of steps the displacment of an atom with mass $m$
 574 The displacement distribution along one dimension for a freely moving
 575 particle with mass $m$ over time $t$ at temperature $T$ is Gaussian
 576 with zero mean and variance $\sigma^2 = t\,k_B T/m$. For the distance
 577 between two particles, the variance changes to $\sigma^2 = \sigma_{12}^2 =
 578 t\,k_B T(1/m_1+1/m_2)$.  Note that in practice particles usually
 579 interact with other particles over time $t$ and therefore the real
 580 displacement distribution is much narrower.  Given a non-bonded
 581 interaction cut-off distance of $r_c$ and a pair-list cut-off
 582 $r_\ell=r_c+r_b$, we can then write the average energy error after
 583 time $t$ for pair interactions between one particle of type 1
 584 surrounded by particles of type 2 with number density $\rho_2$, when
 585 the inter particle distance changes from $r_0$ to $r_t$, as:
 586
 587 \begin{eqnarray}
 588 \langle \Delta V \rangle \! &=&
 589 \int_{0}^{r_c} \int_{r_\ell}^\infty 4 \pi r_0^2 \rho_2 V(r_t) G\!\left(\frac{r_t-r_0}{\sigma}\right) d r_0\, d r_t \\
 590 &\approx&
 591 \int_{-\infty}^{r_c} \int_{r_\ell}^\infty 4 \pi r_0^2 \rho_2 \Big[ V'(r_c) (r_t - r_c) +
 592 \nonumber\\
 593 & &
 594 \phantom{\int_{-\infty}^{r_c} \int_{r_\ell}^\infty 4 \pi r_0^2 \rho_2 \Big[}
 595  V''(r_c)\frac{1}{2}(r_t - r_c)^2 \Big] G\!\left(\frac{r_t-r_0}{\sigma}\right) d r_0 \, d r_t\\
 596 &\approx&
 597 4 \pi (r_\ell+\sigma)^2 \rho_2
 598 \int_{-\infty}^{r_c} \int_{r_\ell}^\infty \Big[ V'(r_c) (r_t - r_c) +
 599 \nonumber\\
 600 & &
 601 \phantom{4 \pi (r_\ell+\sigma)^2 \rho_2 \int_{-\infty}^{r_c} \int_{r_\ell}^\infty \Big[}
 602 V''(r_c)\frac{1}{2}(r_t - r_c)^2 +
 603 \nonumber\\
 604 & &
 605 \phantom{4 \pi (r_\ell+\sigma)^2 \rho_2 \int_{-\infty}^{r_c} \int_{r_\ell}^\infty \Big[}
 606 V'''(r_c)\frac{1}{6}(r_t - r_c)^3 \Big] G\!\left(\frac{r_t-r_0}{\sigma}\right)
 607 d r_0 \, d r_t\\
 608 &=&
 609 4 \pi (r_\ell+\sigma)^2 \rho_2 \bigg\{
 610 \frac{1}{2}V'(r_c)\left[r_b \sigma G\!\left(\frac{r_b}{\sigma}\right) - (r_b^2+\sigma^2)E\!\left(\frac{r_b}{\sigma}\right) \right] +
 611 \nonumber\\
 612 & &
 613 \phantom{4 \pi (r_\ell+\sigma)^2 \rho_2 \bigg\{ }
 614 \frac{1}{6}V''(r_c)\left[ \sigma(r_b^2+2\sigma^2)G\!\left(\frac{r_b}{\sigma}\right) - r_b(r_b^2+3\sigma^2 ) E\!\left(\frac{r_b}{\sigma}\right) \right] +
 615 \nonumber\\
 616 & &
 617 \phantom{4 \pi (r_\ell+\sigma)^2 \rho_2 \bigg\{ }
 618 \frac{1}{24}V'''(r_c)\left[ r_b\sigma(r_b^2+5\sigma^2)G\!\left(\frac{r_b}{\sigma}\right) - (r_b^4+6r_b^2\sigma^2+3\sigma^4 ) E\!\left(\frac{r_b}{\sigma}\right) \right]
 619 \bigg\}
 620 \end{eqnarray}
 621
 622 where $G$ is a Gaussian distribution with 0 mean and unit variance and
 623 $E(x)=\frac{1}{2}\mathrm{erfc}(x/\sqrt{2})$. We always want to achieve
 624 small energy error, so $\sigma$ will be small compared to both $r_c$
 625 and $r_\ell$, thus the approximations in the equations above are good,
 626 since the Gaussian distribution decays rapidly. The energy error needs
 627 to be averaged over all particle pair types and weighted with the
 628 particle counts. In {\gromacs} we don't allow cancellation of error
 629 between pair types, so we average the absolute values. To obtain the
 630 average energy error per unit time, it needs to be divided by the
 631 neighbor-list life time $t = ({\tt nstlist} - 1)\times{\tt dt}$. This
 632 function can not be inverted analytically, so we use bisection to
 633 obtain the buffer size $r_b$ for a target drift.  Again we note that
 634 in practice the error we usually be much smaller than this estimate,
 635 as in the condensed phase particle displacements will be much smaller
 636 than for freely moving particles, which is the assumption used here.
 637
 638 When (bond) constraints are present, some particles will have fewer
 639 degrees of freedom. This will reduce the energy errors. The
 640 displacement in an arbitrary direction of a particle with 2 degrees of
 641 freedom is not Gaussian, but rather follows the complementary error
 642 function: \beq
 643 \frac{\sqrt{\pi}}{2\sqrt{2}\sigma}\,\mathrm{erfc}\left(\frac{|r|}{\sqrt{2}\,\sigma}\right)
 644 \eeq where $\sigma^2$ is again $k_B T/m$.  This distribution can no
 645 longer be integrated analytically to obtain the energy error. But we
 646 can generate a tight upper bound using a scaled and shifted Gaussian
 647 distribution (not shown). This Gaussian distribution can then be used
 648 to calculate the energy error as described above. We consider
 649 particles constrained, i.e. having 2 degrees of freedom or fewer, when
 650 they are connected by constraints to particles with a total mass of at
 651 least 1.5 times the mass of the particles itself. For a particle with
 652 a single constraint this would give a total mass along the constraint
 653 direction of at least 2.5, which leads to a reduction in the variance
 654 of the displacement along that direction by at least a factor of 6.25.
 655 As the Gaussian distribution decays very rapidly, this effectively
 656 removes one degree of freedom from the displacement. Multiple
 657 constraints would reduce the displacement even more, but as this gets
 658 very complex, we consider those as particles with 2 degrees of
 659 freedom.
 660
 661 There is one important implementation detail that reduces the energy
 662 errors caused by the finite Verlet buffer list size. The derivation
 663 above assumes a particle pair-list. However, the {\gromacs}
 664 implementation uses a cluster pair-list for efficiency. The pair list
 665 consists of pairs of clusters of 4 particles in most cases, also
 666 called a $4 \times 4$ list, but the list can also be $4 \times 8$ (GPU
 667 CUDA kernels and AVX 256-bit single precision kernels) or $4 \times 2$
 668 (SSE double-precision kernels). This means that the pair-list is
 669 effectively much larger than the corresponding $1 \times 1$ list. Thus
 670 slightly beyond the pair-list cut-off there will still be a large
 671 fraction of particle pairs present in the list. This fraction can be
 672 determined in a simulation and accurately estimated under some
 673 reasonable assumptions. The fraction decreases with increasing
 674 pair-list range, meaning that a smaller buffer can be used. For
 675 typical all-atom simulations with a cut-off of 0.9 nm this fraction is
 676 around 0.9, which gives a reduction in the energy errors of a factor of
 677 10. This reduction is taken into account during the automatic Verlet
 678 buffer calculation and results in a smaller buffer size.
 679
 680 \begin{figure}
 681 \centerline{\includegraphics[width=9cm]{plots/verlet-drift}}
 682 \caption {Energy drift per atom for an SPC/E water system at 300K with
 683   a time step of 2 fs and a pair-list update period of 10 steps
 684   (pair-list life time: 18 fs). PME was used with {\tt ewald-rtol} set
 685   to 10$^{-5}$; this parameter affects the shape of the potential at
 686   the cut-off. Error estimates due to finite Verlet buffer size are
 687   shown for a $1 \times 1$ atom pair list and $4 \times 4$ atom pair
 688   list without and with (dashed line) cancellation of positive and
 689   negative errors. Real energy drift is shown for simulations using
 690   double- and mixed-precision settings. Rounding errors in the SETTLE
 691   constraint algorithm from the use of single precision causes
 692   the drift to become negative
 693   at large buffer size. Note that at zero buffer size, the real drift
 694   is small because positive (H-H) and negative (O-H) energy errors
 695   cancel.}
 696 \label{fig:verletdrift}
 697 \end{figure}
 698
 699 In \figref{verletdrift} one can see that for small buffer sizes the drift
 700 of the total energy is much smaller than the pair energy error tolerance,
 701 due to cancellation of errors. For larger buffer size, the error estimate
 702 is a factor of 6 higher than drift of the total energy, or alternatively
 703 the buffer estimate is 0.024 nm too large. This is because the protons
 704 don't move freely over 18 fs, but rather vibrate.
 705 %At a buffer size of zero there is cancellation of
 706 %drift due to repulsive (H-H) and attractive (O-H) interactions.
 707
 708 \subsubsection{Cut-off artifacts and switched interactions}
 709 With the Verlet scheme, the pair potentials are shifted to be zero at
 710 the cut-off, which makes the potential the integral of the force.
 711 This is only possible in the group scheme if the shape of the potential
 712 is such that its value is zero at the cut-off distance.
 713 However, there can still be energy drift when the
 714 forces are non-zero at the cut-off. This effect is extremely small and
 715 often not noticeable, as other integration errors (e.g. from constraints)
 716 may dominate. To
 717 completely avoid cut-off artifacts, the non-bonded forces can be
 718 switched exactly to zero at some distance smaller than the neighbor
 719 list cut-off (there are several ways to do this in {\gromacs}, see
 720 \secref{mod_nb_int}). One then has a buffer with the size equal to the
 721 neighbor list cut-off less the longest interaction cut-off.
 722
 723 } % Brace matches ifthenelse test for gmxlite
 724
 725 \subsubsection{Simple search\swapindexquiet{simple}{search}}
 726 Due to \eqnsref{box_rot}{simplerc}, the vector $\rvij$
 727 connecting images within the cut-off $R_c$ can be found by constructing:
 728 \bea
 729 \ve{r}'''   & = & \ve{r}_j-\ve{r}_i \\
 730 \ve{r}''    & = & \ve{r}''' - {\bf c}*\verb'round'(r'''_z/c_z) \\
 731 \ve{r}'     & = & \ve{r}'' - {\bf b}*\verb'round'(r''_y/b_y) \\
 732 \ve{r}_{ij} & = & \ve{r}' - {\bf a}*\verb'round'(r'_x/a_x)
 733 \eea
 734 When distances between two particles in a triclinic box are needed
 735 that do not obey \eqnref{box_rot},
 736 many shifts of combinations of box vectors need to be considered to find
 737 the nearest image.
 738
 739 \ifthenelse{\equal{\gmxlite}{1}}{}{
 740
 741 \begin{figure}
 742 \centerline{\includegraphics[width=8cm]{plots/nstric}}
 743 \caption {Grid search in two dimensions. The arrows are the box vectors.}
 744 \label{fig:grid}
 745 \end{figure}
 746
 747 \subsubsection{Grid search\swapindexquiet{grid}{search}}
 748 \label{sec:nsgrid}
 749 The grid search is schematically depicted in \figref{grid}.  All
 750 particles are put on the {\nsgrid}, with the smallest spacing $\ge$
 751 $R_c/2$ in each of the directions.  In the direction of each box
 752 vector, a particle $i$ has three images. For each direction the image
 753 may be -1,0 or 1, corresponding to a translation over -1, 0 or +1 box
 754 vector. We do not search the surrounding {\nsgrid} cells for neighbors
 755 of $i$ and then calculate the image, but rather construct the images
 756 first and then search neighbors corresponding to that image of $i$.
 757 As \figref{grid} shows, some grid cells may be searched more than once
 758 for different images of $i$. This is not a problem, since, due to the
 759 minimum image convention, at most one image will ``see'' the
 760 $j$-particle.  For every particle, fewer than 125 (5$^3$) neighboring
 761 cells are searched.  Therefore, the algorithm scales linearly with the
 762 number of particles.  Although the prefactor is large, the scaling
 763 behavior makes the algorithm far superior over the standard $O(N^2)$
 764 algorithm when there are more than a few hundred particles.  The
 765 grid search is equally fast for rectangular and triclinic boxes.  Thus
 766 for most protein and peptide simulations the rhombic dodecahedron will
 767 be the preferred box shape.
 768 } % Brace matches ifthenelse test for gmxlite
 769
 770 \ifthenelse{\equal{\gmxlite}{1}}{}{
 771 \subsubsection{Charge groups}
 772 \label{sec:chargegroup}\swapindexquiet{charge}{group}%
 773 Charge groups were originally introduced to reduce cut-off artifacts
 774 of Coulomb interactions. When a plain cut-off is used, significant
 775 jumps in the potential and forces arise when atoms with (partial) charges
 776 move in and out of the cut-off radius. When all chemical moieties have
 777 a net charge of zero, these jumps can be reduced by moving groups
 778 of atoms with net charge zero, called charge groups, in and
 779 out of the neighbor list. This reduces the cut-off effects from
 780 the charge-charge level to the dipole-dipole level, which decay
 781 much faster. With the advent of full range electrostatics methods,
 782 such as particle-mesh Ewald (\secref{pme}), the use of charge groups is
 783 no longer required for accuracy. It might even have a slight negative effect
 784 on the accuracy or efficiency, depending on how the neighbor list is made
 785 and the interactions are calculated.
 786
 787 But there is still an important reason for using ``charge groups'': efficiency with the group cut-off scheme.
 788 Where applicable, neighbor searching is carried out on the basis of
 789 charge groups which are defined in the molecular topology.
 790 If the nearest image distance between the {\em
 791 geometrical centers} of the atoms of two charge groups is less than
 792 the cut-off radius, all atom pairs between the charge groups are
 793 included in the pair list.
 794 The neighbor searching for a water system, for instance,
 795 is $3^2=9$ times faster when each molecule is treated as a charge group.
 796 Also the highly optimized water force loops (see \secref{waterloops})
 797 only work when all atoms in a water molecule form a single charge group.
 798 Currently the name {\em neighbor-search group} would be more appropriate,
 799 but the name charge group is retained for historical reasons.
 800 When developing a new force field, the advice is to use charge groups
 801 of 3 to 4 atoms for optimal performance. For all-atom force fields
 802 this is relatively easy, as one can simply put hydrogen atoms, and in some
 803 case oxygen atoms, in the same charge group as the heavy atom they
 804 are connected to; for example: CH$_3$, CH$_2$, CH, NH$_2$, NH, OH, CO$_2$, CO.
 805
 806 With the Verlet cut-off scheme, charge groups are ignored.
 807
 808 } % Brace matches ifthenelse test for gmxlite
 809
 810 \subsection{Compute forces}
 811 \label{subsec:forces}
 812
 813 \subsubsection{Potential energy}
 814 When forces are computed, the \swapindex{potential}{energy} of each
 815 interaction term is computed as well. The total potential energy is
 816 summed for various contributions, such as Lennard-Jones, Coulomb, and
 817 bonded terms. It is also possible to compute these contributions for
 818 {\em energy-monitor groups} of atoms that are separately defined (see
 819 \secref{groupconcept}).
 820
 821 \subsubsection{Kinetic energy and temperature}
 822 The \normindex{temperature} is given by the total
 823 \swapindex{kinetic}{energy} of the $N$-particle system:
 824 \beq
 825 E_{kin} = \half \sum_{i=1}^N m_i v_i^2
 826 \eeq
 827 From this the absolute temperature $T$ can be computed using:
 828 \beq
 829 \half N_{\mathrm{df}} kT = E_{\mathrm{kin}}
 830 \label{eqn:E-T}
 831 \eeq
 832 where $k$ is Boltzmann's constant and $N_{df}$ is the number of
 833 degrees of freedom which can be computed from:
 834 \beq
 835 N_{\mathrm{df}}  ~=~     3 N - N_c - N_{\mathrm{com}}
 836 \eeq
 837 Here $N_c$ is the number of {\em \normindex{constraints}} imposed on the system.
 838 When performing molecular dynamics $N_{\mathrm{com}}=3$ additional degrees of
 839 freedom must be removed, because the three
 840 center-of-mass velocities are constants of the motion, which are usually
 841 set to zero. When simulating in vacuo, the rotation around the center of mass
 842 can also be removed, in this case $N_{\mathrm{com}}=6$.
 843 When more than one temperature-coupling group\index{temperature-coupling group} is used, the number of degrees
 844 of freedom for group $i$ is:
 845 \beq
 846 N^i_{\mathrm{df}}  ~=~  (3 N^i - N^i_c) \frac{3 N - N_c - N_{\mathrm{com}}}{3 N - N_c}
 847 \eeq
 848
 849 The kinetic energy can also be written as a tensor, which is necessary
 850 for pressure calculation in a triclinic system, or systems where shear
 851 forces  are imposed:
 852 \beq
 853 {\bf E}_{\mathrm{kin}} = \half \sum_i^N m_i \vvi \otimes \vvi
 854 \eeq
 855
 856 \subsubsection{Pressure and virial}
 857 The \normindex{pressure}
 858 tensor {\bf P} is calculated from the difference between
 859 kinetic energy $E_{\mathrm{kin}}$ and the \normindex{virial} ${\bf \Xi}$:
 860 \beq
 861 {\bf P} = \frac{2}{V} ({\bf E}_{\mathrm{kin}}-{\bf \Xi})
 862 \label{eqn:P}
 863 \eeq
 864 where $V$ is the volume of the computational box.
 865 The scalar pressure $P$, which can be used for pressure coupling in the case
 866 of isotropic systems, is computed as:
 867 \beq
 868 P       = {\rm trace}({\bf P})/3
 869 \eeq
 870
 871 The virial ${\bf \Xi}$ tensor is defined as:
 872 \beq
 873 {\bf \Xi} = -\half \sum_{i<j} \rvij \otimes \Fvij
 874 \label{eqn:Xi}
 875 \eeq
 876
 877 \ifthenelse{\equal{\gmxlite}{1}}{}{
 878 The {\gromacs} implementation of the virial computation is described
 879 in \secref{virial}.
 880 } % Brace matches ifthenelse test for gmxlite
 881
 882
 883 \subsection{The \swapindex{leap-frog}{integrator}}
 884 \label{subsec:update}
 885 \begin{figure}
 886 \centerline{\includegraphics[width=8cm]{plots/leapfrog}}
 887 \caption[The Leap-Frog integration method.]{The Leap-Frog integration method. The algorithm is called Leap-Frog because $\ve{r}$ and $\ve{v}$ are leaping
 888 like  frogs over each other's backs.}
 889 \label{fig:leapfrog}
 890 \end{figure}
 891
 892 The default MD integrator in {\gromacs} is the so-called {\em leap-frog}
 893 algorithm~\cite{Hockney74} for the integration of the equations of
 894 motion.  When extremely accurate integration with temperature
 895 and/or pressure coupling is required, the velocity Verlet integrators
 896 are also present and may be preferable (see \ssecref{vverlet}). The leap-frog
 897 algorithm uses positions $\ve{r}$ at time $t$ and
 898 velocities $\ve{v}$ at time $t-\hDt$; it updates positions and
 899 velocities using the forces
 900 $\ve{F}(t)$ determined by the positions at time $t$ using these relations:
 901 \bea
 902 \label{eqn:leapfrogv}
 903 \ve{v}(t+\hDt)  &~=~&   \ve{v}(t-\hDt)+\frac{\Dt}{m}\ve{F}(t)   \\
 904 \ve{r}(t+\Dt)   &~=~&   \ve{r}(t)+\Dt\ve{v}(t+\hDt)
 905 \eea
 906 The algorithm is visualized in \figref{leapfrog}.
 907 It produces trajectories that are identical to the Verlet~\cite{Verlet67} algorithm,
 908 whose position-update relation is
 909 \beq
 910 \ve{r}(t+\Dt)~=~2\ve{r}(t) - \ve{r}(t-\Dt) + \frac{1}{m}\ve{F}(t)\Dt^2+O(\Dt^4)
 911 \eeq
 912 The algorithm is of third order in $\ve{r}$ and is time-reversible.
 913 See ref.~\cite{Berendsen86b} for the merits of this algorithm and comparison
 914 with other time integration algorithms.
 915
 916 The \swapindex{equations of}{motion} are modified for temperature
 917 coupling and pressure coupling, and extended to include the
 918 conservation of constraints, all of which are described below.
 919
 920 \subsection{The \swapindex{velocity Verlet}{integrator}}
 921 \label{subsec:vverlet}
 922 The velocity Verlet algorithm~\cite{Swope82} is also implemented in
 923 {\gromacs}, though it is not yet fully integrated with all sets of
 924 options.  In velocity Verlet, positions $\ve{r}$ and velocities
 925 $\ve{v}$ at time $t$ are used to integrate the equations of motion;
 926 velocities at the previous half step are not required.  \bea
 927 \label{eqn:velocityverlet1}
 928 \ve{v}(t+\hDt)  &~=~&   \ve{v}(t)+\frac{\Dt}{2m}\ve{F}(t)   \\
 929 \ve{r}(t+\Dt)   &~=~&   \ve{r}(t)+\Dt\,\ve{v}(t+\hDt) \\
 930 \ve{v}(t+\Dt)   &~=~&   \ve{v}(t+\hDt)+\frac{\Dt}{2m}\ve{F}(t+\Dt)
 931 \eea
 932 or, equivalently,
 933 \bea
 934 \label{eqn:velocityverlet2}
 935 \ve{r}(t+\Dt)   &~=~&   \ve{r}(t)+ \Dt\,\ve{v} + \frac{\Dt^2}{2m}\ve{F}(t) \\
 936 \ve{v}(t+\Dt)   &~=~&   \ve{v}(t)+ \frac{\Dt}{2m}\left[\ve{F}(t) + \ve{F}(t+\Dt)\right]
 937 \eea
 938 With no temperature or pressure coupling, and with {\em corresponding}
 939 starting points, leap-frog and velocity Verlet will generate identical
 940 trajectories, as can easily be verified by hand from the equations
 941 above.  Given a single starting file with the {\em same} starting
 942 point $\ve{x}(0)$ and $\ve{v}(0)$, leap-frog and velocity Verlet will
 943 {\em not} give identical trajectories, as leap-frog will interpret the
 944 velocities as corresponding to $t=-\hDt$, while velocity Verlet will
 945 interpret them as corresponding to the timepoint $t=0$.
 946
 947 \subsection{Understanding reversible integrators: The Trotter decomposition}
 948 To further understand the relationship between velocity Verlet and
 949 leap-frog integration, we introduce the reversible Trotter formulation
 950 of dynamics, which is also useful to understanding implementations of
 951 thermostats and barostats in {\gromacs}.
 952
 953 A system of coupled, first-order differential equations can be evolved
 954 from time $t = 0$ to time $t$ by applying the evolution operator
 955 \bea
 956 \Gamma(t) &=& \exp(iLt) \Gamma(0) \nonumber \\
 957 iL &=& \dot{\Gamma}\cdot \nabla_{\Gamma},
 958 \eea
 959 where $L$ is the Liouville operator, and $\Gamma$ is the
 960 multidimensional vector of independent variables (positions and
 961 velocities).
 962 A short-time approximation to the true operator, accurate at time $\Dt
 963 = t/P$, is applied $P$ times in succession to evolve the system as
 964 \beq
 965 \Gamma(t) = \prod_{i=1}^P \exp(iL\Dt) \Gamma(0)
 966 \eeq
 967 For NVE dynamics, the Liouville operator is
 968 \bea
 969 iL = \sum_{i=1}^{N} \vv_i \cdot \nabla_{\rv_i} + \sum_{i=1}^N \frac{1}{m_i}\F(r_i) \cdot \nabla_{\vv_i}.
 970 \eea
 971 This can be split into two additive operators
 972 \bea
 973 iL_1 &=& \sum_{i=1}^N \frac{1}{m_i}\F(r_i) \cdot \nabla_{\vv_i} \nonumber \\
 974 iL_2 &=& \sum_{i=1}^{N} \vv_i \cdot \nabla_{\rv_i}
 975 \eea
 976 Then a short-time, symmetric, and thus reversible approximation of the true dynamics will be
 977 \bea
 978 \exp(iL\Dt) = \exp(iL_2\hDt) \exp(iL_1\Dt) \exp(iL_2\hDt) + \mathcal{O}(\Dt^3).
 979 \label{eq:NVE_Trotter}
 980 \eea
 981 This corresponds to velocity Verlet integration.  The first
 982 exponential term over $\hDt$ corresponds to a velocity half-step, the
 983 second exponential term over $\Dt$ corresponds to a full velocity
 984 step, and the last exponential term over $\hDt$ is the final velocity
 985 half step.  For future times $t = n\Dt$, this becomes
 986 \bea
 987 \exp(iLn\Dt) &\approx&  \left(\exp(iL_2\hDt) \exp(iL_1\Dt) \exp(iL_2\hDt)\right)^n \nonumber \\
 988              &\approx&  \exp(iL_2\hDt) \bigg(\exp(iL_1\Dt) \exp(iL_2\Dt)\bigg)^{n-1} \nonumber \\
 989              &       &  \;\;\;\; \exp(iL_1\Dt) \exp(iL_2\hDt)
 990 \eea
 991 This formalism allows us to easily see the difference between the
 992 different flavors of Verlet integrators.  The leap-frog integrator can
 993 be seen as starting with Eq.~\ref{eq:NVE_Trotter} with the
 994 $\exp\left(iL_1 \dt\right)$ term, instead of the half-step velocity
 995 term, yielding
 996 \bea
 997 \exp(iLn\dt) &=& \exp\left(iL_1 \dt\right) \exp\left(iL_2 \Dt \right) + \mathcal{O}(\Dt^3).
 998 \eea
 999 Here, the full step in velocity is between $t-\hDt$ and $t+\hDt$,
1000 since it is a combination of the velocity half steps in velocity
1001 Verlet. For future times $t = n\Dt$, this becomes
1002 \bea
1003 \exp(iLn\dt) &\approx& \bigg(\exp\left(iL_1 \dt\right) \exp\left(iL_2 \Dt \right)  \bigg)^{n}.
1004 \eea
1005 Although at first this does not appear symmetric, as long as the full velocity
1006 step is between $t-\hDt$ and $t+\hDt$, then this is simply a way of
1007 starting velocity Verlet at a different place in the cycle.
1008
1009 Even though the trajectory and thus potential energies are identical
1010 between leap-frog and velocity Verlet, the kinetic energy and
1011 temperature will not necessarily be the same.  Standard velocity
1012 Verlet uses the velocities at the $t$ to calculate the kinetic energy
1013 and thus the temperature only at time $t$; the kinetic energy is then a sum over all particles
1014 \bea
1015 KE_{\mathrm{full}}(t) &=& \sum_i \left(\frac{1}{2m_i}\ve{v}_i(t)\right)^2 \nonumber\\
1016       &=& \sum_i \frac{1}{2m_i}\left(\frac{1}{2}\ve{v}_i(t-\hDt)+\frac{1}{2}\ve{v}_i(t+\hDt)\right)^2,
1017 \eea
1018 with the square on the {\em outside} of the average.  Standard
1019 leap-frog calculates the kinetic energy at time $t$ based on the
1020 average kinetic energies at the timesteps $t+\hDt$ and $t-\hDt$, or
1021 the sum over all particles
1022 \bea
1023 KE_{\mathrm{average}}(t) &=& \sum_i \frac{1}{2m_i}\left(\frac{1}{2}\ve{v}_i(t-\hDt)^2+\frac{1}{2}\ve{v}_i(t+\hDt)^2\right),
1024 \eea
1025 where the square is {\em inside} the average.
1026
1027 A non-standard variant of velocity Verlet which averages the kinetic
1028 energies $KE(t+\hDt)$ and $KE(t-\hDt)$, exactly like leap-frog, is also
1029 now implemented in {\gromacs} (as {\tt .mdp} file option {\tt md-vv-avek}).  Without
1030 temperature and pressure coupling, velocity Verlet with
1031 half-step-averaged kinetic energies and leap-frog will be identical up
1032 to numerical precision.  For temperature- and pressure-control schemes,
1033 however, velocity Verlet with half-step-averaged kinetic energies and
1034 leap-frog will be different, as will be discussed in the section in
1035 thermostats and barostats.
1036
1037 The half-step-averaged kinetic energy and temperature are slightly more
1038 accurate for a given step size; the difference in average kinetic
1039 energies using the half-step-averaged kinetic energies ({\em md} and
1040 {\em md-vv-avek}) will be closer to the kinetic energy obtained in the
1041 limit of small step size than will the full-step kinetic energy (using
1042 {\em md-vv}).  For NVE simulations, this difference is usually not
1043 significant, since the positions and velocities of the particles are
1044 still identical; it makes a difference in the way the the temperature
1045 of the simulations are {\em interpreted}, but {\em not} in the
1046 trajectories that are produced.  Although the kinetic energy is more
1047 accurate with the half-step-averaged method, meaning that it changes
1048 less as the timestep gets large, it is also more noisy.  The RMS deviation
1049 of the total energy of the system (sum of kinetic plus
1050 potential) in the half-step-averaged kinetic energy case will be
1051 higher (about twice as high in most cases) than the full-step kinetic
1052 energy.  The drift will still be the same, however, as again, the
1053 trajectories are identical.
1054
1055 For NVT simulations, however, there {\em will} be a difference, as
1056 discussed in the section on temperature control, since the velocities
1057 of the particles are adjusted such that kinetic energies of the
1058 simulations, which can be calculated either way, reach the
1059 distribution corresponding to the set temperature.  In this case, the
1060 three methods will not give identical results.
1061
1062 Because the velocity and position are both defined at the same time
1063 $t$ the velocity Verlet integrator can be used for some methods,
1064 especially rigorously correct pressure control methods, that are not
1065 actually possible with leap-frog.  The integration itself takes
1066 negligibly more time than leap-frog, but twice as many communication
1067 calls are currently required.  In most cases, and especially for large
1068 systems where communication speed is important for parallelization and
1069 differences between thermodynamic ensembles vanish in the $1/N$ limit,
1070 and when only NVT ensembles are required, leap-frog will likely be the
1071 preferred integrator.  For pressure control simulations where the fine
1072 details of the thermodynamics are important, only velocity Verlet
1073 allows the true ensemble to be calculated.  In either case, simulation
1074 with double precision may be required to get fine details of
1075 thermodynamics correct.
1076
1077 \subsection{Twin-range cut-offs\index{twin-range!cut-off}}
1078 To save computation time, slowly varying forces can be calculated
1079 less often than rapidly varying forces. In {\gromacs}
1080 such a \normindex{multiple time step} splitting is possible between
1081 short and long range non-bonded interactions.
1082 In {\gromacs} versions up to 4.0, an irreversible integration scheme
1083 was used which is also used by the {\gromos} simulation package:
1084 every $n$ steps the long range forces are determined and these are
1085 then also used (without modification) for the next $n-1$ integration steps
1086 in \eqnref{leapfrogv}. Such an irreversible scheme can result in bad energy
1087 conservation and, possibly, bad sampling.
1088 Since version 4.5, a leap-frog version of the reversible Trotter decomposition scheme~\cite{Tuckerman1992a} is used.
1089 In this integrator the long-range forces are determined every $n$ steps
1090 and are then integrated into the velocity in \eqnref{leapfrogv} using
1091 a time step of $\Dt_\mathrm{LR} = n \Dt$:
1092 \beq
1093 \ve{v}(t+\hDt) =
1094 \left\{ \begin{array}{lll} \displaystyle
1095   \ve{v}(t-\hDt) + \frac{1}{m}\left[\ve{F}_\mathrm{SR}(t) + n \ve{F}_\mathrm{LR}(t)\right] \Dt &,& \mathrm{step} ~\%~ n = 0  \\ \noalign{\medskip} \displaystyle
1096   \ve{v}(t-\hDt) + \frac{1}{m}\ve{F}_\mathrm{SR}(t)\Dt &,& \mathrm{step} ~\%~ n \neq 0  \\
1097 \end{array} \right.
1098 \eeq
1099
1100 The parameter $n$ is equal to the neighbor list update frequency. In
1101 4.5, the velocity Verlet version of multiple time-stepping is not yet
1102 fully implemented.
1103
1104 Several other simulation packages uses multiple time stepping for
1105 bonds and/or the PME mesh forces. In {\gromacs} we have not implemented
1106 this (yet), since we use a different philosophy. Bonds can be constrained
1107 (which is also a more sound approximation of a physical quantum
1108 oscillator), which allows the smallest time step to be increased
1109 to the larger one. This not only halves the number of force calculations,
1110 but also the update calculations. For even larger time steps, angle vibrations
1111 involving hydrogen atoms can be removed using virtual interaction
1112 \ifthenelse{\equal{\gmxlite}{1}}
1113 {sites,}
1114 {sites (see \secref{rmfast}),}
1115 which brings the shortest time step up to
1116 PME mesh update frequency of a multiple time stepping scheme.
1117
1118 As an example we show the energy conservation for integrating
1119 the equations of motion for SPC/E water at 300 K. To avoid cut-off
1120 effects, reaction-field electrostatics with $\epsilon_{RF}=\infty$ and
1121 shifted Lennard-Jones interactions are used, both with a buffer region.
1122 The long-range interactions were evaluated between 1.0 and 1.4 nm.
1123 In \figref{leapfrog} one can see that for electrostatics the Trotter scheme
1124 does an order of magnitude better up to  $\Dt_{LR}$ = 16 fs.
1125 The electrostatics depends strongly on the orientation of the water molecules,
1126 which changes rapidly.
1127 For Lennard-Jones interactions, the energy drift is linear in $\Dt_{LR}$
1128 and roughly two orders of magnitude smaller than for the electrostatics.
1129 Lennard-Jones forces are smaller than Coulomb forces and
1130 they are mainly affected by translation of water molecules, not rotation.
1131
1132 \begin{figure}
1133 \centerline{\includegraphics[width=12cm]{plots/drift-all}}
1134 \caption{Energy drift per degree of freedom in SPC/E water
1135 with twin-range cut-offs
1136 for reaction field (left) and Lennard-Jones interaction (right)
1137 as a function of the long-range time step length for the irreversible
1138 ``\gromos'' scheme and a reversible Trotter scheme.}
1139 \label{fig:twinrangeener}
1140 \end{figure}
1141
1142 \subsection{Temperature coupling\index{temperature coupling}}
1143 While direct use of molecular dynamics gives rise to the NVE (constant
1144 number, constant volume, constant energy ensemble), most quantities
1145 that we wish to calculate are actually from a constant temperature
1146 (NVT) ensemble, also called the canonical ensemble. {\gromacs} can use
1147 the {\em weak-coupling} scheme of Berendsen~\cite{Berendsen84},
1148 stochastic randomization through the Andersen
1149 thermostat~\cite{Andersen80}, the extended ensemble Nos{\'e}-Hoover
1150 scheme~\cite{Nose84,Hoover85}, or a velocity-rescaling
1151 scheme~\cite{Bussi2007a} to simulate constant temperature, with
1152 advantages of each of the schemes laid out below.
1153
1154 There are several other reasons why it might be necessary to control
1155 the temperature of the system (drift during equilibration, drift as a
1156 result of force truncation and integration errors, heating due to
1157 external or frictional forces), but this is not entirely correct to do
1158 from a thermodynamic standpoint, and in some cases only masks the
1159 symptoms (increase in temperature of the system) rather than the
1160 underlying problem (deviations from correct physics in the dynamics).
1161 For larger systems, errors in ensemble averages and structural
1162 properties incurred by using temperature control to remove slow drifts
1163 in temperature appear to be negligible, but no completely
1164 comprehensive comparisons have been carried out, and some caution must
1165 be taking in interpreting the results.
1166
1167 \subsubsection{Berendsen temperature coupling\pawsindexquiet{Berendsen}{temperature coupling}\index{weak coupling}}
1168 The Berendsen algorithm mimics weak coupling with first-order
1169 kinetics to an external heat bath with given temperature $T_0$.
1170 See ref.~\cite{Berendsen91} for a comparison with the
1171 Nos{\'e}-Hoover scheme. The effect of this algorithm is
1172 that a deviation of the system temperature from $T_0$ is slowly
1173 corrected according to:
1174 \beq
1175 \frac{\de T}{\de t} = \frac{T_0-T}{\tau}
1176 \label{eqn:Tcoupling}
1177 \eeq
1178 which means that a temperature deviation decays exponentially with a
1179 time constant $\tau$.
1180 This method of coupling has the advantage that the strength of the
1181 coupling can be varied and adapted to the user requirement: for
1182 equilibration purposes the coupling time can be taken quite short
1183 ({\eg} 0.01 ps), but for reliable equilibrium runs it can be taken much
1184 longer ({\eg} 0.5 ps) in which case it hardly influences the
1185 conservative dynamics.
1186
1187 The Berendsen thermostat suppresses the fluctuations of the kinetic
1188 energy.  This means that one does not generate a proper canonical
1189 ensemble, so rigorously, the sampling will be incorrect.  This
1190 error scales with $1/N$, so for very large systems most ensemble
1191 averages will not be affected significantly, except for the
1192 distribution of the kinetic energy itself.  However, fluctuation
1193 properties, such as the heat capacity, will be affected.  A similar
1194 thermostat which does produce a correct ensemble is the velocity
1195 rescaling thermostat~\cite{Bussi2007a} described below.
1196
1197 The heat flow into or out of the system is affected by scaling the
1198 velocities of each particle every step, or every $n_\mathrm{TC}$ steps,
1199 with a time-dependent factor $\lambda$, given by:
1200 \beq
1201 \lambda = \left[ 1 + \frac{n_\mathrm{TC} \Delta t}{\tau_T}
1202 \left\{\frac{T_0}{T(t -  \hDt)} - 1 \right\} \right]^{1/2}
1203 \label{eqn:lambda}
1204 \eeq
1205 The parameter $\tau_T$ is close, but not exactly equal, to the time constant
1206 $\tau$ of the temperature coupling (\eqnref{Tcoupling}):
1207 \beq
1208 \tau = 2 C_V \tau_T / N_{df} k
1209 \eeq
1210 where $C_V$ is the total heat capacity of the system, $k$ is Boltzmann's
1211 constant, and $N_{df}$ is the total number of degrees of freedom. The
1212 reason that $\tau \neq \tau_T$ is that the kinetic energy change
1213 caused by scaling the velocities is partly redistributed between
1214 kinetic and potential energy and hence the change in temperature is
1215 less than the scaling energy.  In practice, the ratio $\tau / \tau_T$
1216 ranges from 1 (gas) to 2 (harmonic solid) to 3 (water). When we use
1217 the term ``temperature coupling time constant,'' we mean the parameter
1218 \normindex{$\tau_T$}.
1219 {\bf Note} that in practice the scaling factor $\lambda$ is limited to
1220 the range of 0.8 $<= \lambda <=$ 1.25, to avoid scaling by very large
1221 numbers which may crash the simulation. In normal use,
1222 $\lambda$ will always be much closer to 1.0.
1223
1224 \subsubsection{Velocity-rescaling temperature coupling\pawsindexquiet{velocity-rescaling}{temperature coupling}}
1225 The velocity-rescaling thermostat~\cite{Bussi2007a} is essentially a Berendsen
1226 thermostat (see above) with an additional stochastic term that ensures
1227 a correct kinetic energy distribution by modifying it according to
1228 \beq
1229 \de K = (K_0 - K) \frac{\de t}{\tau_T} + 2 \sqrt{\frac{K K_0}{N_f}} \frac{\de W}{\sqrt{\tau_T}},
1230 \label{eqn:vrescale}
1231 \eeq
1232 where $K$ is the kinetic energy, $N_f$ the number of degrees of freedom and $\de W$ a Wiener process.
1233 There are no additional parameters, except for a random seed.
1234 This thermostat produces a correct canonical ensemble and still has
1235 the advantage of the Berendsen thermostat: first order decay of
1236 temperature deviations and no oscillations.
1237 When an $NVT$ ensemble is used, the conserved energy quantity
1238 is written to the energy and log file.
1239
1240 \subsubsection{\normindex{Andersen thermostat}}
1241 One simple way to maintain a thermostatted ensemble is to take an
1242 $NVE$ integrator and periodically re-select the velocities of the
1243 particles from a Maxwell-Boltzmann distribution.~\cite{Andersen80}
1244 This can either be done by randomizing all the velocities
1245 simultaneously (massive collision) every $\tau_T/\Dt$ steps ({\tt andersen-massive}), or by
1246 randomizing every particle with some small probability every timestep ({\tt andersen}),
1247 equal to $\Dt/\tau$, where in both cases $\Dt$ is the timestep and
1248 $\tau_T$ is a characteristic coupling time scale.
1249 Because of the way constraints operate, all particles in the same
1250 constraint group must be randomized simultaneously.  Because of
1251 parallelization issues, the {\tt andersen} version cannot currently (5.0) be
1252 used in systems with constraints. {\tt andersen-massive} can be used regardless of constraints.
1253 This thermostat is also currently only possible with velocity Verlet algorithms,
1254 because it operates directly on the velocities at each timestep.
1255
1256 This algorithm completely avoids some of the ergodicity issues of other thermostatting
1257 algorithms, as energy cannot flow back and forth between energetically
1258 decoupled components of the system as in velocity scaling motions.
1259 However, it can slow down the kinetics of system by randomizing
1260 correlated motions of the system, including slowing sampling when
1261 $\tau_T$ is at moderate levels (less than 10 ps). This algorithm
1262 should therefore generally not be used when examining kinetics or
1263 transport properties of the system.~\cite{Basconi2013}
1264
1265 % \ifthenelse{\equal{\gmxlite}{1}}{}{
1266 \subsubsection{Nos{\'e}-Hoover temperature coupling\index{Nose-Hoover temperature coupling@Nos{\'e}-Hoover temperature coupling|see{temperature coupling, Nos{\'e}-Hoover}}{\index{temperature coupling Nose-Hoover@temperature coupling Nos{\'e}-Hoover}}\index{extended ensemble}}
1267
1268 The Berendsen weak-coupling algorithm is
1269 extremely efficient for relaxing a system to the target temperature,
1270 but once the system has reached equilibrium it might be more
1271 important to probe a correct canonical ensemble. This is unfortunately
1272 not the case for the weak-coupling scheme.
1273
1274 To enable canonical ensemble simulations, {\gromacs} also supports the
1275 extended-ensemble approach first proposed by Nos{\'e}~\cite{Nose84}
1276 and later modified by Hoover~\cite{Hoover85}. The system Hamiltonian is
1277 extended by introducing a thermal reservoir and a friction term in the
1278 equations of motion.  The friction force is proportional to the
1279 product of each particle's velocity and a friction parameter, $\xi$.
1280 This friction parameter (or ``heat bath'' variable) is a fully
1281 dynamic quantity with its own momentum ($p_{\xi}$) and equation of
1282 motion; the time derivative is calculated from the difference between
1283 the current kinetic energy and the reference temperature.
1284
1285 In this formulation, the particles' equations of motion in
1286 \figref{global} are replaced by:
1287 \beq
1288 \frac {\de^2\ve{r}_i}{\de t^2} = \frac{\ve{F}_i}{m_i} -
1289 \frac{p_{\xi}}{Q}\frac{\de \ve{r}_i}{\de t} ,
1290 \label{eqn:NH-eqn-of-motion}
1291 \eeq where the equation of motion for the heat bath parameter $\xi$ is:
1292 \beq \frac {\de p_{\xi}}{\de t} = \left( T - T_0 \right).  \eeq The
1293 reference temperature is denoted $T_0$, while $T$ is the current
1294 instantaneous temperature of the system. The strength of the coupling
1295 is determined by the constant $Q$ (usually called the ``mass parameter''
1296 of the reservoir) in combination with the reference
1297 temperature.~\footnote{Note that some derivations, an alternative
1298   notation $\xi_{\mathrm{alt}} = v_{\xi} = p_{\xi}/Q$ is used.}
1299
1300 The conserved quantity for the Nos{\'e}-Hoover equations of motion is not
1301 the total energy, but rather
1302 \bea
1303 H = \sum_{i=1}^{N} \frac{\pb_i}{2m_i} + U\left(\rv_1,\rv_2,\ldots,\rv_N\right) +\frac{p_{\xi}^2}{2Q} + N_fkT\xi,
1304 \eea
1305 where $N_f$ is the total number of degrees of freedom.
1306
1307 In our opinion, the mass parameter is a somewhat awkward way of
1308 describing coupling strength, especially due to its dependence on
1309 reference temperature (and some implementations even include the
1310 number of degrees of freedom in your system when defining $Q$).  To
1311 maintain the coupling strength, one would have to change $Q$ in
1312 proportion to the change in reference temperature. For this reason, we
1313 prefer to let the {\gromacs} user work instead with the period
1314 $\tau_T$ of the oscillations of kinetic energy between the system and
1315 the reservoir instead. It is directly related to $Q$ and $T_0$ via:
1316 \beq
1317 Q = \frac {\tau_T^2 T_0}{4 \pi^2}.
1318 \eeq
1319 This provides a much more intuitive way of selecting the
1320 Nos{\'e}-Hoover coupling strength (similar to the weak-coupling
1321 relaxation), and in addition $\tau_T$ is independent of system size
1322 and reference temperature.
1323
1324 It is however important to keep the difference between the
1325 weak-coupling scheme and the Nos{\'e}-Hoover algorithm in mind:
1326 Using weak coupling you get a
1327 strongly damped {\em exponential relaxation},
1328 while the Nos{\'e}-Hoover approach
1329 produces an {\em oscillatory relaxation}.
1330 The actual time it takes to relax with Nos{\'e}-Hoover coupling is
1331 several times larger than the period of the
1332 oscillations that you select. These oscillations (in contrast
1333 to exponential relaxation) also means that
1334 the time constant normally should be 4--5 times larger
1335 than the relaxation time used with weak coupling, but your
1336 mileage may vary.
1337
1338 Nos{\'e}-Hoover dynamics in simple systems such as collections of
1339 harmonic oscillators, can be {\em nonergodic}, meaning that only a
1340 subsection of phase space is ever sampled, even if the simulations
1341 were to run for infinitely long.  For this reason, the Nos{\'e}-Hoover
1342 chain approach was developed, where each of the Nos{\'e}-Hoover
1343 thermostats has its own Nos{\'e}-Hoover thermostat controlling its
1344 temperature.  In the limit of an infinite chain of thermostats, the
1345 dynamics are guaranteed to be ergodic. Using just a few chains can
1346 greatly improve the ergodicity, but recent research has shown that the
1347 system will still be nonergodic, and it is still not entirely clear
1348 what the practical effect of this~\cite{Cooke2008}. Currently, the
1349 default number of chains is 10, but this can be controlled by the
1350 user.  In the case of chains, the equations are modified in the
1351 following way to include a chain of thermostatting
1352 particles~\cite{Martyna1992}:
1353
1354 \bea
1355 \frac {\de^2\ve{r}_i}{\de t^2} &~=~& \frac{\ve{F}_i}{m_i} - \frac{p_{{\xi}_1}}{Q_1} \frac{\de \ve{r}_i}{\de t} \nonumber \\
1356 \frac {\de p_{{\xi}_1}}{\de t} &~=~& \left( T - T_0 \right) - p_{{\xi}_1} \frac{p_{{\xi}_2}}{Q_2} \nonumber \\
1357 \frac {\de p_{{\xi}_{i=2\ldots N}}}{\de t} &~=~& \left(\frac{p_{\xi_{i-1}}^2}{Q_{i-1}} -kT\right) - p_{\xi_i} \frac{p_{\xi_{i+1}}}{Q_{i+1}} \nonumber \\
1358 \frac {\de p_{\xi_N}}{\de t} &~=~& \left(\frac{p_{\xi_{N-1}}^2}{Q_{N-1}}-kT\right)
1359 \label{eqn:NH-chain-eqn-of-motion}
1360 \eea
1361 The conserved quantity for Nos{\'e}-Hoover chains is
1362 \bea
1363 H = \sum_{i=1}^{N} \frac{\pb_i}{2m_i} + U\left(\rv_1,\rv_2,\ldots,\rv_N\right) +\sum_{k=1}^M\frac{p^2_{\xi_k}}{2Q^{\prime}_k} + N_fkT\xi_1 + kT\sum_{k=2}^M \xi_k
1364 \eea
1365 The values and velocities of the Nos{\'e}-Hoover thermostat variables
1366 are generally not included in the output, as they take up a fair
1367 amount of space and are generally not important for analysis of
1368 simulations, but this can be overridden by defining the environment
1369 variable {\tt GMX_NOSEHOOVER_CHAINS}, which will print the values of all
1370 the positions and velocities of all Nos{\'e}-Hoover particles in the
1371 chain to the {\tt .edr} file.  Leap-frog simulations currently can only have
1372 Nos{\'e}-Hoover chain lengths of 1, but this will likely be updated in
1373 later version.
1374
1375 As described in the integrator section, for temperature coupling, the
1376 temperature that the algorithm attempts to match to the reference
1377 temperature is calculated differently in velocity Verlet and leap-frog
1378 dynamics.  Velocity Verlet ({\em md-vv}) uses the full-step kinetic
1379 energy, while leap-frog and {\em md-vv-avek} use the half-step-averaged
1380 kinetic energy.
1381
1382 We can examine the Trotter decomposition again to better understand
1383 the differences between these constant-temperature integrators.  In
1384 the case of Nos{\'e}-Hoover dynamics (for simplicity, using a chain
1385 with $N=1$, with more details in Ref.~\cite{Martyna1996}), we split
1386 the Liouville operator as
1387 \beq
1388 iL = iL_1 + iL_2 + iL_{\mathrm{NHC}},
1389 \eeq
1390 where
1391 \bea
1392 iL_1 &=& \sum_{i=1}^N \left[\frac{\pb_i}{m_i}\right]\cdot \frac{\partial}{\partial \rv_i} \nonumber \\
1393 iL_2 &=& \sum_{i=1}^N \F_i\cdot \frac{\partial}{\partial \pb_i} \nonumber \\
1394 iL_{\mathrm{NHC}} &=& \sum_{i=1}^N-\frac{p_{\xi}}{Q}\vv_i\cdot \nabla_{\vv_i} +\frac{p_{\xi}}{Q}\frac{\partial }{\partial \xi} + \left( T - T_0 \right)\frac{\partial }{\partial p_{\xi}}
1395 \eea
1396 For standard velocity Verlet with Nos{\'e}-Hoover temperature control, this becomes
1397 \bea
1398 \exp(iL\dt) &=& \exp\left(iL_{\mathrm{NHC}}\dt/2\right) \exp\left(iL_2 \dt/2\right) \nonumber \\
1399 &&\exp\left(iL_1 \dt\right) \exp\left(iL_2 \dt/2\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right) + \mathcal{O}(\Dt^3).
1400 \eea
1401 For half-step-averaged temperature control using {\em md-vv-avek},
1402 this decomposition will not work, since we do not have the full step
1403 temperature until after the second velocity step.  However, we can
1404 construct an alternate decomposition that is still reversible, by
1405 switching the place of the NHC and velocity portions of the
1406 decomposition:
1407 \bea
1408 \exp(iL\dt) &=& \exp\left(iL_2 \dt/2\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right)\exp\left(iL_1 \dt\right)\nonumber \\
1409 &&\exp\left(iL_{\mathrm{NHC}}\dt/2\right) \exp\left(iL_2 \dt/2\right)+ \mathcal{O}(\Dt^3)
1410 \label{eq:half_step_NHC_integrator}
1411 \eea
1412 This formalism allows us to easily see the difference between the
1413 different flavors of velocity Verlet integrator.  The leap-frog
1414 integrator can be seen as starting with
1415 Eq.~\ref{eq:half_step_NHC_integrator} just before the $\exp\left(iL_1
1416 \dt\right)$ term, yielding:
1417 \bea
1418 \exp(iL\dt) &=&  \exp\left(iL_1 \dt\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right) \nonumber \\
1419 &&\exp\left(iL_2 \dt\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right) + \mathcal{O}(\Dt^3)
1420 \eea
1421 and then using some algebra tricks to solve for some quantities are
1422 required before they are actually calculated~\cite{Holian95}.
1423
1424 % }
1425
1426 \subsubsection{Group temperature coupling}\index{temperature-coupling group}%
1427 In {\gromacs} temperature coupling can be performed on groups of
1428 atoms, typically a protein and solvent. The reason such algorithms
1429 were introduced is that energy exchange between different components
1430 is not perfect, due to different effects including cut-offs etc. If
1431 now the whole system is coupled to one heat bath, water (which
1432 experiences the largest cut-off noise) will tend to heat up and the
1433 protein will cool down. Typically 100 K differences can be obtained.
1434 With the use of proper electrostatic methods (PME) these difference
1435 are much smaller but still not negligible.  The parameters for
1436 temperature coupling in groups are given in the {\tt mdp} file.
1437 Recent investigation has shown that small temperature differences
1438 between protein and water may actually be an artifact of the way
1439 temperature is calculated when there are finite timesteps, and very
1440 large differences in temperature are likely a sign of something else
1441 seriously going wrong with the system, and should be investigated
1442 carefully~\cite{Eastwood2010}.
1443
1444 One special case should be mentioned: it is possible to temperature-couple only
1445 part of the system, leaving other parts without temperature
1446 coupling. This is done by specifying ${-1}$ for the time constant
1447 $\tau_T$ for the group that should not be thermostatted.  If only
1448 part of the system is thermostatted, the system will still eventually
1449 converge to an NVT system.  In fact, one suggestion for minimizing
1450 errors in the temperature caused by discretized timesteps is that if
1451 constraints on the water are used, then only the water degrees of
1452 freedom should be thermostatted, not protein degrees of freedom, as
1453 the higher frequency modes in the protein can cause larger deviations
1454 from the ``true'' temperature, the temperature obtained with small
1455 timesteps~\cite{Eastwood2010}.
1456
1457 \subsection{Pressure coupling\index{pressure coupling}}
1458 In the same spirit as the temperature coupling, the system can also be
1459 coupled to a ``pressure bath.'' {\gromacs} supports both the Berendsen
1460 algorithm~\cite{Berendsen84} that scales coordinates and box vectors
1461 every step, the extended-ensemble Parrinello-Rahman approach~\cite{Parrinello81,Nose83}, and for
1462 the velocity Verlet variants, the Martyna-Tuckerman-Tobias-Klein
1463 (MTTK) implementation of pressure
1464 control~\cite{Martyna1996}. Parrinello-Rahman and Berendsen can be
1465 combined with any of the temperature coupling methods above. MTTK can
1466 only be used with Nos{\'e}-Hoover temperature control. From 5.1 afterwards,
1467 it can only used when the system does not have constraints.
1468
1469 \subsubsection{Berendsen pressure coupling\pawsindexquiet{Berendsen}{pressure coupling}\index{weak coupling}}
1470 \label{sec:berendsen_pressure_coupling}
1471 The Berendsen algorithm rescales the
1472 coordinates and box vectors every step, or every $n_\mathrm{PC}$ steps,
1473  with a matrix {\boldmath $\mu$},
1474 which has the effect of a first-order kinetic relaxation of the pressure
1475 towards a given reference pressure ${\bf P}_0$ according to
1476 \beq
1477 \frac{\de {\bf P}}{\de t} = \frac{{\bf P}_0-{\bf P}}{\tau_p}.
1478 \eeq
1479 The scaling matrix {\boldmath $\mu$} is given by
1480 \beq
1481 \mu_{ij}
1482 = \delta_{ij} - \frac{n_\mathrm{PC}\Delta t}{3\, \tau_p} \beta_{ij} \{P_{0ij} - P_{ij}(t) \}.
1483 \label{eqn:mu}
1484 \eeq
1485 \index{isothermal compressibility}
1486 \index{compressibility}
1487 Here, {\boldmath $\beta$} is the isothermal compressibility of the system.
1488 In most cases this will be a diagonal matrix, with equal elements on the
1489 diagonal, the value of which is generally not known.
1490 It suffices to take a rough estimate because the value of {\boldmath $\beta$}
1491 only influences the non-critical time constant of the
1492 pressure relaxation without affecting the average pressure itself.
1493 For water at 1 atm and 300 K
1494 $\beta = 4.6 \times 10^{-10}$ Pa$^{-1} = 4.6 \times 10^{-5}$ bar$^{-1}$,
1495 which is $7.6 \times 10^{-4}$ MD units (see \chref{defunits}).
1496 Most other liquids have similar values.
1497 When scaling completely anisotropically, the system has to be rotated in
1498 order to obey \eqnref{box_rot}.
1499 This rotation is approximated in first order in the scaling, which is usually
1500 less than $10^{-4}$. The actual scaling matrix {\boldmath $\mu'$} is
1501 \beq
1502 \mbox{\boldmath $\mu'$} =
1503 \left(\begin{array}{ccc}
1504 \mu_{xx} & \mu_{xy} + \mu_{yx} & \mu_{xz} + \mu_{zx} \\
1505 0        & \mu_{yy}            & \mu_{yz} + \mu_{zy} \\
1506 0        & 0                   & \mu_{zz}
1507 \end{array}\right).
1508 \eeq
1509 The velocities are neither scaled nor rotated.
1510
1511 In {\gromacs}, the Berendsen scaling can also be done isotropically,
1512 which means that instead of $\ve{P}$ a diagonal matrix with elements of size
1513 trace$(\ve{P})/3$ is used. For systems with interfaces, semi-isotropic
1514 scaling can be useful.
1515 In this case, the $x/y$-directions are scaled isotropically and the $z$
1516 direction is scaled independently. The compressibility in the $x/y$ or
1517 $z$-direction can be set to zero, to scale only in the other direction(s).
1518
1519 If you allow full anisotropic deformations and use constraints you
1520 might have to scale more slowly or decrease your timestep to avoid
1521 errors from the constraint algorithms.  It is important to note that
1522 although the Berendsen pressure control algorithm yields a simulation
1523 with the correct average pressure, it does not yield the exact NPT
1524 ensemble, and it is not yet clear exactly what errors this approximation
1525 may yield.
1526
1527 % \ifthenelse{\equal{\gmxlite}{1}}{}{
1528 \subsubsection{Parrinello-Rahman pressure coupling\pawsindexquiet{Parrinello-Rahman}{pressure coupling}}
1529
1530 In cases where the fluctuations in pressure or volume are important
1531 {\em per se} ({\eg} to calculate thermodynamic properties), especially
1532 for small systems, it may be a problem that the exact ensemble is not
1533 well defined for the weak-coupling scheme, and that it does not
1534 simulate the true NPT ensemble.
1535
1536 {\gromacs} also supports constant-pressure simulations using the
1537 Parrinello-Rahman approach~\cite{Parrinello81,Nose83}, which is similar
1538 to the Nos{\'e}-Hoover temperature coupling, and in theory gives the
1539 true NPT ensemble.  With the Parrinello-Rahman barostat, the box
1540 vectors as represented by the matrix \ve{b} obey the matrix equation
1541 of motion\footnote{The box matrix representation \ve{b} in {\gromacs}
1542 corresponds to the transpose of the box matrix representation \ve{h}
1543 in the paper by Nos{\'e} and Klein. Because of this, some of our
1544 equations will look slightly different.}
1545 \beq
1546 \frac{\de \ve{b}^2}{\de t^2}= V \ve{W}^{-1} \ve{b}'^{-1} \left( \ve{P} - \ve{P}_{ref}\right).
1547 \eeq
1548
1549 The volume of the box is denoted $V$, and $\ve{W}$ is a matrix parameter that determines
1550 the strength of the coupling. The matrices \ve{P} and \ve{P}$_{ref}$ are the
1551 current and reference pressures, respectively.
1552
1553 The equations of motion for the particles are also changed, just as
1554 for the Nos{\'e}-Hoover coupling. In most cases you would combine the
1555 Parrinello-Rahman barostat with the Nos{\'e}-Hoover
1556 thermostat, but to keep it simple we only show the Parrinello-Rahman
1557 modification here:
1558
1559 \bea \frac {\de^2\ve{r}_i}{\de t^2} & = & \frac{\ve{F}_i}{m_i} -
1560 \ve{M} \frac{\de \ve{r}_i}{\de t} , \\ \ve{M} & = & \ve{b}^{-1} \left[
1561   \ve{b} \frac{\de \ve{b}'}{\de t} + \frac{\de \ve{b}}{\de t} \ve{b}'
1562   \right] \ve{b}'^{-1}.  \eea The (inverse) mass parameter matrix
1563 $\ve{W}^{-1}$ determines the strength of the coupling, and how the box
1564 can be deformed.  The box restriction (\ref{eqn:box_rot}) will be
1565 fulfilled automatically if the corresponding elements of $\ve{W}^{-1}$
1566 are zero. Since the coupling strength also depends on the size of your
1567 box, we prefer to calculate it automatically in {\gromacs}.  You only
1568 have to provide the approximate isothermal compressibilities
1569 {\boldmath $\beta$} and the pressure time constant $\tau_p$ in the
1570 input file ($L$ is the largest box matrix element): \beq \left(
1571 \ve{W}^{-1} \right)_{ij} = \frac{4 \pi^2 \beta_{ij}}{3 \tau_p^2 L}.
1572 \eeq Just as for the Nos{\'e}-Hoover thermostat, you should realize
1573 that the Parrinello-Rahman time constant is {\em not} equivalent to
1574 the relaxation time used in the Berendsen pressure coupling algorithm.
1575 In most cases you will need to use a 4--5 times larger time constant
1576 with Parrinello-Rahman coupling. If your pressure is very far from
1577 equilibrium, the Parrinello-Rahman coupling may result in very large
1578 box oscillations that could even crash your run.  In that case you
1579 would have to increase the time constant, or (better) use the weak-coupling
1580 scheme to reach the target pressure, and then switch to
1581 Parrinello-Rahman coupling once the system is in equilibrium.
1582 Additionally, using the leap-frog algorithm, the pressure at time $t$
1583 is not available until after the time step has completed, and so the
1584 pressure from the previous step must be used, which makes the algorithm
1585 not directly reversible, and may not be appropriate for high precision
1586 thermodynamic calculations.
1587
1588 \subsubsection{Surface-tension coupling\pawsindexquiet{surface-tension}{pressure coupling}}
1589 When a periodic system consists of more than one phase, separated by
1590 surfaces which are parallel to the $xy$-plane,
1591 the surface tension and the $z$-component of the pressure can be coupled
1592 to a pressure bath. Presently, this only works with the Berendsen
1593 pressure coupling algorithm in {\gromacs}.
1594 The average surface tension $\gamma(t)$ can be calculated from
1595 the difference between the normal and the lateral pressure
1596 \bea
1597 \gamma(t) & = &
1598 \frac{1}{n} \int_0^{L_z}
1599 \left\{ P_{zz}(z,t) - \frac{P_{xx}(z,t) + P_{yy}(z,t)}{2} \right\} \mbox{d}z \\
1600 & = &
1601 \frac{L_z}{n} \left\{ P_{zz}(t) - \frac{P_{xx}(t) + P_{yy}(t)}{2} \right\},
1602 \eea
1603 where $L_z$ is the height of the box and $n$ is the number of surfaces.
1604 The pressure in the z-direction is corrected by scaling the height of
1605 the box with $\mu_{zz}$
1606 \beq
1607 \Delta P_{zz} = \frac{\Delta t}{\tau_p} \{ P_{0zz} - P_{zz}(t) \}
1608 \eeq
1609 \beq
1610 \mu_{zz} = 1 + \beta_{zz} \Delta P_{zz}
1611 \eeq
1612 This is similar to normal pressure coupling, except that the factor
1613 of $1/3$ is missing.
1614 The pressure correction in the $z$-direction is then used to get the
1615 correct convergence for the surface tension to the reference value $\gamma_0$.
1616 The correction factor for the box length in the $x$/$y$-direction is
1617 \beq
1618 \mu_{x/y} = 1 + \frac{\Delta t}{2\,\tau_p} \beta_{x/y}
1619         \left( \frac{n \gamma_0}{\mu_{zz} L_z}
1620         - \left\{ P_{zz}(t)+\Delta P_{zz} - \frac{P_{xx}(t) + P_{yy}(t)}{2} \right\}
1621         \right)
1622 \eeq
1623 The value of $\beta_{zz}$ is more critical than with normal pressure
1624 coupling. Normally an incorrect compressibility will just scale $\tau_p$,
1625 but with surface tension coupling it affects the convergence of the surface
1626 tension.
1627 When $\beta_{zz}$ is set to zero (constant box height), $\Delta P_{zz}$ is also set
1628 to zero, which is necessary for obtaining the correct surface tension.
1629
1630 \subsubsection{MTTK pressure control algorithms}
1631
1632 As mentioned in the previous section, one weakness of leap-frog
1633 integration is in constant pressure simulations, since the pressure
1634 requires a calculation of both the virial and the kinetic energy at
1635 the full time step; for leap-frog, this information is not available
1636 until {\em after} the full timestep.  Velocity Verlet does allow the
1637 calculation, at the cost of an extra round of global communication,
1638 and can compute, mod any integration errors, the true NPT ensemble.
1639
1640 The full equations, combining both pressure coupling and temperature
1641 coupling, are taken from Martyna {\em et al.}~\cite{Martyna1996} and
1642 Tuckerman~\cite{Tuckerman2006} and are referred to here as MTTK
1643 equations (Martyna-Tuckerman-Tobias-Klein).  We introduce for
1644 convenience $\epsilon = (1/3)\ln (V/V_0)$, where $V_0$ is a reference
1645 volume.  The momentum of $\epsilon$ is $\veps = p_{\epsilon}/W =
1646 \dot{\epsilon} = \dot{V}/3V$, and define $\alpha = 1 + 3/N_{dof}$ (see
1647 Ref~\cite{Tuckerman2006})
1648
1649 The isobaric equations are
1650 \bea
1651 \dot{\rv}_i &=& \frac{\pb_i}{m_i} + \frac{\peps}{W} \rv_i \nonumber \\
1652 \frac{\dot{\pb}_i}{m_i} &=& \frac{1}{m_i}\F_i - \alpha\frac{\peps}{W} \frac{\pb_i}{m_i} \nonumber \\
1653 \dot{\epsilon} &=& \frac{\peps}{W} \nonumber \\
1654 \frac{\dot{\peps}}{W} &=& \frac{3V}{W}(P_{\mathrm{int}} - P) + (\alpha-1)\left(\sum_{n=1}^N\frac{\pb_i^2}{m_i}\right),\\
1655 \eea
1656 where
1657 \bea
1658 P_{\mathrm{int}} &=& P_{\mathrm{kin}} -P_{\mathrm{vir}} = \frac{1}{3V}\left[\sum_{i=1}^N \left(\frac{\pb_i^2}{2m_i} - \rv_i \cdot \F_i\
1659 \right)\right].
1660 \eea
1661 The terms including $\alpha$ are required to make phase space
1662 incompressible~\cite{Tuckerman2006}. The $\epsilon$ acceleration term
1663 can be rewritten as
1664 \bea
1665 \frac{\dot{\peps}}{W} &=& \frac{3V}{W}\left(\alpha P_{\mathrm{kin}} - P_{\mathrm{vir}} - P\right)
1666 \eea
1667 In terms of velocities, these equations become
1668 \bea
1669 \dot{\rv}_i &=& \vv_i + \veps \rv_i \nonumber \\
1670 \dot{\vv}_i &=& \frac{1}{m_i}\F_i - \alpha\veps \vv_i \nonumber \\
1671 \dot{\epsilon} &=& \veps \nonumber \\
1672 \dot{\veps} &=& \frac{3V}{W}(P_{\mathrm{int}} - P) + (\alpha-1)\left( \sum_{n=1}^N \frac{1}{2} m_i \vv_i^2\right)\nonumber \\
1673 P_{\mathrm{int}} &=& P_{\mathrm{kin}} - P_{\mathrm{vir}} = \frac{1}{3V}\left[\sum_{i=1}^N \left(\frac{1}{2} m_i\vv_i^2 - \rv_i \cdot \F_i\right)\right]
1674 \eea
1675 For these equations, the conserved quantity is
1676 \bea
1677 H = \sum_{i=1}^{N} \frac{\pb_i^2}{2m_i} + U\left(\rv_1,\rv_2,\ldots,\rv_N\right) + \frac{p_\epsilon}{2W} + PV
1678 \eea
1679 The next step is to add temperature control.  Adding Nos{\'e}-Hoover
1680 chains, including to the barostat degree of freedom, where we use
1681 $\eta$ for the barostat Nos{\'e}-Hoover variables, and $Q^{\prime}$
1682 for the coupling constants of the thermostats of the barostats, we get
1683 \bea
1684 \dot{\rv}_i &=& \frac{\pb_i}{m_i} + \frac{\peps}{W} \rv_i \nonumber \\
1685 \frac{\dot{\pb}_i}{m_i} &=& \frac{1}{m_i}\F_i - \alpha\frac{\peps}{W} \frac{\pb_i}{m_i} - \frac{p_{\xi_1}}{Q_1}\frac{\pb_i}{m_i}\nonumber \\
1686 \dot{\epsilon} &=& \frac{\peps}{W} \nonumber \\
1687 \frac{\dot{\peps}}{W} &=& \frac{3V}{W}(\alpha P_{\mathrm{kin}} - P_{\mathrm{vir}} - P) -\frac{p_{\eta_1}}{Q^{\prime}_1}\peps \nonumber \\
1688 \dot{\xi}_k &=& \frac{p_{\xi_k}}{Q_k} \nonumber \\
1689 \dot{\eta}_k &=& \frac{p_{\eta_k}}{Q^{\prime}_k} \nonumber \\
1690 \dot{p}_{\xi_k} &=& G_k - \frac{p_{\xi_{k+1}}}{Q_{k+1}} \;\;\;\; k=1,\ldots, M-1 \nonumber \\
1691 \dot{p}_{\eta_k} &=& G^\prime_k - \frac{p_{\eta_{k+1}}}{Q^\prime_{k+1}} \;\;\;\; k=1,\ldots, M-1 \nonumber \\
1692 \dot{p}_{\xi_M} &=& G_M \nonumber \\
1693 \dot{p}_{\eta_M} &=& G^\prime_M, \nonumber \\
1694 \eea
1695 where
1696 \bea
1697 P_{\mathrm{int}} &=& P_{\mathrm{kin}} - P_{\mathrm{vir}} = \frac{1}{3V}\left[\sum_{i=1}^N \left(\frac{\pb_i^2}{2m_i} - \rv_i \cdot \F_i\right)\right] \nonumber \\
1698 G_1  &=& \sum_{i=1}^N \frac{\pb^2_i}{m_i} - N_f kT \nonumber \\
1699 G_k  &=&  \frac{p^2_{\xi_{k-1}}}{2Q_{k-1}} - kT \;\; k = 2,\ldots,M \nonumber \\
1700 G^\prime_1 &=& \frac{\peps^2}{2W} - kT \nonumber \\
1701 G^\prime_k &=& \frac{p^2_{\eta_{k-1}}}{2Q^\prime_{k-1}} - kT \;\; k = 2,\ldots,M
1702 \eea
1703 The conserved quantity is now
1704 \bea
1705 H = \sum_{i=1}^{N} \frac{\pb_i}{2m_i} + U\left(\rv_1,\rv_2,\ldots,\rv_N\right) + \frac{p^2_\epsilon}{2W} + PV + \nonumber \\
1706 \sum_{k=1}^M\frac{p^2_{\xi_k}}{2Q_k} +\sum_{k=1}^M\frac{p^2_{\eta_k}}{2Q^{\prime}_k} + N_fkT\xi_1 +  kT\sum_{i=2}^M \xi_k + kT\sum_{k=1}^M \eta_k
1707 \eea
1708 Returning to the Trotter decomposition formalism, for pressure control and temperature control~\cite{Martyna1996} we get:
1709 \bea
1710 iL = iL_1 + iL_2 + iL_{\epsilon,1} + iL_{\epsilon,2} + iL_{\mathrm{NHC-baro}} + iL_{\mathrm{NHC}}
1711 \eea
1712 where ``NHC-baro'' corresponds to the Nos{\`e}-Hoover chain of the barostat,
1713 and NHC corresponds to the NHC of the particles,
1714 \bea
1715 iL_1 &=& \sum_{i=1}^N \left[\frac{\pb_i}{m_i} + \frac{\peps}{W}\rv_i\right]\cdot \frac{\partial}{\partial \rv_i} \\
1716 iL_2 &=& \sum_{i=1}^N \F_i - \alpha \frac{\peps}{W}\pb_i \cdot \frac{\partial}{\partial \pb_i} \\
1717 iL_{\epsilon,1} &=& \frac{p_\epsilon}{W} \frac{\partial}{\partial \epsilon}\\
1718 iL_{\epsilon,2} &=& G_{\epsilon} \frac{\partial}{\partial p_\epsilon}
1719 \eea
1720 and where
1721 \bea
1722 G_{\epsilon} = 3V\left(\alpha P_{\mathrm{kin}} - P_{\mathrm{vir}} - P\right)
1723 \eea
1724 Using the Trotter decomposition, we get
1725 \bea
1726 \exp(iL\dt) &=& \exp\left(iL_{\mathrm{NHC-baro}}\dt/2\right)\exp\left(iL_{\mathrm{NHC}}\dt/2\right) \nonumber \nonumber \\
1727 &&\exp\left(iL_{\epsilon,2}\dt/2\right) \exp\left(iL_2 \dt/2\right) \nonumber \nonumber \\
1728 &&\exp\left(iL_{\epsilon,1}\dt\right) \exp\left(iL_1 \dt\right) \nonumber \nonumber \\
1729 &&\exp\left(iL_2 \dt/2\right) \exp\left(iL_{\epsilon,2}\dt/2\right) \nonumber \nonumber \\
1730 &&\exp\left(iL_{\mathrm{NHC}}\dt/2\right)\exp\left(iL_{\mathrm{NHC-baro}}\dt/2\right) + \mathcal{O}(\dt^3)
1731 \eea
1732 The action of $\exp\left(iL_1 \dt\right)$ comes from the solution of
1733 the the differential equation
1734 $\dot{\rv}_i = \vv_i + \veps \rv_i$
1735 with $\vv_i = \pb_i/m_i$ and $\veps$ constant with initial condition
1736 $\rv_i(0)$, evaluate at $t=\Delta t$.  This yields the evolution
1737 \beq
1738 \rv_i(\dt) = \rv_i(0)e^{\veps \dt} + \Delta t \vv_i(0) e^{\veps \dt/2} \sinhx{\veps \dt/2}.
1739 \eeq
1740 The action of $\exp\left(iL_2 \dt/2\right)$ comes from the solution
1741 of the differential equation $\dot{\vv}_i = \frac{\F_i}{m_i} -
1742 \alpha\veps\vv_i$, yielding
1743 \beq
1744 \vv_i(\dt/2) = \vv_i(0)e^{-\alpha\veps \dt/2} + \frac{\Delta t}{2m_i}\F_i(0) e^{-\alpha\veps \dt/4}\sinhx{\alpha\veps \dt/4}.
1745 \eeq
1746 {\em md-vv-avek} uses the full step kinetic energies for determining the pressure with the pressure control,
1747 but the half-step-averaged kinetic energy for the temperatures, which can be written as a Trotter decomposition as
1748 \bea
1749 \exp(iL\dt) &=& \exp\left(iL_{\mathrm{NHC-baro}}\dt/2\right)\nonumber \exp\left(iL_{\epsilon,2}\dt/2\right) \exp\left(iL_2 \dt/2\right) \nonumber \\
1750 &&\exp\left(iL_{\mathrm{NHC}}\dt/2\right) \exp\left(iL_{\epsilon,1}\dt\right) \exp\left(iL_1 \dt\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right) \nonumber \\
1751 &&\exp\left(iL_2 \dt/2\right) \exp\left(iL_{\epsilon,2}\dt/2\right) \exp\left(iL_{\mathrm{NHC-baro}}\dt/2\right) + \mathcal{O}(\dt^3)
1752 \eea
1753
1754 With constraints, the equations become significantly more complicated,
1755 in that each of these equations need to be solved iteratively for the
1756 constraint forces. Before {\gromacs} 5.1, these iterative
1757 constraints were solved as described in~\cite{Yu2010}. From {\gromacs}
1758 5.1 onward, MTTK with constraints has been removed because of
1759 numerical stability issues with the iterations.
1760
1761 \subsubsection{Infrequent evaluation of temperature and pressure coupling}
1762
1763 Temperature and pressure control require global communication to
1764 compute the kinetic energy and virial, which can become costly if
1765 performed every step for large systems.  We can rearrange the Trotter
1766 decomposition to give alternate symplectic, reversible integrator with
1767 the coupling steps every $n$ steps instead of every steps.  These new
1768 integrators will diverge if the coupling time step is too large, as
1769 the auxiliary variable integrations will not converge.  However, in
1770 most cases, long coupling times are more appropriate, as they disturb
1771 the dynamics less~\cite{Martyna1996}.
1772
1773 Standard velocity Verlet with Nos{\'e}-Hoover temperature control has a Trotter expansion
1774 \bea
1775 \exp(iL\dt) &\approx& \exp\left(iL_{\mathrm{NHC}}\dt/2\right) \exp\left(iL_2 \dt/2\right) \nonumber \\
1776 &&\exp\left(iL_1 \dt\right) \exp\left(iL_2 \dt/2\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right).
1777 \eea
1778 If the Nos{\'e}-Hoover chain is sufficiently slow with respect to the motions of the system, we can
1779 write an alternate integrator over $n$ steps for velocity Verlet as
1780 \bea
1781 \exp(iL\dt) &\approx& (\exp\left(iL_{\mathrm{NHC}}(n\dt/2)\right)\left[\exp\left(iL_2 \dt/2\right)\right. \nonumber \\
1782 &&\left.\exp\left(iL_1 \dt\right) \exp\left(iL_2 \dt/2\right)\right]^n \exp\left(iL_{\mathrm{NHC}}(n\dt/2)\right).
1783 \eea
1784 For pressure control, this becomes
1785 \bea
1786 \exp(iL\dt) &\approx& \exp\left(iL_{\mathrm{NHC-baro}}(n\dt/2)\right)\exp\left(iL_{\mathrm{NHC}}(n\dt/2)\right) \nonumber \nonumber \\
1787 &&\exp\left(iL_{\epsilon,2}(n\dt/2)\right) \left[\exp\left(iL_2 \dt/2\right)\right. \nonumber \nonumber \\
1788 &&\exp\left(iL_{\epsilon,1}\dt\right) \exp\left(iL_1 \dt\right) \nonumber \nonumber \\
1789 &&\left.\exp\left(iL_2 \dt/2\right)\right]^n \exp\left(iL_{\epsilon,2}(n\dt/2)\right) \nonumber \nonumber \\
1790 &&\exp\left(iL_{\mathrm{NHC}}(n\dt/2)\right)\exp\left(iL_{\mathrm{NHC-baro}}(n\dt/2)\right),
1791 \eea
1792 where the box volume integration occurs every step, but the auxiliary variable
1793 integrations happen every $n$ steps.
1794
1795 % } % Brace matches ifthenelse test for gmxlite
1796
1797
1798 \subsection{The complete update algorithm}
1799 \begin{figure}
1800 \begin{center}
1801 \addtolength{\fboxsep}{0.5cm}
1802 \begin{shadowenv}[12cm]
1803 {\large \bf THE UPDATE ALGORITHM}
1804 \rule{\textwidth}{2pt} \\
1805 Given:\\
1806 Positions $\ve{r}$ of all atoms at time $t$ \\
1807 Velocities $\ve{v}$ of all atoms at time $t-\hDt$ \\
1808 Accelerations $\ve{F}/m$ on all atoms at time $t$.\\
1809 (Forces are computed disregarding any constraints)\\
1810 Total kinetic energy and virial at $t-\Dt$\\
1811 $\Downarrow$ \\
1812 {\bf 1.} Compute the scaling factors $\lambda$ and $\mu$\\
1813 according to \eqnsref{lambda}{mu}\\
1814 $\Downarrow$ \\
1815 {\bf 2.} Update and scale velocities: $\ve{v}' =  \lambda (\ve{v} +
1816 \ve{a} \Delta t)$ \\
1817 $\Downarrow$ \\
1818 {\bf 3.} Compute new unconstrained coordinates: $\ve{r}' = \ve{r} + \ve{v}'
1819 \Delta t$ \\
1820 $\Downarrow$ \\
1821 {\bf 4.} Apply constraint algorithm to coordinates: constrain($\ve{r}^{'} \rightarrow  \ve{r}'';
1822 \,  \ve{r}$) \\
1823 $\Downarrow$ \\
1824 {\bf 5.} Correct velocities for constraints: $\ve{v} = (\ve{r}'' -
1825 \ve{r}) / \Delta t$ \\
1826 $\Downarrow$ \\
1827 {\bf 6.} Scale coordinates and box: $\ve{r} = \mu \ve{r}''; \ve{b} =
1828 \mu  \ve{b}$ \\
1829 \end{shadowenv}
1830 \caption{The MD update algorithm with the leap-frog integrator}
1831 \label{fig:complete-update}
1832 \end{center}
1833 \end{figure}
1834 The complete algorithm for the update of velocities and coordinates is
1835 given using leap-frog in \figref{complete-update}. The SHAKE algorithm of step
1836 4 is explained below.
1837
1838 {\gromacs} has a provision to ``freeze''  (prevent motion of) selected
1839 particles\index{frozen atoms}, which must be defined as a ``\swapindex{freeze}{group}.'' This is implemented
1840 using a {\em freeze factor $\ve{f}_g$}, which is a vector, and differs for each
1841 freeze group (see \secref{groupconcept}). This vector contains only
1842 zero (freeze) or one (don't freeze).
1843 When we take this freeze factor and the external acceleration $\ve{a}_h$ into
1844 account the update algorithm for the velocities becomes
1845 \beq
1846 \ve{v}(t+\hdt)~=~\ve{f}_g * \lambda * \left[ \ve{v}(t-\hdt) +\frac{\ve{F}(t)}{m}\Delta t + \ve{a}_h \Delta t \right],
1847 \eeq
1848 where $g$ and $h$ are group indices which differ per atom.
1849
1850 \subsection{Output step}
1851 The most important output of the MD run is the {\em
1852 \swapindex{trajectory}{file}}, which contains particle coordinates
1853 and (optionally) velocities at regular intervals.
1854 The trajectory file contains frames that could include positions,
1855 velocities and/or forces, as well as information about the dimensions
1856 of the simulation volume, integration step, integration time, etc. The
1857 interpretation of the time varies with the integrator chosen, as
1858 described above. For Velocity Verlet integrators, velocities labeled
1859 at time $t$ are for that time. For other integrators (e.g. leap-frog,
1860 stochastic dynamics), the velocities labeled at time $t$ are for time
1861 $t - \hDt$.
1862
1863 Since the trajectory
1864 files are lengthy, one should not save every step! To retain all
1865 information it suffices to write a frame every 15 steps, since at
1866 least 30 steps are made per period of the highest frequency in the
1867 system, and Shannon's \normindex{sampling} theorem states that two samples per
1868 period of the highest frequency in a band-limited signal contain all
1869 available information. But that still gives very long files! So, if
1870 the highest frequencies are not of interest, 10 or 20 samples per ps
1871 may suffice. Be aware of the distortion of high-frequency motions by
1872 the {\em stroboscopic effect}, called {\em aliasing}: higher frequencies
1873 are  mirrored with respect to the sampling frequency and appear as
1874 lower frequencies.
1875
1876 {\gromacs} can also write reduced-precision coordinates for a subset of
1877 the simulation system to a special compressed trajectory file
1878 format. All the other tools can read and write this format. See
1879 the User Guide for details on how to set up your {\tt .mdp} file
1880 to have {\tt mdrun} use this feature.
1881
1882 % \ifthenelse{\equal{\gmxlite}{1}}{}{
1883 \section{Shell molecular dynamics}
1884 {\gromacs} can simulate \normindex{polarizability} using the
1885 \normindex{shell model} of Dick and Overhauser~\cite{Dick58}. In such models
1886 a shell particle representing the electronic degrees of freedom is
1887 attached to a nucleus by a spring. The potential energy is minimized with
1888 respect to the shell position  at every step of the simulation (see below).
1889 Successful applications of shell models in {\gromacs} have been published
1890 for $N_2$~\cite{Jordan95} and water~\cite{Maaren2001a}.
1891
1892 \subsection{Optimization of the shell positions}
1893 The force \ve{F}$_S$ on a shell particle $S$ can be decomposed into two
1894 components
1895 \begin{equation}
1896 \ve{F}_S ~=~ \ve{F}_{bond} + \ve{F}_{nb}
1897 \end{equation}
1898 where \ve{F}$_{bond}$ denotes the component representing the
1899 polarization energy, usually represented by a harmonic potential and
1900 \ve{F}$_{nb}$ is the sum of Coulomb and van der Waals interactions. If we
1901 assume that \ve{F}$_{nb}$ is almost constant we can analytically derive the
1902 optimal position of the shell, i.e. where \ve{F}$_S$ = 0. If we have the
1903 shell S connected to atom A we have
1904 \begin{equation}
1905 \ve{F}_{bond} ~=~ k_b \left( \ve{x}_S - \ve{x}_A\right).
1906 \end{equation}
1907 In an iterative solver, we have positions \ve{x}$_S(n)$ where $n$ is
1908 the iteration count. We now have at iteration $n$
1909 \begin{equation}
1910 \ve{F}_{nb} ~=~ \ve{F}_S - k_b \left( \ve{x}_S(n) - \ve{x}_A\right)
1911 \end{equation}
1912 and the optimal position for the shells $x_S(n+1)$ thus follows from
1913 \begin{equation}
1914 \ve{F}_S - k_b \left( \ve{x}_S(n) - \ve{x}_A\right) + k_b \left( \ve{x}_S(n+1) - \ve{x}_A\right) = 0
1915 \end{equation}
1916 if we write
1917 \begin{equation}
1918 \Delta \ve{x}_S = \ve{x}_S(n+1) - \ve{x}_S(n)
1919 \end{equation}
1920 we finally obtain
1921 \begin{equation}
1922 \Delta \ve{x}_S = \ve{F}_S/k_b
1923 \end{equation}
1924 which then yields the algorithm to compute the next trial in the optimization
1925 of shell positions
1926 \begin{equation}
1927 \ve{x}_S(n+1) ~=~ \ve{x}_S(n) + \ve{F}_S/k_b.
1928 \end{equation}
1929 % } % Brace matches ifthenelse test for gmxlite
1930
1931 \section{Constraint algorithms\index{constraint algorithms}}
1932 Constraints can be imposed in {\gromacs} using LINCS (default) or
1933 the traditional SHAKE method.
1934
1935 \subsection{\normindex{SHAKE}}
1936 \label{subsec:SHAKE}
1937 The SHAKE~\cite{Ryckaert77} algorithm changes a set of unconstrained
1938 coordinates $\ve{r}^{'}$ to a set of coordinates $\ve{r}''$ that
1939 fulfill a  list of distance constraints, using a set $\ve{r}$
1940 reference, as
1941 \beq
1942 {\rm SHAKE}(\ve{r}^{'} \rightarrow \ve{r}'';\, \ve{r})
1943 \eeq
1944 This action is consistent with solving a set of Lagrange multipliers
1945 in the constrained equations of motion. SHAKE needs a {\em relative tolerance};
1946 it will continue until all constraints are satisfied within
1947 that relative tolerance. An error message is
1948 given if SHAKE cannot reset the coordinates because the deviation is
1949 too large, or if a given number of iterations is surpassed.
1950
1951 Assume the equations of motion must fulfill $K$ holonomic constraints,
1952 expressed as
1953 \beq
1954 \sigma_k(\ve{r}_1 \ldots \ve{r}_N) = 0; \;\; k=1 \ldots K.
1955 \eeq
1956 For example, $(\ve{r}_1 - \ve{r}_2)^2 - b^2 = 0$.
1957 Then the forces are defined as
1958 \beq
1959 - \frac{\partial}{\partial \ve{r}_i} \left( V + \sum_{k=1}^K \lambda_k
1960 \sigma_k \right),
1961 \eeq
1962 where $\lambda_k$ are Lagrange multipliers which must be solved to
1963 fulfill the constraint equations. The second part of this sum
1964 determines the {\em constraint forces} $\ve{G}_i$, defined by
1965 \beq
1966 \ve{G}_i = -\sum_{k=1}^K \lambda_k \frac{\partial \sigma_k}{\partial
1967 \ve{r}_i}
1968 \eeq
1969 The displacement due to the constraint forces in the leap-frog or
1970 Verlet algorithm is equal to $(\ve{G}_i/m_i)(\Dt)^2$. Solving the
1971 Lagrange multipliers (and hence the displacements) requires the
1972 solution of a set of coupled equations of the second degree. These are
1973 solved iteratively by SHAKE.
1974 % \ifthenelse{\equal{\gmxlite}{1}}{}{
1975 \label{subsec:SETTLE}
1976 For the special case of rigid water molecules, that often make up more
1977 than 80\% of the simulation system we have implemented the
1978 \normindex{SETTLE}
1979 algorithm~\cite{Miyamoto92} (\secref{constraints}).
1980
1981 For velocity Verlet, an additional round of constraining must be
1982 done, to constrain the velocities of the second velocity half step,
1983 removing any component of the velocity parallel to the bond vector.
1984 This step is called RATTLE, and is covered in more detail in the
1985 original Andersen paper~\cite{Andersen1983a}.
1986
1987 % } % Brace matches ifthenelse test for gmxlite
1988
1989
1990
1991
1992 \newcommand{\fs}[1]{\begin{equation} \label{eqn:#1}}
1993 \newcommand{\fe}{\end{equation}}
1994 \newcommand{\p}{\partial}
1995 \newcommand{\Bm}{\ve{B}}
1996 \newcommand{\M}{\ve{M}}
1997 \newcommand{\iM}{\M^{-1}}
1998 \newcommand{\Tm}{\ve{T}}
1999 \newcommand{\Sm}{\ve{S}}
2000 \newcommand{\fo}{\ve{f}}
2001 \newcommand{\con}{\ve{g}}
2002 \newcommand{\lenc}{\ve{d}}
2003
2004 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2005 \subsection{\normindex{LINCS}}
2006 \label{subsec:lincs}
2007
2008 \subsubsection{The LINCS algorithm}
2009 LINCS is an algorithm that resets bonds to their correct lengths
2010 after an unconstrained update~\cite{Hess97}.
2011 The method is non-iterative, as it always uses two steps.
2012 Although LINCS is based on matrices, no matrix-matrix multiplications are
2013 needed. The method is more stable and faster than SHAKE,
2014 but it can only be used with bond constraints and
2015 isolated angle constraints, such as the proton angle in OH.
2016 Because of its stability, LINCS is especially useful for Brownian dynamics.
2017 LINCS has two parameters, which are explained in the subsection parameters.
2018 The parallel version of LINCS, P-LINCS, is described
2019 in subsection \ssecref{plincs}.
2020
2021 \subsubsection{The LINCS formulas}
2022 We consider a system of $N$ particles, with positions given by a
2023 $3N$ vector $\ve{r}(t)$.
2024 For molecular dynamics the equations of motion are given by Newton's Law
2025 \fs{c1}
2026 {\de^2 \ve{r} \over \de t^2} = \iM \ve{F},
2027 \fe
2028 where $\ve{F}$ is the $3N$ force vector
2029 and $\M$ is a $3N \times 3N$ diagonal matrix,
2030 containing the masses of the particles.
2031 The system is constrained by $K$ time-independent constraint equations
2032 \fs{c2}
2033 g_i(\ve{r}) = | \ve{r}_{i_1}-\ve{r}_{i_2} | - d_i = 0 ~~~~~~i=1,\ldots,K.
2034 \fe
2035
2036 In a numerical integration scheme, LINCS is applied after an
2037 unconstrained update, just like SHAKE. The algorithm works in two
2038 steps (see figure \figref{lincs}). In the first step, the projections
2039 of the new bonds on the old bonds are set to zero. In the second step,
2040 a correction is applied for the lengthening of the bonds due to
2041 rotation. The numerics for the first step and the second step are very
2042 similar. A complete derivation of the algorithm can be found in
2043 \cite{Hess97}. Only a short description of the first step is given
2044 here.
2045
2046 \begin{figure}
2047 \centerline{\includegraphics[height=50mm]{plots/lincs}}
2048 \caption[The three position updates needed for one time step.]{The
2049 three position updates needed for one time step. The dashed line is
2050 the old bond of length $d$, the solid lines are the new bonds. $l=d
2051 \cos \theta$ and $p=(2 d^2 - l^2)^{1 \over 2}$.}
2052 \label{fig:lincs}
2053 \end{figure}
2054
2055 A new notation is introduced for the gradient matrix of the constraint
2056 equations which appears on the right hand side of this equation:
2057 \fs{c3}
2058 B_{hi} = {\p g_h \over \p r_i}
2059 \fe
2060 Notice that $\Bm$ is a $K \times 3N$ matrix, it contains the directions
2061 of the constraints.
2062 The following equation shows how the new constrained coordinates
2063 $\ve{r}_{n+1}$ are related to the unconstrained coordinates
2064 $\ve{r}_{n+1}^{unc}$ by
2065 \fs{m0}
2066 \begin{array}{c}
2067   \ve{r}_{n+1}=(\ve{I}-\Tm_n \ve{B}_n) \ve{r}_{n+1}^{unc} + \Tm_n \lenc=
2068   \\[2mm]
2069   \ve{r}_{n+1}^{unc} -
2070 \iM \Bm_n (\Bm_n \iM \Bm_n^T)^{-1} (\Bm_n \ve{r}_{n+1}^{unc} - \lenc)
2071 \end{array}
2072 \fe
2073 where $\Tm = \iM \Bm^T (\Bm \iM \Bm^T)^{-1}$.
2074 The derivation of this equation from \eqnsref{c1}{c2} can be found
2075 in \cite{Hess97}.
2076
2077 This first step does not set the real bond lengths to the prescribed lengths,
2078 but the projection of the new bonds onto the old directions of the bonds.
2079 To correct for the rotation of bond $i$, the projection of the
2080 bond, $p_i$, on the old direction is set to
2081 \fs{m1a}
2082 p_i=\sqrt{2 d_i^2 - l_i^2},
2083 \fe
2084 where $l_i$ is the bond length after the first projection.
2085 The corrected positions are
2086 \fs{m1b}
2087 \ve{r}_{n+1}^*=(\ve{I}-\Tm_n \Bm_n)\ve{r}_{n+1} + \Tm_n \ve{p}.
2088 \fe
2089 This correction for rotational effects is actually an iterative process,
2090 but during MD only one iteration is applied.
2091 The relative constraint deviation after this procedure will be less than
2092 0.0001 for every constraint.
2093 In energy minimization, this might not be accurate enough, so the number
2094 of iterations is equal to the order of the expansion (see below).
2095
2096 Half of the CPU time goes to inverting the constraint coupling
2097 matrix $\Bm_n \iM \Bm_n^T$, which has to be done every time step.
2098 This $K \times K$ matrix
2099 has $1/m_{i_1} + 1/m_{i_2}$ on the diagonal.
2100 The off-diagonal elements are only non-zero when two bonds are connected,
2101 then the element is
2102 $\cos \phi /m_c$,  where $m_c$ is
2103 the mass of the atom connecting the
2104 two bonds and $\phi$ is the angle between the bonds.
2105
2106 The matrix $\Tm$ is inverted through a power expansion.
2107 A $K \times K$ matrix $\ve{S}$ is
2108 introduced which is the inverse square root of
2109 the diagonal of $\Bm_n \iM \Bm_n^T$.
2110 This matrix is used to convert the diagonal elements
2111 of the coupling matrix to one:
2112 \fs{m2}
2113 \begin{array}{c}
2114 (\Bm_n \iM \Bm_n^T)^{-1}
2115 = \Sm \Sm^{-1} (\Bm_n \iM \Bm_n^T)^{-1} \Sm^{-1} \Sm  \\[2mm]
2116 = \Sm (\Sm \Bm_n \iM \Bm_n^T \Sm)^{-1} \Sm =
2117   \Sm (\ve{I} - \ve{A}_n)^{-1} \Sm
2118 \end{array}
2119 \fe
2120 The matrix $\ve{A}_n$ is symmetric and sparse and has zeros on the diagonal.
2121 Thus a simple trick can be used to calculate the inverse:
2122 \fs{m3}
2123 (\ve{I}-\ve{A}_n)^{-1}=
2124         \ve{I} + \ve{A}_n + \ve{A}_n^2 + \ve{A}_n^3 + \ldots
2125 \fe
2126
2127 This inversion method is only valid if the absolute values of all the
2128 eigenvalues of $\ve{A}_n$ are smaller than one.
2129 In molecules with only bond constraints, the connectivity is so low
2130 that this will always be true, even if ring structures are present.
2131 Problems can arise in angle-constrained molecules.
2132 By constraining angles with additional distance constraints,
2133 multiple small ring structures are introduced.
2134 This gives a high connectivity, leading to large eigenvalues.
2135 Therefore LINCS should NOT be used with coupled angle-constraints.
2136
2137 For molecules with all bonds constrained the eigenvalues of $A$
2138 are around 0.4. This means that with each additional order
2139 in the expansion \eqnref{m3} the deviations decrease by a factor 0.4.
2140 But for relatively isolated triangles of constraints the largest
2141 eigenvalue is around 0.7.
2142 Such triangles can occur when removing hydrogen angle vibrations
2143 with an additional angle constraint in alcohol groups
2144 or when constraining water molecules with LINCS, for instance
2145 with flexible constraints.
2146 The constraints in such triangles converge twice as slow as
2147 the other constraints. Therefore, starting with {\gromacs} 4,
2148 additional terms are added to the expansion for such triangles
2149 \fs{m3_ang}
2150 (\ve{I}-\ve{A}_n)^{-1} \approx
2151         \ve{I} + \ve{A}_n + \ldots + \ve{A}_n^{N_i} +
2152         \left(\ve{A}^*_n + \ldots + {\ve{A}_n^*}^{N_i} \right) \ve{A}_n^{N_i}
2153 \fe
2154 where $N_i$ is the normal order of the expansion and
2155 $\ve{A}^*$ only contains the elements of $\ve{A}$ that couple
2156 constraints within rigid triangles, all other elements are zero.
2157 In this manner, the accuracy of angle constraints comes close
2158 to that of the other constraints, while the series of matrix vector
2159 multiplications required for determining the expansion
2160 only needs to be extended for a few constraint couplings.
2161 This procedure is described in the P-LINCS paper\cite{Hess2008a}.
2162
2163 \subsubsection{The LINCS Parameters}
2164 The accuracy of LINCS depends on the number of matrices used
2165 in the expansion \eqnref{m3}. For MD calculations a fourth order
2166 expansion is enough. For Brownian dynamics with
2167 large time steps an eighth order expansion may be necessary.
2168 The order is a parameter in the {\tt *.mdp} file.
2169 The implementation of LINCS is done in such a way that the
2170 algorithm will never crash. Even when it is impossible to
2171 to reset the constraints LINCS will generate a conformation
2172 which fulfills the constraints as well as possible.
2173 However, LINCS will generate a warning when in one step a bond
2174 rotates over more than a predefined angle.
2175 This angle is set by the user in the {\tt *.mdp} file.
2176
2177 % } % Brace matches ifthenelse test for gmxlite
2178
2179
2180 \section{Simulated Annealing}
2181 \label{sec:SA}
2182 The well known \swapindex{simulated}{annealing}
2183 (SA) protocol is supported in {\gromacs}, and you can even couple multiple
2184 groups of atoms separately with an arbitrary number of reference temperatures
2185 that change during the simulation. The annealing is implemented by simply
2186 changing the current reference temperature for each group in the temperature
2187 coupling, so the actual relaxation and coupling properties depends on the
2188 type of thermostat you use and how hard you are coupling it. Since we are
2189 changing the reference temperature it is important to remember that the system
2190 will NOT instantaneously reach this value - you need to allow for the inherent
2191 relaxation time in the coupling algorithm too. If you are changing the
2192 annealing reference temperature faster than the temperature relaxation you
2193 will probably end up with a crash when the difference becomes too large.
2194
2195 The annealing protocol is specified as a series of corresponding times and
2196 reference temperatures for each group, and you can also choose whether you only
2197 want a single sequence (after which the temperature will be coupled to the
2198 last reference value), or if the annealing should be periodic and restart at
2199 the first reference point once the sequence is completed. You can mix and
2200 match both types of annealing and non-annealed groups in your simulation.
2201
2202 \newcommand{\vrond}{\stackrel{\circ}{\ve{r}}}
2203 \newcommand{\rond}{\stackrel{\circ}{r}}
2204 \newcommand{\ruis}{\ve{r}^G}
2205
2206 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2207 \section{Stochastic Dynamics\swapindexquiet{stochastic}{dynamics}}
2208 \label{sec:SD}
2209 Stochastic or velocity \swapindex{Langevin}{dynamics} adds a friction
2210 and a noise term to Newton's equations of motion, as
2211 \beq
2212 \label{SDeq}
2213 m_i {\de^2 \ve{r}_i \over \de t^2} =
2214 - m_i \gamma_i {\de \ve{r}_i \over \de t} + \ve{F}_i(\ve{r}) + \vrond_i,
2215 \eeq
2216 where $\gamma_i$ is the friction constant $[1/\mbox{ps}]$ and
2217 $\vrond_i\!\!(t)$  is a noise process with
2218 $\langle \rond_i\!\!(t) \rond_j\!\!(t+s) \rangle =
2219     2 m_i \gamma_i k_B T \delta(s) \delta_{ij}$.
2220 When $1/\gamma_i$ is large compared to the time scales present in the system,
2221 one could see stochastic dynamics as molecular dynamics with stochastic
2222 temperature-coupling. The advantage compared to MD with Berendsen
2223 temperature-coupling is that in case of SD the generated ensemble is known.
2224 For simulating a system in vacuum there is the additional advantage that there is no
2225 accumulation of errors for the overall translational and rotational
2226 degrees of freedom.
2227 When $1/\gamma_i$ is small compared to the time scales present in the system,
2228 the dynamics will be completely different from MD, but the sampling is
2229 still correct.
2230
2231 In {\gromacs} there are two algorithms to integrate equation (\ref{SDeq}):
2232 a simple and efficient one
2233 and a more complex leap-frog algorithm~\cite{Gunsteren88}, which is now deprecated.
2234 The accuracy of both integrators is equivalent to the normal MD leap-frog and
2235 Velocity Verlet integrator, except with constraints where the complex
2236 SD integrator samples at a temperature that is slightly too high (although that error is smaller than the one from the Velocity Verlet integrator that uses the kinetic energy from the full-step velocity). The simple integrator is nearly identical to the common way of discretizing the Langevin equation, but the friction and velocity term are applied in an impulse fashion~\cite{Goga2012}.
2237 The simple integrator is:
2238 \bea
2239 \label{eqn:sd_int1}
2240 \ve{v}'  &~=~&   \ve{v}(t-\hDt) + \frac{1}{m}\ve{F}(t)\Dt \\
2241 \Delta\ve{v}     &~=~&   -\alpha \, \ve{v}'(t+\hDt) + \sqrt{\frac{k_B T}{m}(1 - \alpha^2)} \, \ruis_i \\
2242 \ve{r}(t+\Dt)   &~=~&   \ve{r}(t)+\left(\ve{v}' +\frac{1}{2}\Delta \ve{v}\right)\Dt \label{eqn:sd1_x_upd}\\
2243 \ve{v}(t+\hDt)  &~=~&   \ve{v}' + \Delta \ve{v} \\
2244 \alpha &~=~& 1 - e^{-\gamma \Dt}
2245 \eea
2246 where $\ruis_i$ is Gaussian distributed noise with $\mu = 0$, $\sigma = 1$.
2247 The velocity is first updated a full time step without friction and noise to get $\ve{v}'$, identical to the normal update in leap-frog. The friction and noise are then applied as an impulse at step $t+\Dt$. The advantage of this scheme is that the velocity-dependent terms act at the full time step, which makes the correct integration of forces that depend on both coordinates and velocities, such as constraints and dissipative particle dynamics (DPD, not implented yet), straightforward. With constraints, the coordinate update \eqnref{sd1_x_upd} is split into a normal leap-frog update and a $\Delta \ve{v}$. After both of these updates the constraints are applied to coordinates and velocities.
2248
2249 In the deprecated complex algorithm, four Gaussian random numbers are required
2250 per integration step per degree of freedom, and with constraints the
2251 coordinates need to be constrained twice per integration step.
2252 Depending on the computational cost of the force calculation,
2253 this can take a significant part of the simulation time.
2254 Exact continuation of a stochastic dynamics simulation is not possible,
2255 because the state of the random number generator is not stored.
2256
2257 When using SD as a thermostat, an appropriate value for $\gamma$ is e.g. 0.5 ps$^{-1}$,
2258 since this results in a friction that is lower than the internal friction
2259 of water, while it still provides efficient thermostatting.
2260
2261
2262 \section{Brownian Dynamics\swapindexquiet{Brownian}{dynamics}}
2263 \label{sec:BD}
2264 In the limit of high friction, stochastic dynamics reduces to
2265 Brownian dynamics, also called position Langevin dynamics.
2266 This applies to over-damped systems,
2267 {\ie} systems in which the inertia effects are negligible.
2268 The equation is
2269 \beq
2270 {\de \ve{r}_i \over \de t} = \frac{1}{\gamma_i} \ve{F}_i(\ve{r}) + \vrond_i
2271 \eeq
2272 where $\gamma_i$ is the friction coefficient $[\mbox{amu/ps}]$ and
2273 $\vrond_i\!\!(t)$  is a noise process with
2274 $\langle \rond_i\!\!(t) \rond_j\!\!(t+s) \rangle =
2275     2 \delta(s) \delta_{ij} k_B T / \gamma_i$.
2276 In {\gromacs} the equations are integrated with a simple, explicit scheme
2277 \beq
2278 \ve{r}_i(t+\Delta t) = \ve{r}_i(t) +
2279         {\Delta t \over \gamma_i} \ve{F}_i(\ve{r}(t))
2280         + \sqrt{2 k_B T {\Delta t \over \gamma_i}}\, \ruis_i,
2281 \eeq
2282 where $\ruis_i$ is Gaussian distributed noise with $\mu = 0$, $\sigma = 1$.
2283 The friction coefficients $\gamma_i$ can be chosen the same for all
2284 particles or as $\gamma_i = m_i\,\gamma_i$, where the friction constants
2285 $\gamma_i$ can be different for different groups of atoms.
2286 Because the system is assumed to be over-damped, large timesteps
2287 can be used. LINCS should be used for the constraints since SHAKE
2288 will not converge for large atomic displacements.
2289 BD is an option of the {\tt mdrun} program.
2290 % } % Brace matches ifthenelse test for gmxlite
2291
2292 \section{Energy Minimization}
2293 \label{sec:EM}\index{energy minimization}%
2294 Energy minimization in {\gromacs} can be done using steepest descent,
2295 conjugate gradients, or l-bfgs (limited-memory
2296 Broyden-Fletcher-Goldfarb-Shanno quasi-Newtonian minimizer...we
2297 prefer the abbreviation). EM is just an option of the {\tt mdrun}
2298 program.
2299
2300 \subsection{Steepest Descent\index{steepest descent}}
2301 Although steepest descent is certainly not the most efficient
2302 algorithm for searching, it is robust and easy to implement.
2303
2304 We define the vector $\ve{r}$ as the vector of all $3N$ coordinates.
2305 Initially a maximum displacement $h_0$ ({\eg} 0.01 nm) must be given.
2306
2307 First the forces $\ve{F}$ and potential energy are calculated.
2308 New positions are calculated by
2309 \beq
2310 \ve{r}_{n+1} =  \ve{r}_n + \frac{\ve{F}_n}{\max (|\ve{F}_n|)} h_n,
2311 \eeq
2312 where $h_n$ is the maximum displacement and $\ve{F}_n$ is the force,
2313 or the negative gradient of the  potential $V$. The notation $\max
2314 (|\ve{F}_n|)$ means the largest of the absolute values of the force
2315 components.  The forces and energy are again computed for the new positions \\
2316 If ($V_{n+1} < V_n$) the new positions are accepted and $h_{n+1} = 1.2
2317 h_n$. \\
2318 If ($V_{n+1} \geq V_n$) the new positions are rejected and $h_n = 0.2 h_n$.
2319
2320 The algorithm stops when either a user-specified number of force
2321 evaluations has been performed ({\eg} 100), or when the maximum of the absolute
2322 values of the force (gradient) components is smaller than a specified
2323 value $\epsilon$.
2324 Since force truncation produces some noise in the
2325 energy evaluation, the stopping criterion should not be made too tight
2326 to avoid endless iterations. A reasonable value for $\epsilon$ can be
2327 estimated from the root mean square force $f$ a harmonic oscillator would exhibit at a
2328 temperature $T$. This value is
2329 \beq
2330   f = 2 \pi \nu \sqrt{ 2mkT},
2331 \eeq
2332 where $\nu$ is the oscillator frequency, $m$ the (reduced) mass, and
2333 $k$ Boltzmann's constant. For a weak oscillator with a wave number of
2334 100 cm$^{-1}$ and a mass of 10 atomic units, at a temperature of 1 K,
2335 $f=7.7$ kJ~mol$^{-1}$~nm$^{-1}$. A value for $\epsilon$ between 1 and
2336 10 is acceptable.
2337
2338 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2339 \subsection{Conjugate Gradient\index{conjugate gradient}}
2340 Conjugate gradient is slower than steepest descent in the early stages
2341 of the minimization, but becomes more efficient closer to the energy
2342 minimum.  The parameters and stop criterion are the same as for
2343 steepest descent.  In {\gromacs} conjugate gradient can not be used
2344 with constraints, including the SETTLE algorithm for
2345 water~\cite{Miyamoto92}, as this has not been implemented. If water is
2346 present it must be of a flexible model, which can be specified in the
2347 {\tt *.mdp} file by {\tt define = -DFLEXIBLE}.
2348
2349 This is not really a restriction, since the accuracy of conjugate
2350 gradient is only required for minimization prior to a normal-mode
2351 analysis, which cannot be performed with constraints.  For most other
2352 purposes steepest descent is efficient enough.
2353 % } % Brace matches ifthenelse test for gmxlite
2354
2355 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2356 \subsection{\normindex{L-BFGS}}
2357 The original BFGS algorithm works by successively creating better
2358 approximations of the inverse Hessian matrix, and moving the system to
2359 the currently estimated minimum. The memory requirements for this are
2360 proportional to the square of the number of particles, so it is not
2361 practical for large systems like biomolecules. Instead, we use the
2362 L-BFGS algorithm of Nocedal~\cite{Byrd95a,Zhu97a}, which approximates
2363 the inverse Hessian by a fixed number of corrections from previous
2364 steps. This sliding-window technique is almost as efficient as the
2365 original method, but the memory requirements are much lower -
2366 proportional to the number of particles multiplied with the correction
2367 steps. In practice we have found it to converge faster than conjugate
2368 gradients, but due to the correction steps it is not yet parallelized.
2369 It is also noteworthy that switched or shifted interactions usually
2370 improve the convergence, since sharp cut-offs mean the potential
2371 function at the current coordinates is slightly different from the
2372 previous steps used to build the inverse Hessian approximation.
2373 % } % Brace matches ifthenelse test for gmxlite
2374
2375 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2376 \section{Normal-Mode Analysis\index{normal-mode analysis}\index{NMA}}
2377 Normal-mode analysis~\cite{Levitt83,Go83,BBrooks83b}
2378 can be performed using {\gromacs}, by diagonalization of the mass-weighted
2379 \normindex{Hessian} $H$:
2380 \bea
2381 R^T M^{-1/2} H M^{-1/2} R   &=& \mbox{diag}(\lambda_1,\ldots,\lambda_{3N})
2382 \\
2383 \lambda_i &=& (2 \pi \omega_i)^2
2384 \eea
2385 where $M$ contains the atomic masses, $R$ is a matrix that contains
2386 the eigenvectors as columns, $\lambda_i$ are the eigenvalues
2387 and $\omega_i$ are the corresponding frequencies.
2388
2389 First the Hessian matrix, which is a $3N \times 3N$ matrix where $N$
2390 is the number of atoms, needs to be calculated:
2391 \bea
2392 H_{ij}  &=&     \frac{\partial^2 V}{\partial x_i \partial x_j}
2393 \eea
2394 where $x_i$ and $x_j$ denote the atomic x, y or z coordinates.
2395 In practice, this equation is not used, but the Hessian is
2396 calculated numerically from the force as:
2397 \bea
2398 H_{ij} &=& -
2399   \frac{f_i({\bf x}+h{\bf e}_j) - f_i({\bf x}-h{\bf e}_j)}{2h}
2400 \\
2401 f_i     &=& - \frac{\partial V}{\partial x_i}
2402 \eea
2403 where ${\bf e}_j$ is the unit vector in direction $j$.
2404 It should be noted that
2405 for a usual normal-mode calculation, it is necessary to completely minimize
2406 the energy prior to computation of the Hessian.
2407 The tolerance required depends on the type of system,
2408 but a rough indication is 0.001 kJ mol$^{-1}$.
2409 Minimization should be done with conjugate gradients or L-BFGS in double precision.
2410
2411 A number of {\gromacs} programs are involved in these
2412 calculations. First, the energy should be minimized using {\tt mdrun}.
2413 Then, {\tt mdrun} computes the Hessian.  {\bf Note} that for generating
2414 the run input file, one should use the minimized conformation from
2415 the full precision trajectory file, as the structure file is not
2416 accurate enough.
2417 {\tt \normindex{g_nmeig}} does the diagonalization and
2418 the sorting of the normal modes according to their frequencies.
2419 Both {\tt mdrun} and {\tt g_nmeig} should be run in double precision.
2420 The normal modes can be analyzed with the program {\tt g_anaeig}.
2421 Ensembles of structures at any temperature and for any subset of
2422 normal modes can be generated with {\tt \normindex{g_nmens}}.
2423 An overview of normal-mode analysis and the related principal component
2424 analysis (see \secref{covanal}) can be found in~\cite{Hayward95b}.
2425 % } % Brace matches ifthenelse test for gmxlite
2426
2427 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2428
2429 \section{Free energy calculations\index{free energy calculations}}
2430 \label{sec:fecalc}
2431 \subsection{Slow-growth methods\index{slow-growth methods}}
2432 Free energy calculations can be performed
2433 in {\gromacs} using  a number of methods, including ``slow-growth.'' An example problem
2434 might be calculating the difference in free energy of binding of an inhibitor {\bf I}
2435 to an enzyme {\bf E} and to a mutated enzyme {\bf E$^{\prime}$}. It
2436 is not feasible with computer simulations to perform a docking
2437 calculation for such a large complex, or even releasing the inhibitor from
2438 the enzyme in a reasonable amount of computer time with reasonable accuracy.
2439 However, if we consider the free energy cycle in~\figref{free}A
2440 we can write:
2441 \beq
2442 \Delta G_1 - \Delta G_2 =       \Delta G_3 - \Delta G_4
2443 \label{eqn:ddg}
2444 \eeq
2445 If we are interested in the left-hand term we can equally well compute
2446 the right-hand term.
2447 \begin{figure}
2448 \centerline{\includegraphics[width=6cm,angle=270]{plots/free1}\hspace{2cm}\includegraphics[width=6cm,angle=270]{plots/free2}}
2449 \caption[Free energy cycles.]{Free energy cycles. {\bf A:} to
2450 calculate $\Delta G_{12}$, the free energy difference between the
2451 binding of inhibitor {\bf I} to enzymes {\bf E} respectively {\bf
2452 E$^{\prime}$}. {\bf B:} to calculate $\Delta G_{12}$, the free energy
2453 difference for binding of inhibitors {\bf I} respectively {\bf I$^{\prime}$} to
2454 enzyme {\bf E}.}
2455 \label{fig:free}
2456 \end{figure}
2457
2458 If we want to compute the difference in free energy of binding of two
2459 inhibitors {\bf I} and {\bf I$^{\prime}$} to an enzyme {\bf E} (\figref{free}B)
2460 we can again use \eqnref{ddg} to compute the desired property.
2461
2462 \newcommand{\sA}{^{\mathrm{A}}}
2463 \newcommand{\sB}{^{\mathrm{B}}}
2464 Free energy differences between two molecular species can
2465 be calculated in {\gromacs} using the ``slow-growth'' method.
2466 Such free energy differences between different molecular species are
2467 physically meaningless, but they can be used to obtain meaningful
2468 quantities employing a thermodynamic cycle.
2469 The method requires a simulation during which the Hamiltonian of the
2470 system changes slowly from that describing one system (A) to that
2471 describing the other system (B). The change must be so slow that the
2472 system remains in equilibrium during the process; if that requirement
2473 is fulfilled, the change is reversible and a slow-growth simulation from B to A
2474 will yield the same results (but with a different sign) as a slow-growth
2475 simulation from A to B. This is a useful check, but the user should be
2476 aware of the danger that equality of forward and backward growth results does
2477 not guarantee correctness of the results.
2478
2479 The required modification of the Hamiltonian $H$ is realized by making
2480 $H$ a function of a \textit{coupling parameter} $\lambda:
2481 H=H(p,q;\lambda)$ in such a way that $\lambda=0$ describes system A
2482 and $\lambda=1$ describes system B:
2483 \beq
2484   H(p,q;0)=H\sA (p,q);~~~~ H(p,q;1)=H\sB (p,q).
2485 \eeq
2486 In {\gromacs}, the functional form of the $\lambda$-dependence is
2487 different for the various force-field contributions and is described
2488 in section \secref{feia}.
2489
2490 The Helmholtz free energy $A$ is related to the
2491 partition function $Q$ of an $N,V,T$ ensemble, which is assumed to be
2492 the equilibrium ensemble generated by a MD simulation at constant
2493 volume and temperature. The generally more useful Gibbs free energy
2494 $G$ is related to the partition function $\Delta$ of an $N,p,T$
2495 ensemble, which is assumed to be the equilibrium ensemble generated by
2496 a MD simulation at constant pressure and temperature:
2497 \bea
2498  A(\lambda) &=&  -k_BT \ln Q \\
2499  Q &=& c \int\!\!\int \exp[-\beta H(p,q;\lambda)]\,dp\,dq \\
2500  G(\lambda) &=&  -k_BT \ln \Delta \\
2501  \Delta &=& c \int\!\!\int\!\!\int \exp[-\beta H(p,q;\lambda) -\beta
2502 pV]\,dp\,dq\,dV \\
2503 G &=& A + pV,
2504 \eea
2505 where $\beta = 1/(k_BT)$ and $c = (N! h^{3N})^{-1}$.
2506 These integrals over phase space cannot be evaluated from a
2507 simulation, but it is possible to evaluate the derivative with
2508 respect to $\lambda$ as an ensemble average:
2509 \beq
2510  \frac{dA}{d\lambda} =  \frac{\int\!\!\int (\partial H/ \partial
2511 \lambda) \exp[-\beta H(p,q;\lambda)]\,dp\,dq}{\int\!\!\int \exp[-\beta
2512 H(p,q;\lambda)]\,dp\,dq} =
2513 \left\langle \frac{\partial H}{\partial \lambda} \right\rangle_{NVT;\lambda},
2514 \eeq
2515 with a similar relation for $dG/d\lambda$ in the $N,p,T$
2516 ensemble.  The difference in free energy between A and B can be found
2517 by integrating the derivative over $\lambda$:
2518 \bea
2519   A\sB(V,T)-A\sA(V,T) &=& \int_0^1 \left\langle \frac{\partial
2520 H}{\partial \lambda} \right\rangle_{NVT;\lambda} \,d\lambda
2521 \label{eq:delA} \\
2522  G\sB(p,T)-G\sA(p,T) &=& \int_0^1 \left\langle \frac{\partial
2523 H}{\partial \lambda} \right\rangle_{NpT;\lambda} \,d\lambda.
2524 \label{eq:delG}
2525 \eea
2526 If one wishes to evaluate $G\sB(p,T)-G\sA(p,T)$,
2527 the natural choice is a constant-pressure simulation. However, this
2528 quantity can also be obtained from a slow-growth simulation at
2529 constant volume, starting with system A at pressure $p$ and volume $V$
2530 and ending with system B at pressure $p_B$, by applying the following
2531 small (but, in principle, exact) correction:
2532 \beq
2533   G\sB(p)-G\sA(p) =
2534 A\sB(V)-A\sA(V) - \int_p^{p\sB}[V\sB(p')-V]\,dp'
2535 \eeq
2536 Here we omitted the constant $T$ from the notation. This correction is
2537 roughly equal to $-\frac{1}{2} (p\sB-p)\Delta V=(\Delta V)^2/(2
2538 \kappa V)$, where $\Delta V$ is the volume change at $p$ and $\kappa$
2539 is the isothermal compressibility. This is usually
2540 small; for example, the growth of a water molecule from nothing
2541 in a bath of 1000 water molecules at constant volume would produce an
2542 additional pressure of as much as 22 bar, but a correction to the
2543 Helmholtz free energy of just -1 kJ mol$^{-1}$. %-20 J/mol.
2544
2545 In Cartesian coordinates, the kinetic energy term in the Hamiltonian
2546 depends only on the momenta, and can be separately integrated and, in
2547 fact, removed from the equations. When masses do not change, there is
2548 no contribution from the kinetic energy at all; otherwise the
2549 integrated contribution to the free energy is $-\frac{3}{2} k_BT \ln
2550 (m\sB/m\sA)$. {\bf Note} that this is only true in the absence of constraints.
2551
2552 \subsection{Thermodynamic integration\index{thermodynamic integration}\index{BAR}\index{Bennett's acceptance ratio}}
2553 {\gromacs} offers the possibility to integrate eq.~\ref{eq:delA} or
2554 eq. \ref{eq:delG} in one simulation over the full range from A to
2555 B. However, if the change is large and insufficient sampling can be
2556 expected, the user may prefer to determine the value of $\langle
2557 dG/d\lambda \rangle$ accurately at a number of well-chosen
2558 intermediate values of $\lambda$. This can easily be done by setting
2559 the stepsize {\tt delta_lambda} to zero. Each simulation can be
2560 equilibrated first, and a proper error estimate can be made for each
2561 value of $dG/d\lambda$ from the fluctuation of $\partial H/\partial
2562 \lambda$. The total free energy change is then determined afterward
2563 by an appropriate numerical integration procedure.
2564
2565 {\gromacs} now also supports the use of Bennett's Acceptance Ratio~\cite{Bennett1976}
2566 for calculating values of $\Delta$G for transformations from state A to state B using
2567 the program {\tt \normindex{g_bar}}. The same data can also be used to calculate free
2568 energies using MBAR~\cite{Shirts2008}, though the analysis currently requires external tools from
2569 the external {\tt pymbar} package, at https://SimTK.org/home/pymbar.
2570
2571 The $\lambda$-dependence for the force-field contributions is
2572 described in detail in section \secref{feia}.
2573 % } % Brace matches ifthenelse test for gmxlite
2574
2575 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2576 \section{Replica exchange\index{replica exchange}}
2577 Replica exchange molecular dynamics (\normindex{REMD})
2578 is a method that can be used to speed up
2579 the sampling of any type of simulation, especially if
2580 conformations are separated by relatively high energy barriers.
2581 It involves simulating multiple replicas of the same system
2582 at different temperatures and randomly exchanging the complete state
2583 of two replicas at regular intervals with the probability:
2584 \beq
2585 P(1 \leftrightarrow 2)=\min\left(1,\exp\left[
2586 \left(\frac{1}{k_B T_1} - \frac{1}{k_B T_2}\right)(U_1 - U_2)
2587  \right] \right)
2588 \eeq
2589 where $T_1$ and $T_2$ are the reference temperatures and $U_1$ and $U_2$
2590 are the instantaneous potential energies of replicas 1 and 2 respectively.
2591 After exchange the velocities are scaled by $(T_1/T_2)^{\pm0.5}$
2592 and a neighbor search is performed the next step.
2593 This combines the fast sampling and frequent barrier-crossing
2594 of the highest temperature with correct Boltzmann sampling at
2595 all the different temperatures~\cite{Hukushima96a,Sugita99}.
2596 We only attempt exchanges for neighboring temperatures as the probability
2597 decreases very rapidly with the temperature difference.
2598 One should not attempt exchanges for all possible pairs in one step.
2599 If, for instance, replicas 1 and 2 would exchange, the chance of
2600 exchange for replicas 2 and 3 not only depends on the energies of
2601 replicas 2 and 3, but also on the energy of replica 1.
2602 In {\gromacs} this is solved by attempting exchange for all ``odd''
2603 pairs on ``odd'' attempts and for all ``even'' pairs on ``even'' attempts.
2604 If we have four replicas: 0, 1, 2 and 3, ordered in temperature
2605 and we attempt exchange every 1000 steps, pairs 0-1 and 2-3
2606 will be tried at steps 1000, 3000 etc. and pair 1-2 at steps 2000, 4000 etc.
2607
2608 How should one choose the temperatures?
2609 The energy difference can be written as:
2610 \beq
2611 U_1 - U_2 =  N_{df} \frac{c}{2} k_B (T_1 - T_2)
2612 \eeq
2613 where $N_{df}$ is the total number of degrees of freedom of one replica
2614 and $c$ is 1 for harmonic potentials and around 2 for protein/water systems.
2615 If $T_2 = (1+\epsilon) T_1$ the probability becomes:
2616 \beq
2617 P(1 \leftrightarrow 2)
2618   = \exp\left( -\frac{\epsilon^2 c\,N_{df}}{2 (1+\epsilon)} \right)
2619 \approx \exp\left(-\epsilon^2 \frac{c}{2} N_{df} \right)
2620 \eeq
2621 Thus for a probability of $e^{-2}\approx 0.135$
2622 one obtains $\epsilon \approx 2/\sqrt{c\,N_{df}}$.
2623 With all bonds constrained one has $N_{df} \approx 2\, N_{atoms}$
2624 and thus for $c$ = 2 one should choose $\epsilon$ as $1/\sqrt{N_{atoms}}$.
2625 However there is one problem when using pressure coupling. The density at
2626 higher temperatures will decrease, leading to higher energy~\cite{Seibert2005a},
2627 which should be taken into account. The {\gromacs} website features a
2628 so-called ``REMD calculator,'' that lets you type in the temperature range and
2629 the number of atoms, and based on that proposes a set of temperatures.
2630
2631 An extension to the REMD for the isobaric-isothermal ensemble was
2632 proposed by Okabe {\em et al.}~\cite{Okabe2001a}. In this work the
2633 exchange probability is modified to:
2634 \beq
2635 P(1 \leftrightarrow 2)=\min\left(1,\exp\left[
2636 \left(\frac{1}{k_B T_1} - \frac{1}{k_B T_2}\right)(U_1 - U_2) +
2637 \left(\frac{P_1}{k_B T_1} - \frac{P_2}{k_B T_2}\right)\left(V_1-V_2\right)
2638  \right] \right)
2639 \eeq
2640 where $P_1$ and $P_2$ are the respective reference pressures and $V_1$ and
2641 $V_2$ are the respective instantaneous volumes in the simulations.
2642 In most cases the differences in volume are so small that the second
2643 term is negligible. It only plays a role when the difference between
2644 $P_1$ and $P_2$ is large or in phase transitions.
2645
2646 Hamiltonian replica exchange is also supported in {\gromacs}.  In
2647 Hamiltonian replica exchange, each replica has a different
2648 Hamiltonian, defined by the free energy pathway specified for the simulation.  The
2649 exchange probability to maintain the correct ensemble probabilities is:
2650 \beq P(1 \leftrightarrow 2)=\min\left(1,\exp\left[
2651     \left(\frac{1}{k_B T} - \frac{1}{k_B T}\right)((U_1(x_2) - U_1(x_1)) + (U_2(x_1) - U_2(x_2)))
2652 \right]
2653 \right)
2654 \eeq
2655 The separate Hamiltonians are defined by the free energy functionality
2656 of {\gromacs}, with swaps made between the different values of
2657 $\lambda$ defined in the mdp file.
2658
2659 Hamiltonian and temperature replica exchange can also be performed
2660 simultaneously, using the acceptance criteria:
2661 \beq
2662 P(1 \leftrightarrow 2)=\min\left(1,\exp\left[
2663 \left(\frac{1}{k_B T} - \right)(\frac{U_1(x_2) - U_1(x_1)}{k_B T_1} + \frac{U_2(x_1) - U_2(x_2)}{k_B T_2})
2664  \right] \right)
2665 \eeq
2666
2667 Gibbs sampling replica exchange has also been implemented in
2668 {\gromacs}~\cite{Chodera2011}.  In Gibbs sampling replica exchange, all
2669 possible pairs are tested for exchange, allowing swaps between
2670 replicas that are not neighbors.
2671
2672 Gibbs sampling replica exchange requires no additional potential
2673 energy calculations.  However there is an additional communication
2674 cost in Gibbs sampling replica exchange, as for some permutations,
2675 more than one round of swaps must take place.  In some cases, this
2676 extra communication cost might affect the efficiency.
2677
2678 All replica exchange variants are options of the {\tt mdrun}
2679 program. It will only work when MPI is installed, due to the inherent
2680 parallelism in the algorithm. For efficiency each replica can run on a
2681 separate rank.  See the manual page of {\tt mdrun} on how to use these
2682 multinode features.
2683
2684 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2685
2686 \section{Essential Dynamics sampling\index{essential dynamics}\index{principal component analysis}\seeindexquiet{PCA}{covariance analysis}}
2687 The results from Essential Dynamics (see \secref{covanal})
2688 of a protein can be used to guide MD simulations. The idea is that
2689 from an initial MD simulation (or from other sources) a definition of
2690 the collective fluctuations with largest amplitude is obtained. The
2691 position along one or more of these collective modes can be
2692 constrained in a (second) MD simulation in a number of ways for
2693 several purposes. For example, the position along a certain mode may
2694 be kept fixed to monitor the average force (free-energy gradient) on
2695 that coordinate in that position. Another application is to enhance
2696 sampling efficiency with respect to usual MD
2697 \cite{Degroot96a,Degroot96b}. In this case, the system is encouraged
2698 to sample its available configuration space more systematically than
2699 in a diffusion-like path that proteins usually take.
2700
2701 Another possibility to enhance sampling is \normindex{flooding}.
2702 Here a flooding potential is added to certain
2703 (collective) degrees of freedom to expel the system out
2704 of a region of phase space \cite{Lange2006a}.
2705
2706 The procedure for essential dynamics sampling or flooding is as follows.
2707 First, the eigenvectors and eigenvalues need to be determined
2708 using covariance analysis ({\tt g_covar})
2709 or normal-mode analysis ({\tt g_nmeig}).
2710 Then, this information is fed into {\tt make_edi},
2711 which has many options for selecting vectors and setting parameters,
2712 see {\tt gmx make_edi -h}.
2713 The generated {\tt edi} input file is then passed to {\tt mdrun}.
2714
2715 % } % Brace matches ifthenelse test for gmxlite
2716
2717 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2718 \section{\normindex{Expanded Ensemble}}
2719
2720 In an expanded ensemble simulation~\cite{Lyubartsev1992}, both the coordinates and the
2721 thermodynamic ensemble are treated as configuration variables that can
2722 be sampled over.  The probability of any given state can be written as:
2723 \beq
2724 P(\vec{x},k) \propto \exp\left(-\beta_k U_k + g_k\right),
2725 \eeq
2726 where $\beta_k = \frac{1}{k_B T_k}$ is the $\beta$ corresponding to the $k$th
2727 thermodynamic state, and $g_k$ is a user-specified weight factor corresponding
2728 to the $k$th state.  This space is therefore a {\em mixed}, {\em generalized}, or {\em
2729   expanded} ensemble which samples from multiple thermodynamic
2730 ensembles simultaneously. $g_k$ is chosen to give a specific weighting
2731 of each subensemble in the expanded ensemble, and can either be fixed,
2732 or determined by an iterative procedure. The set of $g_k$ is
2733 frequently chosen to give each thermodynamic ensemble equal
2734 probability, in which case $g_k$ is equal to the free energy in
2735 non-dimensional units, but they can be set to arbitrary values as
2736 desired.  Several different algorithms can be used to equilibrate
2737 these weights, described in the mdp option listings.
2738 % } % Brace matches ifthenelse test for gmxlite
2739
2740 In {\gromacs}, this space is sampled by alternating sampling in the $k$
2741 and $\vec{x}$ directions.  Sampling in the $\vec{x}$ direction is done
2742 by standard molecular dynamics sampling; sampling between the
2743 different thermodynamics states is done by Monte Carlo, with several
2744 different Monte Carlo moves supported. The $k$ states can be defined
2745 by different temperatures, or choices of the free energy $\lambda$
2746 variable, or both.  Expanded ensemble simulations thus represent a
2747 serialization of the replica exchange formalism, allowing a single
2748 simulation to explore many thermodynamic states.
2749
2750
2751
2752 \section{Parallelization\index{parallelization}}
2753 The CPU time required for a simulation can be reduced by running the simulation
2754 in parallel over more than one core.
2755 Ideally, one would want to have linear scaling: running on $N$ cores
2756 makes the simulation $N$ times faster. In practice this can only be
2757 achieved for a small number of cores. The scaling will depend
2758 a lot on the algorithms used. Also, different algorithms can have different
2759 restrictions on the interaction ranges between atoms.
2760
2761 \section{Domain decomposition\index{domain decomposition}}
2762 Since most interactions in molecular simulations are local,
2763 domain decomposition is a natural way to decompose the system.
2764 In domain decomposition, a spatial domain is assigned to each rank,
2765 which will then integrate the equations of motion for the particles
2766 that currently reside in its local domain. With domain decomposition,
2767 there are two choices that have to be made: the division of the unit cell
2768 into domains and the assignment of the forces to domains.
2769 Most molecular simulation packages use the half-shell method for assigning
2770 the forces. But there are two methods that always require less communication:
2771 the eighth shell~\cite{Liem1991} and the midpoint~\cite{Shaw2006} method.
2772 {\gromacs} currently uses the eighth shell method, but for certain systems
2773 or hardware architectures it might be advantageous to use the midpoint
2774 method. Therefore, we might implement the midpoint method in the future.
2775 Most of the details of the domain decomposition can be found
2776 in the {\gromacs} 4 paper~\cite{Hess2008b}.
2777
2778 \subsection{Coordinate and force communication}
2779 In the most general case of a triclinic unit cell,
2780 the space in divided with a 1-, 2-, or 3-D grid in parallelepipeds
2781 that we call domain decomposition cells.
2782 Each cell is assigned to a particle-particle rank.
2783 The system is partitioned over the ranks at the beginning
2784 of each MD step in which neighbor searching is performed.
2785 Since the neighbor searching is based on charge groups, charge groups
2786 are also the units for the domain decomposition.
2787 Charge groups are assigned to the cell where their center of geometry resides.
2788 Before the forces can be calculated, the coordinates from some
2789 neighboring cells need to be communicated,
2790 and after the forces are calculated, the forces need to be communicated
2791 in the other direction.
2792 The communication and force assignment is based on zones that
2793 can cover one or multiple cells.
2794 An example of a zone setup is shown in \figref{ddcells}.
2795
2796 \begin{figure}
2797 \centerline{\includegraphics[width=6cm]{plots/dd-cells}}
2798 \caption{
2799 A non-staggered domain decomposition grid of 3$\times$2$\times$2 cells.
2800 Coordinates in zones 1 to 7 are communicated to the corner cell
2801 that has its home particles in zone 0.
2802 $r_c$ is the cut-off radius.
2803 \label{fig:ddcells}
2804 }
2805 \end{figure}
2806
2807 The coordinates are communicated by moving data along the ``negative''
2808 direction in $x$, $y$ or $z$ to the next neighbor. This can be done in one
2809 or multiple pulses. In \figref{ddcells} two pulses in $x$ are required,
2810 then one in $y$ and then one in $z$. The forces are communicated by
2811 reversing this procedure. See the {\gromacs} 4 paper~\cite{Hess2008b}
2812 for details on determining which non-bonded and bonded forces
2813 should be calculated on which rank.
2814
2815 \subsection{Dynamic load balancing\swapindexquiet{dynamic}{load balancing}}
2816 When different ranks have a different computational load
2817 (load imbalance), all ranks will have to wait for the one
2818 that takes the most time. One would like to avoid such a situation.
2819 Load imbalance can occur due to three reasons:
2820 \begin{itemize}
2821 \item inhomogeneous particle distribution
2822 \item inhomogeneous interaction cost distribution (charged/uncharged,
2823   water/non-water due to {\gromacs} water innerloops)
2824 \item statistical fluctuation (only with small particle numbers)
2825 \end{itemize}
2826 So we need a dynamic load balancing algorithm
2827 where the volume of each domain decomposition cell
2828 can be adjusted {\em independently}.
2829 To achieve this, the 2- or 3-D domain decomposition grids need to be
2830 staggered. \figref{ddtric} shows the most general case in 2-D.
2831 Due to the staggering, one might require two distance checks
2832 for deciding if a charge group needs to be communicated:
2833 a non-bonded distance and a bonded distance check.
2834
2835 \begin{figure}
2836 \centerline{\includegraphics[width=7cm]{plots/dd-tric}}
2837 \caption{
2838 The zones to communicate to the rank of zone 0,
2839 see the text for details. $r_c$ and $r_b$ are the non-bonded
2840 and bonded cut-off radii respectively, $d$ is an example
2841 of a distance between following, staggered boundaries of cells.
2842 \label{fig:ddtric}
2843 }
2844 \end{figure}
2845
2846 By default, {\tt mdrun} automatically turns on the dynamic load
2847 balancing during a simulation when the total performance loss
2848 due to the force calculation imbalance is 5\% or more.
2849 {\bf Note} that the reported force load imbalance numbers might be higher,
2850 since the force calculation is only part of work that needs to be done
2851 during an integration step.
2852 The load imbalance is reported in the log file at log output steps
2853 and when the {\tt -v} option is used also on screen.
2854 The average load imbalance and the total performance loss
2855 due to load imbalance are reported at the end of the log file.
2856
2857 There is one important parameter for the dynamic load balancing,
2858 which is the minimum allowed scaling. By default, each dimension
2859 of the domain decomposition cell can scale down by at least
2860 a factor of 0.8. For 3-D domain decomposition this allows cells
2861 to change their volume by about a factor of 0.5, which should allow
2862 for compensation of a load imbalance of 100\%.
2863 The minimum allowed scaling can be changed with the {\tt -dds}
2864 option of {\tt mdrun}.
2865
2866 \subsection{Constraints in parallel\index{constraints}}
2867 \label{subsec:plincs}
2868 Since with domain decomposition parts of molecules can reside
2869 on different ranks, bond constraints can cross cell boundaries.
2870 Therefore a parallel constraint algorithm is required.
2871 {\gromacs} uses the \normindex{P-LINCS} algorithm~\cite{Hess2008a},
2872 which is the parallel version of the \normindex{LINCS} algorithm~\cite{Hess97}
2873 % \ifthenelse{\equal{\gmxlite}{1}}
2874 {.}
2875 {(see \ssecref{lincs}).}
2876 The P-LINCS procedure is illustrated in \figref{plincs}.
2877 When molecules cross the cell boundaries, atoms in such molecules
2878 up to ({\tt lincs_order + 1}) bonds away are communicated over the cell boundaries.
2879 Then, the normal LINCS algorithm can be applied to the local bonds
2880 plus the communicated ones. After this procedure, the local bonds
2881 are correctly constrained, even though the extra communicated ones are not.
2882 One coordinate communication step is required for the initial LINCS step
2883 and one for each iteration. Forces do not need to be communicated.
2884
2885 \begin{figure}
2886 \centerline{\includegraphics[width=6cm]{plots/par-lincs2}}
2887 \caption{
2888 Example of the parallel setup of P-LINCS with one molecule
2889 split over three domain decomposition cells, using a matrix
2890 expansion order of 3.
2891 The top part shows which atom coordinates need to be communicated
2892 to which cells. The bottom parts show the local constraints (solid)
2893 and the non-local constraints (dashed) for each of the three cells.
2894 \label{fig:plincs}
2895 }
2896 \end{figure}
2897
2898 \subsection{Interaction ranges}
2899 Domain decomposition takes advantage of the locality of interactions.
2900 This means that there will be limitations on the range of interactions.
2901 By default, {\tt mdrun} tries to find the optimal balance between
2902 interaction range and efficiency. But it can happen that a simulation
2903 stops with an error message about missing interactions,
2904 or that a simulation might run slightly faster with shorter
2905 interaction ranges. A list of interaction ranges
2906 and their default values is given in \tabref{dd_ranges}.
2907
2908 \begin{table}
2909 \centerline{
2910 \begin{tabular}{|c|c|ll|}
2911 \dline
2912 interaction & range & option & default \\
2913 \dline
2914 non-bonded        & $r_c$ = max($r_{\mathrm{list}}$,$r_{\mathrm{VdW}}$,$r_{\mathrm{Coul}}$) & {\tt mdp} file & \\
2915 two-body bonded   & max($r_{\mathrm{mb}}$,$r_c$) & {\tt mdrun -rdd} & starting conf. + 10\% \\
2916 multi-body bonded & $r_{\mathrm{mb}}$ & {\tt mdrun -rdd} & starting conf. + 10\% \\
2917 constraints       & $r_{\mathrm{con}}$ & {\tt mdrun -rcon} & est. from bond lengths \\
2918 virtual sites     & $r_{\mathrm{con}}$ & {\tt mdrun -rcon} & 0 \\
2919 \dline
2920 \end{tabular}
2921 }
2922 \caption{The interaction ranges with domain decomposition.}
2923 \label{tab:dd_ranges}
2924 \end{table}
2925
2926 In most cases the defaults of {\tt mdrun} should not cause the simulation
2927 to stop with an error message of missing interactions.
2928 The range for the bonded interactions is determined from the distance
2929 between bonded charge-groups in the starting configuration, with 10\% added
2930 for headroom. For the constraints, the value of $r_{\mathrm{con}}$ is determined by
2931 taking the maximum distance that ({\tt lincs_order + 1}) bonds can cover
2932 when they all connect at angles of 120 degrees.
2933 The actual constraint communication is not limited by $r_{\mathrm{con}}$,
2934 but by the minimum cell size $L_C$, which has the following lower limit:
2935 \beq
2936 L_C \geq \max(r_{\mathrm{mb}},r_{\mathrm{con}})
2937 \eeq
2938 Without dynamic load balancing the system is actually allowed to scale
2939 beyond this limit when pressure scaling is used.
2940 {\bf Note} that for triclinic boxes, $L_C$ is not simply the box diagonal
2941 component divided by the number of cells in that direction,
2942 rather it is the shortest distance between the triclinic cells borders.
2943 For rhombic dodecahedra this is a factor of $\sqrt{3/2}$ shorter
2944 along $x$ and $y$.
2945
2946 When $r_{\mathrm{mb}} > r_c$, {\tt mdrun} employs a smart algorithm to reduce
2947 the communication. Simply communicating all charge groups within
2948 $r_{\mathrm{mb}}$ would increase the amount of communication enormously.
2949 Therefore only charge-groups that are connected by bonded interactions
2950 to charge groups which are not locally present are communicated.
2951 This leads to little extra communication, but also to a slightly
2952 increased cost for the domain decomposition setup.
2953 In some cases, {\eg} coarse-grained simulations with a very short cut-off,
2954 one might want to set $r_{\mathrm{mb}}$ by hand to reduce this cost.
2955
2956 \subsection{Multiple-Program, Multiple-Data PME parallelization\index{PME}}
2957 \label{subsec:mpmd_pme}
2958 Electrostatics interactions are long-range, therefore special
2959 algorithms are used to avoid summation over many atom pairs.
2960 In {\gromacs} this is usually
2961 % \ifthenelse{\equal{\gmxlite}{1}}
2962 {.}
2963 {PME (\secref{pme}).}
2964 Since with PME all particles interact with each other, global communication
2965 is required. This will usually be the limiting factor for
2966 scaling with domain decomposition.
2967 To reduce the effect of this problem, we have come up with
2968 a Multiple-Program, Multiple-Data approach~\cite{Hess2008b}.
2969 Here, some ranks are selected to do only the PME mesh calculation,
2970 while the other ranks, called particle-particle (PP) ranks,
2971 do all the rest of the work.
2972 For rectangular boxes the optimal PP to PME rank ratio is usually 3:1,
2973 for rhombic dodecahedra usually 2:1.
2974 When the number of PME ranks is reduced by a factor of 4, the number
2975 of communication calls is reduced by about a factor of 16.
2976 Or put differently, we can now scale to 4 times more ranks.
2977 In addition, for modern 4 or 8 core machines in a network,
2978 the effective network bandwidth for PME is quadrupled,
2979 since only a quarter of the cores will be using the network connection
2980 on each machine during the PME calculations.
2981
2982 \begin{figure}
2983 \centerline{\includegraphics[width=12cm]{plots/mpmd-pme}}
2984 \caption{
2985 Example of 8 ranks without (left) and with (right) MPMD.
2986 The PME communication (red arrows) is much higher on the left
2987 than on the right. For MPMD additional PP - PME coordinate
2988 and force communication (blue arrows) is required,
2989 but the total communication complexity is lower.
2990 \label{fig:mpmd_pme}
2991 }
2992 \end{figure}
2993
2994 {\tt mdrun} will by default interleave the PP and PME ranks.
2995 If the ranks are not number consecutively inside the machines,
2996 one might want to use {\tt mdrun -ddorder pp_pme}.
2997 For machines with a real 3-D torus and proper communication software
2998 that assigns the ranks accordingly one should use
2999 {\tt mdrun -ddorder cartesian}.
3000
3001 To optimize the performance one should usually set up the cut-offs
3002 and the PME grid such that the PME load is 25 to 33\% of the total
3003 calculation load. {\tt grompp} will print an estimate for this load
3004 at the end and also {\tt mdrun} calculates the same estimate
3005 to determine the optimal number of PME ranks to use.
3006 For high parallelization it might be worthwhile to optimize
3007 the PME load with the {\tt mdp} settings and/or the number
3008 of PME ranks with the {\tt -npme} option of {\tt mdrun}.
3009 For changing the electrostatics settings it is useful to know
3010 the accuracy of the electrostatics remains nearly constant
3011 when the Coulomb cut-off and the PME grid spacing are scaled
3012 by the same factor.
3013 {\bf Note} that it is usually better to overestimate than to underestimate
3014 the number of PME ranks, since the number of PME ranks is smaller
3015 than the number of PP ranks, which leads to less total waiting time.
3016
3017 The PME domain decomposition can be 1-D or 2-D along the $x$ and/or
3018 $y$ axis. 2-D decomposition is also known as \normindex{pencil decomposition} because of
3019 the shape of the domains at high parallelization.
3020 1-D decomposition along the $y$ axis can only be used when
3021 the PP decomposition has only 1 domain along $x$. 2-D PME decomposition
3022 has to have the number of domains along $x$ equal to the number of
3023 the PP decomposition. {\tt mdrun} automatically chooses 1-D or 2-D
3024 PME decomposition (when possible with the total given number of ranks),
3025 based on the minimum amount of communication for the coordinate redistribution
3026 in PME plus the communication for the grid overlap and transposes.
3027 To avoid superfluous communication of coordinates and forces
3028 between the PP and PME ranks, the number of DD cells in the $x$
3029 direction should ideally be the same or a multiple of the number
3030 of PME ranks. By default, {\tt mdrun} takes care of this issue.
3031
3032 \subsection{Domain decomposition flow chart}
3033 In \figref{dd_flow} a flow chart is shown for domain decomposition
3034 with all possible communication for different algorithms.
3035 For simpler simulations, the same flow chart applies,
3036 without the algorithms and communication for
3037 the algorithms that are not used.
3038
3039 \begin{figure}
3040 \centerline{\includegraphics[width=12cm]{plots/flowchart}}
3041 \caption{
3042 Flow chart showing the algorithms and communication (arrows)
3043 for a standard MD simulation with virtual sites, constraints
3044 and separate PME-mesh ranks.
3045 \label{fig:dd_flow}
3046 }
3047 \end{figure}
3048
3049
3050 \section{Implicit solvation\index{implicit solvation}\index{Generalized Born methods}}
3051 \label{sec:gbsa}
3052 Implicit solvent models provide an efficient way of representing
3053 the electrostatic effects of solvent molecules, while saving a
3054 large piece of the computations involved in an accurate, aqueous
3055 description of the surrounding water in molecular dynamics simulations.
3056 Implicit solvation models offer several advantages compared with
3057 explicit solvation, including eliminating the need for the equilibration of water
3058 around the solute, and the absence of viscosity, which allows the protein
3059 to more quickly explore conformational space.
3060
3061 Implicit solvent calculations in {\gromacs} can be done using the
3062 generalized Born-formalism, and the Still~\cite{Still97}, HCT~\cite{Truhlar96},
3063 and OBC~\cite{Case04} models are available for calculating the Born radii.
3064
3065 Here, the free energy $G_{\mathrm{solv}}$ of solvation is the sum of three terms,
3066 a solvent-solvent cavity term ($G_{\mathrm{cav}}$), a solute-solvent van der
3067 Waals term ($G_{\mathrm{vdw}}$), and finally a solvent-solute electrostatics
3068 polarization term ($G_{\mathrm{pol}}$).
3069
3070 The sum of $G_{\mathrm{cav}}$ and $G_{\mathrm{vdw}}$ corresponds to the (non-polar)
3071 free energy of solvation for a molecule from which all charges
3072 have been removed, and is commonly called $G_{\mathrm{np}}$,
3073 calculated from the total solvent accessible surface area
3074 multiplied with a surface tension.
3075 The total expression for the solvation free energy then becomes:
3076
3077 \beq
3078 G_{\mathrm{solv}} = G_{\mathrm{np}}  + G_{\mathrm{pol}}
3079 \label{eqn:gb_solv}
3080 \eeq
3081
3082 Under the generalized Born model, $G_{\mathrm{pol}}$ is calculated from the generalized Born equation~\cite{Still97}:
3083
3084 \beq
3085 G_{\mathrm{pol}} = \left(1-\frac{1}{\epsilon}\right) \sum_{i=1}^n \sum_{j>i}^n \frac {q_i q_j}{\sqrt{r^2_{ij} + b_i b_j \exp\left(\frac{-r^2_{ij}}{4 b_i b_j}\right)}}
3086 \label{eqn:gb_still}
3087 \eeq
3088
3089 In {\gromacs}, we have introduced the substitution~\cite{Larsson10}:
3090
3091 \beq
3092 c_i=\frac{1}{\sqrt{b_i}}
3093 \label{eqn:gb_subst}
3094 \eeq
3095
3096 which makes it possible to introduce a cheap transformation to a new
3097 variable $x$ when evaluating each interaction, such that:
3098
3099 \beq
3100 x=\frac{r_{ij}}{\sqrt{b_i b_j }} = r_{ij} c_i c_j
3101 \label{eqn:gb_subst2}
3102 \eeq
3103
3104 In the end, the full re-formulation of~\ref{eqn:gb_still} becomes:
3105
3106 \beq
3107 G_{\mathrm{pol}} = \left(1-\frac{1}{\epsilon}\right) \sum_{i=1}^n \sum_{j>i}^n \frac{q_i q_j}{\sqrt{b_i  b_j}} ~\xi (x) = \left(1-\frac{1}{\epsilon}\right) \sum_{i=1}^n q_i c_i \sum_{j>i}^n q_j c_j~\xi (x)
3108 \label{eqn:gb_final}
3109 \eeq
3110
3111 The non-polar part ($G_{\mathrm{np}}$) of Equation~\ref{eqn:gb_solv} is calculated
3112 directly from the Born radius of each atom using a simple ACE type
3113 approximation by Schaefer {\em et al.}~\cite{Karplus98}, including a
3114 simple loop over all atoms.
3115 This requires only one extra solvation parameter, independent of atom type,
3116 but differing slightly between the three Born radii models.
3117
3118 % LocalWords:  GROningen MAchine BIOSON Groningen GROMACS Berendsen der Spoel
3119 % LocalWords:  Drunen Comp Phys Comm ROck NS FFT pbc EM ifthenelse gmxlite ff
3120 % LocalWords:  octahedra triclinic Ewald PME PPPM trjconv xy solvated
3121 % LocalWords:  boxtypes boxshapes editconf Lennard COM XTC TNG kT defunits
3122 % LocalWords:  Boltzmann's Mueller nb int mdrun chargegroup simplerc prefactor
3123 % LocalWords:  pme waterloops CH NH CO df com virial integrator Verlet vverlet
3124 % LocalWords:  integrators ref timepoint timestep timesteps mdp md vv avek NVE
3125 % LocalWords:  NVT off's leapfrogv lll LR rmfast SPC fs Nos physicality ps GMX
3126 % LocalWords:  Tcoupling nonergodic thermostatting NOSEHOOVER algorithmes ij yx
3127 % LocalWords:  Parrinello Rahman rescales atm anisotropically ccc xz zx yy yz
3128 % LocalWords:  zy zz se barostat compressibilities MTTK NPT Martyna al isobaric
3129 % LocalWords:  Tuckerman vir PV fkT iLt iL Liouville NHC Eq baro mu trj mol bc
3130 % LocalWords:  freezegroup Shannon's polarizability Overhauser barostats iLn KE
3131 % LocalWords:  negligibly thermostatted Tobias  rhombic maxwell et xtc tng TC rlist
3132 % LocalWords:  waals LINCS holonomic plincs lincs unc ang SA Langevin SD amu BD
3133 % LocalWords:  bfgs Broyden Goldfarb Shanno mkT kJ DFLEXIBLE Nocedal diag nmeig
3134 % LocalWords:  diagonalization anaeig nmens covanal ddg feia BT dp dq pV dV dA
3135 % LocalWords:  NpT eq stepsize REMD constrainted website Okabe MPI covar edi dd
3136 % LocalWords:  progman NMR ddcells innerloops ddtric tric dds rdd conf rcon est
3137 % LocalWords:  mb PP MPMD ddorder pp cartesian grompp npme parallelizable edr
3138 % LocalWords:  macromolecule nstlist vacuo parallelization dof indices MBAR AVX
3139 % LocalWords:  TOL numerics parallelized eigenvectors dG parallelepipeds VdW np
3140 % LocalWords:  Coul multi solvation HCT OBC solv cav vdw Schaefer symplectic dt
3141 % LocalWords:  pymbar multinode subensemble Monte solute subst groupconcept GPU
3142 % LocalWords:  dodecahedron octahedron dodecahedra equilibration usinggroups nm
3143 % LocalWords:  topologies rlistlong CUDA GPUs rcoulomb SIMD BlueGene FPUs erfc
3144 % LocalWords:  cutoffschemesupport unbuffered bondeds AdResS OpenMP ewald rtol
3145 % LocalWords:  verletdrift peptide RMS rescaling ergodicity ergodic discretized
3146 % LocalWords:  isothermal compressibility isotropically anisotropic iteratively
3147 % LocalWords:  incompressible integrations translational biomolecules NMA PCA
3148 % LocalWords:  Bennett's equilibrated Hamiltonians covariance equilibrate
3149 % LocalWords:  inhomogeneous conformational online other's th