algorithms.tex

   1 % This file is part of the GROMACS molecular simulation package.
   2 %
   3 % Copyright (c) 2013, by the GROMACS development team, led by
   4 % David van der Spoel, Berk Hess, Erik Lindahl, and including many
   5 % others, as listed in the AUTHORS file in the top-level source
   6 % directory and at http://www.gromacs.org.
   7 %
   8 % GROMACS is free software; you can redistribute it and/or
   9 % modify it under the terms of the GNU Lesser General Public License
  10 % as published by the Free Software Foundation; either version 2.1
  11 % of the License, or (at your option) any later version.
  12 %
  13 % GROMACS is distributed in the hope that it will be useful,
  14 % but WITHOUT ANY WARRANTY; without even the implied warranty of
  15 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
  16 % Lesser General Public License for more details.
  17 %
  18 % You should have received a copy of the GNU Lesser General Public
  19 % License along with GROMACS; if not, see
  20 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
  21 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
  22 %
  23 % If you want to redistribute modifications to GROMACS, please
  24 % consider that scientific software is very special. Version
  25 % control is crucial - bugs must be traceable. We will be happy to
  26 % consider code for inclusion in the official distribution, but
  27 % derived work must not be called official GROMACS. Details are found
  28 % in the README & COPYING files - if they are missing, get the
  29 % official version at http://www.gromacs.org.
  30 %
  31 % To help us fund GROMACS development, we humbly ask that you cite
  32 % the research papers on the package. Check out http://www.gromacs.org
  33
  34 \newcommand{\nproc}{\mbox{$M$}}
  35 \newcommand{\natom}{\mbox{$N$}}
  36 \newcommand{\nx}{\mbox{$n_x$}}
  37 \newcommand{\ny}{\mbox{$n_y$}}
  38 \newcommand{\nz}{\mbox{$n_z$}}
  39 \newcommand{\nsgrid}{NS grid}
  40 \newcommand{\fftgrid}{FFT grid}
  41 \newcommand{\dgrid}{\mbox{$\delta_{grid}$}}
  42 \newcommand{\bfv}[1]{{\mbox{\boldmath{$#1$}}}}
  43 % non-italicized boldface for math (e.g. matrices)
  44 \newcommand{\bfm}[1]{{\bf #1}}
  45 \newcommand{\dt}{\Delta t}
  46 \newcommand{\rv}{\bfv{r}}
  47 \newcommand{\vv}{\bfv{v}}
  48 \newcommand{\F}{\bfv{F}}
  49 \newcommand{\pb}{\bfv{p}}
  50 \newcommand{\veps}{v_{\epsilon}}
  51 \newcommand{\peps}{p_{\epsilon}}
  52 \newcommand{\sinhx}[1]{\frac{\sinh{\left( #1\right)}}{#1}}
  53 \chapter{Algorithms}
  54 \label{ch:algorithms}
  55 \section{Introduction}
  56 In this chapter we first give describe some general concepts used in
  57 {\gromacs}:  {\em periodic boundary conditions} (\secref{pbc})
  58 and the {\em group concept} (\secref{groupconcept}). The MD algorithm is
  59 described in \secref{MD}: first a global form of the algorithm is
  60 given, which is refined in subsequent subsections. The (simple) EM
  61 (Energy Minimization) algorithm is described in \secref{EM}. Some
  62 other algorithms for special purpose dynamics are described after
  63 this.
  64
  65 %\ifthenelse{\equal{\gmxlite}{1}}{}{
  66 %In the final \secref{par} of this chapter a few principles are
  67 %given on which parallelization of {\gromacs} is based. The
  68 %parallelization is hardly visible for the user and is therefore not
  69 %treated in detail.
  70 %} % Brace matches ifthenelse test for gmxlite
  71
  72 A few issues are of general interest. In all cases the {\em system}
  73 must be defined, consisting of molecules. Molecules again consist of
  74 particles  with defined interaction functions. The detailed
  75 description of the {\em topology} of the molecules and of the {\em force
  76 field} and the calculation of forces is given in
  77 \chref{ff}. In the present chapter we describe
  78 other aspects of the algorithm, such as pair list generation, update of
  79 velocities  and positions, coupling to external temperature and
  80 pressure,  conservation of constraints.
  81 \ifthenelse{\equal{\gmxlite}{1}}{}{
  82 The {\em analysis} of the data generated by an MD simulation is treated in \chref{analysis}.
  83 } % Brace matches ifthenelse test for gmxlite
  84
  85 \section{Periodic boundary conditions\index{periodic boundary conditions}}
  86 \label{sec:pbc}
  87 \begin{figure}
  88 \centerline{\includegraphics[width=9cm]{plots/pbctric}}
  89 \caption {Periodic boundary conditions in two dimensions.}
  90 \label{fig:pbc}
  91 \end{figure}
  92 The classical way to minimize edge effects in a finite system is to
  93 apply {\em periodic boundary conditions}. The atoms of the system to
  94 be simulated are put into a space-filling box, which is surrounded by
  95 translated copies of itself (\figref{pbc}).  Thus there are no
  96 boundaries of the system; the artifact caused by unwanted boundaries
  97 in an isolated cluster is now replaced by the artifact of periodic
  98 conditions. If the system is crystalline, such boundary conditions are
  99 desired (although motions are naturally restricted to periodic motions
 100 with wavelengths fitting into the box). If one wishes to simulate
 101 non-periodic systems, such as liquids or solutions, the periodicity by
 102 itself causes errors. The errors can be evaluated by comparing various
 103 system sizes; they are expected to be less severe than the errors
 104 resulting from an unnatural boundary with vacuum.
 105
 106 There are several possible shapes for space-filling unit cells. Some,
 107 like the {\em \normindex{rhombic dodecahedron}} and the
 108 {\em \normindex{truncated octahedron}}~\cite{Adams79} are closer to being a sphere
 109 than a cube is, and are therefore better suited to the
 110 study of an approximately spherical macromolecule in solution, since
 111 fewer solvent molecules are required to fill the box given a minimum
 112 distance between macromolecular images. At the same time, rhombic
 113 dodecahedra and truncated octahedra are special cases of {\em triclinic}
 114 unit cells\index{triclinic unit cell}; the most general space-filling unit cells
 115 that comprise all possible space-filling shapes~\cite{Bekker95}.
 116 For this reason, {\gromacs} is based on the triclinic unit cell.
 117
 118 {\gromacs} uses periodic boundary conditions, combined with the {\em
 119 \normindex{minimum image convention}}: only one -- the nearest -- image of each
 120 particle is considered for short-range non-bonded interaction terms.
 121 For long-range electrostatic interactions this is not always accurate
 122 enough, and {\gromacs} therefore also incorporates lattice sum methods
 123 such as Ewald Sum, PME and PPPM.
 124
 125 {\gromacs} supports triclinic boxes of any shape.
 126 The simulation box (unit cell) is defined by the 3 box vectors
 127 ${\bf a}$,${\bf b}$ and ${\bf c}$.
 128 The box vectors must satisfy the following conditions:
 129 \beq
 130 \label{eqn:box_rot}
 131 a_y = a_z = b_z = 0
 132 \eeq
 133 \beq
 134 \label{eqn:box_shift1}
 135 a_x>0,~~~~b_y>0,~~~~c_z>0
 136 \eeq
 137 \beq
 138 \label{eqn:box_shift2}
 139 |b_x| \leq \frac{1}{2} \, a_x,~~~~
 140 |c_x| \leq \frac{1}{2} \, a_x,~~~~
 141 |c_y| \leq \frac{1}{2} \, b_y
 142 \eeq
 143 Equations \ref{eqn:box_rot} can always be satisfied by rotating the box.
 144 Inequalities (\ref{eqn:box_shift1}) and (\ref{eqn:box_shift2}) can always be
 145 satisfied by adding and subtracting box vectors.
 146
 147 Even when simulating using a triclinic box, {\gromacs} always keeps the
 148 particles in a brick-shaped volume for efficiency,
 149 as illustrated in \figref{pbc} for a 2-dimensional system.
 150 Therefore, from the output trajectory it might seem that the simulation was
 151 done in a rectangular box. The program {\tt trjconv} can be used to convert
 152 the trajectory to a different unit-cell representation.
 153
 154 It is also possible to simulate without periodic boundary conditions,
 155 but it is usually more efficient to simulate an isolated cluster of molecules
 156 in a large periodic box, since fast grid searching can only be used
 157 in a periodic system.
 158
 159 \begin{figure}
 160 \centerline{
 161 \includegraphics[width=5cm]{plots/rhododec}
 162 ~~~~\includegraphics[width=5cm]{plots/truncoct}
 163 }
 164 \caption {A rhombic dodecahedron and truncated octahedron
 165 (arbitrary orientations).}
 166 \label{fig:boxshapes}
 167 \end{figure}
 168
 169 \subsection{Some useful box types}
 170 \begin{table}
 171 \centerline{
 172 \begin{tabular}{|c|c|c|ccc|ccc|}
 173 \dline
 174 box type & image & box & \multicolumn{3}{c|}{box vectors} & \multicolumn{3}{c|}{box vector angles} \\
 175  & distance & volume & ~{\bf a}~ & {\bf b} & {\bf c} &
 176    $\angle{\bf bc}$ & $\angle{\bf ac}$ & $\angle{\bf ab}$ \\
 177 \dline
 178              &     &       & $d$ & 0              & 0              & & & \\
 179 cubic        & $d$ & $d^3$ & 0   & $d$            & 0              & $90^\circ$ & $90^\circ$ & $90^\circ$ \\
 180              &     &       & 0   & 0              & $d$            & & & \\
 181 \hline
 182 rhombic      &     &       & $d$ & 0              & $\frac{1}{2}\,d$ & & & \\
 183 dodecahedron & $d$ & $\frac{1}{2}\sqrt{2}\,d^3$ & 0   & $d$            & $\frac{1}{2}\,d$ & $60^\circ$ & $60^\circ$ & $90^\circ$ \\
 184 (xy-square)  &     & $0.707\,d^3$ & 0   & 0              & $\frac{1}{2}\sqrt{2}\,d$ & & & \\
 185 \hline
 186 rhombic      &     &       & $d$ & $\frac{1}{2}\,d$ & $\frac{1}{2}\,d$ & & & \\
 187 dodecahedron & $d$ & $\frac{1}{2}\sqrt{2}\,d^3$ & 0 & $\frac{1}{2}\sqrt{3}\,d$ & $\frac{1}{6}\sqrt{3}\,d$ & $60^\circ$ & $60^\circ$ & $60^\circ$ \\
 188 (xy-hexagon) &     & $0.707\,d^3$ & 0   & 0              & $\frac{1}{3}\sqrt{6}\,d$ & & & \\
 189 \hline
 190 truncated    &     &       & $d$ & $\frac{1}{3}\,d$ & $-\frac{1}{3}\,d$ & & &\\
 191 octahedron   & $d$ & $\frac{4}{9}\sqrt{3}\,d^3$ & 0   & $\frac{2}{3}\sqrt{2}\,d$ & $\frac{1}{3}\sqrt{2}\,d$ & $71.53^\circ$ & $109.47^\circ$ & $71.53^\circ$ \\
 192              &     & $0.770\,d^3$ & 0   & 0              & $\frac{1}{3}\sqrt{6}\,d$ & & & \\
 193 \dline
 194 \end{tabular}
 195 }
 196 \caption{The cubic box, the rhombic \normindex{dodecahedron} and the truncated
 197 \normindex{octahedron}.}
 198 \label{tab:boxtypes}
 199 \end{table}
 200 The three most useful box types for simulations of solvated systems
 201 are described in \tabref{boxtypes}.  The rhombic dodecahedron
 202 (\figref{boxshapes}) is the smallest and most regular space-filling
 203 unit cell. Each of the 12 image cells is at the same distance.  The
 204 volume is 71\% of the volume of a cube having the same image
 205 distance. This saves about 29\% of CPU-time when simulating a
 206 spherical or flexible molecule in solvent. There are two different
 207 orientations of a rhombic dodecahedron that satisfy equations
 208 \ref{eqn:box_rot}, \ref{eqn:box_shift1} and \ref{eqn:box_shift2}.
 209 The program {\tt editconf} produces the orientation
 210 which has a square intersection with the xy-plane.  This orientation
 211 was chosen because the first two box vectors coincide with the x and
 212 y-axis, which is easier to comprehend. The other orientation can be
 213 useful for simulations of membrane proteins. In this case the
 214 cross-section with the xy-plane is a hexagon, which has an area which
 215 is 14\% smaller than the area of a square with the same image
 216 distance.  The height of the box ($c_z$) should be changed to obtain
 217 an optimal spacing.  This box shape not only saves CPU time, it
 218 also results in a more uniform arrangement of the proteins.
 219
 220 \subsection{Cut-off restrictions}
 221 The \normindex{minimum image convention} implies that the cut-off radius used to
 222 truncate non-bonded interactions may not exceed half the shortest box
 223 vector:
 224 \beq
 225 \label{eqn:physicalrc}
 226   R_c < \half \min(\|{\bf a}\|,\|{\bf b}\|,\|{\bf c}\|),
 227 \eeq
 228 because otherwise more than one image would be within the cut-off distance
 229 of the force. When a macromolecule, such as a protein, is studied in
 230 solution, this restriction alone is not sufficient: in principle, a single
 231 solvent molecule should not be able
 232 to `see' both sides of the macromolecule. This means that the length of
 233 each box vector must exceed the length of the macromolecule in the
 234 direction of that edge {\em plus} two times the cut-off radius $R_c$.
 235 It is, however, common to compromise in this respect, and make the solvent
 236 layer somewhat smaller in order to reduce the computational cost.
 237 For efficiency reasons the cut-off with triclinic boxes is more restricted.
 238 For grid search the extra restriction is weak:
 239 \beq
 240 \label{eqn:gridrc}
 241 R_c < \min(a_x,b_y,c_z)
 242 \eeq
 243 For simple search the extra restriction is stronger:
 244 \beq
 245 \label{eqn:simplerc}
 246 R_c < \half \min(a_x,b_y,c_z)
 247 \eeq
 248
 249 Each unit cell (cubic, rectangular or triclinic)
 250 is surrounded by 26 translated images. A
 251 particular image can therefore always be identified by an index pointing to one
 252 of 27 {\em translation vectors} and constructed by applying a
 253 translation with the indexed vector (see \ssecref{forces}).
 254 Restriction (\ref{eqn:gridrc}) ensures that only 26 images need to be
 255 considered.
 256
 257 %\ifthenelse{\equal{\gmxlite}{1}}{}{
 258 \section{The group concept}
 259 \label{sec:groupconcept}\index{group}
 260 The {\gromacs} MD and analysis programs use user-defined {\em groups} of
 261 atoms to perform certain actions on. The maximum number of groups is
 262 256, but each atom can only belong to six different groups, one
 263 each of the following:
 264 \begin{description}
 265 \item[temperature-coupling group \swapindex{temperature-coupling}{group}]
 266 The \normindex{temperature coupling} parameters (reference
 267 temperature, time constant, number of degrees of freedom, see
 268 \ssecref{update}) can be defined for each T-coupling group
 269 separately. For example, in a solvated macromolecule the solvent (that
 270 tends to generate more heating by force and integration errors) can be
 271 coupled with a shorter time constant to a bath than is a macromolecule,
 272 or a surface can be kept cooler than an adsorbing molecule. Many
 273 different T-coupling groups may be defined. See also center of mass
 274 groups below.
 275
 276 \item[\swapindex{freeze}{group}\index{frozen atoms}]
 277 Atoms that belong to a freeze group are kept stationary in the
 278 dynamics. This is useful during equilibration, {\eg} to avoid badly
 279 placed solvent molecules giving unreasonable kicks to protein atoms,
 280 although the same effect can also be obtained by putting a restraining
 281 potential on the atoms that must be protected. The freeze option can
 282 be used, if desired, on just one or two coordinates of an atom,
 283 thereby freezing the atoms in a plane or on a line.  When an atom is
 284 partially frozen, constraints will still be able to move it, even in a
 285 frozen direction. A fully frozen atom can not be moved by constraints.
 286 Many freeze groups can be defined.  Frozen coordinates are unaffected
 287 by pressure scaling; in some cases this can produce unwanted results,
 288 particularly when constraints are also used (in this case you will
 289 get very large pressures). Accordingly, it is recommended to avoid
 290 combining freeze groups with constraints and pressure coupling. For the
 291 sake of equilibration it could suffice to start with freezing in a
 292 constant volume simulation, and afterward use position restraints in
 293 conjunction with constant pressure.
 294
 295 \item[\swapindex{accelerate}{group}]
 296 On each atom in an ``accelerate group'' an acceleration
 297 $\ve{a}^g$ is imposed. This is equivalent to an external
 298 force. This feature makes it possible to drive the system into a
 299 non-equilibrium state and enables the performance of
 300 \swapindex{non-equilibrium}{MD} and hence to obtain transport properties.
 301
 302 \item[\swapindex{energy-monitor}{group}]
 303 Mutual interactions between all energy-monitor groups are compiled
 304 during the simulation. This is done separately for Lennard-Jones and
 305 Coulomb terms.  In principle up to 256 groups could be defined, but
 306 that would lead to 256$\times$256 items! Better use this concept
 307 sparingly.
 308
 309 All non-bonded interactions between pairs of energy-monitor groups can
 310 be excluded\index{exclusions}
 311 \ifthenelse{\equal{\gmxlite}{1}}
 312 {.}
 313 {(see \secref{mdpopt}).}
 314 Pairs of particles from excluded pairs of energy-monitor groups
 315 are not put into the pair list.
 316 This can result in a significant speedup
 317 for simulations where interactions within or between parts of the system
 318 are not required.
 319
 320 \item[\swapindex{center of mass}{group}\index{removing COM motion}]
 321 In \gromacs\ the center of mass (COM) motion can be removed, for
 322 either the complete system or for groups of atoms. The latter is
 323 useful, {\eg} for systems where there is limited friction ({\eg} gas
 324 systems) to prevent center of mass motion to occur. It makes sense to
 325 use the same groups for temperature coupling and center of mass motion
 326 removal.
 327
 328 \item[\swapindex{XTC output}{group}]
 329 In order to reduce the size of the {\tt .xtc{\index{XTC}}} trajectory file, only a subset
 330 of all particles can be stored. All XTC groups that are specified
 331 are saved, the rest is not. If no XTC groups are specified, than all
 332 atoms are saved to the {\tt .xtc} file.
 333
 334 \end{description}
 335 The use of groups in {\gromacs} tools is described in
 336 \secref{usinggroups}.
 337 %} % Brace matches ifthenelse test for gmxlite
 338
 339 \section{Molecular Dynamics}
 340 \label{sec:MD}
 341 \begin{figure}
 342 \begin{center}
 343 \addtolength{\fboxsep}{0.5cm}
 344 \begin{shadowenv}[12cm]
 345 {\large \bf THE GLOBAL MD ALGORITHM}
 346 \rule{\textwidth}{2pt} \\
 347 {\bf 1. Input initial conditions}\\[2ex]
 348 Potential interaction $V$ as a function of atom positions\\
 349 Positions $\ve{r}$ of all atoms in the system\\
 350 Velocities $\ve{v}$ of all atoms in the system \\
 351 $\Downarrow$\\
 352 \rule{\textwidth}{1pt}\\
 353 {\bf repeat 2,3,4} for the required number of steps:\\
 354 \rule{\textwidth}{1pt}\\
 355 {\bf 2. Compute forces} \\[1ex]
 356 The force on any atom  \\[1ex]
 357 $\ve{F}_i = - \displaystyle\frac{\partial V}{\partial \ve{r}_i}$ \\[1ex]
 358 is computed by calculating the force between non-bonded atom pairs: \\
 359 $\ve{F}_i = \sum_j \ve{F}_{ij}$ \\
 360 plus the forces due to bonded interactions (which may depend on 1, 2,
 361 3, or 4 atoms), plus restraining and/or external forces. \\
 362 The potential and kinetic energies and the pressure tensor are computed. \\
 363 $\Downarrow$\\
 364 {\bf 3. Update configuration} \\[1ex]
 365 The movement of the atoms is simulated by numerically solving Newton's
 366 equations of motion \\[1ex]
 367 $\displaystyle
 368 \frac {\de^2\ve{r}_i}{\de t^2} = \frac{\ve{F}_i}{m_i} $ \\
 369 or \\
 370 $\displaystyle
 371 \frac{\de\ve{r}_i}{\de t} = \ve{v}_i ; \;\;
 372 \frac{\de\ve{v}_i}{\de t} = \frac{\ve{F}_i}{m_i} $ \\[1ex]
 373 $\Downarrow$ \\
 374 {\bf 4.} if required: {\bf Output step} \\
 375 write positions, velocities, energies, temperature, pressure, etc. \\
 376 \end{shadowenv}
 377 \caption{The global MD algorithm}
 378 \label{fig:global}
 379 \end{center}
 380 \end{figure}
 381 A global flow scheme for MD is given in \figref{global}. Each
 382 MD or  EM run requires as input a set of initial coordinates and --
 383 optionally -- initial velocities of all particles involved. This
 384 chapter does not describe how these are obtained; for the setup of an
 385 actual MD run check the online manual at {\wwwpage}.
 386
 387 \subsection{Initial conditions}
 388 \subsubsection{Topology and force field}
 389 The system topology, including a description of the force field, must
 390 be read in.
 391 \ifthenelse{\equal{\gmxlite}{1}}
 392 {.}
 393 {Force fields and topologies are described in \chref{ff}
 394 and \ref{ch:top}, respectively.}
 395 All this information is static; it is never modified during the run.
 396
 397 \subsubsection{Coordinates and velocities}
 398 \begin{figure}
 399 \centerline{\includegraphics[width=8cm]{plots/maxwell}}
 400 \caption{A Maxwell-Boltzmann velocity distribution, generated from
 401     random numbers.}
 402 \label{fig:maxwell}
 403 \end{figure}
 404
 405 Then, before a run starts, the box size and the coordinates and
 406 velocities of  all particles are required. The box size and shape is
 407 determined by three vectors (nine numbers) $\ve{b}_1, \ve{b}_2, \ve{b}_3$,
 408 which represent the three basis vectors of the periodic box.
 409
 410 If the run starts at $t=t_0$, the coordinates at $t=t_0$ must be
 411 known. The {\em leap-frog algorithm}, the default algorithm used to
 412 update the time step with $\Dt$ (see \ssecref{update}), also requires
 413 that the velocities at $t=t_0 - \hDt$ are known. If velocities are not
 414 available, the program can generate initial atomic velocities
 415 $v_i, i=1\ldots 3N$ with a \index{Maxwell-Boltzmann distribution}
 416 (\figref{maxwell}) at a given absolute temperature $T$:
 417 \beq
 418 p(v_i) = \sqrt{\frac{m_i}{2 \pi kT}}\exp\left(-\frac{m_i v_i^2}{2kT}\right)
 419 \eeq
 420 where $k$ is Boltzmann's constant (see \chref{defunits}).
 421 To accomplish this, normally distributed random numbers are generated
 422 by adding twelve random numbers $R_k$ in the range $0 \le R_k < 1$ and
 423 subtracting 6.0 from their sum. The result is then multiplied by the
 424 standard deviation of the velocity distribution $\sqrt{kT/m_i}$. Since
 425 the resulting total energy will not correspond exactly to the required
 426 temperature $T$, a correction is made: first the center-of-mass motion
 427 is removed and then all velocities are scaled so that the total
 428 energy corresponds exactly to $T$ (see \eqnref{E-T}).
 429 % Why so complicated? What's wrong with Box-Mueller transforms?
 430
 431 \subsubsection{Center-of-mass motion\index{removing COM motion}}
 432 The \swapindex{center-of-mass}{velocity} is normally set to zero at
 433 every step; there is (usually) no net external force acting on the
 434 system and the center-of-mass velocity should remain constant. In
 435 practice, however, the update algorithm introduces a very slow change in
 436 the center-of-mass velocity, and therefore in the total kinetic energy of
 437 the system -- especially when temperature coupling is used. If such
 438 changes are not quenched, an appreciable center-of-mass motion
 439 can develop in long runs, and the temperature will be
 440 significantly misinterpreted. Something similar may happen due to overall
 441 rotational motion, but only when an isolated cluster is simulated. In
 442 periodic systems with filled boxes, the overall rotational motion is
 443 coupled to other degrees of freedom and does not cause such problems.
 444
 445
 446 \subsection{Neighbor searching\swapindexquiet{neighbor}{searching}}
 447 \label{subsec:ns}
 448 As mentioned in \chref{ff}, internal forces are
 449 either generated from fixed (static) lists, or from dynamic lists.
 450 The latter consist of non-bonded interactions between any pair of particles.
 451 When calculating the non-bonded forces, it is convenient to have all
 452 particles in a rectangular box.
 453 As shown in \figref{pbc}, it is possible to transform a
 454 triclinic box into a rectangular box.
 455 The output coordinates are always in a rectangular box, even when a
 456 dodecahedron or triclinic box was used for the simulation.
 457 Equation \ref{eqn:box_rot} ensures that we can reset particles
 458 in a rectangular box by first shifting them with
 459 box vector ${\bf c}$, then with ${\bf b}$ and finally with ${\bf a}$.
 460 Equations \ref{eqn:box_shift2}, \ref{eqn:physicalrc} and \ref{eqn:gridrc}
 461 ensure that we can find the 14 nearest triclinic images within
 462 a linear combination that does not involve multiples of box vectors.
 463
 464 \subsubsection{Pair lists generation}
 465 The non-bonded pair forces need to be calculated only for those pairs
 466 $i,j$  for which the distance $r_{ij}$ between $i$ and the
 467 \swapindex{nearest}{image}
 468 of $j$ is less than a given cut-off radius $R_c$. Some of the particle
 469 pairs that fulfill this criterion are excluded, when their interaction
 470 is already fully accounted for by bonded interactions.  {\gromacs}
 471 employs a {\em pair list} that contains those particle pairs for which
 472 non-bonded forces must be calculated.  The pair list contains atoms
 473 $i$, a displacement vector for atom $i$, and all particles $j$ that
 474 are within \verb'rlist' of this particular image of atom $i$.  The
 475 list is updated every \verb'nstlist' steps, where \verb'nstlist' is
 476 typically 10. There is an option to calculate the total non-bonded
 477 force on each particle due to all particle in a shell around the
 478 list cut-off, {\ie} at a distance between \verb'rlist' and
 479 \verb'rlistlong'.  This force is calculated during the pair list update
 480 and  retained during \verb'nstlist' steps.
 481
 482 To make the \normindex{neighbor list}, all particles that are close
 483 ({\ie} within the neighbor list cut-off) to a given particle must be found.
 484 This searching, usually called neighbor search (NS) or pair search,
 485 involves periodic boundary conditions and determining the {\em image}
 486 (see \secref{pbc}). The search algorithm is $O(N)$, although a simpler
 487 $O(N^2)$ algorithm is still available under some conditions.
 488
 489 \subsubsection{\normindex{Cut-off schemes}: group versus Verlet}
 490 From version 4.6, {\gromacs} supports two different cut-off scheme
 491 setups: the original one based on atom groups and one using a Verlet
 492 buffer. There are some important differences that affect results,
 493 performance and feature support. The group scheme can be made to work
 494 (almost) like the Verlet scheme, but this will lead to a decrease in
 495 performance. The group scheme is especially fast for water molecules,
 496 which are abundant in many simulations.
 497
 498 In the group scheme, a neighbor list is generated consisting of pairs
 499 of groups of at least one atom. These groups were originally
 500 \swapindex{charge}{group}s \ifthenelse{\equal{\gmxlite}{1}}{}{(see
 501   \secref{chargegroup})}, but with a proper treatment of long-range
 502 electrostatics, performance is their only advantage. A pair of groups
 503 is put into the neighbor list when their center of geometry is within
 504 the cut-off distance. Interactions between all atom pairs (one from
 505 each charge group) are calculated for a certain number of MD steps,
 506 until the neighbor list is updated. This setup is efficient, as the
 507 neighbor search only checks distance between charge group pair, not
 508 atom pairs (saves a factor of $3 \times 3 = 9$ with a three-atom water
 509 model) and the non-bonded force kernels can be optimized for, say, a
 510 water molecule ``group''. Without explicit buffering, this setup leads
 511 to energy drift as some atom pairs which are within the cut-off don't
 512 interact and some outside the cut-off do interact. This can be caused
 513 by
 514 \begin{itemize}
 515 \item atoms moving across the cut-off between neighbor search steps, and/or
 516 \item for charge groups consisting of more than one atom, atom pairs
 517   moving in/out of the cut-off when their charge group center of
 518   geometry distance is outside/inside of the cut-off.
 519 \end{itemize}
 520 Explicitly adding a buffer to the neighbor list will remove such
 521 artifacts, but this comes at a high computational cost. How severe the
 522 artifacts are depends on the system, the properties in which you are
 523 interested, and the cut-off setup.
 524
 525 The Verlet cut-off scheme uses a buffered pair list by default. It
 526 also uses clusters of atoms, but these are not static as in the group
 527 scheme. Rather, the clusters are defined spatially and consist of 4 or
 528 8 atoms, which is convenient for stream computing, using e.g. SSE, AVX
 529 or CUDA on GPUs. At neighbor search steps, an atom pair list (or
 530 cluster pair list, but that's an implementation detail) is created
 531 with a Verlet buffer. Thus the pair-list cut-off is larger than the
 532 interaction cut-off. In the non-bonded force kernels, forces are only
 533 added when an atom pair is within the cut-off distance at that
 534 particular time step. This ensures that as atoms move between pair
 535 search steps, forces between nearly all atoms within the cut-off
 536 distance are calculated. We say {\em nearly} all atoms, because
 537 {\gromacs} uses a fixed pair list update frequency for
 538 efficiency. There is a small chance that an atom pair distance is
 539 decreased to within the cut-off in this fixed number of steps. This
 540 small chance results in a small energy drift. When temperature
 541 coupling is used, the buffer size can be determined automatically,
 542 given a certain limit on the energy drift.
 543
 544 The Verlet scheme specific settings in the {\tt mdp} file are:
 545 \begin{verbatim}
 546 cutoff-scheme        = Verlet
 547 verlet-buffer-drift  = 0.005
 548 \end{verbatim}
 549 The Verlet buffer size is determined from the latter option, which is
 550 by default set to 0.005 kJ/mol/ps energy drift per atom. Note that the
 551 total energy drift in the system is affected by many factors and it is
 552 usually much smaller than this default setting for the estimate. For
 553 constant energy (NVE) simulations, this drift should be set to -1 and
 554 a buffer has to be set manually by specifying {\tt rlist} $>$ {\tt
 555   rcoulomb}. The simplest way to get a reasonable buffer size is to
 556 use an NVT {\tt mdp} file with the target temperature set to what you
 557 expect in your NVE simulation, and transfer the buffer size printed by
 558 {\tt grompp} to your NVE {\tt mdp} file.
 559
 560 The Verlet cut-off scheme is implemented in a very efficient fashion
 561 based on clusters of particles. The simplest example is a cluster size
 562 of 4 particles. The pair list is then constructed based on cluster
 563 pairs. The cluster-pair search is much faster searching based on
 564 particle pairs, because $4 \times 4 = 16$ particle pairs are put in
 565 the list at once. The non-bonded force calculation kernel can then
 566 calculate all 16 particle-pair interactions at once, which maps nicely
 567 to SIMD units which can perform multiple floating operations at once
 568 (e.g. SSE, AVX, CUDA on GPUs, BlueGene FPUs). These non-bonded kernels
 569 are much faster than the kernels used in the group scheme for most
 570 types of systems, except for water molecules when not using a buffered
 571 pair list. This latter case is quite common for (bio-)molecular
 572 simulations, so for greatest speed, it is worth comparing the
 573 performance of both schemes.
 574
 575 As the Verlet cut-off scheme was introduced in version 4.6, not
 576 all features of the group scheme are supported yet. The Verlet scheme
 577 supports a few new features which the group scheme does not support.
 578 A list of features not (fully) supported in both cut-off schemes is
 579 given in \tabref{cutoffschemesupport}.
 580
 581 \begin{table}
 582 \centerline{
 583 \begin{tabular}{|l|c|c|}
 584 \dline
 585 Non-bonded interaction feature    & group & Verlet \\
 586 \dline
 587 unbuffered cut-off scheme         & $\surd$ & \\
 588 exact cut-off                     & shift/switch & $\surd$ \\
 589 shifted interactions              & force+energy & energy \\
 590 switched forces                   & $\surd$ & \\
 591 non-periodic systems              & $\surd$ & Z  + walls \\
 592 implicit solvent                  & $\surd$ & \\
 593 free energy perturbed non-bondeds & $\surd$ & \\
 594 group energy contributions        & $\surd$ & CPU (not on GPU) \\
 595 energy group exclusions           & $\surd$ & \\
 596 AdResS multi-scale                & $\surd$ & \\
 597 OpenMP multi-threading            & only PME & $\surd$ \\
 598 native GPU support                &         & $\surd$ \\
 599 \dline
 600 \end{tabular}
 601 }
 602 \caption{Differences (only) in the support of non-bonded features
 603   between the group and Verlet cut-off schemes.}
 604 \label{tab:cutoffschemesupport}
 605 \end{table}
 606
 607 \ifthenelse{\equal{\gmxlite}{1}}{}{
 608 \subsubsection{Energy drift and pair-list buffering}
 609 For a canonical ensemble, the average energy drift caused by the
 610 finite Verlet buffer size can be determined from the atomic
 611 displacements and the shape of the potential at the cut-off.
 612 %Since we are interested in the small drift regime, we will assume
 613 %#that atoms will only move within the cut-off distance in the last step,
 614 %$n_\mathrm{ps}-1$, of the pair list update interval $n_\mathrm{ps}$.
 615 %Over this number of steps the displacment of an atom with mass $m$
 616 The displacement distribution along one dimension for a freely moving
 617 particle with mass $m$ over time $t$ at temperature $T$ is Gaussian
 618 with zero mean and variance $\sigma^2 = t\,k_B T/m$. For the distance
 619 between two atoms, the variance changes to $\sigma^2 = \sigma_{12}^2 =
 620 t\,k_B T(1/m_1+1/m_2)$.  Note that in practice particles usually
 621 interact with other particles over time $t$ and therefore the real
 622 displacement distribution is much narrower.  Given a non-bonded
 623 interaction cut-off distance of $r_c$ and a pair-list cut-off
 624 $r_\ell=r_c+r_b$, we can then write the average energy drift after
 625 time $t$ for pair interactions between one particle of type 1
 626 surrounded by particles of type 2 with number density $\rho_2$, when
 627 the inter particle distance changes from $r_0$ to $r_t$, as:
 628
 629 \begin{eqnarray}
 630 \langle \Delta V \rangle \! &=&
 631 \int_{0}^{r_c} \int_{r_\ell}^\infty 4 \pi r_0^2 \rho_2 V(r_t) G\!\left(\frac{r_t-r_0}{\sigma}\right) d r_0\, d r_t \\
 632 &\approx&
 633 \int_{-\infty}^{r_c} \int_{r_\ell}^\infty 4 \pi r_0^2 \rho_2 \Big[ V'(r_c) (r_t - r_c) +
 634 \nonumber\\
 635 & &
 636 \phantom{\int_{-\infty}^{r_c} \int_{r_\ell}^\infty 4 \pi r_0^2 \rho_2 \Big[}
 637  V''(r_c)\frac{1}{2}(r_t - r_c)^2 \Big] G\!\left(\frac{r_t-r_0}{\sigma}\right) d r_0 \, d r_t\\
 638 &\approx&
 639 4 \pi (r_\ell+\sigma)^2 \rho_2
 640 \int_{-\infty}^{r_c} \int_{r_\ell}^\infty \Big[ V'(r_c) (r_t - r_c) +
 641 \nonumber\\
 642 & &
 643 \phantom{4 \pi (r_\ell+\sigma)^2 \rho_2 \int_{-\infty}^{r_c} \int_{r_\ell}^\infty \Big[}
 644 V''(r_c)\frac{1}{2}(r_t - r_c)^2 \Big] G\!\left(\frac{r_t-r_0}{\sigma}\right) d r_0 \, d r_t\\
 645 &=&
 646 4 \pi (r_\ell+\sigma)^2 \rho_2 \bigg\{
 647 \frac{1}{2}V'(r_c)\left[r_b \sigma G\!\left(\frac{r_b}{\sigma}\right) - (r_b^2+\sigma^2)E\!\left(\frac{r_b}{\sigma}\right) \right] +
 648 \nonumber\\
 649 & &
 650 \phantom{4 \pi (r_\ell+\sigma)^2 \rho_2 \bigg\{ }
 651 \frac{1}{6}V''(r_c)\left[ \sigma(r_b^2+\sigma^2)G\!\left(\frac{r_b}{\sigma}\right) - r_b(r_b^2+3\sigma^2 ) E\!\left(\frac{r_b}{\sigma}\right) \right]
 652 \bigg\}
 653 \end{eqnarray}
 654
 655 where $G$ is a Gaussian distribution with 0 mean and unit variance and
 656 $E(x)=\frac{1}{2}\mathrm{erfc}(x/\sqrt{2})$. We always want to achieve
 657 small energy drift, so $\sigma$ will be small compared to both $r_c$
 658 and $r_\ell$, thus the approximations in the equations above are good,
 659 since the Gaussian distribution decays rapidly. The energy drift needs
 660 to be averaged over all particle pair types and weighted with the
 661 particle counts. In {\gromacs} we don't allow cancellation of drift
 662 between pair types, so we average the absolute values. To obtain the
 663 average energy drift per unit time, it needs to be divided by the
 664 neighbor-list life time $t = ({\tt nstlist} - 1)\times{\tt dt}$. This
 665 function can not be inverted analytically, so we use bisection to
 666 obtain the buffer size $r_b$ for a target drift.  Again we note that
 667 in practice the drift we usually be much smaller than this estimate,
 668 as in the condensed phase particle displacements will be much smaller
 669 than for freely moving particles, which is the assumption used here.
 670
 671 When (bond) constraints are present, some particles will have fewer
 672 degrees of freedom. This will reduce the energy drift. The
 673 displacement in an arbitrary direction of a particle with 2 degrees of
 674 freedom is not Gaussian, but rather follows the complementary error
 675 function: \beq
 676 \frac{\sqrt{\pi}}{2\sqrt{2}\sigma}\,\mathrm{erfc}\left(\frac{|r|}{\sqrt{2}\,\sigma}\right)
 677 \eeq where $\sigma^2$ is again $k_B T/m$.  This distribution can no
 678 longer be integrated analytically to obtain the energy drift. But we
 679 can generate a tight upper bound using a scaled and shifted Gaussian
 680 distribution (not shown). This Gaussian distribution can then be used
 681 to calculate the energy drift as described above. We consider
 682 particles constrained, i.e. having 2 degrees of freedom or fewer, when
 683 they are connected by constraints to particles with a total mass of at
 684 least 1.5 times the mass of the particles itself. For a particle with
 685 a single constraint this would give a total mass along the constraint
 686 direction of at least 2.5, which leads to a reduction in the variance
 687 of the displacement along that direction by at least a factor of 6.25.
 688 As the Gaussian distribution decays very rapidly, this effectively
 689 removes one degree of freedom from the displacement. Multiple
 690 constraints would reduce the displacement even more, but as this gets
 691 very complex, we consider those as particles with 2 degrees of
 692 freedom.
 693
 694 There is one important implementation detail that reduces the energy
 695 drift caused by the finite Verlet buffer list size. The derivation
 696 above assumes a particle pair-list. However, the {\gromacs}
 697 implementation uses a cluster pair-list for efficiency. The pair list
 698 consists of pairs of clusters of 4 particles in most cases, also
 699 called a $4 \times 4$ list, but the list can also be $4 \times 8$ (GPU
 700 CUDA kernels and AVX 256-bit single precision kernels) or $4 \times 2$
 701 (SSE double-precision kernels). This means that the pair-list is
 702 effectively much larger than the corresponding $1 \times 1$ list. Thus
 703 slightly beyond the pair-list cut-off there will still be a large
 704 fraction of particle pairs present in the list. This fraction can be
 705 determined in a simulation and accurately estimated under some
 706 reasonable assumptions. The fraction decreases with increasing
 707 pair-list range, meaning that a smaller buffer can be used. For
 708 typical all-atom simulations with a cut-off of 0.9 nm this fraction is
 709 around 0.9, which gives a reduction in the energy drift of a factor of
 710 10. This reduction is taken into account during the automatic Verlet
 711 buffer calculation and results in a smaller buffer size.
 712
 713 \begin{figure}
 714 \centerline{\includegraphics[width=9cm]{plots/verlet-drift}}
 715 \caption {Energy drift per atom for an SPC/E water system at 300K with
 716   a time step of 2 fs and a pair-list update period of 10 steps
 717   (pair-list life time: 18 fs). PME was used with {\tt ewald-rtol} set
 718   to 10$^{-5}$; this parameter affects the shape of the potential at
 719   the cut-off. Drift estimates due to finite Verlet buffer size are
 720   shown for a $1 \times 1$ atom pair list and $4 \times 4$ atom pair
 721   list without and with (dashed line) cancellation of positive and
 722   negative drift. Real energy drift is shown for double- and
 723   single-precision simulations. Single-precision rounding errors in
 724   the SETTLE constraint algorithm cause the drift to become negative
 725   at large buffer size. Note that at zero buffer size, the real drift
 726   is small because the positive (H-H) and negative (O-H) drift
 727   cancels.}
 728 \label{fig:verletdrift}
 729 \end{figure}
 730
 731 In \figref{verletdrift} one can see that for water with a pair-list
 732 life time of 18 fs, the drift estimate is a factor of 6 higher than
 733 the real drift, or alternatively the buffer estimate is 0.024 nm too
 734 large.  This is because the protons don't move freely over 18 fs, but
 735 rather vibrate.
 736 %At a buffer size of zero there is cancellation of
 737 %drift due to repulsive (H-H) and attractive (O-H) interactions.
 738
 739 \subsubsection{Cut-off artifacts and switched interactions}
 740 With the Verlet scheme, the pair potentials are shifted to be zero at
 741 the cut-off, such that the potential is the integral of the force.
 742 Note that in the group scheme this is not possible, because no exact
 743 cut-off distance is used. There can still be energy drift from
 744 non-zero forces at the cut-off. This effect is extremely small and
 745 often not noticeable, as other integration errors may dominate. To
 746 completely avoid cut-off artifacts, the non-bonded forces can be
 747 switched exactly to zero at some distance smaller than the neighbor
 748 list cut-off (there are several ways to do this in {\gromacs}, see
 749 \secref{mod_nb_int}). One then has a buffer with the size equal to the
 750 neighbor list cut-off less the longest interaction cut-off. With the
 751 group cut-off scheme, one can then also choose to let {\tt mdrun} only
 752 update the neighbor list when required. That is when one or more
 753 particles have moved more than half the buffer size from the center of
 754 geometry of the \swapindex{charge}{group} to which they belong (see
 755 \secref{chargegroup}), as determined at the previous neighbor search.
 756 This option guarantees that there are no cut-off artifacts.  {\bf
 757   Note} that for larger systems this comes at a high computational
 758 cost, since the neighbor list update frequency will be determined by
 759 just one or two particles moving slightly beyond the half buffer
 760 length (which not even necessarily implies that the neighbor list is
 761 invalid), while 99.99\% of the particles are fine.  } % Brace matches
 762 ifthenelse test for gmxlite
 763
 764 \subsubsection{Simple search\swapindexquiet{simple}{search}}
 765 Due to \eqnsref{box_rot}{simplerc}, the vector $\rvij$
 766 connecting images within the cut-off $R_c$ can be found by constructing:
 767 \bea
 768 \ve{r}'''   & = & \ve{r}_j-\ve{r}_i \\
 769 \ve{r}''    & = & \ve{r}''' - {\bf c}*\verb'round'(r'''_z/c_z) \\
 770 \ve{r}'     & = & \ve{r}'' - {\bf b}*\verb'round'(r''_y/b_y) \\
 771 \ve{r}_{ij} & = & \ve{r}' - {\bf a}*\verb'round'(r'_x/a_x)
 772 \eea
 773 When distances between two particles in a triclinic box are needed
 774 that do not obey \eqnref{box_rot},
 775 many shifts of combinations of box vectors need to be considered to find
 776 the nearest image.
 777
 778 \ifthenelse{\equal{\gmxlite}{1}}{}{
 779
 780 \begin{figure}
 781 \centerline{\includegraphics[width=8cm]{plots/nstric}}
 782 \caption {Grid search in two dimensions. The arrows are the box vectors.}
 783 \label{fig:grid}
 784 \end{figure}
 785
 786 \subsubsection{Grid search\swapindexquiet{grid}{search}}
 787 \label{sec:nsgrid}
 788 The grid search is schematically depicted in \figref{grid}.  All
 789 particles are put on the {\nsgrid}, with the smallest spacing $\ge$
 790 $R_c/2$ in each of the directions.  In the direction of each box
 791 vector, a particle $i$ has three images. For each direction the image
 792 may be -1,0 or 1, corresponding to a translation over -1, 0 or +1 box
 793 vector. We do not search the surrounding {\nsgrid} cells for neighbors
 794 of $i$ and then calculate the image, but rather construct the images
 795 first and then search neighbors corresponding to that image of $i$.
 796 As \figref{grid} shows, some grid cells may be searched more than once
 797 for different images of $i$. This is not a problem, since, due to the
 798 minimum image convention, at most one image will ``see'' the
 799 $j$-particle.  For every particle, fewer than 125 (5$^3$) neighboring
 800 cells are searched.  Therefore, the algorithm scales linearly with the
 801 number of particles.  Although the prefactor is large, the scaling
 802 behavior makes the algorithm far superior over the standard $O(N^2)$
 803 algorithm when there are more than a few hundred particles.  The
 804 grid search is equally fast for rectangular and triclinic boxes.  Thus
 805 for most protein and peptide simulations the rhombic dodecahedron will
 806 be the preferred box shape.
 807 } % Brace matches ifthenelse test for gmxlite
 808
 809 \ifthenelse{\equal{\gmxlite}{1}}{}{
 810 \subsubsection{Charge groups}
 811 \label{sec:chargegroup}\swapindexquiet{charge}{group}%
 812 Charge groups were originally introduced to reduce cut-off artifacts
 813 of Coulomb interactions. When a plain cut-off is used, significant
 814 jumps in the potential and forces arise when atoms with (partial) charges
 815 move in and out of the cut-off radius. When all chemical moieties have
 816 a net charge of zero, these jumps can be reduced by moving groups
 817 of atoms with net charge zero, called charge groups, in and
 818 out of the neighbor list. This reduces the cut-off effects from
 819 the charge-charge level to the dipole-dipole level, which decay
 820 much faster. With the advent of full range electrostatics methods,
 821 such as particle mesh Ewald (\secref{pme}), the use of charge groups is
 822 no longer required for accuracy. It might even have a slight negative effect
 823 on the accuracy or efficiency, depending on how the neighbor list is made
 824 and the interactions are calculated.
 825
 826 But there is still an important reason for using ``charge groups'': efficiency.
 827 Where applicable, neighbor searching is carried out on the basis of
 828 charge groups which are defined in the molecular topology.
 829 If the nearest image distance between the {\em
 830 geometrical centers} of the atoms of two charge groups is less than
 831 the cut-off radius, all atom pairs between the charge groups are
 832 included in the pair list.
 833 The neighbor searching for a water system, for instance,
 834 is $3^2=9$ times faster when each molecule is treated as a charge group.
 835 Also the highly optimized water force loops (see \secref{waterloops})
 836 only work when all atoms in a water molecule form a single charge group.
 837 Currently the name {\em neighbor-search group} would be more appropriate,
 838 but the name charge group is retained for historical reasons.
 839 When developing a new force field, the advice is to use charge groups
 840 of 3 to 4 atoms for optimal performance. For all-atom force fields
 841 this is relatively easy, as one can simply put hydrogen atoms, and in some
 842 case oxygen atoms, in the same charge group as the heavy atom they
 843 are connected to; for example: CH$_3$, CH$_2$, CH, NH$_2$, NH, OH, CO$_2$, CO.
 844 } % Brace matches ifthenelse test for gmxlite
 845
 846 \subsection{Compute forces}
 847 \label{subsec:forces}
 848
 849 \subsubsection{Potential energy}
 850 When forces are computed, the \swapindex{potential}{energy} of each
 851 interaction term is computed as well. The total potential energy is
 852 summed for various contributions, such as Lennard-Jones, Coulomb, and
 853 bonded terms. It is also possible to compute these contributions for
 854 {\em energy-monitor groups} of atoms that are separately defined (see
 855 \secref{groupconcept}).
 856
 857 \subsubsection{Kinetic energy and temperature}
 858 The \normindex{temperature} is given by the total
 859 \swapindex{kinetic}{energy} of the $N$-particle system:
 860 \beq
 861 E_{kin} = \half \sum_{i=1}^N m_i v_i^2
 862 \eeq
 863 From this the absolute temperature $T$ can be computed using:
 864 \beq
 865 \half N_{df} kT = E_{kin}
 866 \label{eqn:E-T}
 867 \eeq
 868 where $k$ is Boltzmann's constant and $N_{df}$ is the number of
 869 degrees of freedom which can be computed from:
 870 \beq
 871 N_{df}  ~=~     3 N - N_c - N_{com}
 872 \eeq
 873 Here $N_c$ is the number of {\em \normindex{constraints}} imposed on the system.
 874 When performing molecular dynamics $N_{com}=3$ additional degrees of
 875 freedom must be removed, because the three
 876 center-of-mass velocities are constants of the motion, which are usually
 877 set to zero. When simulating in vacuo, the rotation around the center of mass
 878 can also be removed, in this case $N_{com}=6$.
 879 When more than one temperature-coupling group\index{temperature-coupling group} is used, the number of degrees
 880 of freedom for group $i$ is:
 881 \beq
 882 N^i_{df}  ~=~  (3 N^i - N^i_c) \frac{3 N - N_c - N_{com}}{3 N - N_c}
 883 \eeq
 884
 885 The kinetic energy can also be written as a tensor, which is necessary
 886 for pressure calculation in a triclinic system, or systems where shear
 887 forces  are imposed:
 888 \beq
 889 {\bf E}_{kin} = \half \sum_i^N m_i \vvi \otimes \vvi
 890 \eeq
 891
 892 \subsubsection{Pressure and virial}
 893 The \normindex{pressure}
 894 tensor {\bf P} is calculated from the difference between
 895 kinetic energy $E_{kin}$ and the \normindex{virial} ${\bf \Xi}$:
 896 \beq
 897 {\bf P} = \frac{2}{V} ({\bf E}_{kin}-{\bf \Xi})
 898 \label{eqn:P}
 899 \eeq
 900 where $V$ is the volume of the computational box.
 901 The scalar pressure $P$, which can be used for pressure coupling in the case
 902 of isotropic systems, is computed as:
 903 \beq
 904 P       = {\rm trace}({\bf P})/3
 905 \eeq
 906
 907 The virial ${\bf \Xi}$ tensor is defined as:
 908 \beq
 909 {\bf \Xi} = -\half \sum_{i<j} \rvij \otimes \Fvij
 910 \label{eqn:Xi}
 911 \eeq
 912
 913 \ifthenelse{\equal{\gmxlite}{1}}{}{
 914 The {\gromacs} implementation of the virial computation is described
 915 in \secref{virial}.
 916 } % Brace matches ifthenelse test for gmxlite
 917
 918
 919 \subsection{The \swapindex{leap-frog}{integrator}}
 920 \label{subsec:update}
 921 \begin{figure}
 922 \centerline{\includegraphics[width=8cm]{plots/leapfrog}}
 923 \caption[The Leap-Frog integration method.]{The Leap-Frog integration method. The algorithm is called Leap-Frog because $\ve{r}$ and $\ve{v}$ are leaping
 924 like  frogs over each other's backs.}
 925 \label{fig:leapfrog}
 926 \end{figure}
 927
 928 The default MD integrator in {\gromacs} is the so-called {\em leap-frog}
 929 algorithm~\cite{Hockney74} for the integration of the equations of
 930 motion.  When extremely accurate integration with temperature
 931 and/or pressure coupling is required, the velocity Verlet integrators
 932 are also present and may be preferable (see \ssecref{vverlet}). The leap-frog
 933 algorithm uses positions $\ve{r}$ at time $t$ and
 934 velocities $\ve{v}$ at time $t-\hDt$; it updates positions and
 935 velocities using the forces
 936 $\ve{F}(t)$ determined by the positions at time $t$ using these relations:
 937 \bea
 938 \label{eqn:leapfrogv}
 939 \ve{v}(t+\hDt)  &~=~&   \ve{v}(t-\hDt)+\frac{\Dt}{m}\ve{F}(t)   \\
 940 \ve{r}(t+\Dt)   &~=~&   \ve{r}(t)+\Dt\ve{v}(t+\hDt)
 941 \eea
 942 The algorithm is visualized in \figref{leapfrog}.
 943 It produces trajectories that are identical to the Verlet~\cite{Verlet67} algorithm,
 944 whose position-update relation is
 945 \beq
 946 \ve{r}(t+\Dt)~=~2\ve{r}(t) - \ve{r}(t-\Dt) + \frac{1}{m}\ve{F}(t)\Dt^2+O(\Dt^4)
 947 \eeq
 948 The algorithm is of third order in $\ve{r}$ and is time-reversible.
 949 See ref.~\cite{Berendsen86b} for the merits of this algorithm and comparison
 950 with other time integration algorithms.
 951
 952 The \swapindex{equations of}{motion} are modified for temperature
 953 coupling and pressure coupling, and extended to include the
 954 conservation of constraints, all of which are described below.
 955
 956 \subsection{The \swapindex{velocity Verlet}{integrator}}
 957 \label{subsec:vverlet}
 958 The velocity Verlet algorithm~\cite{Swope82} is also implemented in
 959 {\gromacs}, though it is not yet fully integrated with all sets of
 960 options.  In velocity Verlet, positions $\ve{r}$ and velocities
 961 $\ve{v}$ at time $t$ are used to integrate the equations of motion;
 962 velocities at the previous half step are not required.  \bea
 963 \label{eqn:velocityverlet1}
 964 \ve{v}(t+\hDt)  &~=~&   \ve{v}(t)+\frac{\Dt}{2m}\ve{F}(t)   \\
 965 \ve{r}(t+\Dt)   &~=~&   \ve{r}(t)+\Dt\,\ve{v}(t+\hDt) \\
 966 \ve{v}(t+\Dt)   &~=~&   \ve{v}(t+\hDt)+\frac{\Dt}{2m}\ve{F}(t+\Dt)
 967 \eea
 968 or, equivalently,
 969 \bea
 970 \label{eqn:velocityverlet2}
 971 \ve{r}(t+\Dt)   &~=~&   \ve{r}(t)+ \Dt\,\ve{v} + \frac{\Dt^2}{2m}\ve{F}(t) \\
 972 \ve{v}(t+\Dt)   &~=~&   \ve{v}(t)+ \frac{\Dt}{2m}\left[\ve{F}(t) + \ve{F}(t+\Dt)\right]
 973 \eea
 974 With no temperature or pressure coupling, and with {\em corresponding}
 975 starting points, leap-frog and velocity Verlet will generate identical
 976 trajectories, as can easily be verified by hand from the equations
 977 above.  Given a single starting file with the {\em same} starting
 978 point $\ve{x}(0)$ and $\ve{v}(0)$, leap-frog and velocity Verlet will
 979 {\em not} give identical trajectories, as leap-frog will interpret the
 980 velocities as corresponding to $t=-\hDt$, while velocity Verlet will
 981 interpret them as corresponding to the timepoint $t=0$.
 982
 983 \subsection{Understanding reversible integrators: The Trotter decomposition}
 984 To further understand the relationship between velocity Verlet and
 985 leap-frog integration, we introduce the reversible Trotter formulation
 986 of dynamics, which is also useful to understanding implementations of
 987 thermostats and barostats in {\gromacs}.
 988
 989 A system of coupled, first-order differential equations can be evolved
 990 from time $t = 0$ to time $t$ by applying the evolution operator
 991 \bea
 992 \Gamma(t) &=& \exp(iLt) \Gamma(0) \nonumber \\
 993 iL &=& \dot{\Gamma}\cdot \nabla_{\Gamma},
 994 \eea
 995 where $L$ is the Liouville operator, and $\Gamma$ is the
 996 multidimensional vector of independent variables (positions and
 997 velocities).
 998 A short-time approximation to the true operator, accurate at time $\Dt
 999 = t/P$, is applied $P$ times in succession to evolve the system as
1000 \beq
1001 \Gamma(t) = \prod_{i=1}^P \exp(iL\Dt) \Gamma(0)
1002 \eeq
1003 For NVE dynamics, the Liouville operator is
1004 \bea
1005 iL = \sum_{i=1}^{N} \vv_i \cdot \nabla_{\rv_i} + \sum_{i=1}^N \frac{1}{m_i}\F(r_i) \cdot \nabla_{\vv_i}.
1006 \eea
1007 This can be split into two additive operators
1008 \bea
1009 iL_1 &=& \sum_{i=1}^N \frac{1}{m_i}\F(r_i) \cdot \nabla_{\vv_i} \nonumber \\
1010 iL_2 &=& \sum_{i=1}^{N} \vv_i \cdot \nabla_{\rv_i}
1011 \eea
1012 Then a short-time, symmetric, and thus reversible approximation of the true dynamics will be
1013 \bea
1014 \exp(iL\Dt) = \exp(iL_2\hDt) \exp(iL_1\Dt) \exp(iL_2\hDt) + \mathcal{O}(\Dt^3).
1015 \label{eq:NVE_Trotter}
1016 \eea
1017 This corresponds to velocity Verlet integration.  The first
1018 exponential term over $\hDt$ corresponds to a velocity half-step, the
1019 second exponential term over $\Dt$ corresponds to a full velocity
1020 step, and the last exponential term over $\hDt$ is the final velocity
1021 half step.  For future times $t = n\Dt$, this becomes
1022 \bea
1023 \exp(iLn\Dt) &\approx&  \left(\exp(iL_2\hDt) \exp(iL_1\Dt) \exp(iL_2\hDt)\right)^n \nonumber \\
1024              &\approx&  \exp(iL_2\hDt) \bigg(\exp(iL_1\Dt) \exp(iL_2\Dt)\bigg)^{n-1} \nonumber \\
1025              &       &  \;\;\;\; \exp(iL_1\Dt) \exp(iL_2\hDt)
1026 \eea
1027 This formalism allows us to easily see the difference between the
1028 different flavors of Verlet integrators.  The leap-frog integrator can
1029 be seen as starting with Eq.~\ref{eq:NVE_Trotter} with the
1030 $\exp\left(iL_1 \dt\right)$ term, instead of the half-step velocity
1031 term, yielding
1032 \bea
1033 \exp(iLn\dt) &=& \exp\left(iL_1 \dt\right) \exp\left(iL_2 \Dt \right) + \mathcal{O}(\Dt^3).
1034 \eea
1035 Here, the full step in velocity is between $t-\hDt$ and $t+\hDt$,
1036 since it is a combination of the velocity half steps in velocity
1037 Verlet. For future times $t = n\Dt$, this becomes
1038 \bea
1039 \exp(iLn\dt) &\approx& \bigg(\exp\left(iL_1 \dt\right) \exp\left(iL_2 \Dt \right)  \bigg)^{n}.
1040 \eea
1041 Although at first this does not appear symmetric, as long as the full velocity
1042 step is between $t-\hDt$ and $t+\hDt$, then this is simply a way of
1043 starting velocity Verlet at a different place in the cycle.
1044
1045 Even though the trajectory and thus potential energies are identical
1046 between leap-frog and velocity Verlet, the kinetic energy and
1047 temperature will not necessarily be the same.  Standard velocity
1048 Verlet uses the velocities at the $t$ to calculate the kinetic energy
1049 and thus the temperature only at time $t$; the kinetic energy is then a sum over all particles
1050 \bea
1051 KE_{\mathrm{full}}(t) &=& \sum_i \left(\frac{1}{2m_i}\ve{v}_i(t)\right)^2 \nonumber\\
1052       &=& \sum_i \frac{1}{2m_i}\left(\frac{1}{2}\ve{v}_i(t-\hDt)+\frac{1}{2}\ve{v}_i(t+\hDt)\right)^2,
1053 \eea
1054 with the square on the {\em outside} of the average.  Standard
1055 leap-frog calculates the kinetic energy at time $t$ based on the
1056 average kinetic energies at the timesteps $t+\hDt$ and $t-\hDt$, or
1057 the sum over all particles
1058 \bea
1059 KE_{\mathrm{average}}(t) &=& \sum_i \frac{1}{2m_i}\left(\frac{1}{2}\ve{v}_i(t-\hDt)^2+\frac{1}{2}\ve{v}_i(t+\hDt)^2\right),
1060 \eea
1061 where the square is {\em inside} the average.
1062
1063 A non-standard variant of velocity Verlet which averages the kinetic
1064 energies $KE(t+\hDt)$ and $KE(t-\hDt)$, exactly like leap-frog, is also
1065 now implemented in {\gromacs} (as {\tt .mdp} file option {\tt md-vv-avek}).  Without
1066 temperature and pressure coupling, velocity Verlet with
1067 half-step-averaged kinetic energies and leap-frog will be identical up
1068 to numerical precision.  For temperature- and pressure-control schemes,
1069 however, velocity Verlet with half-step-averaged kinetic energies and
1070 leap-frog will be different, as will be discussed in the section in
1071 thermostats and barostats.
1072
1073 The half-step-averaged kinetic energy and temperature are slightly more
1074 accurate for a given step size; the difference in average kinetic
1075 energies using the half-step-averaged kinetic energies ({\em md} and
1076 {\em md-vv-avek}) will be closer to the kinetic energy obtained in the
1077 limit of small step size than will the full-step kinetic energy (using
1078 {\em md-vv}).  For NVE simulations, this difference is usually not
1079 significant, since the positions and velocities of the particles are
1080 still identical; it makes a difference in the way the the temperature
1081 of the simulations are {\em interpreted}, but {\em not} in the
1082 trajectories that are produced.  Although the kinetic energy is more
1083 accurate with the half-step-averaged method, meaning that it changes
1084 less as the timestep gets large, it is also more noisy.  The RMS deviation
1085 of the total energy of the system (sum of kinetic plus
1086 potential) in the half-step-averaged kinetic energy case will be
1087 higher (about twice as high in most cases) than the full-step kinetic
1088 energy.  The drift will still be the same, however, as again, the
1089 trajectories are identical.
1090
1091 For NVT simulations, however, there {\em will} be a difference, as
1092 discussed in the section on temperature control, since the velocities
1093 of the particles are adjusted such that kinetic energies of the
1094 simulations, which can be calculated either way, reach the
1095 distribution corresponding to the set temperature.  In this case, the
1096 three methods will not give identical results.
1097
1098 Because the velocity and position are both defined at the same time
1099 $t$ the velocity Verlet integrator can be used for some methods,
1100 especially rigorously correct pressure control methods, that are not
1101 actually possible with leap-frog.  The integration itself takes
1102 negligibly more time than leap-frog, but twice as many communication
1103 calls are currently required.  In most cases, and especially for large
1104 systems where communication speed is important for parallelization and
1105 differences between thermodynamic ensembles vanish in the $1/N$ limit,
1106 and when only NVT ensembles are required, leap-frog will likely be the
1107 preferred integrator.  For pressure control simulations where the fine
1108 details of the thermodynamics are important, only velocity Verlet
1109 allows the true ensemble to be calculated.  In either case, simulation
1110 with double precision may be required to get fine details of
1111 thermodynamics correct.
1112
1113 \subsection{Twin-range cut-offs\index{twin-range!cut-off}}
1114 To save computation time, slowly varying forces can be calculated
1115 less often than rapidly varying forces. In {\gromacs}
1116 such a \normindex{multiple time step} splitting is possible between
1117 short and long range non-bonded interactions.
1118 In {\gromacs} versions up to 4.0, an irreversible integration scheme
1119 was used which is also used by the {\gromos} simulation package:
1120 every $n$ steps the long range forces are determined and these are
1121 then also used (without modification) for the next $n-1$ integration steps
1122 in \eqnref{leapfrogv}. Such an irreversible scheme can result in bad energy
1123 conservation and, possibly, bad sampling.
1124 Since version 4.5, a leap-frog version of the reversible Trotter decomposition scheme~\cite{Tuckerman1992a} is used.
1125 In this integrator the long-range forces are determined every $n$ steps
1126 and are then integrated into the velocity in \eqnref{leapfrogv} using
1127 a time step of $\Dt_\mathrm{LR} = n \Dt$:
1128 \beq
1129 \ve{v}(t+\hDt) =
1130 \left\{ \begin{array}{lll} \displaystyle
1131   \ve{v}(t-\hDt) + \frac{1}{m}\left[\ve{F}_\mathrm{SR}(t) + n \ve{F}_\mathrm{LR}(t)\right] \Dt &,& \mathrm{step} ~\%~ n = 0  \\ \noalign{\medskip} \displaystyle
1132   \ve{v}(t-\hDt) + \frac{1}{m}\ve{F}_\mathrm{SR}(t)\Dt &,& \mathrm{step} ~\%~ n \neq 0  \\
1133 \end{array} \right.
1134 \eeq
1135
1136 The parameter $n$ is equal to the neighbor list update frequency. In
1137 4.5, the velocity Verlet version of multiple time-stepping is not yet
1138 fully implemented.
1139
1140 Several other simulation packages uses multiple time stepping for
1141 bonds and/or the PME mesh forces. In {\gromacs} we have not implemented
1142 this (yet), since we use a different philosophy. Bonds can be constrained
1143 (which is also a more sound approximation of a physical quantum
1144 oscillator), which allows the smallest time step to be increased
1145 to the larger one. This not only halves the number of force calculations,
1146 but also the update calculations. For even larger time steps, angle vibrations
1147 involving hydrogen atoms can be removed using virtual interaction
1148 \ifthenelse{\equal{\gmxlite}{1}}
1149 {sites,}
1150 {sites (see \secref{rmfast}),}
1151 which brings the shortest time step up to
1152 PME mesh update frequency of a multiple time stepping scheme.
1153
1154 As an example we show the energy conservation for integrating
1155 the equations of motion for SPC/E water at 300 K. To avoid cut-off
1156 effects, reaction-field electrostatics with $\epsilon_{RF}=\infty$ and
1157 shifted Lennard-Jones interactions are used, both with a buffer region.
1158 The long-range interactions were evaluated between 1.0 and 1.4 nm.
1159 In \figref{leapfrog} one can see that for electrostatics the Trotter scheme
1160 does an order of magnitude better up to  $\Dt_{LR}$ = 16 fs.
1161 The electrostatics depends strongly on the orientation of the water molecules,
1162 which changes rapidly.
1163 For Lennard-Jones interactions, the energy drift is linear in $\Dt_{LR}$
1164 and roughly two orders of magnitude smaller than for the electrostatics.
1165 Lennard-Jones forces are smaller than Coulomb forces and
1166 they are mainly affected by translation of water molecules, not rotation.
1167
1168 \begin{figure}
1169 \centerline{\includegraphics[width=12cm]{plots/drift-all}}
1170 \caption{Energy drift per degree of freedom in SPC/E water
1171 with twin-range cut-offs
1172 for reaction field (left) and Lennard-Jones interaction (right)
1173 as a function of the long-range time step length for the irreversible
1174 ``\gromos'' scheme and a reversible Trotter scheme.}
1175 \label{fig:twinrangeener}
1176 \end{figure}
1177
1178 \subsection{Temperature coupling\index{temperature coupling}}
1179 While direct use of molecular dynamics gives rise to the NVE (constant
1180 number, constant volume, constant energy ensemble), most quantities
1181 that we wish to calculate are actually from a constant temperature
1182 (NVT) ensemble, also called the canonical ensemble. {\gromacs} can use
1183 the {\em weak-coupling} scheme of Berendsen~\cite{Berendsen84},
1184 stochastic randomization through the Andersen
1185 thermostat~\cite{Andersen80}, the extended ensemble Nos{\'e}-Hoover
1186 scheme~\cite{Nose84,Hoover85}, or a velocity-rescaling
1187 scheme~\cite{Bussi2007a} to simulate constant temperature, with
1188 advantages of each of the schemes laid out below.
1189
1190 There are several other reasons why it might be necessary to control
1191 the temperature of the system (drift during equilibration, drift as a
1192 result of force truncation and integration errors, heating due to
1193 external or frictional forces), but this is not entirely correct to do
1194 from a thermodynamic standpoint, and in some cases only masks the
1195 symptoms (increase in temperature of the system) rather than the
1196 underlying problem (deviations from correct physics in the dynamics).
1197 For larger systems, errors in ensemble averages and structural
1198 properties incurred by using temperature control to remove slow drifts
1199 in temperature appear to be negligible, but no completely
1200 comprehensive comparisons have been carried out, and some caution must
1201 be taking in interpreting the results.
1202
1203 \subsubsection{Berendsen temperature coupling\pawsindexquiet{Berendsen}{temperature coupling}\index{weak coupling}}
1204 The Berendsen algorithm mimics weak coupling with first-order
1205 kinetics to an external heat bath with given temperature $T_0$.
1206 See ref.~\cite{Berendsen91} for a comparison with the
1207 Nos{\'e}-Hoover scheme. The effect of this algorithm is
1208 that a deviation of the system temperature from $T_0$ is slowly
1209 corrected according to:
1210 \beq
1211 \frac{\de T}{\de t} = \frac{T_0-T}{\tau}
1212 \label{eqn:Tcoupling}
1213 \eeq
1214 which means that a temperature deviation decays exponentially with a
1215 time constant $\tau$.
1216 This method of coupling has the advantage that the strength of the
1217 coupling can be varied and adapted to the user requirement: for
1218 equilibration purposes the coupling time can be taken quite short
1219 ({\eg} 0.01 ps), but for reliable equilibrium runs it can be taken much
1220 longer ({\eg} 0.5 ps) in which case it hardly influences the
1221 conservative dynamics.
1222
1223 The Berendsen thermostat suppresses the fluctuations of the kinetic
1224 energy.  This means that one does not generate a proper canonical
1225 ensemble, so rigorously, the sampling will be incorrect.  This
1226 error scales with $1/N$, so for very large systems most ensemble
1227 averages will not be affected significantly, except for the
1228 distribution of the kinetic energy itself.  However, fluctuation
1229 properties, such as the heat capacity, will be affected.  A similar
1230 thermostat which does produce a correct ensemble is the velocity
1231 rescaling thermostat~\cite{Bussi2007a} described below.
1232
1233 The heat flow into or out of the system is affected by scaling the
1234 velocities of each particle every step, or every $n_\mathrm{TC}$ steps,
1235 with a time-dependent factor $\lambda$, given by:
1236 \beq
1237 \lambda = \left[ 1 + \frac{n_\mathrm{TC} \Delta t}{\tau_T}
1238 \left\{\frac{T_0}{T(t -  \hDt)} - 1 \right\} \right]^{1/2}
1239 \label{eqn:lambda}
1240 \eeq
1241 The parameter $\tau_T$ is close, but not exactly equal, to the time constant
1242 $\tau$ of the temperature coupling (\eqnref{Tcoupling}):
1243 \beq
1244 \tau = 2 C_V \tau_T / N_{df} k
1245 \eeq
1246 where $C_V$ is the total heat capacity of the system, $k$ is Boltzmann's
1247 constant, and $N_{df}$ is the total number of degrees of freedom. The
1248 reason that $\tau \neq \tau_T$ is that the kinetic energy change
1249 caused by scaling the velocities is partly redistributed between
1250 kinetic and potential energy and hence the change in temperature is
1251 less than the scaling energy.  In practice, the ratio $\tau / \tau_T$
1252 ranges from 1 (gas) to 2 (harmonic solid) to 3 (water). When we use
1253 the term ``temperature coupling time constant,'' we mean the parameter
1254 \normindex{$\tau_T$}.
1255 {\bf Note} that in practice the scaling factor $\lambda$ is limited to
1256 the range of 0.8 $<= \lambda <=$ 1.25, to avoid scaling by very large
1257 numbers which may crash the simulation. In normal use,
1258 $\lambda$ will always be much closer to 1.0.
1259
1260 \subsubsection{Velocity-rescaling temperature coupling\pawsindexquiet{velocity-rescaling}{temperature coupling}}
1261 The velocity-rescaling thermostat~\cite{Bussi2007a} is essentially a Berendsen
1262 thermostat (see above) with an additional stochastic term that ensures
1263 a correct kinetic energy distribution by modifying it according to
1264 \beq
1265 \de K = (K_0 - K) \frac{\de t}{\tau_T} + 2 \sqrt{\frac{K K_0}{N_f}} \frac{\de W}{\sqrt{\tau_T}},
1266 \label{eqn:vrescale}
1267 \eeq
1268 where $K$ is the kinetic energy, $N_f$ the number of degrees of freedom and $\de W$ a Wiener process.
1269 There are no additional parameters, except for a random seed.
1270 This thermostat produces a correct canonical ensemble and still has
1271 the advantage of the Berendsen thermostat: first order decay of
1272 temperature deviations and no oscillations.
1273 When an $NVT$ ensemble is used, the conserved energy quantity
1274 is written to the energy and log file.
1275
1276 \subsubsection{\normindex{Andersen thermostat}}
1277 One simple way to maintain a thermostatted ensemble is to take an
1278 $NVE$ integrator and periodically re-select the velocities of the
1279 particles from a Maxwell-Boltzmann distribution.~\cite{Andersen80}.
1280 This can either be done by randomizing all the velocities
1281 simultaneously (massive collision) every $\tau_T/\Dt$ steps, or by
1282 randomizing every particle with some small probability every timestep,
1283 equal to $\Dt/\tau$, where in both cases $\Dt$ is the timestep and
1284 $\tau_T$ is a characteristic coupling time scale.
1285
1286 Because of the way constraints operate, all particles in the same
1287 constraint group must be re-randomized simultaneously.  This
1288 thermostat is also only possible with velocity Verlet algorithms,
1289 because it operates directly on the velocities at each timestep.
1290
1291 This algorithm avoids some of the ergodicity issues of other
1292 algorithms, as energy cannot flow back and forth between energetically
1293 decoupled components of the system as in velocity scaling motions.
1294 However, it can slow down the kinetics of system by randomizing
1295 correlated motions of the system, including slowing sampling when
1296 $\tau_T$ is at moderate levels (less than 10 ps). This algorithm
1297 should therefore generally not be used when examining kinetics of the
1298 system, but can avoid ergodicity problems of scaling problems when
1299 examining thermodynamic properties.
1300
1301 % \ifthenelse{\equal{\gmxlite}{1}}{}{
1302 \subsubsection{Nos{\'e}-Hoover temperature coupling\index{Nose-Hoover temperature coupling@Nos{\'e}-Hoover temperature coupling|see{temperature coupling, Nos{\'e}-Hoover}}{\index{temperature coupling Nose-Hoover@temperature coupling Nos{\'e}-Hoover}}\index{extended ensemble}}
1303
1304 The Berendsen weak-coupling algorithm is
1305 extremely efficient for relaxing a system to the target temperature,
1306 but once the system has reached equilibrium it might be more
1307 important to probe a correct canonical ensemble. This is unfortunately
1308 not the case for the weak-coupling scheme.
1309
1310 To enable canonical ensemble simulations, {\gromacs} also supports the
1311 extended-ensemble approach first proposed by Nos{\'e}~\cite{Nose84}
1312 and later modified by Hoover~\cite{Hoover85}. The system Hamiltonian is
1313 extended by introducing a thermal reservoir and a friction term in the
1314 equations of motion.  The friction force is proportional to the
1315 product of each particle's velocity and a friction parameter, $\xi$.
1316 This friction parameter (or ``heat bath'' variable) is a fully
1317 dynamic quantity with its own momentum ($p_{\xi}$) and equation of
1318 motion; the time derivative is calculated from the difference between
1319 the current kinetic energy and the reference temperature.
1320
1321 In this formulation, the particles' equations of motion in
1322 \figref{global} are replaced by:
1323 \beq
1324 \frac {\de^2\ve{r}_i}{\de t^2} = \frac{\ve{F}_i}{m_i} -
1325 \frac{p_{\xi}}{Q}\frac{\de \ve{r}_i}{\de t} ,
1326 \label{eqn:NH-eqn-of-motion}
1327 \eeq where the equation of motion for the heat bath parameter $\xi$ is:
1328 \beq \frac {\de p_{\xi}}{\de t} = \left( T - T_0 \right).  \eeq The
1329 reference temperature is denoted $T_0$, while $T$ is the current
1330 instantaneous temperature of the system. The strength of the coupling
1331 is determined by the constant $Q$ (usually called the ``mass parameter''
1332 of the reservoir) in combination with the reference
1333 temperature.~\footnote{Note that some derivations, an alternative
1334   notation $\xi_{\mathrm{alt}} = v_{\xi} = p_{\xi}/Q$ is used.}
1335
1336 The conserved quantity for the Nos{\'e}-Hoover equations of motion is not
1337 the total energy, but rather
1338 \bea
1339 H = \sum_{i=1}^{N} \frac{\pb_i}{2m_i} + U\left(\rv_1,\rv_2,\ldots,\rv_N\right) +\frac{p_{\xi}^2}{2Q} + N_fkT\xi,
1340 \eea
1341 where $N_f$ is the total number of degrees of freedom.
1342
1343 In our opinion, the mass parameter is a somewhat awkward way of
1344 describing coupling strength, especially due to its dependence on
1345 reference temperature (and some implementations even include the
1346 number of degrees of freedom in your system when defining $Q$).  To
1347 maintain the coupling strength, one would have to change $Q$ in
1348 proportion to the change in reference temperature. For this reason, we
1349 prefer to let the {\gromacs} user work instead with the period
1350 $\tau_T$ of the oscillations of kinetic energy between the system and
1351 the reservoir instead. It is directly related to $Q$ and $T_0$ via:
1352 \beq
1353 Q = \frac {\tau_T^2 T_0}{4 \pi^2}.
1354 \eeq
1355 This provides a much more intuitive way of selecting the
1356 Nos{\'e}-Hoover coupling strength (similar to the weak-coupling
1357 relaxation), and in addition $\tau_T$ is independent of system size
1358 and reference temperature.
1359
1360 It is however important to keep the difference between the
1361 weak-coupling scheme and the Nos{\'e}-Hoover algorithm in mind:
1362 Using weak coupling you get a
1363 strongly damped {\em exponential relaxation},
1364 while the Nos{\'e}-Hoover approach
1365 produces an {\em oscillatory relaxation}.
1366 The actual time it takes to relax with Nos{\'e}-Hoover coupling is
1367 several times larger than the period of the
1368 oscillations that you select. These oscillations (in contrast
1369 to exponential relaxation) also means that
1370 the time constant normally should be 4--5 times larger
1371 than the relaxation time used with weak coupling, but your
1372 mileage may vary.
1373
1374 Nos{\'e}-Hoover dynamics in simple systems such as collections of
1375 harmonic oscillators, can be {\em nonergodic}, meaning that only a
1376 subsection of phase space is ever sampled, even if the simulations
1377 were to run for infinitely long.  For this reason, the Nos{\'e}-Hoover
1378 chain approach was developed, where each of the Nos{\'e}-Hoover
1379 thermostats has its own Nos{\'e}-Hoover thermostat controlling its
1380 temperature.  In the limit of an infinite chain of thermostats, the
1381 dynamics are guaranteed to be ergodic. Using just a few chains can
1382 greatly improve the ergodicity, but recent research has shown that the
1383 system will still be nonergodic, and it is still not entirely clear
1384 what the practical effect of this~\cite{Cooke2008}. Currently, the
1385 default number of chains is 10, but this can be controlled by the
1386 user.  In the case of chains, the equations are modified in the
1387 following way to include a chain of thermostatting
1388 particles~\cite{Martyna1992}:
1389
1390 \bea
1391 \frac {\de^2\ve{r}_i}{\de t^2} &~=~& \frac{\ve{F}_i}{m_i} - \frac{p_{{\xi}_1}}{Q_1} \frac{\de \ve{r}_i}{\de t} \nonumber \\
1392 \frac {\de p_{{\xi}_1}}{\de t} &~=~& \left( T - T_0 \right) - p_{{\xi}_1} \frac{p_{{\xi}_2}}{Q_2} \nonumber \\
1393 \frac {\de p_{{\xi}_{i=2\ldots N}}}{\de t} &~=~& \left(\frac{p_{\xi_{i-1}}^2}{Q_{i-1}} -kT\right) - p_{\xi_i} \frac{p_{\xi_{i+1}}}{Q_{i+1}} \nonumber \\
1394 \frac {\de p_{\xi_N}}{\de t} &~=~& \left(\frac{p_{\xi_{N-1}}^2}{Q_{N-1}}-kT\right)
1395 \label{eqn:NH-chain-eqn-of-motion}
1396 \eea
1397 The conserved quantity for Nos{\'e}-Hoover chains is
1398 \bea
1399 H = \sum_{i=1}^{N} \frac{\pb_i}{2m_i} + U\left(\rv_1,\rv_2,\ldots,\rv_N\right) +\sum_{k=1}^M\frac{p^2_{\xi_k}}{2Q^{\prime}_k} + N_fkT\xi_1 + kT\sum_{k=2}^M \xi_k
1400 \eea
1401 The values and velocities of the Nos{\'e}-Hoover thermostat variables
1402 are generally not included in the output, as they take up a fair
1403 amount of space and are generally not important for analysis of
1404 simulations, but this can be overridden by defining the environment
1405 variable {\tt GMX_NOSEHOOVER_CHAINS}, which will print the values of all
1406 the positions and velocities of all Nos{\'e}-Hoover particles in the
1407 chain to the {\tt .edr} file.  Leap-frog simulations currently can only have
1408 Nos{\'e}-Hoover chain lengths of 1, but this will likely be updated in
1409 later version.
1410
1411 As described in the integrator section, for temperature coupling, the
1412 temperature that the algorithm attempts to match to the reference
1413 temperature is calculated differently in velocity Verlet and leap-frog
1414 dynamics.  Velocity Verlet ({\em md-vv}) uses the full-step kinetic
1415 energy, while leap-frog and {\em md-vv-avek} use the half-step-averaged
1416 kinetic energy.
1417
1418 We can examine the Trotter decomposition again to better understand
1419 the differences between these constant-temperature integrators.  In
1420 the case of Nos{\'e}-Hoover dynamics (for simplicity, using a chain
1421 with $N=1$, with more details in Ref.~\cite{Martyna1996}), we split
1422 the Liouville operator as
1423 \beq
1424 iL = iL_1 + iL_2 + iL_{\mathrm{NHC}},
1425 \eeq
1426 where
1427 \bea
1428 iL_1 &=& \sum_{i=1}^N \left[\frac{\pb_i}{m_i}\right]\cdot \frac{\partial}{\partial \rv_i} \nonumber \\
1429 iL_2 &=& \sum_{i=1}^N \F_i\cdot \frac{\partial}{\partial \pb_i} \nonumber \\
1430 iL_{\mathrm{NHC}} &=& \sum_{i=1}^N-\frac{p_{\xi}}{Q}\vv_i\cdot \nabla_{\vv_i} +\frac{p_{\xi}}{Q}\frac{\partial }{\partial \xi} + \left( T - T_0 \right)\frac{\partial }{\partial p_{\xi}}
1431 \eea
1432 For standard velocity Verlet with Nos{\'e}-Hoover temperature control, this becomes
1433 \bea
1434 \exp(iL\dt) &=& \exp\left(iL_{\mathrm{NHC}}\dt/2\right) \exp\left(iL_2 \dt/2\right) \nonumber \\
1435 &&\exp\left(iL_1 \dt\right) \exp\left(iL_2 \dt/2\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right) + \mathcal{O}(\Dt^3).
1436 \eea
1437 For half-step-averaged temperature control using {\em md-vv-avek},
1438 this decomposition will not work, since we do not have the full step
1439 temperature until after the second velocity step.  However, we can
1440 construct an alternate decomposition that is still reversible, by
1441 switching the place of the NHC and velocity portions of the
1442 decomposition:
1443 \bea
1444 \exp(iL\dt) &=& \exp\left(iL_2 \dt/2\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right)\exp\left(iL_1 \dt\right)\nonumber \\
1445 &&\exp\left(iL_{\mathrm{NHC}}\dt/2\right) \exp\left(iL_2 \dt/2\right)+ \mathcal{O}(\Dt^3)
1446 \label{eq:half_step_NHC_integrator}
1447 \eea
1448 This formalism allows us to easily see the difference between the
1449 different flavors of velocity Verlet integrator.  The leap-frog
1450 integrator can be seen as starting with
1451 Eq.~\ref{eq:half_step_NHC_integrator} just before the $\exp\left(iL_1
1452 \dt\right)$ term, yielding:
1453 \bea
1454 \exp(iL\dt) &=&  \exp\left(iL_1 \dt\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right) \nonumber \\
1455 &&\exp\left(iL_2 \dt\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right) + \mathcal{O}(\Dt^3)
1456 \eea
1457 and then using some algebra tricks to solve for some quantities are
1458 required before they are actually calculated~\cite{Holian95}.
1459
1460 % }
1461
1462 \subsubsection{Group temperature coupling}\index{temperature-coupling group}%
1463 In {\gromacs} temperature coupling can be performed on groups of
1464 atoms, typically a protein and solvent. The reason such algorithms
1465 were introduced is that energy exchange between different components
1466 is not perfect, due to different effects including cut-offs etc. If
1467 now the whole system is coupled to one heat bath, water (which
1468 experiences the largest cut-off noise) will tend to heat up and the
1469 protein will cool down. Typically 100 K differences can be obtained.
1470 With the use of proper electrostatic methods (PME) these difference
1471 are much smaller but still not negligible.  The parameters for
1472 temperature coupling in groups are given in the {\tt mdp} file.
1473 Recent investigation has shown that small temperature differences
1474 between protein and water may actually be an artifact of the way
1475 temperature is calculated when there are finite timesteps, and very
1476 large differences in temperature are likely a sign of something else
1477 seriously going wrong with the system, and should be investigated
1478 carefully~\cite{Eastwood2010}.
1479
1480 One special case should be mentioned: it is possible to temperature-couple only
1481 part of the system, leaving other parts without temperature
1482 coupling. This is done by specifying ${-1}$ for the time constant
1483 $\tau_T$ for the group that should not be thermostatted.  If only
1484 part of the system is thermostatted, the system will still eventually
1485 converge to an NVT system.  In fact, one suggestion for minimizing
1486 errors in the temperature caused by discretized timesteps is that if
1487 constraints on the water are used, then only the water degrees of
1488 freedom should be thermostatted, not protein degrees of freedom, as
1489 the higher frequency modes in the protein can cause larger deviations
1490 from the ``true'' temperature, the temperature obtained with small
1491 timesteps~\cite{Eastwood2010}.
1492
1493 \subsection{Pressure coupling\index{pressure coupling}}
1494 In the same spirit as the temperature coupling, the system can also be
1495 coupled to a ``pressure bath.'' {\gromacs} supports both the Berendsen
1496 algorithm~\cite{Berendsen84} that scales coordinates and box vectors
1497 every step, the extended-ensemble Parrinello-Rahman approach~\cite{Parrinello81,Nose83}, and for
1498 the velocity Verlet variants, the Martyna-Tuckerman-Tobias-Klein
1499 (MTTK) implementation of pressure
1500 control~\cite{Martyna1996}. Parrinello-Rahman and Berendsen can be
1501 combined with any of the temperature coupling methods above; MTTK can
1502 only be used with Nos{\'e}-Hoover temperature control.
1503
1504 \subsubsection{Berendsen pressure coupling\pawsindexquiet{Berendsen}{pressure coupling}\index{weak coupling}}
1505 \label{sec:berendsen_pressure_coupling}
1506 The Berendsen algorithm rescales the
1507 coordinates and box vectors every step, or every $n_\mathrm{PC}$ steps,
1508  with a matrix {\boldmath $\mu$},
1509 which has the effect of a first-order kinetic relaxation of the pressure
1510 towards a given reference pressure ${\bf P}_0$ according to
1511 \beq
1512 \frac{\de {\bf P}}{\de t} = \frac{{\bf P}_0-{\bf P}}{\tau_p}.
1513 \eeq
1514 The scaling matrix {\boldmath $\mu$} is given by
1515 \beq
1516 \mu_{ij}
1517 = \delta_{ij} - \frac{n_\mathrm{PC}\Delta t}{3\, \tau_p} \beta_{ij} \{P_{0ij} - P_{ij}(t) \}.
1518 \label{eqn:mu}
1519 \eeq
1520 \index{isothermal compressibility}
1521 \index{compressibility}
1522 Here, {\boldmath $\beta$} is the isothermal compressibility of the system.
1523 In most cases this will be a diagonal matrix, with equal elements on the
1524 diagonal, the value of which is generally not known.
1525 It suffices to take a rough estimate because the value of {\boldmath $\beta$}
1526 only influences the non-critical time constant of the
1527 pressure relaxation without affecting the average pressure itself.
1528 For water at 1 atm and 300 K
1529 $\beta = 4.6 \times 10^{-10}$ Pa$^{-1} = 4.6 \times 10^{-5}$ bar$^{-1}$,
1530 which is $7.6 \times 10^{-4}$ MD units (see \chref{defunits}).
1531 Most other liquids have similar values.
1532 When scaling completely anisotropically, the system has to be rotated in
1533 order to obey \eqnref{box_rot}.
1534 This rotation is approximated in first order in the scaling, which is usually
1535 less than $10^{-4}$. The actual scaling matrix {\boldmath $\mu'$} is
1536 \beq
1537 \mbox{\boldmath $\mu'$} =
1538 \left(\begin{array}{ccc}
1539 \mu_{xx} & \mu_{xy} + \mu_{yx} & \mu_{xz} + \mu_{zx} \\
1540 0        & \mu_{yy}            & \mu_{yz} + \mu_{zy} \\
1541 0        & 0                   & \mu_{zz}
1542 \end{array}\right).
1543 \eeq
1544 The velocities are neither scaled nor rotated.
1545
1546 In {\gromacs}, the Berendsen scaling can also be done isotropically,
1547 which means that instead of $\ve{P}$ a diagonal matrix with elements of size
1548 trace$(\ve{P})/3$ is used. For systems with interfaces, semi-isotropic
1549 scaling can be useful.
1550 In this case, the $x/y$-directions are scaled isotropically and the $z$
1551 direction is scaled independently. The compressibility in the $x/y$ or
1552 $z$-direction can be set to zero, to scale only in the other direction(s).
1553
1554 If you allow full anisotropic deformations and use constraints you
1555 might have to scale more slowly or decrease your timestep to avoid
1556 errors from the constraint algorithms.  It is important to note that
1557 although the Berendsen pressure control algorithm yields a simulation
1558 with the correct average pressure, it does not yield the exact NPT
1559 ensemble, and it is not yet clear exactly what errors this approximation
1560 may yield.
1561
1562 % \ifthenelse{\equal{\gmxlite}{1}}{}{
1563 \subsubsection{Parrinello-Rahman pressure coupling\pawsindexquiet{Parrinello-Rahman}{pressure coupling}}
1564
1565 In cases where the fluctuations in pressure or volume are important
1566 {\em per se} ({\eg} to calculate thermodynamic properties), especially
1567 for small systems, it may be a problem that the exact ensemble is not
1568 well defined for the weak-coupling scheme, and that it does not
1569 simulate the true NPT ensemble.
1570
1571 {\gromacs} also supports constant-pressure simulations using the
1572 Parrinello-Rahman approach~\cite{Parrinello81,Nose83}, which is similar
1573 to the Nos{\'e}-Hoover temperature coupling, and in theory gives the
1574 true NPT ensemble.  With the Parrinello-Rahman barostat, the box
1575 vectors as represented by the matrix \ve{b} obey the matrix equation
1576 of motion\footnote{The box matrix representation \ve{b} in {\gromacs}
1577 corresponds to the transpose of the box matrix representation \ve{h}
1578 in the paper by Nos{\'e} and Klein. Because of this, some of our
1579 equations will look slightly different.}
1580 \beq
1581 \frac{\de \ve{b}^2}{\de t^2}= V \ve{W}^{-1} \ve{b}'^{-1} \left( \ve{P} - \ve{P}_{ref}\right).
1582 \eeq
1583
1584 The volume of the box is denoted $V$, and $\ve{W}$ is a matrix parameter that determines
1585 the strength of the coupling. The matrices \ve{P} and \ve{P}$_{ref}$ are the
1586 current and reference pressures, respectively.
1587
1588 The equations of motion for the particles are also changed, just as
1589 for the Nos{\'e}-Hoover coupling. In most cases you would combine the
1590 Parrinello-Rahman barostat with the Nos{\'e}-Hoover
1591 thermostat, but to keep it simple we only show the Parrinello-Rahman
1592 modification here:
1593
1594 \bea \frac {\de^2\ve{r}_i}{\de t^2} & = & \frac{\ve{F}_i}{m_i} -
1595 \ve{M} \frac{\de \ve{r}_i}{\de t} , \\ \ve{M} & = & \ve{b}^{-1} \left[
1596   \ve{b} \frac{\de \ve{b}'}{\de t} + \frac{\de \ve{b}}{\de t} \ve{b}'
1597   \right] \ve{b}'^{-1}.  \eea The (inverse) mass parameter matrix
1598 $\ve{W}^{-1}$ determines the strength of the coupling, and how the box
1599 can be deformed.  The box restriction (\ref{eqn:box_rot}) will be
1600 fulfilled automatically if the corresponding elements of $\ve{W}^{-1}$
1601 are zero. Since the coupling strength also depends on the size of your
1602 box, we prefer to calculate it automatically in {\gromacs}.  You only
1603 have to provide the approximate isothermal compressibilities
1604 {\boldmath $\beta$} and the pressure time constant $\tau_p$ in the
1605 input file ($L$ is the largest box matrix element): \beq \left(
1606 \ve{W}^{-1} \right)_{ij} = \frac{4 \pi^2 \beta_{ij}}{3 \tau_p^2 L}.
1607 \eeq Just as for the Nos{\'e}-Hoover thermostat, you should realize
1608 that the Parrinello-Rahman time constant is {\em not} equivalent to
1609 the relaxation time used in the Berendsen pressure coupling algorithm.
1610 In most cases you will need to use a 4--5 times larger time constant
1611 with Parrinello-Rahman coupling. If your pressure is very far from
1612 equilibrium, the Parrinello-Rahman coupling may result in very large
1613 box oscillations that could even crash your run.  In that case you
1614 would have to increase the time constant, or (better) use the weak-coupling
1615 scheme to reach the target pressure, and then switch to
1616 Parrinello-Rahman coupling once the system is in equilibrium.
1617 Additionally, using the leap-frog algorithm, the pressure at time $t$
1618 is not available until after the time step has completed, and so the
1619 pressure from the previous step must be used, which makes the algorithm
1620 not directly reversible, and may not be appropriate for high precision
1621 thermodynamic calculations.
1622
1623 \subsubsection{Surface-tension coupling\pawsindexquiet{surface-tension}{pressure coupling}}
1624 When a periodic system consists of more than one phase, separated by
1625 surfaces which are parallel to the $xy$-plane,
1626 the surface tension and the $z$-component of the pressure can be coupled
1627 to a pressure bath. Presently, this only works with the Berendsen
1628 pressure coupling algorithm in {\gromacs}.
1629 The average surface tension $\gamma(t)$ can be calculated from
1630 the difference between the normal and the lateral pressure
1631 \bea
1632 \gamma(t) & = &
1633 \frac{1}{n} \int_0^{L_z}
1634 \left\{ P_{zz}(z,t) - \frac{P_{xx}(z,t) + P_{yy}(z,t)}{2} \right\} \mbox{d}z \\
1635 & = &
1636 \frac{L_z}{n} \left\{ P_{zz}(t) - \frac{P_{xx}(t) + P_{yy}(t)}{2} \right\},
1637 \eea
1638 where $L_z$ is the height of the box and $n$ is the number of surfaces.
1639 The pressure in the z-direction is corrected by scaling the height of
1640 the box with $\mu_z$
1641 \beq
1642 \Delta P_{zz} = \frac{\Delta t}{\tau_p} \{ P_{0zz} - P_{zz}(t) \}
1643 \eeq
1644 \beq
1645 \mu_{zz} = 1 + \beta_{zz} \Delta P_{zz}
1646 \eeq
1647 This is similar to normal pressure coupling, except that the power
1648 of $1/3$ is missing.
1649 The pressure correction in the $z$-direction is then used to get the
1650 correct convergence for the surface tension to the reference value $\gamma_0$.
1651 The correction factor for the box length in the $x$/$y$-direction is
1652 \beq
1653 \mu_{x/y} = 1 + \frac{\Delta t}{2\,\tau_p} \beta_{x/y}
1654         \left( \frac{n \gamma_0}{\mu_{zz} L_z}
1655         - \left\{ P_{zz}(t)+\Delta P_{zz} - \frac{P_{xx}(t) + P_{yy}(t)}{2} \right\}
1656         \right)
1657 \eeq
1658 The value of $\beta_{zz}$ is more critical than with normal pressure
1659 coupling. Normally an incorrect compressibility will just scale $\tau_p$,
1660 but with surface tension coupling it affects the convergence of the surface
1661 tension.
1662 When $\beta_{zz}$ is set to zero (constant box height), $\Delta P_z$ is also set
1663 to zero, which is necessary for obtaining the correct surface tension.
1664
1665 \subsubsection{MTTK pressure control algorithms}
1666
1667 As mentioned in the previous section, one weakness of leap-frog
1668 integration is in constant pressure simulations, since the pressure
1669 requires a calculation of both the virial and the kinetic energy at
1670 the full time step; for leap-frog, this information is not available
1671 until {\em after} the full timestep.  Velocity Verlet does allow the
1672 calculation, at the cost of an extra round of global communication,
1673 and can compute, mod any integration errors, the true NPT ensemble.
1674
1675 The full equations, combining both pressure coupling and temperature
1676 coupling, are taken from Martyna {\em et al.}~\cite{Martyna1996} and
1677 Tuckerman~\cite{Tuckerman2006} and are referred to here as MTTK
1678 equations (Martyna-Tuckerman-Tobias-Klein).  We introduce for
1679 convenience $\epsilon = (1/3)\ln (V/V_0)$, where $V_0$ is a reference
1680 volume.  The momentum of $\epsilon$ is $\veps = p_{\epsilon}/W =
1681 \dot{\epsilon} = \dot{V}/3V$, and define $\alpha = 1 + 3/N_{dof}$ (see
1682 Ref~\cite{Tuckerman2006})
1683
1684 The isobaric equations are
1685 \bea
1686 \dot{\rv}_i &=& \frac{\pb_i}{m_i} + \frac{\peps}{W} \rv_i \nonumber \\
1687 \frac{\dot{\pb}_i}{m_i} &=& \frac{1}{m_i}\F_i - \alpha\frac{\peps}{W} \frac{\pb_i}{m_i} \nonumber \\
1688 \dot{\epsilon} &=& \frac{\peps}{W} \nonumber \\
1689 \frac{\dot{\peps}}{W} &=& \frac{3V}{W}(P_{\mathrm{int}} - P) + (\alpha-1)\left(\sum_{n=1}^N\frac{\pb_i^2}{m_i}\right),\\
1690 \eea
1691 where
1692 \bea
1693 P_{\mathrm{int}} &=& P_{\mathrm{kin}} -P_{\mathrm{vir}} = \frac{1}{3V}\left[\sum_{i=1}^N \left(\frac{\pb_i^2}{2m_i} - \rv_i \cdot \F_i\
1694 \right)\right].
1695 \eea
1696 The terms including $\alpha$ are required to make phase space
1697 incompressible~\cite{Tuckerman2006}. The $\epsilon$ acceleration term
1698 can be rewritten as
1699 \bea
1700 \frac{\dot{\peps}}{W} &=& \frac{3V}{W}\left(\alpha P_{\mathrm{kin}} - P_{\mathrm{vir}} - P\right)
1701 \eea
1702 In terms of velocities, these equations become
1703 \bea
1704 \dot{\rv}_i &=& \vv_i + \veps \rv_i \nonumber \\
1705 \dot{\vv}_i &=& \frac{1}{m_i}\F_i - \alpha\veps \vv_i \nonumber \\
1706 \dot{\epsilon} &=& \veps \nonumber \\
1707 \dot{\veps} &=& \frac{3V}{W}(P_{\mathrm{int}} - P) + (\alpha-1)\left( \sum_{n=1}^N \frac{1}{2} m_i \vv_i^2\right)\nonumber \\
1708 P_{\mathrm{int}} &=& P_{\mathrm{kin}} - P_{\mathrm{vir}} = \frac{1}{3V}\left[\sum_{i=1}^N \left(\frac{1}{2} m_i\vv_i^2 - \rv_i \cdot \F_i\right)\right]
1709 \eea
1710 For these equations, the conserved quantity is
1711 \bea
1712 H = \sum_{i=1}^{N} \frac{\pb_i^2}{2m_i} + U\left(\rv_1,\rv_2,\ldots,\rv_N\right) + \frac{p_\epsilon}{2W} + PV
1713 \eea
1714 The next step is to add temperature control.  Adding Nos{\'e}-Hoover
1715 chains, including to the barostat degree of freedom, where we use
1716 $\eta$ for the barostat Nos{\'e}-Hoover variables, and $Q^{\prime}$
1717 for the coupling constants of the thermostats of the barostats, we get
1718 \bea
1719 \dot{\rv}_i &=& \frac{\pb_i}{m_i} + \frac{\peps}{W} \rv_i \nonumber \\
1720 \frac{\dot{\pb}_i}{m_i} &=& \frac{1}{m_i}\F_i - \alpha\frac{\peps}{W} \frac{\pb_i}{m_i} - \frac{p_{\xi_1}}{Q_1}\frac{\pb_i}{m_i}\nonumber \\
1721 \dot{\epsilon} &=& \frac{\peps}{W} \nonumber \\
1722 \frac{\dot{\peps}}{W} &=& \frac{3V}{W}(\alpha P_{\mathrm{kin}} - P_{\mathrm{vir}} - P) -\frac{p_{\eta_1}}{Q^{\prime}_1}\peps \nonumber \\
1723 \dot{\xi}_k &=& \frac{p_{\xi_k}}{Q_k} \nonumber \\
1724 \dot{\eta}_k &=& \frac{p_{\eta_k}}{Q^{\prime}_k} \nonumber \\
1725 \dot{p}_{\xi_k} &=& G_k - \frac{p_{\xi_{k+1}}}{Q_{k+1}} \;\;\;\; k=1,\ldots, M-1 \nonumber \\
1726 \dot{p}_{\eta_k} &=& G^\prime_k - \frac{p_{\eta_{k+1}}}{Q^\prime_{k+1}} \;\;\;\; k=1,\ldots, M-1 \nonumber \\
1727 \dot{p}_{\xi_M} &=& G_M \nonumber \\
1728 \dot{p}_{\eta_M} &=& G^\prime_M, \nonumber \\
1729 \eea
1730 where
1731 \bea
1732 P_{\mathrm{int}} &=& P_{\mathrm{kin}} - P_{\mathrm{vir}} = \frac{1}{3V}\left[\sum_{i=1}^N \left(\frac{\pb_i^2}{2m_i} - \rv_i \cdot \F_i\right)\right] \nonumber \\
1733 G_1  &=& \sum_{i=1}^N \frac{\pb^2_i}{m_i} - N_f kT \nonumber \\
1734 G_k  &=&  \frac{p^2_{\xi_{k-1}}}{2Q_{k-1}} - kT \;\; k = 2,\ldots,M \nonumber \\
1735 G^\prime_1 &=& \frac{\peps^2}{2W} - kT \nonumber \\
1736 G^\prime_k &=& \frac{p^2_{\eta_{k-1}}}{2Q^\prime_{k-1}} - kT \;\; k = 2,\ldots,M
1737 \eea
1738 The conserved quantity is now
1739 \bea
1740 H = \sum_{i=1}^{N} \frac{\pb_i}{2m_i} + U\left(\rv_1,\rv_2,\ldots,\rv_N\right) + \frac{p^2_\epsilon}{2W} + PV + \nonumber \\
1741 \sum_{k=1}^M\frac{p^2_{\xi_k}}{2Q_k} +\sum_{k=1}^M\frac{p^2_{\eta_k}}{2Q^{\prime}_k} + N_fkT\xi_1 +  kT\sum_{i=2}^M \xi_k + kT\sum_{k=1}^M \eta_k
1742 \eea
1743 Returning to the Trotter decomposition formalism, for pressure control and temperature control~\cite{Martyna1996} we get:
1744 \bea
1745 iL = iL_1 + iL_2 + iL_{\epsilon,1} + iL_{\epsilon,2} + iL_{\mathrm{NHC-baro}} + iL_{\mathrm{NHC}}
1746 \eea
1747 where ``NHC-baro'' corresponds to the Nos{\`e}-Hoover chain of the barostat,
1748 and NHC corresponds to the NHC of the particles,
1749 \bea
1750 iL_1 &=& \sum_{i=1}^N \left[\frac{\pb_i}{m_i} + \frac{\peps}{W}\rv_i\right]\cdot \frac{\partial}{\partial \rv_i} \\
1751 iL_2 &=& \sum_{i=1}^N \F_i - \alpha \frac{\peps}{W}\pb_i \cdot \frac{\partial}{\partial \pb_i} \\
1752 iL_{\epsilon,1} &=& \frac{p_\epsilon}{W} \frac{\partial}{\partial \epsilon}\\
1753 iL_{\epsilon,2} &=& G_{\epsilon} \frac{\partial}{\partial p_\epsilon}
1754 \eea
1755 and where
1756 \bea
1757 G_{\epsilon} = 3V\left(\alpha P_{\mathrm{kin}} - P_{\mathrm{vir}} - P\right)
1758 \eea
1759 Using the Trotter decomposition, we get
1760 \bea
1761 \exp(iL\dt) &=& \exp\left(iL_{\mathrm{NHC-baro}}\dt/2\right)\exp\left(iL_{\mathrm{NHC}}\dt/2\right) \nonumber \nonumber \\
1762 &&\exp\left(iL_{\epsilon,2}\dt/2\right) \exp\left(iL_2 \dt/2\right) \nonumber \nonumber \\
1763 &&\exp\left(iL_{\epsilon,1}\dt\right) \exp\left(iL_1 \dt\right) \nonumber \nonumber \\
1764 &&\exp\left(iL_2 \dt/2\right) \exp\left(iL_{\epsilon,2}\dt/2\right) \nonumber \nonumber \\
1765 &&\exp\left(iL_{\mathrm{NHC}}\dt/2\right)\exp\left(iL_{\mathrm{NHC-baro}}\dt/2\right) + \mathcal{O}(\dt^3)
1766 \eea
1767 The action of $\exp\left(iL_1 \dt\right)$ comes from the solution of
1768 the the differential equation
1769 $\dot{\rv}_i = \vv_i + \veps \rv_i$
1770 with $\vv_i = \pb_i/m_i$ and $\veps$ constant with initial condition
1771 $\rv_i(0)$, evaluate at $t=\Delta t$.  This yields the evolution
1772 \beq
1773 \rv_i(\dt) = \rv_i(0)e^{\veps \dt} + \Delta t \vv_i(0) e^{\veps \dt/2} \sinhx{\veps \dt/2}.
1774 \eeq
1775 The action of $\exp\left(iL_2 \dt/2\right)$ comes from the solution
1776 of the differential equation $\dot{\vv}_i = \frac{\F_i}{m_i} -
1777 \alpha\veps\vv_i$, yielding
1778 \beq
1779 \vv_i(\dt/2) = \vv_i(0)e^{-\alpha\veps \dt/2} + \frac{\Delta t}{2m_i}\F_i(0) e^{-\alpha\veps \dt/4}\sinhx{\alpha\veps \dt/4}.
1780 \eeq
1781 {\em md-vv-avek} uses the full step kinetic energies for determining the pressure with the pressure control,
1782 but the half-step-averaged kinetic energy for the temperatures, which can be written as a Trotter decomposition as
1783 \bea
1784 \exp(iL\dt) &=& \exp\left(iL_{\mathrm{NHC-baro}}\dt/2\right)\nonumber \exp\left(iL_{\epsilon,2}\dt/2\right) \exp\left(iL_2 \dt/2\right) \nonumber \\
1785 &&\exp\left(iL_{\mathrm{NHC}}\dt/2\right) \exp\left(iL_{\epsilon,1}\dt\right) \exp\left(iL_1 \dt\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right) \nonumber \\
1786 &&\exp\left(iL_2 \dt/2\right) \exp\left(iL_{\epsilon,2}\dt/2\right) \exp\left(iL_{\mathrm{NHC-baro}}\dt/2\right) + \mathcal{O}(\dt^3)
1787 \eea
1788 With constraints, the equations become significantly more
1789 complicated, in that each of these equations need to be solved
1790 iteratively for the constraint forces.  The discussion of the details of the iteration
1791 is beyond the scope of this manual; readers are encouraged to see the
1792 implementation described in~\cite{Yu2010}.
1793
1794
1795 \subsubsection{Infrequent evaluation of temperature and pressure coupling}
1796
1797 Temperature and pressure control require global communication to
1798 compute the kinetic energy and virial, which can become costly if
1799 performed every step for large systems.  We can rearrange the Trotter
1800 decomposition to give alternate symplectic, reversible integrator with
1801 the coupling steps every $n$ steps instead of every steps.  These new
1802 integrators will diverge if the coupling time step is too large, as
1803 the auxiliary variable integrations will not converge.  However, in
1804 most cases, long coupling times are more appropriate, as they disturb
1805 the dynamics less~\cite{Martyna1996}.
1806
1807 Standard velocity Verlet with Nos{\'e}-Hoover temperature control has a Trotter expansion
1808 \bea
1809 \exp(iL\dt) &\approx& \exp\left(iL_{\mathrm{NHC}}\dt/2\right) \exp\left(iL_2 \dt/2\right) \nonumber \\
1810 &&\exp\left(iL_1 \dt\right) \exp\left(iL_2 \dt/2\right) \exp\left(iL_{\mathrm{NHC}}\dt/2\right).
1811 \eea
1812 If the Nos{\'e}-Hoover chain is sufficiently slow with respect to the motions of the system, we can
1813 write an alternate integrator over $n$ steps for velocity Verlet as
1814 \bea
1815 \exp(iL\dt) &\approx& (\exp\left(iL_{\mathrm{NHC}}(n\dt/2)\right)\left[\exp\left(iL_2 \dt/2\right)\right. \nonumber \\
1816 &&\left.\exp\left(iL_1 \dt\right) \exp\left(iL_2 \dt/2\right)\right]^n \exp\left(iL_{\mathrm{NHC}}(n\dt/2)\right).
1817 \eea
1818 For pressure control, this becomes
1819 \bea
1820 \exp(iL\dt) &\approx& \exp\left(iL_{\mathrm{NHC-baro}}(n\dt/2)\right)\exp\left(iL_{\mathrm{NHC}}(n\dt/2)\right) \nonumber \nonumber \\
1821 &&\exp\left(iL_{\epsilon,2}(n\dt/2)\right) \left[\exp\left(iL_2 \dt/2\right)\right. \nonumber \nonumber \\
1822 &&\exp\left(iL_{\epsilon,1}\dt\right) \exp\left(iL_1 \dt\right) \nonumber \nonumber \\
1823 &&\left.\exp\left(iL_2 \dt/2\right)\right]^n \exp\left(iL_{\epsilon,2}(n\dt/2)\right) \nonumber \nonumber \\
1824 &&\exp\left(iL_{\mathrm{NHC}}(n\dt/2)\right)\exp\left(iL_{\mathrm{NHC-baro}}(n\dt/2)\right),
1825 \eea
1826 where the box volume integration occurs every step, but the auxiliary variable
1827 integrations happen every $n$ steps.
1828
1829 % } % Brace matches ifthenelse test for gmxlite
1830
1831
1832 \subsection{The complete update algorithm}
1833 \begin{figure}
1834 \begin{center}
1835 \addtolength{\fboxsep}{0.5cm}
1836 \begin{shadowenv}[12cm]
1837 {\large \bf THE UPDATE ALGORITHM}
1838 \rule{\textwidth}{2pt} \\
1839 Given:\\
1840 Positions $\ve{r}$ of all atoms at time $t$ \\
1841 Velocities $\ve{v}$ of all atoms at time $t-\hDt$ \\
1842 Accelerations $\ve{F}/m$ on all atoms at time $t$.\\
1843 (Forces are computed disregarding any constraints)\\
1844 Total kinetic energy and virial at $t-\Dt$\\
1845 $\Downarrow$ \\
1846 {\bf 1.} Compute the scaling factors $\lambda$ and $\mu$\\
1847 according to \eqnsref{lambda}{mu}\\
1848 $\Downarrow$ \\
1849 {\bf 2.} Update and scale velocities: $\ve{v}' =  \lambda (\ve{v} +
1850 \ve{a} \Delta t)$ \\
1851 $\Downarrow$ \\
1852 {\bf 3.} Compute new unconstrained coordinates: $\ve{r}' = \ve{r} + \ve{v}'
1853 \Delta t$ \\
1854 $\Downarrow$ \\
1855 {\bf 4.} Apply constraint algorithm to coordinates: constrain($\ve{r}^{'} \rightarrow  \ve{r}'';
1856 \,  \ve{r}$) \\
1857 $\Downarrow$ \\
1858 {\bf 5.} Correct velocities for constraints: $\ve{v} = (\ve{r}'' -
1859 \ve{r}) / \Delta t$ \\
1860 $\Downarrow$ \\
1861 {\bf 6.} Scale coordinates and box: $\ve{r} = \mu \ve{r}''; \ve{b} =
1862 \mu  \ve{b}$ \\
1863 \end{shadowenv}
1864 \caption{The MD update algorithm with the leap-frog integrator}
1865 \label{fig:complete-update}
1866 \end{center}
1867 \end{figure}
1868 The complete algorithm for the update of velocities and coordinates is
1869 given using leap-frog in \figref{complete-update}. The SHAKE algorithm of step
1870 4 is explained below.
1871
1872 {\gromacs} has a provision to ``freeze''  (prevent motion of) selected
1873 particles\index{frozen atoms}, which must be defined as a ``\swapindex{freeze}{group}.'' This is implemented
1874 using a {\em freeze factor $\ve{f}_g$}, which is a vector, and differs for each
1875 freeze group (see \secref{groupconcept}). This vector contains only
1876 zero (freeze) or one (don't freeze).
1877 When we take this freeze factor and the external acceleration $\ve{a}_h$ into
1878 account the update algorithm for the velocities becomes
1879 \beq
1880 \ve{v}(t+\hdt)~=~\ve{f}_g * \lambda * \left[ \ve{v}(t-\hdt) +\frac{\ve{F}(t)}{m}\Delta t + \ve{a}_h \Delta t \right],
1881 \eeq
1882 where $g$ and $h$ are group indices which differ per atom.
1883
1884 \subsection{Output step}
1885 The most important output of the MD run is the {\em
1886 \swapindex{trajectory}{file}}, which contains particle coordinates
1887 and (optionally) velocities at regular intervals.
1888 The trajectory file contains frames that could include positions,
1889 velocities and/or forces, as well as information about the dimensions
1890 of the simulation volume, integration step, integration time, etc. The
1891 interpretation of the time varies with the integrator chosen, as
1892 described above. For velocity-Verlet integrators, velocities labeled
1893 at time $t$ are for that time. For other integrators (e.g. leap-frog,
1894 stochastic dynamics), the velocities labeled at time $t$ are for time
1895 $t - \hDt$.
1896
1897 Since the trajectory
1898 files are lengthy, one should not save every step! To retain all
1899 information it suffices to write a frame every 15 steps, since at
1900 least 30 steps are made per period of the highest frequency in the
1901 system, and Shannon's \normindex{sampling} theorem states that two samples per
1902 period of the highest frequency in a band-limited signal contain all
1903 available information. But that still gives very long files! So, if
1904 the highest frequencies are not of interest, 10 or 20 samples per ps
1905 may suffice. Be aware of the distortion of high-frequency motions by
1906 the {\em stroboscopic effect}, called {\em aliasing}: higher frequencies
1907 are  mirrored with respect to the sampling frequency and appear as
1908 lower frequencies.
1909
1910 {\gromacs} can also write reduced-precision coordinates for a subset of
1911 the simulation system to a special compressed trajectory file
1912 format. All the other tools can read and write this format. See
1913 \secref{mdpopt} for details on how to set up your {\tt .mdp} file
1914 to have {\tt mdrun} use this feature.
1915
1916 % \ifthenelse{\equal{\gmxlite}{1}}{}{
1917 \section{Shell molecular dynamics}
1918 {\gromacs} can simulate \normindex{polarizability} using the
1919 \normindex{shell model} of Dick and Overhauser~\cite{Dick58}. In such models
1920 a shell particle representing the electronic degrees of freedom is
1921 attached to a nucleus by a spring. The potential energy is minimized with
1922 respect to the shell position  at every step of the simulation (see below).
1923 Successful applications of shell models in {\gromacs} have been published
1924 for $N_2$~\cite{Jordan95} and water~\cite{Maaren2001a}.
1925
1926 \subsection{Optimization of the shell positions}
1927 The force \ve{F}$_S$ on a shell particle $S$ can be decomposed into two
1928 components
1929 \begin{equation}
1930 \ve{F}_S ~=~ \ve{F}_{bond} + \ve{F}_{nb}
1931 \end{equation}
1932 where \ve{F}$_{bond}$ denotes the component representing the
1933 polarization energy, usually represented by a harmonic potential and
1934 \ve{F}$_{nb}$ is the sum of Coulomb and van der Waals interactions. If we
1935 assume that \ve{F}$_{nb}$ is almost constant we can analytically derive the
1936 optimal position of the shell, i.e. where \ve{F}$_S$ = 0. If we have the
1937 shell S connected to atom A we have
1938 \begin{equation}
1939 \ve{F}_{bond} ~=~ k_b \left( \ve{x}_S - \ve{x}_A\right).
1940 \end{equation}
1941 In an iterative solver, we have positions \ve{x}$_S(n)$ where $n$ is
1942 the iteration count. We now have at iteration $n$
1943 \begin{equation}
1944 \ve{F}_{nb} ~=~ \ve{F}_S - k_b \left( \ve{x}_S(n) - \ve{x}_A\right)
1945 \end{equation}
1946 and the optimal position for the shells $x_S(n+1)$ thus follows from
1947 \begin{equation}
1948 \ve{F}_S - k_b \left( \ve{x}_S(n) - \ve{x}_A\right) + k_b \left( \ve{x}_S(n+1) - \ve{x}_A\right) = 0
1949 \end{equation}
1950 if we write
1951 \begin{equation}
1952 \Delta \ve{x}_S = \ve{x}_S(n+1) - \ve{x}_S(n)
1953 \end{equation}
1954 we finally obtain
1955 \begin{equation}
1956 \Delta \ve{x}_S = \ve{F}_S/k_b
1957 \end{equation}
1958 which then yields the algorithm to compute the next trial in the optimization
1959 of shell positions
1960 \begin{equation}
1961 \ve{x}_S(n+1) ~=~ \ve{x}_S(n) + \ve{F}_S/k_b.
1962 \end{equation}
1963 % } % Brace matches ifthenelse test for gmxlite
1964
1965 \section{Constraint algorithms\index{constraint algorithms}}
1966 Constraints can be imposed in {\gromacs} using LINCS (default) or
1967 the traditional SHAKE method.
1968
1969 \subsection{\normindex{SHAKE}}
1970 \label{subsec:SHAKE}
1971 The SHAKE~\cite{Ryckaert77} algorithm changes a set of unconstrained
1972 coordinates $\ve{r}^{'}$ to a set of coordinates $\ve{r}''$ that
1973 fulfill a  list of distance constraints, using a set $\ve{r}$
1974 reference, as
1975 \beq
1976 {\rm SHAKE}(\ve{r}^{'} \rightarrow \ve{r}'';\, \ve{r})
1977 \eeq
1978 This action is consistent with solving a set of Lagrange multipliers
1979 in the constrained equations of motion. SHAKE needs a {\em relative tolerance};
1980 it will continue until all constraints are satisfied within
1981 that relative tolerance. An error message is
1982 given if SHAKE cannot reset the coordinates because the deviation is
1983 too large, or if a given number of iterations is surpassed.
1984
1985 Assume the equations of motion must fulfill $K$ holonomic constraints,
1986 expressed as
1987 \beq
1988 \sigma_k(\ve{r}_1 \ldots \ve{r}_N) = 0; \;\; k=1 \ldots K.
1989 \eeq
1990 For example, $(\ve{r}_1 - \ve{r}_2)^2 - b^2 = 0$.
1991 Then the forces are defined as
1992 \beq
1993 - \frac{\partial}{\partial \ve{r}_i} \left( V + \sum_{k=1}^K \lambda_k
1994 \sigma_k \right),
1995 \eeq
1996 where $\lambda_k$ are Lagrange multipliers which must be solved to
1997 fulfill the constraint equations. The second part of this sum
1998 determines the {\em constraint forces} $\ve{G}_i$, defined by
1999 \beq
2000 \ve{G}_i = -\sum_{k=1}^K \lambda_k \frac{\partial \sigma_k}{\partial
2001 \ve{r}_i}
2002 \eeq
2003 The displacement due to the constraint forces in the leap-frog or
2004 Verlet algorithm is equal to $(\ve{G}_i/m_i)(\Dt)^2$. Solving the
2005 Lagrange multipliers (and hence the displacements) requires the
2006 solution of a set of coupled equations of the second degree. These are
2007 solved iteratively by SHAKE.
2008 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2009 \label{subsec:SETTLE}
2010 For the special case of rigid water molecules, that often make up more
2011 than 80\% of the simulation system we have implemented the
2012 \normindex{SETTLE}
2013 algorithm~\cite{Miyamoto92} (\secref{constraints}).
2014
2015 For velocity Verlet, an additional round of constraining must be
2016 done, to constrain the velocities of the second velocity half step,
2017 removing any component of the velocity parallel to the bond vector.
2018 This step is called RATTLE, and is covered in more detail in the
2019 original Andersen paper~\cite{Andersen1983a}.
2020
2021 % } % Brace matches ifthenelse test for gmxlite
2022
2023
2024
2025
2026 \newcommand{\fs}[1]{\begin{equation} \label{eqn:#1}}
2027 \newcommand{\fe}{\end{equation}}
2028 \newcommand{\p}{\partial}
2029 \newcommand{\Bm}{\ve{B}}
2030 \newcommand{\M}{\ve{M}}
2031 \newcommand{\iM}{\M^{-1}}
2032 \newcommand{\Tm}{\ve{T}}
2033 \newcommand{\Sm}{\ve{S}}
2034 \newcommand{\fo}{\ve{f}}
2035 \newcommand{\con}{\ve{g}}
2036 \newcommand{\lenc}{\ve{d}}
2037
2038 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2039 \subsection{\normindex{LINCS}}
2040 \label{subsec:lincs}
2041
2042 \subsubsection{The LINCS algorithm}
2043 LINCS is an algorithm that resets bonds to their correct lengths
2044 after an unconstrained update~\cite{Hess97}.
2045 The method is non-iterative, as it always uses two steps.
2046 Although LINCS is based on matrices, no matrix-matrix multiplications are
2047 needed. The method is more stable and faster than SHAKE,
2048 but it can only be used with bond constraints and
2049 isolated angle constraints, such as the proton angle in OH.
2050 Because of its stability, LINCS is especially useful for Brownian dynamics.
2051 LINCS has two parameters, which are explained in the subsection parameters.
2052 The parallel version of LINCS, P-LINCS, is described
2053 in subsection \ssecref{plincs}.
2054
2055 \subsubsection{The LINCS formulas}
2056 We consider a system of $N$ particles, with positions given by a
2057 $3N$ vector $\ve{r}(t)$.
2058 For molecular dynamics the equations of motion are given by Newton's Law
2059 \fs{c1}
2060 {\de^2 \ve{r} \over \de t^2} = \iM \ve{F},
2061 \fe
2062 where $\ve{F}$ is the $3N$ force vector
2063 and $\M$ is a $3N \times 3N$ diagonal matrix,
2064 containing the masses of the particles.
2065 The system is constrained by $K$ time-independent constraint equations
2066 \fs{c2}
2067 g_i(\ve{r}) = | \ve{r}_{i_1}-\ve{r}_{i_2} | - d_i = 0 ~~~~~~i=1,\ldots,K.
2068 \fe
2069
2070 In a numerical integration scheme, LINCS is applied after an
2071 unconstrained update, just like SHAKE. The algorithm works in two
2072 steps (see figure \figref{lincs}). In the first step, the projections
2073 of the new bonds on the old bonds are set to zero. In the second step,
2074 a correction is applied for the lengthening of the bonds due to
2075 rotation. The numerics for the first step and the second step are very
2076 similar. A complete derivation of the algorithm can be found in
2077 \cite{Hess97}. Only a short description of the first step is given
2078 here.
2079
2080 \begin{figure}
2081 \centerline{\includegraphics[height=50mm]{plots/lincs}}
2082 \caption[The three position updates needed for one time step.]{The
2083 three position updates needed for one time step. The dashed line is
2084 the old bond of length $d$, the solid lines are the new bonds. $l=d
2085 \cos \theta$ and $p=(2 d^2 - l^2)^{1 \over 2}$.}
2086 \label{fig:lincs}
2087 \end{figure}
2088
2089 A new notation is introduced for the gradient matrix of the constraint
2090 equations which appears on the right hand side of this equation:
2091 \fs{c3}
2092 B_{hi} = {\p g_h \over \p r_i}
2093 \fe
2094 Notice that $\Bm$ is a $K \times 3N$ matrix, it contains the directions
2095 of the constraints.
2096 The following equation shows how the new constrained coordinates
2097 $\ve{r}_{n+1}$ are related to the unconstrained coordinates
2098 $\ve{r}_{n+1}^{unc}$ by
2099 \fs{m0}
2100 \begin{array}{c}
2101   \ve{r}_{n+1}=(\ve{I}-\Tm_n \ve{B}_n) \ve{r}_{n+1}^{unc} + \Tm_n \lenc=
2102   \\[2mm]
2103   \ve{r}_{n+1}^{unc} -
2104 \iM \Bm_n (\Bm_n \iM \Bm_n^T)^{-1} (\Bm_n \ve{r}_{n+1}^{unc} - \lenc)
2105 \end{array}
2106 \fe
2107 where $\Tm = \iM \Bm^T (\Bm \iM \Bm^T)^{-1}$.
2108 The derivation of this equation from \eqnsref{c1}{c2} can be found
2109 in \cite{Hess97}.
2110
2111 This first step does not set the real bond lengths to the prescribed lengths,
2112 but the projection of the new bonds onto the old directions of the bonds.
2113 To correct for the rotation of bond $i$, the projection of the
2114 bond, $p_i$, on the old direction is set to
2115 \fs{m1a}
2116 p_i=\sqrt{2 d_i^2 - l_i^2},
2117 \fe
2118 where $l_i$ is the bond length after the first projection.
2119 The corrected positions are
2120 \fs{m1b}
2121 \ve{r}_{n+1}^*=(\ve{I}-\Tm_n \Bm_n)\ve{r}_{n+1} + \Tm_n \ve{p}.
2122 \fe
2123 This correction for rotational effects is actually an iterative process,
2124 but during MD only one iteration is applied.
2125 The relative constraint deviation after this procedure will be less than
2126 0.0001 for every constraint.
2127 In energy minimization, this might not be accurate enough, so the number
2128 of iterations is equal to the order of the expansion (see below).
2129
2130 Half of the CPU time goes to inverting the constraint coupling
2131 matrix $\Bm_n \iM \Bm_n^T$, which has to be done every time step.
2132 This $K \times K$ matrix
2133 has $1/m_{i_1} + 1/m_{i_2}$ on the diagonal.
2134 The off-diagonal elements are only non-zero when two bonds are connected,
2135 then the element is
2136 $\cos \phi /m_c$,  where $m_c$ is
2137 the mass of the atom connecting the
2138 two bonds and $\phi$ is the angle between the bonds.
2139
2140 The matrix $\Tm$ is inverted through a power expansion.
2141 A $K \times K$ matrix $\ve{S}$ is
2142 introduced which is the inverse square root of
2143 the diagonal of $\Bm_n \iM \Bm_n^T$.
2144 This matrix is used to convert the diagonal elements
2145 of the coupling matrix to one:
2146 \fs{m2}
2147 \begin{array}{c}
2148 (\Bm_n \iM \Bm_n^T)^{-1}
2149 = \Sm \Sm^{-1} (\Bm_n \iM \Bm_n^T)^{-1} \Sm^{-1} \Sm  \\[2mm]
2150 = \Sm (\Sm \Bm_n \iM \Bm_n^T \Sm)^{-1} \Sm =
2151   \Sm (\ve{I} - \ve{A}_n)^{-1} \Sm
2152 \end{array}
2153 \fe
2154 The matrix $\ve{A}_n$ is symmetric and sparse and has zeros on the diagonal.
2155 Thus a simple trick can be used to calculate the inverse:
2156 \fs{m3}
2157 (\ve{I}-\ve{A}_n)^{-1}=
2158         \ve{I} + \ve{A}_n + \ve{A}_n^2 + \ve{A}_n^3 + \ldots
2159 \fe
2160
2161 This inversion method is only valid if the absolute values of all the
2162 eigenvalues of $\ve{A}_n$ are smaller than one.
2163 In molecules with only bond constraints, the connectivity is so low
2164 that this will always be true, even if ring structures are present.
2165 Problems can arise in angle-constrained molecules.
2166 By constraining angles with additional distance constraints,
2167 multiple small ring structures are introduced.
2168 This gives a high connectivity, leading to large eigenvalues.
2169 Therefore LINCS should NOT be used with coupled angle-constraints.
2170
2171 For molecules with all bonds constrained the eigenvalues of $A$
2172 are around 0.4. This means that with each additional order
2173 in the expansion \eqnref{m3} the deviations decrease by a factor 0.4.
2174 But for relatively isolated triangles of constraints the largest
2175 eigenvalue is around 0.7.
2176 Such triangles can occur when removing hydrogen angle vibrations
2177 with an additional angle constraint in alcohol groups
2178 or when constraining water molecules with LINCS, for instance
2179 with flexible constraints.
2180 The constraints in such triangles converge twice as slow as
2181 the other constraints. Therefore, starting with {\gromacs} 4,
2182 additional terms are added to the expansion for such triangles
2183 \fs{m3_ang}
2184 (\ve{I}-\ve{A}_n)^{-1} \approx
2185         \ve{I} + \ve{A}_n + \ldots + \ve{A}_n^{N_i} +
2186         \left(\ve{A}^*_n + \ldots + {\ve{A}_n^*}^{N_i} \right) \ve{A}_n^{N_i}
2187 \fe
2188 where $N_i$ is the normal order of the expansion and
2189 $\ve{A}^*$ only contains the elements of $\ve{A}$ that couple
2190 constraints within rigid triangles, all other elements are zero.
2191 In this manner, the accuracy of angle constraints comes close
2192 to that of the other constraints, while the series of matrix vector
2193 multiplications required for determining the expansion
2194 only needs to be extended for a few constraint couplings.
2195 This procedure is described in the P-LINCS paper\cite{Hess2008a}.
2196
2197 \subsubsection{The LINCS Parameters}
2198 The accuracy of LINCS depends on the number of matrices used
2199 in the expansion \eqnref{m3}. For MD calculations a fourth order
2200 expansion is enough. For Brownian dynamics with
2201 large time steps an eighth order expansion may be necessary.
2202 The order is a parameter in the {\tt *.mdp} file.
2203 The implementation of LINCS is done in such a way that the
2204 algorithm will never crash. Even when it is impossible to
2205 to reset the constraints LINCS will generate a conformation
2206 which fulfills the constraints as well as possible.
2207 However, LINCS will generate a warning when in one step a bond
2208 rotates over more than a predefined angle.
2209 This angle is set by the user in the {\tt *.mdp} file.
2210
2211 % } % Brace matches ifthenelse test for gmxlite
2212
2213
2214 \section{Simulated Annealing}
2215 \label{sec:SA}
2216 The well known \swapindex{simulated}{annealing}
2217 (SA) protocol is supported in {\gromacs}, and you can even couple multiple
2218 groups of atoms separately with an arbitrary number of reference temperatures
2219 that change during the simulation. The annealing is implemented by simply
2220 changing the current reference temperature for each group in the temperature
2221 coupling, so the actual relaxation and coupling properties depends on the
2222 type of thermostat you use and how hard you are coupling it. Since we are
2223 changing the reference temperature it is important to remember that the system
2224 will NOT instantaneously reach this value - you need to allow for the inherent
2225 relaxation time in the coupling algorithm too. If you are changing the
2226 annealing reference temperature faster than the temperature relaxation you
2227 will probably end up with a crash when the difference becomes too large.
2228
2229 The annealing protocol is specified as a series of corresponding times and
2230 reference temperatures for each group, and you can also choose whether you only
2231 want a single sequence (after which the temperature will be coupled to the
2232 last reference value), or if the annealing should be periodic and restart at
2233 the first reference point once the sequence is completed. You can mix and
2234 match both types of annealing and non-annealed groups in your simulation.
2235
2236 \newcommand{\vrond}{\stackrel{\circ}{\ve{r}}}
2237 \newcommand{\rond}{\stackrel{\circ}{r}}
2238 \newcommand{\ruis}{\ve{r}^G}
2239
2240 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2241 \section{Stochastic Dynamics\swapindexquiet{stochastic}{dynamics}}
2242 \label{sec:SD}
2243 Stochastic or velocity \swapindex{Langevin}{dynamics} adds a friction
2244 and a noise term to Newton's equations of motion, as
2245 \beq
2246 \label{SDeq}
2247 m_i {\de^2 \ve{r}_i \over \de t^2} =
2248 - m_i \gamma_i {\de \ve{r}_i \over \de t} + \ve{F}_i(\ve{r}) + \vrond_i,
2249 \eeq
2250 where $\gamma_i$ is the friction constant $[1/\mbox{ps}]$ and
2251 $\vrond_i\!\!(t)$  is a noise process with
2252 $\langle \rond_i\!\!(t) \rond_j\!\!(t+s) \rangle =
2253     2 m_i \gamma_i k_B T \delta(s) \delta_{ij}$.
2254 When $1/\gamma_i$ is large compared to the time scales present in the system,
2255 one could see stochastic dynamics as molecular dynamics with stochastic
2256 temperature-coupling. The advantage compared to MD with Berendsen
2257 temperature-coupling is that in case of SD the generated ensemble is known.
2258 For simulating a system in vacuum there is the additional advantage that there is no
2259 accumulation of errors for the overall translational and rotational
2260 degrees of freedom.
2261 When $1/\gamma_i$ is small compared to the time scales present in the system,
2262 the dynamics will be completely different from MD, but the sampling is
2263 still correct.
2264
2265 In {\gromacs} there are two algorithms to integrate equation (\ref{SDeq}):
2266 a simple and efficient one
2267 and a more complex leap-frog algorithm~\cite{Gunsteren88}.
2268 The accuracy of both integrators is equivalent to the normal MD leap-frog and
2269 velocity-Verlet integrator, except with constraints where the simple
2270 SD integrator is significantly less accurate. There is a proper way
2271 of applying constraints with the simple integrator, but that requires
2272 a second constraining step~\cite{Goga2012}, which diminishes the gain.
2273 The simple integrator is:
2274 \bea
2275 \label{eqn:sd_int1}
2276 \ve{v}(t+\hDt)  &~=~&   \alpha \, \ve{v}(t-\hDt) + \frac{1 - \alpha}{m \gamma}\ve{F}(t) + \sqrt{\frac{k_B T}{m}(1 - \alpha^2)} \, \ruis_i \\
2277 \ve{r}(t+\Dt)   &~=~&   \ve{r}(t)+\Dt \, \ve{v}(t+\hDt) \\
2278 \alpha &~=~& \left(1 - \frac{\gamma \Dt}{m} \right)
2279 \eea
2280 where $\ruis_i$ is Gaussian distributed noise with $\mu = 0$, $\sigma = 1$.
2281 With constraints you should only consider using the simple integrator when $\gamma \Dt/m \ll 0.01$.
2282
2283 In the complex algorithm four Gaussian random numbers are required
2284 per integration step per degree of freedom, and with constraints the
2285 coordinates need to be constrained twice per integration step.
2286 Depending on the computational cost of the force calculation,
2287 this can take a significant part of the simulation time.
2288 Exact continuation of a stochastic dynamics simulation is not possible,
2289 because the state of the random number generator is not stored.
2290 When using SD as a thermostat, an appropriate value for $\gamma$ is 0.5 ps$^{-1}$,
2291 since this results in a friction that is lower than the internal friction
2292 of water, while it is high enough to remove excess heat
2293 (unless plain cut-off or reaction-field electrostatics is used).
2294 With this value of $\gamma$ the efficient algorithm will usually be accurate
2295 enough.
2296
2297 \section{Brownian Dynamics\swapindexquiet{Brownian}{dynamics}}
2298 \label{sec:BD}
2299 In the limit of high friction, stochastic dynamics reduces to
2300 Brownian dynamics, also called position Langevin dynamics.
2301 This applies to over-damped systems,
2302 {\ie} systems in which the inertia effects are negligible.
2303 The equation is
2304 \beq
2305 {\de \ve{r}_i \over \de t} = \frac{1}{\gamma_i} \ve{F}_i(\ve{r}) + \vrond_i
2306 \eeq
2307 where $\gamma_i$ is the friction coefficient $[\mbox{amu/ps}]$ and
2308 $\vrond_i\!\!(t)$  is a noise process with
2309 $\langle \rond_i\!\!(t) \rond_j\!\!(t+s) \rangle =
2310     2 \delta(s) \delta_{ij} k_B T / \gamma_i$.
2311 In {\gromacs} the equations are integrated with a simple, explicit scheme
2312 \beq
2313 \ve{r}_i(t+\Delta t) = \ve{r}_i(t) +
2314         {\Delta t \over \gamma_i} \ve{F}_i(\ve{r}(t))
2315         + \sqrt{2 k_B T {\Delta t \over \gamma_i}}\, \ruis_i,
2316 \eeq
2317 where $\ruis_i$ is Gaussian distributed noise with $\mu = 0$, $\sigma = 1$.
2318 The friction coefficients $\gamma_i$ can be chosen the same for all
2319 particles or as $\gamma_i = m_i\,\gamma_i$, where the friction constants
2320 $\gamma_i$ can be different for different groups of atoms.
2321 Because the system is assumed to be over-damped, large timesteps
2322 can be used. LINCS should be used for the constraints since SHAKE
2323 will not converge for large atomic displacements.
2324 BD is an option of the {\tt mdrun} program.
2325 % } % Brace matches ifthenelse test for gmxlite
2326
2327 \section{Energy Minimization}
2328 \label{sec:EM}\index{energy minimization}%
2329 Energy minimization in {\gromacs} can be done using steepest descent,
2330 conjugate gradients, or l-bfgs (limited-memory
2331 Broyden-Fletcher-Goldfarb-Shanno quasi-Newtonian minimizer...we
2332 prefer the abbreviation). EM is just an option of the {\tt mdrun}
2333 program.
2334
2335 \subsection{Steepest Descent\index{steepest descent}}
2336 Although steepest descent is certainly not the most efficient
2337 algorithm for searching, it is robust and easy to implement.
2338
2339 We define the vector $\ve{r}$ as the vector of all $3N$ coordinates.
2340 Initially a maximum displacement $h_0$ ({\eg} 0.01 nm) must be given.
2341
2342 First the forces $\ve{F}$ and potential energy are calculated.
2343 New positions are calculated by
2344 \beq
2345 \ve{r}_{n+1} =  \ve{r}_n + \frac{\ve{F}_n}{\max (|\ve{F}_n|)} h_n,
2346 \eeq
2347 where $h_n$ is the maximum displacement and $\ve{F}_n$ is the force,
2348 or the negative gradient of the  potential $V$. The notation $\max
2349 (|\ve{F}_n|)$ means the largest of the absolute values of the force
2350 components.  The forces and energy are again computed for the new positions \\
2351 If ($V_{n+1} < V_n$) the new positions are accepted and $h_{n+1} = 1.2
2352 h_n$. \\
2353 If ($V_{n+1} \geq V_n$) the new positions are rejected and $h_n = 0.2 h_n$.
2354
2355 The algorithm stops when either a user-specified number of force
2356 evaluations has been performed ({\eg} 100), or when the maximum of the absolute
2357 values of the force (gradient) components is smaller than a specified
2358 value $\epsilon$.
2359 Since force truncation produces some noise in the
2360 energy evaluation, the stopping criterion should not be made too tight
2361 to avoid endless iterations. A reasonable value for $\epsilon$ can be
2362 estimated from the root mean square force $f$ a harmonic oscillator would exhibit at a
2363 temperature $T$. This value is
2364 \beq
2365   f = 2 \pi \nu \sqrt{ 2mkT},
2366 \eeq
2367 where $\nu$ is the oscillator frequency, $m$ the (reduced) mass, and
2368 $k$ Boltzmann's constant. For a weak oscillator with a wave number of
2369 100 cm$^{-1}$ and a mass of 10 atomic units, at a temperature of 1 K,
2370 $f=7.7$ kJ~mol$^{-1}$~nm$^{-1}$. A value for $\epsilon$ between 1 and
2371 10 is acceptable.
2372
2373 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2374 \subsection{Conjugate Gradient\index{conjugate gradient}}
2375 Conjugate gradient is slower than steepest descent in the early stages
2376 of the minimization, but becomes more efficient closer to the energy
2377 minimum.  The parameters and stop criterion are the same as for
2378 steepest descent.  In {\gromacs} conjugate gradient can not be used
2379 with constraints, including the SETTLE algorithm for
2380 water~\cite{Miyamoto92}, as this has not been implemented. If water is
2381 present it must be of a flexible model, which can be specified in the
2382 {\tt *.mdp} file by {\tt define = -DFLEXIBLE}.
2383
2384 This is not really a restriction, since the accuracy of conjugate
2385 gradient is only required for minimization prior to a normal-mode
2386 analysis, which cannot be performed with constraints.  For most other
2387 purposes steepest descent is efficient enough.
2388 % } % Brace matches ifthenelse test for gmxlite
2389
2390 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2391 \subsection{\normindex{L-BFGS}}
2392 The original BFGS algorithm works by successively creating better
2393 approximations of the inverse Hessian matrix, and moving the system to
2394 the currently estimated minimum. The memory requirements for this are
2395 proportional to the square of the number of particles, so it is not
2396 practical for large systems like biomolecules. Instead, we use the
2397 L-BFGS algorithm of Nocedal~\cite{Byrd95a,Zhu97a}, which approximates
2398 the inverse Hessian by a fixed number of corrections from previous
2399 steps. This sliding-window technique is almost as efficient as the
2400 original method, but the memory requirements are much lower -
2401 proportional to the number of particles multiplied with the correction
2402 steps. In practice we have found it to converge faster than conjugate
2403 gradients, but due to the correction steps it is not yet parallelized.
2404 It is also noteworthy that switched or shifted interactions usually
2405 improve the convergence, since sharp cut-offs mean the potential
2406 function at the current coordinates is slightly different from the
2407 previous steps used to build the inverse Hessian approximation.
2408 % } % Brace matches ifthenelse test for gmxlite
2409
2410 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2411 \section{Normal-Mode Analysis\index{normal-mode analysis}\index{NMA}}
2412 Normal-mode analysis~\cite{Levitt83,Go83,BBrooks83b}
2413 can be performed using {\gromacs}, by diagonalization of the mass-weighted
2414 \normindex{Hessian} $H$:
2415 \bea
2416 R^T M^{-1/2} H M^{-1/2} R   &=& \mbox{diag}(\lambda_1,\ldots,\lambda_{3N})
2417 \\
2418 \lambda_i &=& (2 \pi \omega_i)^2
2419 \eea
2420 where $M$ contains the atomic masses, $R$ is a matrix that contains
2421 the eigenvectors as columns, $\lambda_i$ are the eigenvalues
2422 and $\omega_i$ are the corresponding frequencies.
2423
2424 First the Hessian matrix, which is a $3N \times 3N$ matrix where $N$
2425 is the number of atoms, needs to be calculated:
2426 \bea
2427 H_{ij}  &=&     \frac{\partial^2 V}{\partial x_i \partial x_j}
2428 \eea
2429 where $x_i$ and $x_j$ denote the atomic x, y or z coordinates.
2430 In practice, this equation is not used, but the Hessian is
2431 calculated numerically from the force as:
2432 \bea
2433 H_{ij} &=& -
2434   \frac{f_i({\bf x}+h{\bf e}_j) - f_i({\bf x}-h{\bf e}_j)}{2h}
2435 \\
2436 f_i     &=& - \frac{\partial V}{\partial x_i}
2437 \eea
2438 where ${\bf e}_j$ is the unit vector in direction $j$.
2439 It should be noted that
2440 for a usual normal-mode calculation, it is necessary to completely minimize
2441 the energy prior to computation of the Hessian.
2442 The tolerance required depends on the type of system,
2443 but a rough indication is 0.001 kJ mol$^{-1}$.
2444 Minimization should be done with conjugate gradients or L-BFGS in double precision.
2445
2446 A number of {\gromacs} programs are involved in these
2447 calculations. First, the energy should be minimized using {\tt mdrun}.
2448 Then, {\tt mdrun} computes the Hessian.  {\bf Note} that for generating
2449 the run input file, one should use the minimized conformation from
2450 the full precision trajectory file, as the structure file is not
2451 accurate enough.
2452 {\tt \normindex{g_nmeig}} does the diagonalization and
2453 the sorting of the normal modes according to their frequencies.
2454 Both {\tt mdrun} and {\tt g_nmeig} should be run in double precision.
2455 The normal modes can be analyzed with the program {\tt g_anaeig}.
2456 Ensembles of structures at any temperature and for any subset of
2457 normal modes can be generated with {\tt \normindex{g_nmens}}.
2458 An overview of normal-mode analysis and the related principal component
2459 analysis (see \secref{covanal}) can be found in~\cite{Hayward95b}.
2460 % } % Brace matches ifthenelse test for gmxlite
2461
2462 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2463
2464 \section{Free energy calculations\index{free energy calculations}}
2465 \label{sec:fecalc}
2466 \subsection{Slow-growth methods\index{slow-growth methods}}
2467 Free energy calculations can be performed
2468 in {\gromacs} using  a number of methods, including ``slow-growth.'' An example problem
2469 might be calculating the difference in free energy of binding of an inhibitor {\bf I}
2470 to an enzyme {\bf E} and to a mutated enzyme {\bf E$^{\prime}$}. It
2471 is not feasible with computer simulations to perform a docking
2472 calculation for such a large complex, or even releasing the inhibitor from
2473 the enzyme in a reasonable amount of computer time with reasonable accuracy.
2474 However, if we consider the free energy cycle in~\figref{free}A
2475 we can write:
2476 \beq
2477 \Delta G_1 - \Delta G_2 =       \Delta G_3 - \Delta G_4
2478 \label{eqn:ddg}
2479 \eeq
2480 If we are interested in the left-hand term we can equally well compute
2481 the right-hand term.
2482 \begin{figure}
2483 \centerline{\includegraphics[width=6cm,angle=270]{plots/free1}\hspace{2cm}\includegraphics[width=6cm,angle=270]{plots/free2}}
2484 \caption[Free energy cycles.]{Free energy cycles. {\bf A:} to
2485 calculate $\Delta G_{12}$, the free energy difference between the
2486 binding of inhibitor {\bf I} to enzymes {\bf E} respectively {\bf
2487 E$^{\prime}$}. {\bf B:} to calculate $\Delta G_{12}$, the free energy
2488 difference for binding of inhibitors {\bf I} respectively {\bf I$^{\prime}$} to
2489 enzyme {\bf E}.}
2490 \label{fig:free}
2491 \end{figure}
2492
2493 If we want to compute the difference in free energy of binding of two
2494 inhibitors {\bf I} and {\bf I$^{\prime}$} to an enzyme {\bf E} (\figref{free}B)
2495 we can again use \eqnref{ddg} to compute the desired property.
2496
2497 \newcommand{\sA}{^{\mathrm{A}}}
2498 \newcommand{\sB}{^{\mathrm{B}}}
2499 Free energy differences between two molecular species can
2500 be calculated in {\gromacs} using the ``slow-growth'' method.
2501 Such free energy differences between different molecular species are
2502 physically meaningless, but they can be used to obtain meaningful
2503 quantities employing a thermodynamic cycle.
2504 The method requires a simulation during which the Hamiltonian of the
2505 system changes slowly from that describing one system (A) to that
2506 describing the other system (B). The change must be so slow that the
2507 system remains in equilibrium during the process; if that requirement
2508 is fulfilled, the change is reversible and a slow-growth simulation from B to A
2509 will yield the same results (but with a different sign) as a slow-growth
2510 simulation from A to B. This is a useful check, but the user should be
2511 aware of the danger that equality of forward and backward growth results does
2512 not guarantee correctness of the results.
2513
2514 The required modification of the Hamiltonian $H$ is realized by making
2515 $H$ a function of a \textit{coupling parameter} $\lambda:
2516 H=H(p,q;\lambda)$ in such a way that $\lambda=0$ describes system A
2517 and $\lambda=1$ describes system B:
2518 \beq
2519   H(p,q;0)=H\sA (p,q);~~~~ H(p,q;1)=H\sB (p,q).
2520 \eeq
2521 In {\gromacs}, the functional form of the $\lambda$-dependence is
2522 different for the various force-field contributions and is described
2523 in section \secref{feia}.
2524
2525 The Helmholtz free energy $A$ is related to the
2526 partition function $Q$ of an $N,V,T$ ensemble, which is assumed to be
2527 the equilibrium ensemble generated by a MD simulation at constant
2528 volume and temperature. The generally more useful Gibbs free energy
2529 $G$ is related to the partition function $\Delta$ of an $N,p,T$
2530 ensemble, which is assumed to be the equilibrium ensemble generated by
2531 a MD simulation at constant pressure and temperature:
2532 \bea
2533  A(\lambda) &=&  -k_BT \ln Q \\
2534  Q &=& c \int\!\!\int \exp[-\beta H(p,q;\lambda)]\,dp\,dq \\
2535  G(\lambda) &=&  -k_BT \ln \Delta \\
2536  \Delta &=& c \int\!\!\int\!\!\int \exp[-\beta H(p,q;\lambda) -\beta
2537 pV]\,dp\,dq\,dV \\
2538 G &=& A + pV,
2539 \eea
2540 where $\beta = 1/(k_BT)$ and $c = (N! h^{3N})^{-1}$.
2541 These integrals over phase space cannot be evaluated from a
2542 simulation, but it is possible to evaluate the derivative with
2543 respect to $\lambda$ as an ensemble average:
2544 \beq
2545  \frac{dA}{d\lambda} =  \frac{\int\!\!\int (\partial H/ \partial
2546 \lambda) \exp[-\beta H(p,q;\lambda)]\,dp\,dq}{\int\!\!\int \exp[-\beta
2547 H(p,q;\lambda)]\,dp\,dq} =
2548 \left\langle \frac{\partial H}{\partial \lambda} \right\rangle_{NVT;\lambda},
2549 \eeq
2550 with a similar relation for $dG/d\lambda$ in the $N,p,T$
2551 ensemble.  The difference in free energy between A and B can be found
2552 by integrating the derivative over $\lambda$:
2553 \bea
2554   A\sB(V,T)-A\sA(V,T) &=& \int_0^1 \left\langle \frac{\partial
2555 H}{\partial \lambda} \right\rangle_{NVT;\lambda} \,d\lambda
2556 \label{eq:delA} \\
2557  G\sB(p,T)-G\sA(p,T) &=& \int_0^1 \left\langle \frac{\partial
2558 H}{\partial \lambda} \right\rangle_{NpT;\lambda} \,d\lambda.
2559 \label{eq:delG}
2560 \eea
2561 If one wishes to evaluate $G\sB(p,T)-G\sA(p,T)$,
2562 the natural choice is a constant-pressure simulation. However, this
2563 quantity can also be obtained from a slow-growth simulation at
2564 constant volume, starting with system A at pressure $p$ and volume $V$
2565 and ending with system B at pressure $p_B$, by applying the following
2566 small (but, in principle, exact) correction:
2567 \beq
2568   G\sB(p)-G\sA(p) =
2569 A\sB(V)-A\sA(V) - \int_p^{p\sB}[V\sB(p')-V]\,dp'
2570 \eeq
2571 Here we omitted the constant $T$ from the notation. This correction is
2572 roughly equal to $-\frac{1}{2} (p\sB-p)\Delta V=(\Delta V)^2/(2
2573 \kappa V)$, where $\Delta V$ is the volume change at $p$ and $\kappa$
2574 is the isothermal compressibility. This is usually
2575 small; for example, the growth of a water molecule from nothing
2576 in a bath of 1000 water molecules at constant volume would produce an
2577 additional pressure of as much as 22 bar, but a correction to the
2578 Helmholtz free energy of just -1 kJ mol$^{-1}$. %-20 J/mol.
2579
2580 In Cartesian coordinates, the kinetic energy term in the Hamiltonian
2581 depends only on the momenta, and can be separately integrated and, in
2582 fact, removed from the equations. When masses do not change, there is
2583 no contribution from the kinetic energy at all; otherwise the
2584 integrated contribution to the free energy is $-\frac{3}{2} k_BT \ln
2585 (m\sB/m\sA)$. {\bf Note} that this is only true in the absence of constraints.
2586
2587 \subsection{Thermodynamic integration\index{thermodynamic integration}\index{BAR}\index{Bennett's acceptance ratio}}
2588 {\gromacs} offers the possibility to integrate eq.~\ref{eq:delA} or
2589 eq. \ref{eq:delG} in one simulation over the full range from A to
2590 B. However, if the change is large and insufficient sampling can be
2591 expected, the user may prefer to determine the value of $\langle
2592 dG/d\lambda \rangle$ accurately at a number of well-chosen
2593 intermediate values of $\lambda$. This can easily be done by setting
2594 the stepsize {\tt delta_lambda} to zero. Each simulation can be
2595 equilibrated first, and a proper error estimate can be made for each
2596 value of $dG/d\lambda$ from the fluctuation of $\partial H/\partial
2597 \lambda$. The total free energy change is then determined afterward
2598 by an appropriate numerical integration procedure.
2599
2600 {\gromacs} now also supports the use of Bennett's Acceptance Ratio~\cite{Bennett1976}
2601 for calculating values of $\Delta$G for transformations from state A to state B using
2602 the program {\tt \normindex{g_bar}}. The same data can also be used to calculate free
2603 energies using MBAR~\cite{Shirts2008}, though the analysis currently requires external tools from
2604 the external {\tt pymbar} package, at https://SimTK.org/home/pymbar.
2605
2606 The $\lambda$-dependence for the force-field contributions is
2607 described in detail in section \secref{feia}.
2608 % } % Brace matches ifthenelse test for gmxlite
2609
2610 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2611 \section{Replica exchange\index{replica exchange}}
2612 Replica exchange molecular dynamics (\normindex{REMD})
2613 is a method that can be used to speed up
2614 the sampling of any type of simulation, especially if
2615 conformations are separated by relatively high energy barriers.
2616 It involves simulating multiple replicas of the same system
2617 at different temperatures and randomly exchanging the complete state
2618 of two replicas at regular intervals with the probability:
2619 \beq
2620 P(1 \leftrightarrow 2)=\min\left(1,\exp\left[
2621 \left(\frac{1}{k_B T_1} - \frac{1}{k_B T_2}\right)(U_1 - U_2)
2622  \right] \right)
2623 \eeq
2624 where $T_1$ and $T_2$ are the reference temperatures and $U_1$ and $U_2$
2625 are the instantaneous potential energies of replicas 1 and 2 respectively.
2626 After exchange the velocities are scaled by $(T_1/T_2)^{\pm0.5}$
2627 and a neighbor search is performed the next step.
2628 This combines the fast sampling and frequent barrier-crossing
2629 of the highest temperature with correct Boltzmann sampling at
2630 all the different temperatures~\cite{Hukushima96a,Sugita99}.
2631 We only attempt exchanges for neighboring temperatures as the probability
2632 decreases very rapidly with the temperature difference.
2633 One should not attempt exchanges for all possible pairs in one step.
2634 If, for instance, replicas 1 and 2 would exchange, the chance of
2635 exchange for replicas 2 and 3 not only depends on the energies of
2636 replicas 2 and 3, but also on the energy of replica 1.
2637 In {\gromacs} this is solved by attempting exchange for all ``odd''
2638 pairs on ``odd'' attempts and for all ``even'' pairs on ``even'' attempts.
2639 If we have four replicas: 0, 1, 2 and 3, ordered in temperature
2640 and we attempt exchange every 1000 steps, pairs 0-1 and 2-3
2641 will be tried at steps 1000, 3000 etc. and pair 1-2 at steps 2000, 4000 etc.
2642
2643 How should one choose the temperatures?
2644 The energy difference can be written as:
2645 \beq
2646 U_1 - U_2 =  N_{df} \frac{c}{2} k_B (T_1 - T_2)
2647 \eeq
2648 where $N_{df}$ is the total number of degrees of freedom of one replica
2649 and $c$ is 1 for harmonic potentials and around 2 for protein/water systems.
2650 If $T_2 = (1+\epsilon) T_1$ the probability becomes:
2651 \beq
2652 P(1 \leftrightarrow 2)
2653   = \exp\left( -\frac{\epsilon^2 c\,N_{df}}{2 (1+\epsilon)} \right)
2654 \approx \exp\left(-\epsilon^2 \frac{c}{2} N_{df} \right)
2655 \eeq
2656 Thus for a probability of $e^{-2}\approx 0.135$
2657 one obtains $\epsilon \approx 2/\sqrt{c\,N_{df}}$.
2658 With all bonds constrained one has $N_{df} \approx 2\, N_{atoms}$
2659 and thus for $c$ = 2 one should choose $\epsilon$ as $1/\sqrt{N_{atoms}}$.
2660 However there is one problem when using pressure coupling. The density at
2661 higher temperatures will decrease, leading to higher energy~\cite{Seibert2005a},
2662 which should be taken into account. The {\gromacs} website features a
2663 so-called ``REMD calculator,'' that lets you type in the temperature range and
2664 the number of atoms, and based on that proposes a set of temperatures.
2665
2666 An extension to the REMD for the isobaric-isothermal ensemble was
2667 proposed by Okabe {\em et al.}~\cite{Okabe2001a}. In this work the
2668 exchange probability is modified to:
2669 \beq
2670 P(1 \leftrightarrow 2)=\min\left(1,\exp\left[
2671 \left(\frac{1}{k_B T_1} - \frac{1}{k_B T_2}\right)(U_1 - U_2) +
2672 \left(\frac{P_1}{k_B T_1} - \frac{P_2}{k_B T_2}\right)\left(V_1-V_2\right)
2673  \right] \right)
2674 \eeq
2675 where $P_1$ and $P_2$ are the respective reference pressures and $V_1$ and
2676 $V_2$ are the respective instantaneous volumes in the simulations.
2677 In most cases the differences in volume are so small that the second
2678 term is negligible. It only plays a role when the difference between
2679 $P_1$ and $P_2$ is large or in phase transitions.
2680
2681 Hamiltonian replica exchange is also supported in {\gromacs}.  In
2682 Hamiltonian replica exchange, each replica has a different
2683 Hamiltonian, defined by the free energy pathway specified for the simulation.  The
2684 exchange probability to maintain the correct ensemble probabilities is:
2685 \beq P(1 \leftrightarrow 2)=\min\left(1,\exp\left[
2686     \left(\frac{1}{k_B T} - \frac{1}{k_B T}\right)((U_1(x_2) - U_1(x_1)) + (U_2(x_1) - U_2(x_2)))
2687 \right]
2688 \right)
2689 \eeq
2690 The separate Hamiltonians are defined by the free energy functionality
2691 of {\gromacs}, with swaps made between the different values of
2692 $\lambda$ defined in the mdp file.
2693
2694 Hamiltonian and temperature replica exchange can also be performed
2695 simultaneously, using the acceptance criteria:
2696 \beq
2697 P(1 \leftrightarrow 2)=\min\left(1,\exp\left[
2698 \left(\frac{1}{k_B T} - \right)(\frac{U_1(x_2) - U_1(x_1)}{k_B T_1} + \frac{U_2(x_1) - U_2(x_2)}{k_B T_2})
2699  \right] \right)
2700 \eeq
2701
2702 Gibbs sampling replica exchange has also been implemented in
2703 {\gromacs}~\cite{Chodera2011}.  In Gibbs sampling replica exchange, all
2704 possible pairs are tested for exchange, allowing swaps between
2705 replicas that are not neighbors.
2706
2707 Gibbs sampling replica exchange requires no additional potential
2708 energy calculations.  However there is an additional communication
2709 cost in Gibbs sampling replica exchange, as for some permutations,
2710 more than one round of swaps must take place.  In some cases, this
2711 extra communication cost might affect the efficiency.
2712
2713 All replica exchange variants are options of the {\tt mdrun}
2714 program. It will only work when MPI is installed, due to the inherent
2715 parallelism in the algorithm. For efficiency each replica can run on a
2716 separate node.  See the manual page of {\tt mdrun} on how to use these
2717 multinode features.
2718
2719 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2720
2721 \section{Essential Dynamics sampling\index{essential dynamics}\index{principal component analysis}\seeindexquiet{PCA}{covariance analysis}}
2722 The results from Essential Dynamics (see \secref{covanal})
2723 of a protein can be used to guide MD simulations. The idea is that
2724 from an initial MD simulation (or from other sources) a definition of
2725 the collective fluctuations with largest amplitude is obtained. The
2726 position along one or more of these collective modes can be
2727 constrained in a (second) MD simulation in a number of ways for
2728 several purposes. For example, the position along a certain mode may
2729 be kept fixed to monitor the average force (free-energy gradient) on
2730 that coordinate in that position. Another application is to enhance
2731 sampling efficiency with respect to usual MD
2732 \cite{Degroot96a,Degroot96b}. In this case, the system is encouraged
2733 to sample its available configuration space more systematically than
2734 in a diffusion-like path that proteins usually take.
2735
2736 Another possibility to enhance sampling is \normindex{flooding}.
2737 Here a flooding potential is added to certain
2738 (collective) degrees of freedom to expel the system out
2739 of a region of phase space \cite{Lange2006a}.
2740
2741 The procedure for essential dynamics sampling or flooding is as follows.
2742 First, the eigenvectors and eigenvalues need to be determined
2743 using covariance analysis ({\tt g_covar})
2744 or normal-mode analysis ({\tt g_nmeig}).
2745 Then, this information is fed into {\tt make_edi},
2746 which has many options for selecting vectors and setting parameters,
2747 see \appref{progman} for the manual page of {\tt make_edi}.
2748 The generated {\tt edi} input file is then passed to {\tt mdrun}.
2749
2750 % } % Brace matches ifthenelse test for gmxlite
2751
2752 % \ifthenelse{\equal{\gmxlite}{1}}{}{
2753 \section{\normindex{Expanded Ensemble}}
2754
2755 In an expanded ensemble simulation~\cite{Lyubartsev1992}, both the coordinates and the
2756 thermodynamic ensemble are treated as configuration variables that can
2757 be sampled over.  The probability of any given state can be written as:
2758 \beq
2759 P(\vec{x},k) \propto \exp\left(-\beta_k U_k + g_k\right),
2760 \eeq
2761 where $\beta_k = \frac{1}{k_B T_k}$ is the $\beta$ corresponding to the $k$th
2762 thermodynamic state, and $g_k$ is a user-specified weight factor corresponding
2763 to the $k$th state.  This space is therefore a {\em mixed}, {\em generalized}, or {\em
2764   expanded} ensemble which samples from multiple thermodynamic
2765 ensembles simultaneously. $g_k$ is chosen to give a specific weighting
2766 of each subensemble in the expanded ensemble, and can either be fixed,
2767 or determined by an iterative procedure. The set of $g_k$ is
2768 frequently chosen to give each thermodynamic ensemble equal
2769 probability, in which case $g_k$ is equal to the free energy in
2770 non-dimensional units, but they can be set to arbitrary values as
2771 desired.  Several different algorithms can be used to equilibrate
2772 these weights, described in the mdp option listings.
2773 % } % Brace matches ifthenelse test for gmxlite
2774
2775 In {\gromacs}, this space is sampled by alternating sampling in the $k$
2776 and $\vec{x}$ directions.  Sampling in the $\vec{x}$ direction is done
2777 by standard molecular dynamics sampling; sampling between the
2778 different thermodynamics states is done by Monte Carlo, with several
2779 different Monte Carlo moves supported. The $k$ states can be defined
2780 by different temperatures, or choices of the free energy $\lambda$
2781 variable, or both.  Expanded ensemble simulations thus represent a
2782 serialization of the replica exchange formalism, allowing a single
2783 simulation to explore many thermodynamic states.
2784
2785
2786
2787 % This stuff is quite outdated now.
2788 %\ifthenelse{\equal{\gmxlite}{1}}{}{
2789 %\input{gmxpar}
2790 %} % Brace matches ifthenelse test for gmxlite
2791
2792 \section{Parallelization\index{parallelization}}
2793 The CPU time required for a simulation can be reduced by running the simulation
2794 in parallel over more than one processor or processor core.
2795 Ideally one would want to have linear scaling: running on $N$ processors/cores
2796 makes the simulation $N$ times faster. In practice this can only be
2797 achieved for a small number of processors. The scaling will depend
2798 a lot on the algorithms used. Also, different algorithms can have different
2799 restrictions on the interaction ranges between atoms.
2800 In {\gromacs} we have two types of parallelization: particle decomposition
2801 and domain decomposition. Particle decomposition is only useful for
2802 a few special cases. Domain decomposition, which is the default algorithm,
2803 will always be faster and scale better.
2804
2805 \section{Particle decomposition\index{particle decomposition}}
2806 Particle decomposition, also called \index{force decomposition},
2807 is the simplest type of decomposition. At the start of the simulation,
2808 particles are assigned to processors. Then forces between particles
2809 need to be assigned to processors such that the force load is evenly balanced.
2810 This decomposition requires that each processor know the coordinates
2811 of at least half of the particles in the system.
2812 Thus for a high number of processors $N$, about $N \times N/2$ coordinates
2813 need to be communicated. Because of this quadratic relation
2814 particle decomposition does not scale well.
2815
2816 Particle decomposition was the only method available before version 4
2817 of {\gromacs}. Now it is only useful in cases where domain decomposition
2818 does not work, such as systems with long-range bonded interactions,
2819 especially NMR distance or orientation restraints.
2820 With particle decomposition only whole molecules can be assigned to a processor.
2821
2822 \section{Domain decomposition\index{domain decomposition}}
2823 Since most interactions in molecular simulations are local,
2824 domain decomposition is a natural way to decompose the system.
2825 In domain decomposition, a spatial domain is assigned to each processor,
2826 which will then integrate the equations of motion for the particles
2827 that currently reside in its local domain. With domain decomposition,
2828 there are two choices that have to be made: the division of the unit cell
2829 into domains and the assignment of the forces to processors.
2830 Most molecular simulation packages use the half-shell method for assigning
2831 the forces. But there are two methods that always require less communication:
2832 the eighth shell~\cite{Liem1991} and the midpoint~\cite{Shaw2006} method.
2833 {\gromacs} currently uses the eighth shell method, but for certain systems
2834 or hardware architectures it might be advantageous to use the midpoint
2835 method. Therefore, we might implement the midpoint method in the future.
2836 Most of the details of the domain decomposition can be found
2837 in the {\gromacs} 4 paper~\cite{Hess2008b}.
2838
2839 \subsection{Coordinate and force communication}
2840 In the most general case of a triclinic unit cell,
2841 the space in divided with a 1-, 2-, or 3-D grid in parallelepipeds
2842 that we call domain decomposition cells.
2843 Each cell is assigned to a processor.
2844 The system is partitioned over the processors at the beginning
2845 of each MD step in which neighbor searching is performed.
2846 Since the neighbor searching is based on charge groups, charge groups
2847 are also the units for the domain decomposition.
2848 Charge groups are assigned to the cell where their center of geometry resides.
2849 Before the forces can be calculated, the coordinates from some
2850 neighboring cells need to be communicated,
2851 and after the forces are calculated, the forces need to be communicated
2852 in the other direction.
2853 The communication and force assignment is based on zones that
2854 can cover one or multiple cells.
2855 An example of a zone setup is shown in \figref{ddcells}.
2856
2857 \begin{figure}
2858 \centerline{\includegraphics[width=6cm]{plots/dd-cells}}
2859 \caption{
2860 A non-staggered domain decomposition grid of 3$\times$2$\times$2 cells.
2861 Coordinates in zones 1 to 7 are communicated to the corner cell
2862 that has its home particles in zone 0.
2863 $r_c$ is the cut-off radius.
2864 \label{fig:ddcells}
2865 }
2866 \end{figure}
2867
2868 The coordinates are communicated by moving data along the ``negative''
2869 direction in $x$, $y$ or $z$ to the next neighbor. This can be done in one
2870 or multiple pulses. In \figref{ddcells} two pulses in $x$ are required,
2871 then one in $y$ and then one in $z$. The forces are communicated by
2872 reversing this procedure. See the {\gromacs} 4 paper~\cite{Hess2008b}
2873 for details on determining which non-bonded and bonded forces
2874 should be calculated on which node.
2875
2876 \subsection{Dynamic load balancing\swapindexquiet{dynamic}{load balancing}}
2877 When different processors have a different computational load
2878 (load imbalance), all processors will have to wait for the one
2879 that takes the most time. One would like to avoid such a situation.
2880 Load imbalance can occur due to three reasons:
2881 \begin{itemize}
2882 \item inhomogeneous particle distribution
2883 \item inhomogeneous interaction cost distribution (charged/uncharged,
2884   water/non-water due to {\gromacs} water innerloops)
2885 \item statistical fluctuation (only with small particle numbers)
2886 \end{itemize}
2887 So we need a dynamic load balancing algorithm
2888 where the volume of each domain decomposition cell
2889 can be adjusted {\em independently}.
2890 To achieve this, the 2- or 3-D domain decomposition grids need to be
2891 staggered. \figref{ddtric} shows the most general case in 2-D.
2892 Due to the staggering, one might require two distance checks
2893 for deciding if a charge group needs to be communicated:
2894 a non-bonded distance and a bonded distance check.
2895
2896 \begin{figure}
2897 \centerline{\includegraphics[width=7cm]{plots/dd-tric}}
2898 \caption{
2899 The zones to communicate to the processor of zone 0,
2900 see the text for details. $r_c$ and $r_b$ are the non-bonded
2901 and bonded cut-off radii respectively, $d$ is an example
2902 of a distance between following, staggered boundaries of cells.
2903 \label{fig:ddtric}
2904 }
2905 \end{figure}
2906
2907 By default, {\tt mdrun} automatically turns on the dynamic load
2908 balancing during a simulation when the total performance loss
2909 due to the force calculation imbalance is 5\% or more.
2910 {\bf Note} that the reported force load imbalance numbers might be higher,
2911 since the force calculation is only part of work that needs to be done
2912 during an integration step.
2913 The load imbalance is reported in the log file at log output steps
2914 and when the {\tt -v} option is used also on screen.
2915 The average load imbalance and the total performance loss
2916 due to load imbalance are reported at the end of the log file.
2917
2918 There is one important parameter for the dynamic load balancing,
2919 which is the minimum allowed scaling. By default, each dimension
2920 of the domain decomposition cell can scale down by at least
2921 a factor of 0.8. For 3-D domain decomposition this allows cells
2922 to change their volume by about a factor of 0.5, which should allow
2923 for compensation of a load imbalance of 100\%.
2924 The required scaling can be changed with the {\tt -dds} option of {\tt mdrun}.
2925
2926 \subsection{Constraints in parallel\index{constraints}}
2927 \label{subsec:plincs}
2928 Since with domain decomposition parts of molecules can reside
2929 on different processors, bond constraints can cross cell boundaries.
2930 Therefore a parallel constraint algorithm is required.
2931 {\gromacs} uses the \normindex{P-LINCS} algorithm~\cite{Hess2008a},
2932 which is the parallel version of the \normindex{LINCS} algorithm~\cite{Hess97}
2933 % \ifthenelse{\equal{\gmxlite}{1}}
2934 {.}
2935 {(see \ssecref{lincs}).}
2936 The P-LINCS procedure is illustrated in \figref{plincs}.
2937 When molecules cross the cell boundaries, atoms in such molecules
2938 up to ({\tt lincs_order + 1}) bonds away are communicated over the cell boundaries.
2939 Then, the normal LINCS algorithm can be applied to the local bonds
2940 plus the communicated ones. After this procedure, the local bonds
2941 are correctly constrained, even though the extra communicated ones are not.
2942 One coordinate communication step is required for the initial LINCS step
2943 and one for each iteration. Forces do not need to be communicated.
2944
2945 \begin{figure}
2946 \centerline{\includegraphics[width=6cm]{plots/par-lincs2}}
2947 \caption{
2948 Example of the parallel setup of P-LINCS with one molecule
2949 split over three domain decomposition cells, using a matrix
2950 expansion order of 3.
2951 The top part shows which atom coordinates need to be communicated
2952 to which cells. The bottom parts show the local constraints (solid)
2953 and the non-local constraints (dashed) for each of the three cells.
2954 \label{fig:plincs}
2955 }
2956 \end{figure}
2957
2958 \subsection{Interaction ranges}
2959 Domain decomposition takes advantage of the locality of interactions.
2960 This means that there will be limitations on the range of interactions.
2961 By default, {\tt mdrun} tries to find the optimal balance between
2962 interaction range and efficiency. But it can happen that a simulation
2963 stops with an error message about missing interactions,
2964 or that a simulation might run slightly faster with shorter
2965 interaction ranges. A list of interaction ranges
2966 and their default values is given in \tabref{dd_ranges}.
2967
2968 \begin{table}
2969 \centerline{
2970 \begin{tabular}{|c|c|ll|}
2971 \dline
2972 interaction & range & option & default \\
2973 \dline
2974 non-bonded        & $r_c$ = max($r_{list}$,$r_{VdW}$,$r_{Coul}$) & {\tt mdp} file & \\
2975 two-body bonded   & max($r_{mb}$,$r_c$) & {\tt mdrun -rdd} & starting conf. + 10\% \\
2976 multi-body bonded & $r_{mb}$ & {\tt mdrun -rdd} & starting conf. + 10\% \\
2977 constraints       & $r_{con}$ & {\tt mdrun -rcon} & est. from bond lengths \\
2978 virtual sites     & $r_{con}$ & {\tt mdrun -rcon} & 0 \\
2979 \dline
2980 \end{tabular}
2981 }
2982 \caption{The interaction ranges with domain decomposition.}
2983 \label{tab:dd_ranges}
2984 \end{table}
2985
2986 In most cases the defaults of {\tt mdrun} should not cause the simulation
2987 to stop with an error message of missing interactions.
2988 The range for the bonded interactions is determined from the distance
2989 between bonded charge-groups in the starting configuration, with 10\% added
2990 for headroom. For the constraints, the value of $r_{con}$ is determined by
2991 taking the maximum distance that ({\tt lincs_order + 1}) bonds can cover
2992 when they all connect at angles of 120 degrees.
2993 The actual constraint communication is not limited by $r_{con}$,
2994 but by the minimum cell size $L_C$, which has the following lower limit:
2995 \beq
2996 L_C \geq \max(r_{mb},r_{con})
2997 \eeq
2998 Without dynamic load balancing the system is actually allowed to scale
2999 beyond this limit when pressure scaling is used.
3000 {\bf Note} that for triclinic boxes, $L_C$ is not simply the box diagonal
3001 component divided by the number of cells in that direction,
3002 rather it is the shortest distance between the triclinic cells borders.
3003 For rhombic dodecahedra this is a factor of $\sqrt{3/2}$ shorter
3004 along $x$ and $y$.
3005
3006 When $r_{mb} > r_c$, {\tt mdrun} employs a smart algorithm to reduce
3007 the communication. Simply communicating all charge groups within
3008 $r_{mb}$ would increase the amount of communication enormously.
3009 Therefore only charge-groups that are connected by bonded interactions
3010 to charge groups which are not locally present are communicated.
3011 This leads to little extra communication, but also to a slightly
3012 increased cost for the domain decomposition setup.
3013 In some cases, {\eg} coarse-grained simulations with a very short cut-off,
3014 one might want to set $r_{mb}$ by hand to reduce this cost.
3015
3016 \subsection{Multiple-Program, Multiple-Data PME parallelization\index{PME}}
3017 \label{subsec:mpmd_pme}
3018 Electrostatics interactions are long-range, therefore special
3019 algorithms are used to avoid summation over many atom pairs.
3020 In {\gromacs} this is usually
3021 % \ifthenelse{\equal{\gmxlite}{1}}
3022 {.}
3023 {PME (\secref{pme}).}
3024 Since with PME all particles interact with each other, global communication
3025 is required. This will usually be the limiting factor for
3026 scaling with domain decomposition.
3027 To reduce the effect of this problem, we have come up with
3028 a Multiple-Program, Multiple-Data approach~\cite{Hess2008b}.
3029 Here, some processors are selected to do only the PME mesh calculation,
3030 while the other processors, called particle-particle (PP) nodes,
3031 do all the rest of the work.
3032 For rectangular boxes the optimal PP to PME node ratio is usually 3:1,
3033 for rhombic dodecahedra usually 2:1.
3034 When the number of PME nodes is reduced by a factor of 4, the number
3035 of communication calls is reduced by about a factor of 16.
3036 Or put differently, we can now scale to 4 times more nodes.
3037 In addition, for modern 4 or 8 core machines in a network,
3038 the effective network bandwidth for PME is quadrupled,
3039 since only a quarter of the cores will be using the network connection
3040 on each machine during the PME calculations.
3041
3042 \begin{figure}
3043 \centerline{\includegraphics[width=12cm]{plots/mpmd-pme}}
3044 \caption{
3045 Example of 8 nodes without (left) and with (right) MPMD.
3046 The PME communication (red arrows) is much higher on the left
3047 than on the right. For MPMD additional PP - PME coordinate
3048 and force communication (blue arrows) is required,
3049 but the total communication complexity is lower.
3050 \label{fig:mpmd_pme}
3051 }
3052 \end{figure}
3053
3054 {\tt mdrun} will by default interleave the PP and PME nodes.
3055 If the processors are not number consecutively inside the machines,
3056 one might want to use {\tt mdrun -ddorder pp_pme}.
3057 For machines with a real 3-D torus and proper communication software
3058 that assigns the processors accordingly one should use
3059 {\tt mdrun -ddorder cartesian}.
3060
3061 To optimize the performance one should usually set up the cut-offs
3062 and the PME grid such that the PME load is 25 to 33\% of the total
3063 calculation load. {\tt grompp} will print an estimate for this load
3064 at the end and also {\tt mdrun} calculates the same estimate
3065 to determine the optimal number of PME nodes to use.
3066 For high parallelization it might be worthwhile to optimize
3067 the PME load with the {\tt mdp} settings and/or the number
3068 of PME nodes with the {\tt -npme} option of {\tt mdrun}.
3069 For changing the electrostatics settings it is useful to know
3070 the accuracy of the electrostatics remains nearly constant
3071 when the Coulomb cut-off and the PME grid spacing are scaled
3072 by the same factor.
3073 {\bf Note} that it is usually better to overestimate than to underestimate
3074 the number of PME nodes, since the number of PME nodes is smaller
3075 than the number of PP nodes, which leads to less total waiting time.
3076
3077 The PME domain decomposition can be 1-D or 2-D along the $x$ and/or
3078 $y$ axis. 2-D decomposition is also known as \normindex{pencil decomposition} because of
3079 the shape of the domains at high parallelization.
3080 1-D decomposition along the $y$ axis can only be used when
3081 the PP decomposition has only 1 domain along $x$. 2-D PME decomposition
3082 has to have the number of domains along $x$ equal to the number of
3083 the PP decomposition. {\tt mdrun} automatically chooses 1-D or 2-D
3084 PME decomposition (when possible with the total given number of nodes),
3085 based on the minimum amount of communication for the coordinate redistribution
3086 in PME plus the communication for the grid overlap and transposes.
3087 To avoid superfluous communication of coordinates and forces
3088 between the PP and PME nodes, the number of DD cells in the $x$
3089 direction should ideally be the same or a multiple of the number
3090 of PME nodes. By default, {\tt mdrun} takes care of this issue.
3091
3092 \subsection{Domain decomposition flow chart}
3093 In \figref{dd_flow} a flow chart is shown for domain decomposition
3094 with all possible communication for different algorithms.
3095 For simpler simulations, the same flow chart applies,
3096 without the algorithms and communication for
3097 the algorithms that are not used.
3098
3099 \begin{figure}
3100 \centerline{\includegraphics[width=12cm]{plots/flowchart}}
3101 \caption{
3102 Flow chart showing the algorithms and communication (arrows)
3103 for a standard MD simulation with virtual sites, constraints
3104 and separate PME-mesh nodes.
3105 \label{fig:dd_flow}
3106 }
3107 \end{figure}
3108
3109
3110 \section{Implicit solvation\index{implicit solvation}\index{Generalized Born methods}}
3111 \label{sec:gbsa}
3112 Implicit solvent models provide an efficient way of representing
3113 the electrostatic effects of solvent molecules, while saving a
3114 large piece of the computations involved in an accurate, aqueous
3115 description of the surrounding water in molecular dynamics simulations.
3116 Implicit solvation models offer several advantages compared with
3117 explicit solvation, including eliminating the need for the equilibration of water
3118 around the solute, and the absence of viscosity, which allows the protein
3119 to more quickly explore conformational space.
3120
3121 Implicit solvent calculations in {\gromacs} can be done using the
3122 generalized Born-formalism, and the Still~\cite{Still97}, HCT~\cite{Truhlar96},
3123 and OBC~\cite{Case04} models are available for calculating the Born radii.
3124
3125 Here, the free energy $G_{solv}$ of solvation is the sum of three terms,
3126 a solvent-solvent cavity term ($G_{cav}$), a solute-solvent van der
3127 Waals term ($G_{vdw}$), and finally a solvent-solute electrostatics
3128 polarization term ($G_{pol}$).
3129
3130 The sum of $G_{cav}$ and $G_{vdw}$ corresponds to the (non-polar)
3131 free energy of solvation for a molecule from which all charges
3132 have been removed, and is commonly called $G_{np}$,
3133 calculated from the total solvent accessible surface area
3134 multiplied with a surface tension.
3135 The total expression for the solvation free energy then becomes:
3136
3137 \beq
3138 G_{solv} = G_{np}  + G_{pol}
3139 \label{eqn:gb_solv}
3140 \eeq
3141
3142 Under the generalized Born model, $G_{pol}$ is calculated from the generalized Born equation~\cite{Still97}:
3143
3144 \beq
3145 G_{pol} = \left(1-\frac{1}{\epsilon}\right) \sum_{i=1}^n \sum_{j>i}^n \frac {q_i q_j}{\sqrt{r^2_{ij} + b_i b_j \exp\left(\frac{-r^2_{ij}}{4 b_i b_j}\right)}}
3146 \label{eqn:gb_still}
3147 \eeq
3148
3149 In {\gromacs}, we have introduced the substitution~\cite{Larsson10}:
3150
3151 \beq
3152 c_i=\frac{1}{\sqrt{b_i}}
3153 \label{eqn:gb_subst}
3154 \eeq
3155
3156 which makes it possible to introduce a cheap transformation to a new
3157 variable $x$ when evaluating each interaction, such that:
3158
3159 \beq
3160 x=\frac{r_{ij}}{\sqrt{b_i b_j }} = r_{ij} c_i c_j
3161 \label{eqn:gb_subst2}
3162 \eeq
3163
3164 In the end, the full re-formulation of~\ref{eqn:gb_still} becomes:
3165
3166 \beq
3167 G_{pol} = \left(1-\frac{1}{\epsilon}\right) \sum_{i=1}^n \sum_{j>i}^n \frac{q_i q_j}{\sqrt{b_i  b_j}} ~\xi (x) = \left(1-\frac{1}{\epsilon}\right) \sum_{i=1}^n q_i c_i \sum_{j>i}^n q_j c_j~\xi (x)
3168 \label{eqn:gb_final}
3169 \eeq
3170
3171 The non-polar part ($G_{np}$) of Equation~\ref{eqn:gb_solv} is calculated
3172 directly from the Born radius of each atom using a simple ACE type
3173 approximation by Schaefer {\em et al.}~\cite{Karplus98}, including a
3174 simple loop over all atoms.
3175 This requires only one extra solvation parameter, independent of atom type,
3176 but differing slightly between the three Born radii models.
3177
3178 % LocalWords:  GROningen MAchine BIOSON Groningen GROMACS Berendsen der Spoel
3179 % LocalWords:  Drunen Comp Phys Comm ROck NS FFT pbc EM ifthenelse gmxlite ff
3180 % LocalWords:  octahedra triclinic Ewald PME PPPM trjconv xy solvated
3181 % LocalWords:  boxtypes boxshapes editconf Lennard mdpopt COM XTC kT defunits
3182 % LocalWords:  Boltzmann's Mueller nb int mdrun chargegroup simplerc prefactor
3183 % LocalWords:  pme waterloops CH NH CO df com virial integrator Verlet vverlet
3184 % LocalWords:  integrators ref timepoint timestep timesteps mdp md vv avek NVE
3185 % LocalWords:  NVT off's leapfrogv lll LR rmfast SPC fs Nos physicality ps GMX
3186 % LocalWords:  Tcoupling nonergodic thermostatting NOSEHOOVER algorithmes ij yx
3187 % LocalWords:  Parrinello Rahman rescales atm anisotropically ccc xz zx yy yz
3188 % LocalWords:  zy zz se barostat compressibilities MTTK NPT Martyna al isobaric
3189 % LocalWords:  Tuckerman vir PV fkT iLt iL Liouville NHC Eq baro mu trj mol bc
3190 % LocalWords:  freezegroup Shannon's polarizability Overhauser barostats iLn KE
3191 % LocalWords:  negligibly thermostatted Tobias  rhombic maxwell et xtc TC rlist
3192 % LocalWords:  waals LINCS holonomic plincs lincs unc ang SA Langevin SD amu BD
3193 % LocalWords:  bfgs Broyden Goldfarb Shanno mkT kJ DFLEXIBLE Nocedal diag nmeig
3194 % LocalWords:  diagonalization anaeig nmens covanal ddg feia BT dp dq pV dV dA
3195 % LocalWords:  NpT eq stepsize REMD constrainted website Okabe MPI covar edi dd
3196 % LocalWords:  progman NMR ddcells innerloops ddtric tric dds rdd conf rcon est
3197 % LocalWords:  mb PP MPMD ddorder pp cartesian grompp npme parallelizable edr
3198 % LocalWords:  macromolecule nstlist vacuo parallelization dof indices MBAR AVX
3199 % LocalWords:  TOL numerics parallelized eigenvectors dG parallelepipeds VdW np
3200 % LocalWords:  Coul multi solvation HCT OBC solv cav vdw Schaefer symplectic dt
3201 % LocalWords:  pymbar multinode subensemble Monte solute subst groupconcept GPU
3202 % LocalWords:  dodecahedron octahedron dodecahedra equilibration usinggroups nm
3203 % LocalWords:  topologies rlistlong CUDA GPUs rcoulomb SIMD BlueGene FPUs erfc
3204 % LocalWords:  cutoffschemesupport unbuffered bondeds AdResS OpenMP ewald rtol
3205 % LocalWords:  verletdrift peptide RMS rescaling ergodicity ergodic discretized
3206 % LocalWords:  isothermal compressibility isotropically anisotropic iteratively
3207 % LocalWords:  incompressible integrations translational biomolecules NMA PCA
3208 % LocalWords:  Bennett's equilibrated Hamiltonians covariance equilibrate
3209 % LocalWords:  inhomogeneous conformational online other's th