implement.tex

   1 % This file is part of the GROMACS molecular simulation package.
   2 %
   3 % Copyright (c) 2013, by the GROMACS development team, led by
   4 % David van der Spoel, Berk Hess, Erik Lindahl, and including many
   5 % others, as listed in the AUTHORS file in the top-level source
   6 % directory and at http://www.gromacs.org.
   7 %
   8 % GROMACS is free software; you can redistribute it and/or
   9 % modify it under the terms of the GNU Lesser General Public License
  10 % as published by the Free Software Foundation; either version 2.1
  11 % of the License, or (at your option) any later version.
  12 %
  13 % GROMACS is distributed in the hope that it will be useful,
  14 % but WITHOUT ANY WARRANTY; without even the implied warranty of
  15 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
  16 % Lesser General Public License for more details.
  17 %
  18 % You should have received a copy of the GNU Lesser General Public
  19 % License along with GROMACS; if not, see
  20 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
  21 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA.
  22 %
  23 % If you want to redistribute modifications to GROMACS, please
  24 % consider that scientific software is very special. Version
  25 % control is crucial - bugs must be traceable. We will be happy to
  26 % consider code for inclusion in the official distribution, but
  27 % derived work must not be called official GROMACS. Details are found
  28 % in the README & COPYING files - if they are missing, get the
  29 % official version at http://www.gromacs.org.
  30 %
  31 % To help us fund GROMACS development, we humbly ask that you cite
  32 % the research papers on the package. Check out http://www.gromacs.org
  33
  34 \chapter{Some implementation details}
  35 In this chapter we will present some implementation details. This is
  36 far from complete, but we deemed it necessary to clarify some things
  37 that would otherwise be hard to understand.
  38
  39 \section{Single Sum Virial in {\gromacs}}
  40 \label{sec:virial}
  41 The \normindex{virial} $\Xi$ can be written in full tensor form as:
  42 \beq
  43 \Xi~=~-\half~\sum_{i < j}^N~\rvij\otimes\Fvij
  44 \eeq
  45 where $\otimes$ denotes the {\em direct product} of two vectors.\footnote
  46 {$({\bf u}\otimes{\bf v})^{\ab}~=~{\bf u}_{\al}{\bf v}_{\be}$} When this is
  47 computed in the inner loop of an MD program 9 multiplications and 9
  48 additions are needed.\footnote{The calculation of
  49 Lennard-Jones and Coulomb forces is about 50 floating point operations.}
  50
  51 Here it is shown how it is possible to extract the virial calculation
  52 from the inner loop~\cite{Bekker93b}.
  53
  54 \subsection{Virial}
  55 In a system with \index{periodic boundary conditions}, the
  56 periodicity must be taken into account for the virial:
  57 \beq
  58 \Xi~=~-\half~\sum_{i < j}^{N}~\rnij\otimes\Fvij
  59 \eeq
  60 where $\rnij$ denotes the distance vector of the
  61 {\em nearest image} of atom $i$ from atom $j$. In this definition we add
  62 a {\em shift vector} $\delta_i$ to the position vector $\rvi$
  63 of atom $i$. The difference vector $\rnij$ is thus equal to:
  64 \beq
  65 \rnij~=~\rvi+\delta_i-\rvj
  66 \eeq
  67 or in shorthand:
  68 \beq
  69 \rnij~=~\rni-\rvj
  70 \eeq
  71 In a triclinic system, there are 27 possible images of $i$; when a truncated
  72 octahedron is used, there are 15 possible images.
  73
  74 \subsection{Virial from non-bonded forces}
  75 Here the derivation for the single sum virial in the {\em non-bonded force}
  76 routine is given. $i \neq j$ in all formulae below.
  77 \newcommand{\di}{\delta_{i}}
  78 \newcommand{\qrt}{\frac{1}{4}}
  79 \bea
  80 \Xi
  81 &~=~&-\half~\sum_{i < j}^{N}~\rnij\otimes\Fvij                          \\
  82 &~=~&-\qrt\sum_{i=1}^N~\sum_{j=1}^N ~(\rvi+\di-\rvj)\otimes\Fvij        \\
  83 &~=~&-\qrt\sum_{i=1}^N~\sum_{j=1}^N ~(\rvi+\di)\otimes\Fvij-\rvj\otimes\Fvij    \\
  84 &~=~&-\qrt\left(\sum_{i=1}^N~\sum_{j=1}^N ~(\rvi+\di)\otimes\Fvij~-~\sum_{i=1}^N~\sum_{j=1}^N ~\rvj\otimes\Fvij\right)  \\
  85 &~=~&-\qrt\left(\sum_{i=1}^N~(\rvi+\di)\otimes\sum_{j=1}^N~\Fvij~-~\sum_{j=1}^N ~\rvj\otimes\sum_{i=1}^N~\Fvij\right)   \\
  86 &~=~&-\qrt\left(\sum_{i=1}^N~(\rvi+\di)\otimes\Fvi~+~\sum_{j=1}^N ~\rvj\otimes\Fvj\right)       \\
  87 &~=~&-\qrt\left(2~\sum_{i=1}^N~\rvi\otimes\Fvi+\sum_{i=1}^N~\di\otimes\Fvi\right)
  88 \eea
  89 In these formulae we introduced:
  90 \bea
  91 \Fvi&~=~&\sum_{j=1}^N~\Fvij                                     \\
  92 \Fvj&~=~&\sum_{i=1}^N~\Fvji
  93 \eea
  94 which is the total force on $i$ with respect to $j$. Because we use Newton's Third Law:
  95 \beq
  96 \Fvij~=~-\Fvji
  97 \eeq
  98 we must, in the implementation, double the term containing the shift $\delta_i$.
  99
 100 \subsection{The intra-molecular shift (mol-shift)}
 101 For the bonded forces and SHAKE it is possible to make a {\em mol-shift}
 102 list, in which the periodicity is stored. We simple have an array {\tt mshift}
 103 in which for each atom an index in the {\tt shiftvec} array is stored.
 104
 105 The algorithm to generate such a list can be derived from graph theory,
 106 considering each particle in a molecule as a bead in a graph, the bonds
 107 as edges.
 108 \begin{enumerate}
 109 \item[1]        Represent the bonds and atoms as bidirectional graph
 110 \item[2]        Make all atoms white
 111 \item[3]        Make one of the white atoms black (atom $i$) and put it in the
 112                 central box
 113 \item[4]        Make all of the neighbors of $i$ that are currently
 114                 white, gray
 115 \item[5]        Pick one of the gray atoms (atom $j$), give it the
 116                 correct periodicity with respect to any of
 117                 its black neighbors
 118                 and make it black
 119 \item[6]        Make all of the neighbors of $j$ that are currently
 120                 white, gray
 121 \item[7]        If any gray atom remains, go to [5]
 122 \item[8]        If any white atom remains, go to [3]
 123 \end{enumerate}
 124 Using this algorithm we can
 125 \begin{itemize}
 126 \item   optimize the bonded force calculation as well as SHAKE
 127 \item   calculate the virial from the bonded forces
 128         in the single sum method again
 129 \end{itemize}
 130
 131 Find a representation of the bonds as a bidirectional graph.
 132
 133 \subsection{Virial from Covalent Bonds}
 134 Since the covalent bond force gives a contribution to the virial, we have:
 135 \bea
 136 b       &~=~&   \|\rnij\|                                       \\
 137 V_b     &~=~&   \half k_b(b-b_0)^2                              \\
 138 \Fvi    &~=~&   -\nabla V_b                                     \\
 139         &~=~&   k_b(b-b_0)\frac{\rnij}{b}                       \\
 140 \Fvj    &~=~&   -\Fvi
 141 \eea
 142 The virial contribution from the bonds then is:
 143 \bea
 144 \Xi_b   &~=~&   -\half(\rni\otimes\Fvi~+~\rvj\otimes\Fvj)       \\
 145         &~=~&   -\half\rnij\otimes\Fvi
 146 \eea
 147
 148 \subsection{Virial from SHAKE}
 149 An important contribution to the virial comes from shake. Satisfying
 150 the constraints a force {\bf G} that is exerted on the particles ``shaken.'' If this
 151 force does not come out of the algorithm (as in standard SHAKE) it can be
 152 calculated afterward (when using {\em leap-frog}) by:
 153 \bea
 154 \Delta\rvi&~=~&\rvi(t+\Dt)-
 155 [\rvi(t)+{\bf v}_i(t-\frac{\Dt}{2})\Dt+\frac{\Fvi}{m_i}\Dt^2]   \\
 156 {\bf G}_i&~=~&\frac{m_i\Delta\rvi}{\Dt^2}
 157 \eea
 158 This does not help us in the general case. Only when no periodicity
 159 is needed (like in rigid water) this can be used, otherwise
 160 we must add the virial calculation in the inner loop of SHAKE.
 161
 162 When it {\em is} applicable the virial can be calculated in the single sum way:
 163 \beq
 164 \Xi~=~-\half\sum_i^{N_c}~\rvi\otimes\Fvi
 165 \eeq
 166 where $N_c$ is the number of constrained atoms.
 167
 168 %Another method is the Non-Iterative shake as proposed (and implemented)
 169 %by Yoneya. In this algorithm the Lagrangian multipliers are solved in a
 170 %matrix equation, and given these multipliers it is easy to get the periodicity
 171 %in the virial afterwards.
 172
 173 %More...
 174
 175
 176 \section{Optimizations}
 177 Here we describe some of the algorithmic optimizations used
 178 in {\gromacs}, apart from parallelism.
 179 One of these, the implementation of the
 180 1.0/sqrt(x) function is treated separately in \secref{sqrt}.
 181 The most important other optimizations are described below.
 182
 183 \subsection{Inner Loops for Water}
 184 \label{sec:waterloops}
 185 {\gromacs} uses special inner loops to calculate non-bonded
 186 interactions for water molecules with other atoms, and yet
 187 another set of loops for interactions between pairs of
 188 water molecules. There highly optimized loops for two types of water models.
 189 For three site models similar to
 190 SPC~\cite{Berendsen81}, {\ie}:
 191 \begin{enumerate}
 192 \item   There are three atoms in the molecule.
 193 \item   The whole molecule is a single charge group.
 194 \item   The first atom has Lennard-Jones (\secref{lj}) and
 195         Coulomb (\secref{coul}) interactions.
 196 \item   Atoms two and three have only Coulomb interactions,
 197         and equal charges.
 198 \end{enumerate}
 199 These loops also works for the SPC/E~\cite{Berendsen87} and
 200 TIP3P~\cite{Jorgensen83} water models.
 201 And for four site water models similar to TIP4P~\cite{Jorgensen83}:
 202 \begin{enumerate}
 203 \item   There are four atoms in the molecule.
 204 \item   The whole molecule is a single charge group.
 205 \item   The first atom has only Lennard-Jones (\secref{lj}) interactions.
 206 \item   Atoms two and three have only Coulomb (\secref{coul}) interactions,
 207         and equal charges.
 208 \item   Atom four has only Coulomb interactions.
 209 \end{enumerate}
 210
 211 The benefit of these implementations is that there are more floating-point
 212 operations in a single loop, which implies that some compilers
 213 can schedule the code better. However, it turns out that even
 214 some of the most advanced compilers have problems with scheduling,
 215 implying that manual tweaking is necessary to get optimum
 216 \normindex{performance}.
 217 This may include common-sub-expression elimination, or moving
 218 code around.
 219
 220 \subsection{Fortran Code}
 221 Unfortunately, on a few platforms \normindex{Fortran} compilers are
 222 still better than C-compilers. For some machines ({\eg} SGI
 223 Power Challenge) the difference may be up to a factor of 3, in the
 224 case of vector computers this may be even larger. Therefore, some of
 225 the routines that take up a lot of computer time have been translated
 226 into Fortran and even assembly code for Intel and AMD x86 processors.
 227 In most cases, the Fortran or assembly loops should be selected
 228 automatically by the {\tt configure} script when appropriate, but you can
 229 also tweak this by setting options to the {\tt configure} script.
 230
 231 \section{Computation of the 1.0/sqrt function}
 232 \label{sec:sqrt}
 233 \subsection{Introduction}
 234 The {\gromacs} project started with the development of a $1/\sqrt{x}$
 235 processor that calculates:
 236 \begin{equation}
 237 Y(x) = \frac{1}{ \sqrt{x} }
 238 \end{equation}
 239 As the project continued, the {\intel} processor was used to implement
 240 {\gromacs}, which now turned into almost a full software project.  The
 241 $1/\sqrt{x}$ processor was implemented using a Newton-Raphson
 242 iteration scheme for one step. For this it needed look-up tables to
 243 provide the initial approximation. The $1/\sqrt{x}$ function makes it
 244 possible to use two almost independent tables for the exponent seed
 245 and the fraction seed with the IEEE floating-point representation.
 246
 247 \subsection{General}
 248 According to~\cite{Bekker87} the $1/\sqrt{x}$ function can be evaluated using
 249 the Newton-Raphson iteration scheme. The inverse function is:
 250 \begin{equation}
 251 X(y) = \frac{1}{y^{2}}
 252 \end{equation}
 253 So instead of calculating:
 254 \begin{equation}
 255 Y(a) = q
 256 \end{equation}
 257 the equation:
 258 \begin{equation}
 259 X(q) - a = 0
 260 \label{eqn:inversef}
 261 \end{equation}
 262 can now be solved using Newton-Raphson. An iteration is performed by
 263 calculating:
 264 \begin{equation}
 265 y_{n+1} = y_{n} - \frac{f(y_{n})}{f'(y_{n})}
 266 \label{eqn:nr}
 267 \end{equation}
 268 The absolute error $\varepsilon$, in this approximation is defined by:
 269 \begin{equation}
 270 \varepsilon \equiv y_{n} - q
 271 \end{equation}
 272 Using Taylor series expansion to estimate the error results in:
 273 \begin{equation}
 274 \varepsilon _{n+1} = - \frac{\varepsilon _{n}^{2}}{2}
 275                        \frac{ f''(y_{n})}{f'(y_{n})}
 276 \label{eqn:taylor}
 277 \end{equation}
 278 according to~\cite{Bekker87} equation (3.2). This is an estimation of the
 279 absolute error.
 280
 281 \subsection{Applied to floating-point numbers}
 282 Floating-point numbers in IEEE 32 bit single-precision format have a nearly
 283 constant relative error of $\Delta x / x = 2^{-24}$. As seen earlier in the
 284 Taylor series expansion equation (\eqnref{taylor}), the error in every
 285 iteration step is absolute and in general dependent of $y$. If the error is
 286 expressed as a relative error $\varepsilon_{r}$ the following holds:
 287 \begin{equation}
 288 \varepsilon _{{r}_{n+1}} \equiv \frac{\varepsilon_{n+1}}{y}
 289 \end{equation}
 290 and so:
 291 \begin{equation}
 292 \varepsilon _{{r}_{n+1}} =
 293 - ( \frac{\varepsilon _{n}}{y} )^{2} y \frac{ f''}{2f'}
 294 \end{equation}
 295 For the function $f(y) = y^{-2}$ the term $y f''/2f'$ is constant (equal
 296 to $-3/2$) so the relative error $\varepsilon _{r_{n}}$ is independent of $y$.
 297 \begin{equation}
 298 \varepsilon _{{r}_{n+1}} =
 299 \frac{3}{2} (\varepsilon_{r_{n}})^{2}
 300 \label{eqn:epsr}
 301 \end{equation}
 302
 303 The conclusion of this is that the function $1/\sqrt{x}$ can be
 304 calculated with a specified accuracy.
 305
 306 \begin{figure}
 307 \begin{center}
 308 \newcommand{\twltt}{\tt}
 309 \setlength{\unitlength}{0.0080in}
 310 \begin{picture}(489,176)(40,390)
 311 \thicklines
 312 \put(180,505){$\underbrace{\hspace{2.68in}}$}
 313 \put( 60,505){$\underbrace{\hspace{0.88in}}$}
 314 \put( 40,510){\framebox(480,30){}}
 315 \put( 45,505){\vector( 0,-1){ 15}}
 316 \put( 40,540){\dashbox{4}(0,0){}}
 317 \multiput(250,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 318 \multiput(220,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 319 \multiput(190,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 320 \multiput(280,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 321 \multiput(310,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 322 \multiput(340,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 323 \multiput(370,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 324 \multiput(400,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 325 \multiput(430,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 326 \multiput(460,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 327 \multiput(490,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 328 \multiput(205,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 329 \multiput(235,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 330 \multiput(265,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 331 \multiput(295,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 332 \multiput(325,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 333 \multiput(355,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 334 \multiput(385,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 335 \multiput(415,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 336 \multiput(445,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 337 \multiput(475,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 338 \multiput(505,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 339 \put( 40,510){\framebox(480,30){}}
 340 \put( 40,540){\dashbox{4}(0,0){}}
 341 \multiput(250,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 342 \multiput(220,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 343 \multiput(190,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 344 \multiput(160,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 345 \multiput(130,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 346 \multiput(100,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 347 \multiput(280,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 348 \multiput(310,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 349 \multiput(340,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 350 \multiput(370,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 351 \multiput(400,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 352 \multiput(430,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 353 \multiput(460,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 354 \multiput(490,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 355 \multiput( 70,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 356 \put( 55,540){\line( 0,-1){ 30}}
 357 \multiput( 85,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 358 \multiput(115,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 359 \multiput(145,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 360 \multiput(205,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 361 \multiput(235,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 362 \multiput(265,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 363 \multiput(295,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 364 \multiput(325,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 365 \multiput(355,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 366 \multiput(385,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 367 \multiput(415,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 368 \multiput(445,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 369 \multiput(475,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 370 \multiput(505,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 371 \put(175,540){\line( 0,-1){ 30}}
 372 \put(175,540){\line( 0,-1){ 30}}
 373 \multiput(145,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 374 \multiput(115,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 375 \multiput( 85,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 376 \put( 55,540){\line( 0,-1){ 30}}
 377 \multiput( 70,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 378 \multiput(100,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 379 \multiput(130,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 380 \multiput(160,540)(0.00000,-8.57143){4}{\line( 0,-1){  4.286}}
 381 \put(345,470){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $F$}}}
 382 \put(110,470){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $E$}}}
 383 \put( 40,470){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $S$}}}
 384 \put(505,545){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $0$}}}
 385 \put(160,545){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $23$}}}
 386 \put( 40,545){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $31$}}}
 387 \put(140,400){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $Value=(-1)^{S}(2^{E-127})(1.F)$}}}
 388 \put(505,545){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $0$}}}
 389 \put(160,545){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $23$}}}
 390 \put( 40,545){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $31$}}}
 391 \put(140,400){\makebox(0,0)[lb]{\raisebox{0pt}[0pt][0pt]{\twltt $Value=(-1)^{S}(2^{E-127})(1.F)$}}}
 392 \end{picture}
 393 \end{center}
 394 \caption{IEEE single-precision floating-point format}
 395 \label{fig:ieee}
 396 \end{figure}
 397
 398 \subsection{Specification of the look-up table}
 399 To calculate the function $1/\sqrt{x}$ using the previously mentioned
 400 iteration scheme, it is clear that the first estimation of the solution must
 401 be accurate enough to get precise results. The requirements for the
 402 calculation are
 403 \begin{itemize}
 404 \item Maximum possible accuracy with the used IEEE format
 405 \item Use only one iteration step for maximum speed
 406 \end{itemize}
 407
 408 The first requirement states that the result of $1/\sqrt{x}$ may have a
 409 relative error $\varepsilon_{r}$ equal to the $\varepsilon_{r}$ of a IEEE 32
 410 bit single-precision floating-point number. From this, the $1/\sqrt{x}$
 411 of the initial approximation can be derived, rewriting the definition of
 412 the relative error for succeeding steps (\eqnref{epsr}):
 413 \begin{equation}
 414 \frac{\varepsilon_{n}}{y} =
 415 \sqrt{\varepsilon_ {r_{n+1}} \frac{2f'}{yf''}}
 416 \end{equation}
 417 So for the look-up table the needed accuracy is:
 418 \begin{equation}
 419 \frac{\Delta Y}{Y} = \sqrt{\frac{2}{3} 2^{-24}}
 420 \label{eqn:accy}
 421 \end{equation}
 422 which defines the width of the table that must be $\geq 13$ bit.
 423
 424 At this point the relative error, $\varepsilon_{r_{n}}$, of the look-up table
 425 is known. From this the maximum relative error in the argument can be
 426 calculated as follows. The absolute error $\Delta x$ is defined as:
 427 \begin{equation}
 428 \Delta x \equiv \frac{\Delta Y}{Y'}
 429 \end{equation}
 430 and thus:
 431 \begin{equation}
 432 \frac{\Delta x}{Y} = \frac{\Delta Y}{Y} (Y')^{-1}
 433 \end{equation}
 434 and thus:
 435 \begin{equation}
 436 \Delta x = constant \frac{Y}{Y'}
 437 \end{equation}
 438 For the $1/\sqrt{x}$ function, $Y / Y' \sim x$ holds, so
 439 $\Delta x / x = constant$. This is a property of the used floating-point
 440 representation as earlier mentioned. The needed accuracy of the argument of the
 441 look-up table follows from:
 442 \begin{equation}
 443 \frac{\Delta x}{x} = -2 \frac{\Delta Y}{Y}
 444 \end{equation}
 445 So, using the floating-point accuracy (\eqnref{accy}):
 446 \begin{equation}
 447 \frac{\Delta x}{x} = -2 \sqrt{\frac{2}{3} 2^{-24}}
 448 \end{equation}
 449 This defines the length of the look-up table which should be $\geq 12$ bit.
 450
 451 \subsection{Separate exponent and fraction computation}
 452 The used IEEE 32 bit single-precision floating-point format specifies
 453 that a number is represented by a exponent and a fraction. The previous
 454 section specifies for every possible floating-point number the look-up table
 455 length and width. Only the size of the fraction of a floating-point number
 456 defines the accuracy. The conclusion from this can be that the size of the
 457 look-up table is length of look-up table, earlier specified, times the size of
 458 the exponent ($2^{12}2^{8}, 1Mb$). The $1/\sqrt{x}$  function has the
 459 property that the exponent is independent of the fraction. This becomes clear
 460 if the floating-point representation is used. Define:
 461 \begin{equation}
 462 x \equiv (-1)^{S}(2^{E-127})(1.F)
 463 \label{eqn:fpdef}
 464 \end{equation}
 465 See \figref{ieee}, where $0 \leq S \leq 1$, $0 \leq E \leq 255$,
 466 $1 \leq 1.F < 2$ and $S$, $E$, $F$ integer (normalization conditions).
 467 The sign bit ($S$) can be omitted because $1/\sqrt{x}$ is only defined
 468 for $x > 0$. The $1/\sqrt{x}$ function applied to $x$ results in:
 469 \begin{equation}
 470 y(x) = \frac{1}{\sqrt{x}}
 471 \end{equation}
 472 or:
 473 \begin{equation}
 474 y(x) = \frac{1}{\sqrt{(2^{E-127})(1.F)}}
 475 \end{equation}
 476 This can be rewritten as:
 477 \begin{equation}
 478 y(x) = (2^{E-127})^{-1/2}(1.F)^{-1/2}
 479 \label{eqn:yx}
 480 \end{equation}
 481 Define:
 482 \begin{equation}
 483 (2^{E'-127}) \equiv (2^{E-127})^{-1/2}
 484 \end{equation}
 485 \begin{equation}
 486 1.F'\equiv (1.F)^{-1/2}
 487 \end{equation}
 488 Then $\frac{1}{\sqrt{2}} < 1.F' \leq 1$ holds, so the condition
 489 $1 \leq 1.F' < 2$, which is essential for normalized real representation, is
 490 not valid anymore. By introducing an extra term, this can be corrected.
 491 Rewrite the $1/\sqrt{x}$ function applied to floating-point numbers (\eqnref{yx}) as:
 492 \begin{equation}
 493 y(x) = (2^{\frac{127-E}{2}-1}) (2(1.F)^{-1/2})
 494 \end{equation}
 495 and:
 496 \begin{equation}
 497 (2^{E'-127}) \equiv (2^{\frac{127-E}{2}-1})
 498 \label{eqn:exp}
 499 \end{equation}
 500 \begin{equation}
 501 1.F'\equiv 2(1.F)^{-1/2}
 502 \label{eqn:frac}
 503 \end{equation}
 504 Then $\sqrt{2} < 1.F \leq 2$ holds. This is not the exact valid range as
 505 defined for normalized floating-point numbers in \eqnref{fpdef}.
 506 The value $2$  causes the problem. By mapping this value on the nearest
 507 representation $< 2$, this can be solved. The small error that is introduced
 508 by this approximation is within the allowable range.
 509
 510 The integer representation of the exponent is the next problem. Calculating
 511 $(2^{\frac{127-E}{2}-1})$ introduces a fractional result if $(127-E) = odd$.
 512 This is again easily accounted for by splitting up the calculation into an
 513 odd and an even part. For $(127-E) = even$ $E'$ in equation (\eqnref{exp})
 514 can be exactly calculated in integer arithmetic as a function of $E$.
 515 \begin{equation}
 516 E' = \frac{127-E}{2} + 126
 517 \end{equation}
 518
 519 For $(127-E) = odd$ equation (\eqnref{yx}) can be rewritten as:
 520 \begin{equation}
 521 y(x) = (2^{\frac{127-E-1}{2}})(\frac{1.F}{2})^{-1/2}
 522 \end{equation}
 523 Thus:
 524 \begin{equation}
 525 E' = \frac{126-E}{2} + 127
 526 \end{equation}
 527 which also can be calculated exactly in integer arithmetic.
 528 {\bf Note} that the fraction is automatically corrected for its range earlier
 529 mentioned, so the exponent does not need an extra correction.
 530
 531 The conclusions from this are:
 532 \begin{itemize}
 533 \item The fraction and exponent look-up table are independent. The fraction
 534 look-up table exists of two tables (odd and even exponent) so the odd/even
 535 information of the exponent (lsb bit) has to be used to select the right
 536 table.
 537 \item The exponent table is an 256 x 8 bit table, initialized for $odd$
 538 and $even$.% as specified before
 539 \end{itemize}
 540
 541 \subsection{Implementation}
 542 The look-up tables can be generated by a small C program, which uses
 543 floating-point numbers and operations with IEEE 32 bit single-precision format.
 544 Note that because of the $odd$/$even$ information that is needed, the
 545 fraction table is twice the size earlier specified (13 bit i.s.o. 12 bit).
 546 %\figref{expgen} in the appendix shows how to fill the exponent table,
 547 %\figref{fractgen} shows how to fill the fraction table.
 548
 549 The function according to~\eqnref{nr} has to be implemented.
 550 Applied to the $1/\sqrt{x}$ function, equation~\eqnref{inversef} leads to:
 551 \begin{equation}
 552 f = a - \frac{1}{y^{2}}
 553 \end{equation}
 554 and so:
 555 \begin{equation}
 556 f' = \frac{2}{y^{3}}
 557 \end{equation}
 558 so:
 559 \begin{equation}
 560 y_{n+1} = y_{n} - \frac{ a - \frac{1}{y^{2}_{n}} }{ \frac{2}{y^{3}_{n}} }
 561 \end{equation}
 562 or:
 563 \begin{equation}
 564 y_{n+1} = \frac{y_{n}}{2} (3 - a y^{2}_{n})
 565 \end{equation}
 566 Where $y_{0}$ can be found in the look-up tables, and $y_{1}$ gives the result
 567 to the maximum accuracy.
 568 %In \figref{as_implementation}) the assembler implementation is given.
 569 It is clear that only one iteration extra (in double
 570 precision) is needed for a double-precision result.
 571
 572 \section{Modifying GROMACS}
 573 The following files have to be edited in case you want to add a bonded
 574 potential of any type.
 575 \begin{enumerate}
 576 \item {\tt include/bondf.h}
 577 \item {\tt include/types/idef.h}
 578 \item {\tt include/types/nrnb.h}
 579 \item {\tt include/types/enums.h}
 580 \item {\tt include/grompp.h}
 581 \item {\tt src/kernel/topdirs.c}
 582 \item {\tt src/gmxlib/tpxio.c}
 583 \item {\tt src/gmxlib/bondfree.c}
 584 \item {\tt src/gmxlib/ifunc.c}
 585 \item {\tt src/gmxlib/nrnb.c}
 586 \item {\tt src/kernel/convparm.c}
 587 \item {\tt src/kernel/topdirs.c}
 588 \item {\tt src/kernel/topio.c}
 589 \end{enumerate}
 590
 591 % LocalWords:  Virial virial triclinic intra mol mshift shiftvec sqrt SPC lj yf
 592 % LocalWords:  coul Fortran SGI AMD Raphson IEEE taylor epsr accy ieee yx fpdef
 593 % LocalWords:  lsb nr inversef src formulae GROMACS