2 % This file is part of the GROMACS molecular simulation package.
4 % Copyright (c) 2013,2014, by the GROMACS development team, led by
5 % Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
6 % and including many others, as listed in the AUTHORS file in the
7 % top-level source directory and at http://www.gromacs.org.
9 % GROMACS is free software; you can redistribute it and/or
10 % modify it under the terms of the GNU Lesser General Public License
11 % as published by the Free Software Foundation; either version 2.1
12 % of the License, or (at your option) any later version.
14 % GROMACS is distributed in the hope that it will be useful,
15 % but WITHOUT ANY WARRANTY; without even the implied warranty of
16 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
17 % Lesser General Public License for more details.
19 % You should have received a copy of the GNU Lesser General Public
20 % License along with GROMACS; if not, see
21 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
22 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
24 % If you want to redistribute modifications to GROMACS, please
25 % consider that scientific software is very special. Version
26 % control is crucial - bugs must be traceable. We will be happy to
27 % consider code for inclusion in the official distribution, but
28 % derived work must not be called official GROMACS. Details are found
29 % in the README & COPYING files - if they are missing, get the
30 % official version at http://www.gromacs.org.
32 % To help us fund GROMACS development, we humbly ask that you cite
33 % the research papers on the package. Check out http://www.gromacs.org.
35 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
37 % AVERAGES AND FLUCTUATIONS
39 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
40 \chapter{Averages and fluctuations
}
41 \section{Formulae for averaging
}
42 {\bf Note:
} this section was taken from ref~
\cite{Gunsteren94a
}.
44 When analyzing a MD trajectory averages $
\left<x
\right>$ and fluctuations
46 \left<(
\Delta x)^
2\right>^
{\half} ~=~
\left<
[x-
\left<x
\right>
]^
2\right>^
{\half}
49 of a quantity $x$ are to be computed.
50 The variance $
\sigma_x$ of a series of N$_x$ values,
51 \
{x$_i$\
}, can be computed from
53 \sigma_x~=~
\sum_{i=
1}^
{N_x
} x_i^
2 ~-~
\frac{1}{N_x
}\left(
\sum_{i=
1}^
{N_x
}x_i
\right)^
2
56 Unfortunately this formula is numerically not very accurate,
57 especially when $
\sigma_x^
{\half}$ is small compared to the values of $x_i$.
58 The following (equivalent) expression is numerically more accurate
60 \sigma_x ~=~
\sum_{i=
1}^
{N_x
} [x_i -
\left<x
\right>
]^
2
64 \left<x
\right> ~=~
\frac{1}{N_x
} \sum_{i=
1}^
{N_x
} x_i
67 Using ~
\eqnsref{var1
}{var2
} one has to go
68 through the series of $x_i$ values twice, once to determine
69 $
\left<x
\right>$ and again to
71 whereas
\eqnref{var0
} requires only one sequential scan of
72 the series \
{x$_i$\
}. However, one may cast
\eqnref{var1
} in
73 another form, containing partial sums, which allows for a sequential
74 update algorithm. Define the partial sum
76 X_
{n,m
} ~=~
\sum_{i=n
}^
{m
} x_i
78 and the partial variance
80 \sigma_{n,m
} ~=~
\sum_{i=n
}^
{m
} \left[x_i -
\frac{X_
{n,m
}}{m-n+
1}\right]^
2
85 X_
{n,m+k
} ~=~ X_
{n,m
} + X_
{m+
1,m+k
}
90 \sigma_{n,m+k
} &=&
\sigma_{n,m
} +
\sigma_{m+
1,m+k
} +
\left[~
\frac {X_
{n,m
}}{m-n+
1} -
\frac{X_
{n,m+k
}}{m+k-n+
1}~
\right]^
2~*
\nonumber\\
91 && ~
\frac{(m-n+
1)(m+k-n+
1)
}{k
}
92 \label{eqn:varpartial
}
96 \sigma_{1,m+k
} ~=~
\sigma_{1,m
} +
\sigma_{m+
1,m+k
}~+~
97 \left[~
\frac{X_
{1,m
}}{m
} -
\frac{X_
{1,m+k
}}{m+k
}~
\right]^
2~
\frac{m(m+k)
}{k
}
100 and for $n=
1$ and $k=
1$ ~(
\eqnref{varpartial
}) becomes
102 \sigma_{1,m+
1} &=&
\sigma_{1,m
} +
103 \left[\frac{X_
{1,m
}}{m
} -
\frac{X_
{1,m+
1}}{m+
1}\right]^
2 m(m+
1)\\
105 \frac {[~X_
{1,m
} - m x_
{m+
1}~
]^
2}{m(m+
1)
}
106 \label{eqn:simplevar0
}
108 where we have used the relation
110 X_
{1,m+
1} ~=~ X_
{1,m
} + x_
{m+
1}
111 \label{eqn:simplevar1
}
113 Using formulae~(
\eqnref{simplevar0
}) and ~(
\eqnref{simplevar1
}) the average
115 \left<x
\right> ~=~
\frac{X_
{1,N_x
}}{N_x
}
119 \left<(
\Delta x)^
2\right>^
{\half} =
\left[\frac {\sigma_{1,N_x
}}{N_x
}\right]^
{\half}
121 can be obtained by one sweep through the data.
123 \section{Implementation
}
124 In
{\gromacs} the instantaneous
125 energies $E(m)$ are stored in the
\swapindex{energy
}{file
}, along with the
126 values of $
\sigma_{1,m
}$ and $X_
{1,m
}$. Although the steps are counted from
0,
127 for the energy and fluctuations steps are counted from
1. This means that the
128 equations presented here are the ones that are implemented.
129 We give somewhat lengthy derivations in this section
130 to simplify checking of code and equations later on.
132 \subsection{Part of a Simulation
}
133 It is not uncommon to perform a simulation where the first part,
134 {\eg} 100 ps, is taken as
\normindex{equilibration
}. However, the
135 averages and fluctuations as printed in the
\swapindex{log
}{file
}
136 are computed over the whole simulation. The equilibration time,
137 which is now part of the simulation, may in such a case invalidate the
138 averages and fluctuations, because these numbers are now dominated
139 by the initial drift towards equilibrium.
141 Using~
\eqnsref{Xpartial
}{varpartial
} the average and
142 standard deviation over part of the trajectory can be computed as:
144 X_
{m+
1,m+k
} &=& X_
{1,m+k
} - X_
{1,m
} \\
145 \sigma_{m+
1,m+k
} &=&
\sigma_{1,m+k
}-
\sigma_{1,m
} -
\left[~
\frac{X_
{1,m
}}{m
} -
\frac{X_
{1,m+k
}}{m+k
}~
\right]^
{2}~
\frac{m(m+k)
}{k
}
148 or, more generally (with $p
\geq 1$ and $q
\geq p$):
150 X_
{p,q
} &=& X_
{1,q
} - X_
{1,p-
1} \\
151 \sigma_{p,q
} &=&
\sigma_{1,q
}-
\sigma_{1,p-
1} -
\left[~
\frac{X_
{1,p-
1}}{p-
1} -
\frac{X_
{1,q
}}{q
}~
\right]^
{2}~
\frac{(p-
1)q
}{q-p+
1}
153 {\bf Note
} that implementation of this is not entirely trivial, since energies
154 are not stored every time step of the simulation. We therefore have to construct
155 $X_
{1,p-
1}$ and $
\sigma_{1,p-
1}$ from the information at time $p$ using
156 \eqnsref{simplevar0
}{simplevar1
}:
158 X_
{1,p-
1} &=& X_
{1,p
} - x_p \\
159 \sigma_{1,p-
1} &=&
\sigma_{1,p
} -
\frac {[~X_
{1,p-
1} - (p-
1) x_
{p
}~
]^
2}{(p-
1)p
}
162 \subsection{Combining two simulations
}
163 Another frequently occurring problem is, that the fluctuations of two simulations
164 must be combined. Consider the following example: we have two simulations
165 (A) of $n$ and (B) of $m$ steps, in which the second simulation is a
166 continuation of the first. However, the second simulation starts numbering from
1
167 instead of from $n+
1$. For the partial sum
168 this is no problem, we have to add $X_
{1,n
}^A$ from run A:
170 X_
{1,n+m
}^
{AB
} ~=~ X_
{1,n
}^A + X_
{1,m
}^B
173 When we want to compute the partial variance from the two components we have to
174 make a correction $
\Delta\sigma$:
176 \sigma_{1,n+m
}^
{AB
} ~=~
\sigma_{1,n
}^A +
\sigma_{1,m
}^B +
\Delta\sigma
178 if we define $x_i^
{AB
}$ as the combined and renumbered set of data points we can
181 \sigma_{1,n+m
}^
{AB
} ~=~
\sum_{i=
1}^
{n+m
} \left[x_i^
{AB
} -
\frac{X_
{1,n+m
}^
{AB
}}{n+m
}\right]^
2
185 \sum_{i=
1}^
{n+m
} \left[x_i^
{AB
} -
\frac{X_
{1,n+m
}^
{AB
}}{n+m
}\right]^
2 ~=~
186 \sum_{i=
1}^
{n
} \left[x_i^
{A
} -
\frac{X_
{1,n
}^
{A
}}{n
}\right]^
2 +
187 \sum_{i=
1}^
{m
} \left[x_i^
{B
} -
\frac{X_
{1,m
}^
{B
}}{m
}\right]^
2 +
\Delta\sigma
191 \sum_{i=
1}^
{n+m
} \left[(x_i^
{AB
})^
2 -
2 x_i^
{AB
}\frac{X^
{AB
}_
{1,n+m
}}{n+m
} +
\left(
\frac{X^
{AB
}_
{1,n+m
}}{n+m
}\right)^
2 \right] &-&
\nonumber \\
192 \sum_{i=
1}^
{n
} \left[(x_i^
{A
})^
2 -
2 x_i^
{A
}\frac{X^A_
{1,n
}}{n
} +
\left(
\frac{X^A_
{1,n
}}{n
}\right)^
2 \right] &-&
\nonumber \\
193 \sum_{i=
1}^
{m
} \left[(x_i^
{B
})^
2 -
2 x_i^
{B
}\frac{X^B_
{1,m
}}{m
} +
\left(
\frac{X^B_
{1,m
}}{m
}\right)^
2 \right] &=&
\Delta\sigma
195 all the $x_i^
2$ terms drop out, and the terms independent of the summation
196 counter $i$ can be simplified:
198 \frac{\left(X^
{AB
}_
{1,n+m
}\right)^
2}{n+m
} \,-\,
199 \frac{\left(X^A_
{1,n
}\right)^
2}{n
} \,-\,
200 \frac{\left(X^B_
{1,m
}\right)^
2}{m
} &-&
\nonumber \\
201 2\,
\frac{X^
{AB
}_
{1,n+m
}}{n+m
}\sum_{i=
1}^
{n+m
}x_i^
{AB
} \,+\,
202 2\,
\frac{X^
{A
}_
{1,n
}}{n
}\sum_{i=
1}^
{n
}x_i^
{A
} \,+\,
203 2\,
\frac{X^
{B
}_
{1,m
}}{m
}\sum_{i=
1}^
{m
}x_i^
{B
} &=&
\Delta\sigma
205 we recognize the three partial sums on the second line and use
\eqnref{pscomb
}
208 \Delta\sigma ~=~
\frac{\left(mX^A_
{1,n
} - nX^B_
{1,m
}\right)^
2}{nm(n+m)
}
210 if we check this by inserting $m=
1$ we get back
\eqnref{simplevar0
}
212 \subsection{Summing energy terms
}
213 The
{\tt \normindex{g_energy
}} program can also sum energy terms into one,
{\eg}
214 potential + kinetic = total. For the partial averages this is again easy
215 if we have $S$ energy components $s$:
217 X_
{m,n
}^S ~=~
\sum_{i=m
}^n
\sum_{s=
1}^S x_i^s ~=~
\sum_{s=
1}^S
\sum_{i=m
}^n x_i^s ~=~
\sum_{s=
1}^S X_
{m,n
}^s
220 For the fluctuations it is less trivial again, considering for example
221 that the fluctuation in potential and kinetic energy should cancel.
222 Nevertheless we can try the same approach as before by writing:
224 \sigma_{m,n
}^S ~=~
\sum_{s=
1}^S
\sigma_{m,n
}^s +
\Delta\sigma
226 if we fill in
\eqnref{sigma
}:
228 \sum_{i=m
}^n
\left[\left(
\sum_{s=
1}^S x_i^s
\right) -
\frac{X_
{m,n
}^S
}{m-n+
1}\right]^
2 ~=~
229 \sum_{s=
1}^S
\sum_{i=m
}^n
\left[\left(x_i^s
\right) -
\frac{X_
{m,n
}^s
}{m-n+
1}\right]^
2 +
\Delta\sigma
230 \label{eqn:sigmaterms
}
232 which we can expand to:
234 &~&
\sum_{i=m
}^n
\left[\sum_{s=
1}^S (x_i^s)^
2 +
\left(
\frac{X_
{m,n
}^S
}{m-n+
1}\right)^
2 -
2\left(
\frac{X_
{m,n
}^S
}{m-n+
1}\sum_{s=
1}^S x_i^s +
\sum_{s=
1}^S
\sum_{s'=s+
1}^S x_i^s x_i^
{s'
} \right)
\right] \nonumber \\
235 &-&
\sum_{s=
1}^S
\sum_{i=m
}^n
\left[(x_i^s)^
2 -
2\,
\frac{X_
{m,n
}^s
}{m-n+
1}\,x_i^s +
\left(
\frac{X_
{m,n
}^s
}{m-n+
1}\right)^
2\right] ~=~
\Delta\sigma
237 the terms with $(x_i^s)^
2$ cancel, so that we can simplify to:
239 &~&
\frac{\left(X_
{m,n
}^S
\right)^
2}{m-n+
1} -
2 \frac{X_
{m,n
}^S
}{m-n+
1}\sum_{i=m
}^n
\sum_{s=
1}^S x_i^s -
2\sum_{i=m
}^n
\sum_{s=
1}^S
\sum_{s'=s+
1}^S x_i^s x_i^
{s'
}\, -
\nonumber \\
240 &~&
\sum_{s=
1}^S
\sum_{i=m
}^n
\left[-
2\,
\frac{X_
{m,n
}^s
}{m-n+
1}\,x_i^s +
\left(
\frac{X_
{m,n
}^s
}{m-n+
1}\right)^
2\right] ~=~
\Delta\sigma
244 -
\frac{\left(X_
{m,n
}^S
\right)^
2}{m-n+
1} -
2\sum_{i=m
}^n
\sum_{s=
1}^S
\sum_{s'=s+
1}^S x_i^s x_i^
{s'
}\, +
\sum_{s=
1}^S
\frac{\left(X_
{m,n
}^s
\right)^
2}{m-n+
1} ~=~
\Delta\sigma
246 If we now expand the first term using
\eqnref{sumterms
} we obtain:
248 -
\frac{\left(
\sum_{s=
1}^SX_
{m,n
}^s
\right)^
2}{m-n+
1} -
2\sum_{i=m
}^n
\sum_{s=
1}^S
\sum_{s'=s+
1}^S x_i^s x_i^
{s'
}\, +
\sum_{s=
1}^S
\frac{\left(X_
{m,n
}^s
\right)^
2}{m-n+
1} ~=~
\Delta\sigma
250 which we can reformulate to:
252 -
2\left[\sum_{s=
1}^S
\sum_{s'=s+
1}^S X_
{m,n
}^s X_
{m,n
}^
{s'
}\,+
\sum_{i=m
}^n
\sum_{s=
1}^S
\sum_{s'=s+
1}^S x_i^s x_i^
{s'
}\right] ~=~
\Delta\sigma
256 -
2\left[\sum_{s=
1}^S X_
{m,n
}^s
\sum_{s'=s+
1}^S X_
{m,n
}^
{s'
}\,+\,
\sum_{s=
1}^S
\sum_{i=m
}^nx_i^s
\sum_{s'=s+
1}^S x_i^
{s'
}\right] ~=~
\Delta\sigma
260 -
2\sum_{s=
1}^S
\left[X_
{m,n
}^s
\sum_{s'=s+
1}^S
\sum_{i=m
}^n x_i^
{s'
}\,+\,
\sum_{i=m
}^n x_i^s
\sum_{s'=s+
1}^S x_i^
{s'
}\right] ~=~
\Delta\sigma
262 Since we need all data points $i$ to evaluate this, in general this is not possible.
263 We can then make an estimate of $
\sigma_{m,n
}^S$ using only the data points that
264 are available using the left hand side of
\eqnref{sigmaterms
}. While the average can
265 be computed using all time steps in the simulation, the accuracy of the
266 fluctuations is thus limited by the frequency with which energies are saved.
267 Since this can be easily done with a program such as
\normindex{xmgr
} this is not
268 built-in in
{\gromacs}.
270 % LocalWords: Formulae varpartial formulae simplevar Xpartial pscomb mX nX SX
271 % LocalWords: sumterms nx sigmaterms xmgr ps nm