Updated cool quotes
[gromacs.git] / docs / manual / averages.tex
blob752cf5b4a61238b9b5b675ceaa027a805c5459fb
2 % This file is part of the GROMACS molecular simulation package.
4 % Copyright (c) 2013,2014, by the GROMACS development team, led by
5 % Mark Abraham, David van der Spoel, Berk Hess, and Erik Lindahl,
6 % and including many others, as listed in the AUTHORS file in the
7 % top-level source directory and at http://www.gromacs.org.
9 % GROMACS is free software; you can redistribute it and/or
10 % modify it under the terms of the GNU Lesser General Public License
11 % as published by the Free Software Foundation; either version 2.1
12 % of the License, or (at your option) any later version.
14 % GROMACS is distributed in the hope that it will be useful,
15 % but WITHOUT ANY WARRANTY; without even the implied warranty of
16 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
17 % Lesser General Public License for more details.
19 % You should have received a copy of the GNU Lesser General Public
20 % License along with GROMACS; if not, see
21 % http://www.gnu.org/licenses, or write to the Free Software Foundation,
22 % Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
24 % If you want to redistribute modifications to GROMACS, please
25 % consider that scientific software is very special. Version
26 % control is crucial - bugs must be traceable. We will be happy to
27 % consider code for inclusion in the official distribution, but
28 % derived work must not be called official GROMACS. Details are found
29 % in the README & COPYING files - if they are missing, get the
30 % official version at http://www.gromacs.org.
32 % To help us fund GROMACS development, we humbly ask that you cite
33 % the research papers on the package. Check out http://www.gromacs.org.
35 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
37 % AVERAGES AND FLUCTUATIONS
39 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
40 \chapter{Averages and fluctuations}
41 \section{Formulae for averaging}
42 {\bf Note:} this section was taken from ref~\cite{Gunsteren94a}.
44 When analyzing a MD trajectory averages $\left<x\right>$ and fluctuations
45 \beq
46 \left<(\Delta x)^2\right>^{\half} ~=~ \left<[x-\left<x\right>]^2\right>^{\half}
47 \label{eqn:var0}
48 \eeq
49 of a quantity $x$ are to be computed.
50 The variance $\sigma_x$ of a series of N$_x$ values,
51 \{x$_i$\}, can be computed from
52 \beq
53 \sigma_x~=~ \sum_{i=1}^{N_x} x_i^2 ~-~ \frac{1}{N_x}\left(\sum_{i=1}^{N_x}x_i\right)^2
54 \label{eqn:var1}
55 \eeq
56 Unfortunately this formula is numerically not very accurate,
57 especially when $\sigma_x^{\half}$ is small compared to the values of $x_i$.
58 The following (equivalent) expression is numerically more accurate
59 \beq
60 \sigma_x ~=~ \sum_{i=1}^{N_x} [x_i - \left<x\right>]^2
61 \eeq
62 with
63 \beq
64 \left<x\right> ~=~ \frac{1}{N_x} \sum_{i=1}^{N_x} x_i
65 \label{eqn:var2}
66 \eeq
67 Using ~\eqnsref{var1}{var2} one has to go
68 through the series of $x_i$ values twice, once to determine
69 $\left<x\right>$ and again to
70 compute $\sigma_x$,
71 whereas \eqnref{var0} requires only one sequential scan of
72 the series \{x$_i$\}. However, one may cast \eqnref{var1} in
73 another form, containing partial sums, which allows for a sequential
74 update algorithm. Define the partial sum
75 \beq
76 X_{n,m} ~=~ \sum_{i=n}^{m} x_i
77 \eeq
78 and the partial variance
79 \beq
80 \sigma_{n,m} ~=~ \sum_{i=n}^{m} \left[x_i - \frac{X_{n,m}}{m-n+1}\right]^2
81 \label{eqn:sigma}
82 \eeq
83 It can be shown that
84 \beq
85 X_{n,m+k} ~=~ X_{n,m} + X_{m+1,m+k}
86 \label{eqn:Xpartial}
87 \eeq
88 and
89 \bea
90 \sigma_{n,m+k} &=& \sigma_{n,m} + \sigma_{m+1,m+k} + \left[~\frac {X_{n,m}}{m-n+1} - \frac{X_{n,m+k}}{m+k-n+1}~\right]^2~* \nonumber\\
91 && ~\frac{(m-n+1)(m+k-n+1)}{k}
92 \label{eqn:varpartial}
93 \eea
94 For $n=1$ one finds
95 \beq
96 \sigma_{1,m+k} ~=~ \sigma_{1,m} + \sigma_{m+1,m+k}~+~
97 \left[~\frac{X_{1,m}}{m} - \frac{X_{1,m+k}}{m+k}~\right]^2~ \frac{m(m+k)}{k}
98 \label{eqn:sig1}
99 \eeq
100 and for $n=1$ and $k=1$ ~(\eqnref{varpartial}) becomes
101 \bea
102 \sigma_{1,m+1} &=& \sigma_{1,m} +
103 \left[\frac{X_{1,m}}{m} - \frac{X_{1,m+1}}{m+1}\right]^2 m(m+1)\\
104 &=& \sigma_{1,m} +
105 \frac {[~X_{1,m} - m x_{m+1}~]^2}{m(m+1)}
106 \label{eqn:simplevar0}
107 \eea
108 where we have used the relation
109 \beq
110 X_{1,m+1} ~=~ X_{1,m} + x_{m+1}
111 \label{eqn:simplevar1}
112 \eeq
113 Using formulae~(\eqnref{simplevar0}) and ~(\eqnref{simplevar1}) the average
114 \beq
115 \left<x\right> ~=~ \frac{X_{1,N_x}}{N_x}
116 \eeq
117 and the fluctuation
118 \beq
119 \left<(\Delta x)^2\right>^{\half} = \left[\frac {\sigma_{1,N_x}}{N_x}\right]^{\half}
120 \eeq
121 can be obtained by one sweep through the data.
123 \section{Implementation}
124 In {\gromacs} the instantaneous
125 energies $E(m)$ are stored in the \swapindex{energy}{file}, along with the
126 values of $\sigma_{1,m}$ and $X_{1,m}$. Although the steps are counted from 0,
127 for the energy and fluctuations steps are counted from 1. This means that the
128 equations presented here are the ones that are implemented.
129 We give somewhat lengthy derivations in this section
130 to simplify checking of code and equations later on.
132 \subsection{Part of a Simulation}
133 It is not uncommon to perform a simulation where the first part,
134 {\eg} 100 ps, is taken as \normindex{equilibration}. However, the
135 averages and fluctuations as printed in the \swapindex{log}{file}
136 are computed over the whole simulation. The equilibration time,
137 which is now part of the simulation, may in such a case invalidate the
138 averages and fluctuations, because these numbers are now dominated
139 by the initial drift towards equilibrium.
141 Using~\eqnsref{Xpartial}{varpartial} the average and
142 standard deviation over part of the trajectory can be computed as:
143 \bea
144 X_{m+1,m+k} &=& X_{1,m+k} - X_{1,m} \\
145 \sigma_{m+1,m+k} &=& \sigma_{1,m+k}-\sigma_{1,m} - \left[~\frac{X_{1,m}}{m} - \frac{X_{1,m+k}}{m+k}~\right]^{2}~ \frac{m(m+k)}{k}
146 \eea
148 or, more generally (with $p \geq 1$ and $q \geq p$):
149 \bea
150 X_{p,q} &=& X_{1,q} - X_{1,p-1} \\
151 \sigma_{p,q} &=& \sigma_{1,q}-\sigma_{1,p-1} - \left[~\frac{X_{1,p-1}}{p-1} - \frac{X_{1,q}}{q}~\right]^{2}~ \frac{(p-1)q}{q-p+1}
152 \eea
153 {\bf Note} that implementation of this is not entirely trivial, since energies
154 are not stored every time step of the simulation. We therefore have to construct
155 $X_{1,p-1}$ and $\sigma_{1,p-1}$ from the information at time $p$ using
156 \eqnsref{simplevar0}{simplevar1}:
157 \bea
158 X_{1,p-1} &=& X_{1,p} - x_p \\
159 \sigma_{1,p-1} &=& \sigma_{1,p} - \frac {[~X_{1,p-1} - (p-1) x_{p}~]^2}{(p-1)p}
160 \eea
162 \subsection{Combining two simulations}
163 Another frequently occurring problem is, that the fluctuations of two simulations
164 must be combined. Consider the following example: we have two simulations
165 (A) of $n$ and (B) of $m$ steps, in which the second simulation is a
166 continuation of the first. However, the second simulation starts numbering from 1
167 instead of from $n+1$. For the partial sum
168 this is no problem, we have to add $X_{1,n}^A$ from run A:
169 \beq
170 X_{1,n+m}^{AB} ~=~ X_{1,n}^A + X_{1,m}^B
171 \label{eqn:pscomb}
172 \eeq
173 When we want to compute the partial variance from the two components we have to
174 make a correction $\Delta\sigma$:
175 \beq
176 \sigma_{1,n+m}^{AB} ~=~ \sigma_{1,n}^A + \sigma_{1,m}^B +\Delta\sigma
177 \eeq
178 if we define $x_i^{AB}$ as the combined and renumbered set of data points we can
179 write:
180 \beq
181 \sigma_{1,n+m}^{AB} ~=~ \sum_{i=1}^{n+m} \left[x_i^{AB} - \frac{X_{1,n+m}^{AB}}{n+m}\right]^2
182 \eeq
183 and thus
184 \beq
185 \sum_{i=1}^{n+m} \left[x_i^{AB} - \frac{X_{1,n+m}^{AB}}{n+m}\right]^2 ~=~
186 \sum_{i=1}^{n} \left[x_i^{A} - \frac{X_{1,n}^{A}}{n}\right]^2 +
187 \sum_{i=1}^{m} \left[x_i^{B} - \frac{X_{1,m}^{B}}{m}\right]^2 +\Delta\sigma
188 \eeq
190 \bea
191 \sum_{i=1}^{n+m} \left[(x_i^{AB})^2 - 2 x_i^{AB}\frac{X^{AB}_{1,n+m}}{n+m} + \left(\frac{X^{AB}_{1,n+m}}{n+m}\right)^2 \right] &-& \nonumber \\
192 \sum_{i=1}^{n} \left[(x_i^{A})^2 - 2 x_i^{A}\frac{X^A_{1,n}}{n} + \left(\frac{X^A_{1,n}}{n}\right)^2 \right] &-& \nonumber \\
193 \sum_{i=1}^{m} \left[(x_i^{B})^2 - 2 x_i^{B}\frac{X^B_{1,m}}{m} + \left(\frac{X^B_{1,m}}{m}\right)^2 \right] &=& \Delta\sigma
194 \eea
195 all the $x_i^2$ terms drop out, and the terms independent of the summation
196 counter $i$ can be simplified:
197 \bea
198 \frac{\left(X^{AB}_{1,n+m}\right)^2}{n+m} \,-\,
199 \frac{\left(X^A_{1,n}\right)^2}{n} \,-\,
200 \frac{\left(X^B_{1,m}\right)^2}{m} &-& \nonumber \\
201 2\,\frac{X^{AB}_{1,n+m}}{n+m}\sum_{i=1}^{n+m}x_i^{AB} \,+\,
202 2\,\frac{X^{A}_{1,n}}{n}\sum_{i=1}^{n}x_i^{A} \,+\,
203 2\,\frac{X^{B}_{1,m}}{m}\sum_{i=1}^{m}x_i^{B} &=& \Delta\sigma
204 \eea
205 we recognize the three partial sums on the second line and use \eqnref{pscomb}
206 to obtain:
207 \beq
208 \Delta\sigma ~=~ \frac{\left(mX^A_{1,n} - nX^B_{1,m}\right)^2}{nm(n+m)}
209 \eeq
210 if we check this by inserting $m=1$ we get back \eqnref{simplevar0}
212 \subsection{Summing energy terms}
213 The {\tt \normindex{g_energy}} program can also sum energy terms into one, {\eg}
214 potential + kinetic = total. For the partial averages this is again easy
215 if we have $S$ energy components $s$:
216 \beq
217 X_{m,n}^S ~=~ \sum_{i=m}^n \sum_{s=1}^S x_i^s ~=~ \sum_{s=1}^S \sum_{i=m}^n x_i^s ~=~ \sum_{s=1}^S X_{m,n}^s
218 \label{eqn:sumterms}
219 \eeq
220 For the fluctuations it is less trivial again, considering for example
221 that the fluctuation in potential and kinetic energy should cancel.
222 Nevertheless we can try the same approach as before by writing:
223 \beq
224 \sigma_{m,n}^S ~=~ \sum_{s=1}^S \sigma_{m,n}^s + \Delta\sigma
225 \eeq
226 if we fill in \eqnref{sigma}:
227 \beq
228 \sum_{i=m}^n \left[\left(\sum_{s=1}^S x_i^s\right) - \frac{X_{m,n}^S}{m-n+1}\right]^2 ~=~
229 \sum_{s=1}^S \sum_{i=m}^n \left[\left(x_i^s\right) - \frac{X_{m,n}^s}{m-n+1}\right]^2 + \Delta\sigma
230 \label{eqn:sigmaterms}
231 \eeq
232 which we can expand to:
233 \bea
234 &~&\sum_{i=m}^n \left[\sum_{s=1}^S (x_i^s)^2 + \left(\frac{X_{m,n}^S}{m-n+1}\right)^2 -2\left(\frac{X_{m,n}^S}{m-n+1}\sum_{s=1}^S x_i^s + \sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'} \right)\right] \nonumber \\
235 &-&\sum_{s=1}^S \sum_{i=m}^n \left[(x_i^s)^2 - 2\,\frac{X_{m,n}^s}{m-n+1}\,x_i^s + \left(\frac{X_{m,n}^s}{m-n+1}\right)^2\right] ~=~\Delta\sigma
236 \eea
237 the terms with $(x_i^s)^2$ cancel, so that we can simplify to:
238 \bea
239 &~&\frac{\left(X_{m,n}^S\right)^2}{m-n+1} -2 \frac{X_{m,n}^S}{m-n+1}\sum_{i=m}^n\sum_{s=1}^S x_i^s -2\sum_{i=m}^n\sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'}\, - \nonumber \\
240 &~&\sum_{s=1}^S \sum_{i=m}^n \left[- 2\,\frac{X_{m,n}^s}{m-n+1}\,x_i^s + \left(\frac{X_{m,n}^s}{m-n+1}\right)^2\right] ~=~\Delta\sigma
241 \eea
243 \beq
244 -\frac{\left(X_{m,n}^S\right)^2}{m-n+1} -2\sum_{i=m}^n\sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'}\, + \sum_{s=1}^S \frac{\left(X_{m,n}^s\right)^2}{m-n+1} ~=~\Delta\sigma
245 \eeq
246 If we now expand the first term using \eqnref{sumterms} we obtain:
247 \beq
248 -\frac{\left(\sum_{s=1}^SX_{m,n}^s\right)^2}{m-n+1} -2\sum_{i=m}^n\sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'}\, + \sum_{s=1}^S \frac{\left(X_{m,n}^s\right)^2}{m-n+1} ~=~\Delta\sigma
249 \eeq
250 which we can reformulate to:
251 \beq
252 -2\left[\sum_{s=1}^S \sum_{s'=s+1}^S X_{m,n}^s X_{m,n}^{s'}\,+\sum_{i=m}^n\sum_{s=1}^S \sum_{s'=s+1}^S x_i^s x_i^{s'}\right] ~=~\Delta\sigma
253 \eeq
255 \beq
256 -2\left[\sum_{s=1}^S X_{m,n}^s \sum_{s'=s+1}^S X_{m,n}^{s'}\,+\,\sum_{s=1}^S \sum_{i=m}^nx_i^s \sum_{s'=s+1}^S x_i^{s'}\right] ~=~\Delta\sigma
257 \eeq
258 which gives
259 \beq
260 -2\sum_{s=1}^S \left[X_{m,n}^s \sum_{s'=s+1}^S \sum_{i=m}^n x_i^{s'}\,+\,\sum_{i=m}^n x_i^s \sum_{s'=s+1}^S x_i^{s'}\right] ~=~\Delta\sigma
261 \eeq
262 Since we need all data points $i$ to evaluate this, in general this is not possible.
263 We can then make an estimate of $\sigma_{m,n}^S$ using only the data points that
264 are available using the left hand side of \eqnref{sigmaterms}. While the average can
265 be computed using all time steps in the simulation, the accuracy of the
266 fluctuations is thus limited by the frequency with which energies are saved.
267 Since this can be easily done with a program such as \normindex{xmgr} this is not
268 built-in in {\gromacs}.
270 % LocalWords: Formulae varpartial formulae simplevar Xpartial pscomb mX nX SX
271 % LocalWords: sumterms nx sigmaterms xmgr ps nm