1 /* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
2 /* vim: set ts=8 sts=2 et sw=2 tw=80: */
3 /* This Source Code Form is subject to the terms of the Mozilla Public
4 * License, v. 2.0. If a copy of the MPL was not distributed with this
5 * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
7 #ifndef mozilla_FastBernoulliTrial_h
8 #define mozilla_FastBernoulliTrial_h
10 #include "mozilla/Assertions.h"
11 #include "mozilla/XorShift128PlusRNG.h"
19 * class FastBernoulliTrial: Efficient sampling with uniform probability
21 * When gathering statistics about a program's behavior, we may be observing
22 * events that occur very frequently (e.g., function calls or memory
23 * allocations) and we may be gathering information that is somewhat expensive
24 * to produce (e.g., call stacks). Sampling all the events could have a
25 * significant impact on the program's performance.
27 * Why not just sample every N'th event? This technique is called "systematic
28 * sampling"; it's simple and efficient, and it's fine if we imagine a
29 * patternless stream of events. But what if we're sampling allocations, and the
30 * program happens to have a loop where each iteration does exactly N
31 * allocations? You would end up sampling the same allocation every time through
32 * the loop; the entire rest of the loop becomes invisible to your measurements!
33 * More generally, if each iteration does M allocations, and M and N have any
34 * common divisor at all, most allocation sites will never be sampled. If
35 * they're both even, say, the odd-numbered allocations disappear from your
38 * Ideally, we'd like each event to have some probability P of being sampled,
39 * independent of its neighbors and of its position in the sequence. This is
40 * called "Bernoulli sampling", and it doesn't suffer from any of the problems
43 * One disadvantage of Bernoulli sampling is that you can't be sure exactly how
44 * many samples you'll get: technically, it's possible that you might sample
45 * none of them, or all of them. But if the number of events N is large, these
46 * aren't likely outcomes; you can generally expect somewhere around P * N
47 * events to be sampled.
49 * The other disadvantage of Bernoulli sampling is that you have to generate a
50 * random number for every event, which can be slow.
54 * BUT NOT WITH THIS CLASS! FastBernoulliTrial lets you do true Bernoulli
55 * sampling, while generating a fresh random number only when we do decide to
56 * sample an event, not on every trial. When it decides not to sample, a call to
57 * |FastBernoulliTrial::trial| is nothing but decrementing a counter and
58 * comparing it to zero. So the lower your sampling probability is, the less
59 * overhead FastBernoulliTrial imposes.
61 * Probabilities of 0 and 1 are handled efficiently. (In neither case need we
62 * ever generate a random number at all.)
66 * - FastBernoulliTrial(double P)
67 * Construct an instance that selects events with probability P.
69 * - FastBernoulliTrial::trial()
70 * Return true with probability P. Call this each time an event occurs, to
71 * decide whether to sample it or not.
73 * - FastBernoulliTrial::trial(size_t n)
74 * Equivalent to calling trial() |n| times, and returning true if any of those
75 * calls do. However, like trial, this runs in fast constant time.
77 * What is this good for? In some applications, some events are "bigger" than
78 * others. For example, large allocations are more significant than small
79 * allocations. Perhaps we'd like to imagine that we're drawing allocations
80 * from a stream of bytes, and performing a separate Bernoulli trial on every
81 * byte from the stream. We can accomplish this by calling |t.trial(S)| for
82 * the number of bytes S, and sampling the event if that returns true.
84 * Of course, this style of sampling needs to be paired with analysis and
85 * presentation that makes the size of the event apparent, lest trials with
86 * large values for |n| appear to be indistinguishable from those with small
89 class FastBernoulliTrial
{
91 * This comment should just read, "Generate skip counts with a geometric
92 * distribution", and leave everyone to go look that up and see why it's the
93 * right thing to do, if they don't know already.
95 * BUT IF YOU'RE CURIOUS, COMMENTS ARE FREE...
97 * Instead of generating a fresh random number for every trial, we can
98 * randomly generate a count of how many times we should return false before
99 * the next time we return true. We call this a "skip count". Once we've
100 * returned true, we generate a fresh skip count, and begin counting down
103 * Here's an awesome fact: by exercising a little care in the way we generate
104 * skip counts, we can produce results indistinguishable from those we would
105 * get "rolling the dice" afresh for every trial.
107 * In short, skip counts in Bernoulli trials of probability P obey a geometric
108 * distribution. If a random variable X is uniformly distributed from [0..1),
109 * then std::floor(std::log(X) / std::log(1-P)) has the appropriate geometric
110 * distribution for the skip counts.
114 * Suppose we're to return |true| with some probability P, say, 0.3. Spread
115 * all possible futures along a line segment of length 1. In portion P of
116 * those cases, we'll return true on the next call to |trial|; the skip count
117 * is 0. For the remaining portion 1-P of cases, the skip count is 1 or more.
120 * |------------------^-----------------------------------------|
124 * But the "1 or more" section of the line is subdivided the same way: *within
125 * that section*, in portion P the second call to |trial()| returns true, and
126 * in portion 1-P it returns false a second time; the skip count is two or
127 * more. So we return true on the second call in proportion 0.7 * 0.3, and
128 * skip at least the first two in proportion 0.7 * 0.7.
130 * skip: 0 1 2 or more
131 * |------------------^------------^----------------------------|
132 * portion: 0.3 0.7 * 0.3 0.7 * 0.7
135 * We can continue to subdivide:
137 * skip >= 0: |------------------------------------------------- (1-P)^0 --|
138 * skip >= 1: | ------------------------------- (1-P)^1 --|
139 * skip >= 2: | ------------------ (1-P)^2 --|
140 * skip >= 3: | ^ ---------- (1-P)^3 --|
141 * skip >= 4: | . --- (1-P)^4 --|
145 * In other words, the likelihood of the next n calls to |trial| returning
146 * false is (1-P)^n. The longer a run we require, the more the likelihood
147 * drops. Further calls may return false too, but this is the probability
148 * we'll skip at least n.
150 * This is interesting, because we can pick a point along this line segment
151 * and see which skip count's range it falls within; the point X above, for
152 * example, is within the ">= 2" range, but not within the ">= 3" range, so it
153 * designates a skip count of 2. So if we pick points on the line at random
154 * and use the skip counts they fall under, that will be indistinguishable
155 * from generating a fresh random number between 0 and 1 for each trial and
158 * So to find the skip count for a point X, we must ask: To what whole power
159 * must we raise 1-P such that we include X, but the next power would exclude
160 * it? This is exactly std::floor(std::log(X) / std::log(1-P)).
162 * Our algorithm is then, simply: When constructed, compute an initial skip
163 * count. Return false from |trial| that many times, and then compute a new
166 * For a call to |trial(n)|, if the skip count is greater than n, return false
167 * and subtract n from the skip count. If the skip count is less than n,
168 * return true and compute a new skip count. Since each trial is independent,
169 * it doesn't matter by how much n overshoots the skip count; we can actually
170 * compute a new skip count at *any* time without affecting the distribution.
171 * This is really beautiful.
175 * Construct a fast Bernoulli trial generator. Calls to |trial()| return true
176 * with probability |aProbability|. Use |aState0| and |aState1| to seed the
177 * random number generator; both may not be zero.
179 FastBernoulliTrial(double aProbability
, uint64_t aState0
, uint64_t aState1
)
181 mInvLogNotProbability(0),
182 mGenerator(aState0
, aState1
),
184 setProbability(aProbability
);
188 * Return true with probability |mProbability|. Call this each time an event
189 * occurs, to decide whether to sample it or not. The lower |mProbability| is,
190 * the faster this function runs.
198 return chooseSkipCount();
202 * Equivalent to calling trial() |n| times, and returning true if any of those
203 * calls do. However, like trial, this runs in fast constant time.
205 * What is this good for? In some applications, some events are "bigger" than
206 * others. For example, large allocations are more significant than small
207 * allocations. Perhaps we'd like to imagine that we're drawing allocations
208 * from a stream of bytes, and performing a separate Bernoulli trial on every
209 * byte from the stream. We can accomplish this by calling |t.trial(S)| for
210 * the number of bytes S, and sampling the event if that returns true.
212 * Of course, this style of sampling needs to be paired with analysis and
213 * presentation that makes the "size" of the event apparent, lest trials with
214 * large values for |n| appear to be indistinguishable from those with small
215 * values for |n|, despite being potentially much more likely to be sampled.
217 bool trial(size_t aCount
) {
218 if (mSkipCount
> aCount
) {
219 mSkipCount
-= aCount
;
223 return chooseSkipCount();
226 void setRandomState(uint64_t aState0
, uint64_t aState1
) {
227 mGenerator
.setState(aState0
, aState1
);
230 void setProbability(double aProbability
) {
231 MOZ_ASSERT(0 <= aProbability
&& aProbability
<= 1);
232 mProbability
= aProbability
;
233 if (0 < mProbability
&& mProbability
< 1) {
235 * Let's look carefully at how this calculation plays out in floating-
236 * point arithmetic. We'll assume IEEE, but the final C++ code we arrive
237 * at would still be fine if our numbers were mathematically perfect. So,
238 * while we've considered IEEE's edge cases, we haven't done anything that
239 * should be actively bad when using other representations.
241 * (In the below, read comparisons as exact mathematical comparisons: when
242 * we say something "equals 1", that means it's exactly equal to 1. We
243 * treat approximation using intervals with open boundaries: saying a
244 * value is in (0,1) doesn't specify how close to 0 or 1 the value gets.
245 * When we use closed boundaries like [2**-53, 1], we're careful to ensure
246 * the boundary values are actually representable.)
248 * - After the comparison above, we know mProbability is in (0,1).
250 * - The gaps below 1 are 2**-53, so that interval is (0, 1-2**-53].
252 * - Because the floating-point gaps near 1 are wider than those near
253 * zero, there are many small positive doubles ε such that 1-ε rounds to
254 * exactly 1. However, 2**-53 can be represented exactly. So
255 * 1-mProbability is in [2**-53, 1].
257 * - log(1 - mProbability) is thus in (-37, 0].
259 * That range includes zero, but when we use mInvLogNotProbability, it
260 * would be helpful if we could trust that it's negative. So when log(1
261 * - mProbability) is 0, we'll just set mProbability to 0, so that
262 * mInvLogNotProbability is not used in chooseSkipCount.
264 * - How much of the range of mProbability does this cause us to ignore?
265 * The only value for which log returns 0 is exactly 1; the slope of log
266 * at 1 is 1, so for small ε such that 1 - ε != 1, log(1 - ε) is -ε,
267 * never 0. The gaps near one are larger than the gaps near zero, so if
268 * 1 - ε wasn't 1, then -ε is representable. So if log(1 - mProbability)
269 * isn't 0, then 1 - mProbability isn't 1, which means that mProbability
270 * is at least 2**-53, as discussed earlier. This is a sampling
271 * likelihood of roughly one in ten trillion, which is unlikely to be
272 * distinguishable from zero in practice.
274 * So by forbidding zero, we've tightened our range to (-37, -2**-53].
276 * - Finally, 1 / log(1 - mProbability) is in [-2**53, -1/37). This all
277 * falls readily within the range of an IEEE double.
279 * ALL THAT HAVING BEEN SAID: here are the five lines of actual code:
281 double logNotProbability
= std::log(1 - mProbability
);
282 if (logNotProbability
== 0.0)
285 mInvLogNotProbability
= 1 / logNotProbability
;
292 /* The likelihood that any given call to |trial| should return true. */
296 * The value of 1/std::log(1 - mProbability), cached for repeated use.
298 * If mProbability is exactly 0 or exactly 1, we don't use this value.
299 * Otherwise, we guarantee this value is in the range [-2**53, -1/37), i.e.
300 * definitely negative, as required by chooseSkipCount. See setProbability for
303 double mInvLogNotProbability
;
305 /* Our random number generator. */
306 non_crypto::XorShift128PlusRNG mGenerator
;
308 /* The number of times |trial| should return false before next returning true.
313 * Choose the next skip count. This also returns the value that |trial| should
314 * return, since we have to check for the extreme values for mProbability
315 * anyway, and |trial| should never return true at all when mProbability is 0.
317 bool chooseSkipCount() {
319 * If the probability is 1.0, every call to |trial| returns true. Make sure
322 if (mProbability
== 1.0) {
328 * If the probabilility is zero, |trial| never returns true. Don't bother us
331 if (mProbability
== 0.0) {
332 mSkipCount
= SIZE_MAX
;
337 * What sorts of values can this call to std::floor produce?
339 * Since mGenerator.nextDouble returns a value in [0, 1-2**-53], std::log
340 * returns a value in the range [-infinity, -2**-53], all negative. Since
341 * mInvLogNotProbability is negative (see its comments), the product is
342 * positive and possibly infinite. std::floor returns +infinity unchanged.
343 * So the result will always be positive.
345 * Converting a double to an integer that is out of range for that integer
346 * is undefined behavior, so we must clamp our result to SIZE_MAX, to ensure
347 * we get an acceptable value for mSkipCount.
349 * The clamp is written carefully. Note that if we had said:
351 * if (skipCount > double(SIZE_MAX))
352 * mSkipCount = SIZE_MAX;
354 * mSkipCount = skipCount;
356 * that leads to undefined behavior 64-bit machines: SIZE_MAX coerced to
357 * double can equal 2^64, so if skipCount equaled 2^64 converting it to
358 * size_t would induce undefined behavior.
360 * Jakob Olesen cleverly suggested flipping the sense of the comparison to
361 * skipCount < double(SIZE_MAX). The conversion will evaluate to 2^64 or
362 * the double just below it: either way, skipCount is guaranteed to have a
363 * value that's safely convertible to size_t.
365 * (On 32-bit machines, all size_t values can be represented exactly in
366 * double, so all is well.)
369 std::floor(std::log(mGenerator
.nextDouble()) * mInvLogNotProbability
);
370 if (skipCount
< double(SIZE_MAX
))
371 mSkipCount
= skipCount
;
373 mSkipCount
= SIZE_MAX
;
379 } /* namespace mozilla */
381 #endif /* mozilla_FastBernoulliTrial_h */