Bumping manifests a=b2g-bump
[gecko.git] / dom / canvas / WebGLElementArrayCache.cpp
blob0c2502809a37a794f0a35d8578cc4fbac806e494
1 /* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*- */
2 /* This Source Code Form is subject to the terms of the Mozilla Public
3 * License, v. 2.0. If a copy of the MPL was not distributed with this
4 * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
6 #include "WebGLElementArrayCache.h"
8 #include <algorithm>
9 #include <cstdlib>
10 #include <cstring>
11 #include <limits>
12 #include "mozilla/Assertions.h"
13 #include "mozilla/MathAlgorithms.h"
14 #include "mozilla/MemoryReporting.h"
16 namespace mozilla {
18 static void
19 UpdateUpperBound(uint32_t* const out_upperBound, uint32_t newBound)
21 MOZ_ASSERT(out_upperBound);
22 *out_upperBound = std::max(*out_upperBound, newBound);
25 /* WebGLElementArrayCacheTree contains most of the implementation of
26 * WebGLElementArrayCache, which performs WebGL element array buffer validation
27 * for drawElements.
29 * Attention: Here lie nontrivial data structures, bug-prone algorithms, and
30 * non-canonical tweaks! Whence the explanatory comments, and compiled unit
31 * test.
33 * *** What problem are we solving here? ***
35 * WebGL::DrawElements has to validate that the elements are in range wrt the
36 * current vertex attribs. This boils down to the problem, given an array of
37 * integers, of computing the maximum in an arbitrary sub-array. The naive
38 * algorithm has linear complexity; this has been a major performance problem,
39 * see bug 569431. In that bug, we took the approach of caching the max for the
40 * whole array, which does cover most cases (DrawElements typically consumes the
41 * whole element array buffer) but doesn't help in other use cases:
42 * - when doing "partial DrawElements" i.e. consuming only part of the element
43 * array buffer
44 * - when doing frequent "partial buffer updates" i.e. bufferSubData calls
45 * updating parts of the element array buffer
47 * *** The solution: A binary tree ***
49 * The solution implemented here is to use a binary tree as the cache data
50 * structure. Each tree node contains the max of its two children nodes. In this
51 * way, finding the maximum in any contiguous sub-array has log complexity
52 * instead of linear complexity.
54 * Simplistically, if the element array is:
56 * [1 4 3 2]
58 * then the corresponding tree is:
60 * 4
61 * _/ \_
62 * 4 3
63 * / \ / \
64 * 1 4 3 2
66 * In practice, the bottom-most levels of the tree are both the largest to store
67 * (because they have more nodes), and the least useful performance-wise
68 * (because each node in the bottom levels concerns only few entries in the
69 * elements array buffer, it is cheap to compute).
71 * For this reason, we stop the tree a few levels above, so that each tree leaf
72 * actually corresponds to more than one element array entry.
74 * The number of levels that we "drop" is |kSkippedBottomTreeLevels| and the
75 * number of element array entries that each leaf corresponds to, is
76 * |kElementsPerLeaf|. This being a binary tree, we have:
78 * kElementsPerLeaf = 2 ^ kSkippedBottomTreeLevels.
80 * *** Storage layout of the binary tree ***
82 * We take advantage of the specifics of the situation to avoid generalist tree
83 * storage and instead store the tree entries in a vector, mTreeData.
85 * TreeData is always a vector of length:
87 * 2 * (number of leaves).
89 * Its data layout is as follows: mTreeData[0] is unused, mTreeData[1] is the
90 * root node, then at offsets 2..3 is the tree level immediately below the root
91 * node, then at offsets 4..7 is the tree level below that, etc.
93 * The figure below illustrates this by writing at each tree node the offset
94 * into mTreeData at which it is stored:
96 * 1
97 * _/ \_
98 * 2 3
99 * / \ / \
100 * 4 5 6 7
101 * ...
103 * Thus, under the convention that the root level is level 0, we see that level
104 * N is stored at offsets:
106 * [ 2^n .. 2^(n+1) - 1 ]
108 * in mTreeData. Likewise, all the usual tree operations have simple
109 * mathematical expressions in terms of mTreeData offsets, see all the methods
110 * such as ParentNode, LeftChildNode, etc.
112 * *** Design constraint: Element types aren't known at buffer-update time ***
114 * Note that a key constraint that we're operating under, is that we don't know
115 * the types of the elements by the time WebGL bufferData/bufferSubData methods
116 * are called. The type of elements is only specified in the drawElements call.
117 * This means that we may potentially have to store caches for multiple element
118 * types, for the same element array buffer. Since we don't know yet how many
119 * element types we'll eventually support (extensions add more), the concern
120 * about memory usage is serious. This is addressed by kSkippedBottomTreeLevels
121 * as explained above. Of course, in the typical case where each element array
122 * buffer is only ever used with one type, this is also addressed by having
123 * WebGLElementArrayCache lazily create trees for each type only upon first use.
125 * Another consequence of this constraint is that when updating the trees, we
126 * have to update all existing trees. So if trees for types uint8_t, uint16_t
127 * and uint32_t have ever been constructed for this buffer, every subsequent
128 * update will have to update all trees even if one of the types is never used
129 * again. That's inefficient, but content should not put indices of different
130 * types in the same element array buffer anyways. Different index types can
131 * only be consumed in separate drawElements calls, so nothing particular is
132 * to be achieved by lumping them in the same buffer object.
134 template<typename T>
135 struct WebGLElementArrayCacheTree
137 /* A too-high kSkippedBottomTreeLevels would harm the performance of small
138 * drawElements calls. A too-low kSkippedBottomTreeLevels would cause undue
139 * memory usage. The current value has been validated by some benchmarking.
140 * See bug 732660.
142 static const size_t kSkippedBottomTreeLevels = 3;
143 static const size_t kElementsPerLeaf = 1 << kSkippedBottomTreeLevels;
144 // Since kElementsPerLeaf is POT:
145 static const size_t kElementsPerLeafMask = kElementsPerLeaf - 1;
147 private:
148 // The WebGLElementArrayCache that owns this tree:
149 WebGLElementArrayCache& mParent;
151 // The tree's internal data storage. Its length is 2 * (number of leaves)
152 // because of its data layout explained in the above class comment.
153 FallibleTArray<T> mTreeData;
155 public:
156 // Constructor. Takes a reference to the WebGLElementArrayCache that is to be
157 // the parent. Does not initialize the tree. Should be followed by a call
158 // to Update() to attempt initializing the tree.
159 explicit WebGLElementArrayCacheTree(WebGLElementArrayCache& value)
160 : mParent(value)
164 T GlobalMaximum() const {
165 return mTreeData[1];
168 // returns the index of the parent node; if treeIndex=1 (the root node),
169 // the return value is 0.
170 static size_t ParentNode(size_t treeIndex) {
171 MOZ_ASSERT(treeIndex > 1);
172 return treeIndex >> 1;
175 static bool IsRightNode(size_t treeIndex) {
176 MOZ_ASSERT(treeIndex > 1);
177 return treeIndex & 1;
180 static bool IsLeftNode(size_t treeIndex) {
181 MOZ_ASSERT(treeIndex > 1);
182 return !IsRightNode(treeIndex);
185 static size_t SiblingNode(size_t treeIndex) {
186 MOZ_ASSERT(treeIndex > 1);
187 return treeIndex ^ 1;
190 static size_t LeftChildNode(size_t treeIndex) {
191 MOZ_ASSERT(treeIndex);
192 return treeIndex << 1;
195 static size_t RightChildNode(size_t treeIndex) {
196 MOZ_ASSERT(treeIndex);
197 return SiblingNode(LeftChildNode(treeIndex));
200 static size_t LeftNeighborNode(size_t treeIndex, size_t distance = 1) {
201 MOZ_ASSERT(treeIndex > 1);
202 return treeIndex - distance;
205 static size_t RightNeighborNode(size_t treeIndex, size_t distance = 1) {
206 MOZ_ASSERT(treeIndex > 1);
207 return treeIndex + distance;
210 size_t NumLeaves() const {
211 // See class comment for why we the tree storage size is 2 * numLeaves.
212 return mTreeData.Length() >> 1;
215 size_t LeafForElement(size_t element) const {
216 size_t leaf = element / kElementsPerLeaf;
217 MOZ_ASSERT(leaf < NumLeaves());
218 return leaf;
221 size_t LeafForByte(size_t byte) const {
222 return LeafForElement(byte / sizeof(T));
225 // Returns the index, into the tree storage, where a given leaf is stored.
226 size_t TreeIndexForLeaf(size_t leaf) const {
227 // See above class comment. The tree storage is an array of length
228 // 2 * numLeaves. The leaves are stored in its second half.
229 return leaf + NumLeaves();
232 static size_t LastElementUnderSameLeaf(size_t element) {
233 return element | kElementsPerLeafMask;
236 static size_t FirstElementUnderSameLeaf(size_t element) {
237 return element & ~kElementsPerLeafMask;
240 static size_t NextMultipleOfElementsPerLeaf(size_t numElements) {
241 MOZ_ASSERT(numElements >= 1);
242 return ((numElements - 1) | kElementsPerLeafMask) + 1;
245 bool Validate(T maxAllowed, size_t firstLeaf, size_t lastLeaf,
246 uint32_t* const out_upperBound)
248 size_t firstTreeIndex = TreeIndexForLeaf(firstLeaf);
249 size_t lastTreeIndex = TreeIndexForLeaf(lastLeaf);
251 while (true) {
252 // Given that we tweak these values in nontrivial ways, it doesn't
253 // hurt to do this sanity check.
254 MOZ_ASSERT(firstTreeIndex <= lastTreeIndex);
256 // Final case where there is only one node to validate at the
257 // current tree level:
258 if (lastTreeIndex == firstTreeIndex) {
259 const T& curData = mTreeData[firstTreeIndex];
260 UpdateUpperBound(out_upperBound, curData);
261 return curData <= maxAllowed;
264 // If the first node at current tree level is a right node, handle
265 // it individually and replace it with its right neighbor, which is
266 // a left node.
267 if (IsRightNode(firstTreeIndex)) {
268 const T& curData = mTreeData[firstTreeIndex];
269 UpdateUpperBound(out_upperBound, curData);
270 if (curData > maxAllowed)
271 return false;
273 firstTreeIndex = RightNeighborNode(firstTreeIndex);
276 // If the last node at current tree level is a left node, handle it
277 // individually and replace it with its left neighbor, which is a
278 // right node.
279 if (IsLeftNode(lastTreeIndex)) {
280 const T& curData = mTreeData[lastTreeIndex];
281 UpdateUpperBound(out_upperBound, curData);
282 if (curData > maxAllowed)
283 return false;
285 lastTreeIndex = LeftNeighborNode(lastTreeIndex);
288 /* At this point it can happen that firstTreeIndex and lastTreeIndex
289 * "crossed" eachother. That happens if firstTreeIndex was a right
290 * node and lastTreeIndex was its right neighor: In that case, both
291 * above tweaks happened and as a result, they ended up being
292 * swapped: LastTreeIndex is now the _left_ neighbor of
293 * firstTreeIndex. When that happens, there is nothing left to
294 * validate.
296 if (lastTreeIndex == LeftNeighborNode(firstTreeIndex))
297 return true;
299 // Walk up one level.
300 firstTreeIndex = ParentNode(firstTreeIndex);
301 lastTreeIndex = ParentNode(lastTreeIndex);
305 // Updates the tree from the parent's buffer contents. Fallible, as it
306 // may have to resize the tree storage.
307 bool Update(size_t firstByte, size_t lastByte);
309 size_t SizeOfIncludingThis(mozilla::MallocSizeOf mallocSizeOf) const
311 return mallocSizeOf(this) + mTreeData.SizeOfExcludingThis(mallocSizeOf);
315 // TreeForType: just a template helper to select the right tree object for a given
316 // element type.
317 template<typename T>
318 struct TreeForType {};
320 template<>
321 struct TreeForType<uint8_t>
323 static ScopedDeletePtr<WebGLElementArrayCacheTree<uint8_t>>&
324 Value(WebGLElementArrayCache* b) {
325 return b->mUint8Tree;
329 template<>
330 struct TreeForType<uint16_t>
332 static ScopedDeletePtr<WebGLElementArrayCacheTree<uint16_t>>&
333 Value(WebGLElementArrayCache* b) {
334 return b->mUint16Tree;
338 template<>
339 struct TreeForType<uint32_t>
341 static ScopedDeletePtr<WebGLElementArrayCacheTree<uint32_t>>&
342 Value(WebGLElementArrayCache* b) {
343 return b->mUint32Tree;
347 // Calling this method will 1) update the leaves in this interval
348 // from the raw buffer data, and 2) propagate this update up the tree.
349 template<typename T>
350 bool
351 WebGLElementArrayCacheTree<T>::Update(size_t firstByte, size_t lastByte)
353 MOZ_ASSERT(firstByte <= lastByte);
354 MOZ_ASSERT(lastByte < mParent.mBytes.Length());
356 size_t numberOfElements = mParent.mBytes.Length() / sizeof(T);
357 size_t requiredNumLeaves = 0;
358 if (numberOfElements > 0) {
359 /* If we didn't require the number of leaves to be a power of two, then
360 * it would just be equal to
362 * ceil(numberOfElements / kElementsPerLeaf)
364 * The way we implement this (division+ceil) operation in integer
365 * arithmetic
366 * is as follows:
368 size_t numLeavesNonPOT = (numberOfElements + kElementsPerLeaf - 1) / kElementsPerLeaf;
369 // It only remains to round that up to the next power of two:
370 requiredNumLeaves = RoundUpPow2(numLeavesNonPOT);
373 // Step #0: If needed, resize our tree data storage.
374 if (requiredNumLeaves != NumLeaves()) {
375 // See class comment for why we the tree storage size is 2 * numLeaves.
376 if (!mTreeData.SetLength(2 * requiredNumLeaves)) {
377 mTreeData.SetLength(0);
378 return false;
380 MOZ_ASSERT(NumLeaves() == requiredNumLeaves);
382 if (NumLeaves()) {
383 // When resizing, update the whole tree, not just the subset
384 // corresponding to the part of the buffer being updated.
385 memset(mTreeData.Elements(), 0, mTreeData.Length() * sizeof(T));
386 firstByte = 0;
387 lastByte = mParent.mBytes.Length() - 1;
391 if (NumLeaves() == 0)
392 return true;
394 lastByte = std::min(lastByte, NumLeaves() * kElementsPerLeaf * sizeof(T) - 1);
395 if (firstByte > lastByte)
396 return true;
398 size_t firstLeaf = LeafForByte(firstByte);
399 size_t lastLeaf = LeafForByte(lastByte);
401 MOZ_ASSERT(firstLeaf <= lastLeaf && lastLeaf < NumLeaves());
403 size_t firstTreeIndex = TreeIndexForLeaf(firstLeaf);
404 size_t lastTreeIndex = TreeIndexForLeaf(lastLeaf);
406 // Step #1: Initialize the tree leaves from plain buffer data.
407 // That is, each tree leaf must be set to the max of the |kElementsPerLeaf|
408 // corresponding buffer entries.
410 // Condition-less scope to prevent leaking this scope's variables into the
411 // code below:
413 // TreeIndex is the index of the tree leaf we're writing, i.e. the
414 // destination index.
415 size_t treeIndex = firstTreeIndex;
416 // srcIndex is the index in the source buffer.
417 size_t srcIndex = firstLeaf * kElementsPerLeaf;
418 while (treeIndex <= lastTreeIndex) {
419 T m = 0;
420 size_t a = srcIndex;
421 size_t srcIndexNextLeaf = std::min(a + kElementsPerLeaf, numberOfElements);
422 for (; srcIndex < srcIndexNextLeaf; srcIndex++) {
423 m = std::max(m, mParent.Element<T>(srcIndex));
425 mTreeData[treeIndex] = m;
426 treeIndex++;
430 // Step #2: Propagate the values up the tree. This is simply a matter of
431 // walking up the tree and setting each node to the max of its two children.
432 while (firstTreeIndex > 1) {
433 // Move up one level.
434 firstTreeIndex = ParentNode(firstTreeIndex);
435 lastTreeIndex = ParentNode(lastTreeIndex);
437 // Fast-exit case where only one node is updated at the current level.
438 if (firstTreeIndex == lastTreeIndex) {
439 mTreeData[firstTreeIndex] = std::max(mTreeData[LeftChildNode(firstTreeIndex)], mTreeData[RightChildNode(firstTreeIndex)]);
440 continue;
443 size_t child = LeftChildNode(firstTreeIndex);
444 size_t parent = firstTreeIndex;
445 while (parent <= lastTreeIndex) {
446 T a = mTreeData[child];
447 child = RightNeighborNode(child);
448 T b = mTreeData[child];
449 child = RightNeighborNode(child);
450 mTreeData[parent] = std::max(a, b);
451 parent = RightNeighborNode(parent);
455 return true;
458 WebGLElementArrayCache::WebGLElementArrayCache()
462 WebGLElementArrayCache::~WebGLElementArrayCache()
466 bool
467 WebGLElementArrayCache::BufferData(const void* ptr, size_t byteLength)
469 if (mBytes.Length() != byteLength) {
470 if (!mBytes.SetLength(byteLength)) {
471 mBytes.SetLength(0);
472 return false;
475 MOZ_ASSERT(mBytes.Length() == byteLength);
476 return BufferSubData(0, ptr, byteLength);
479 bool
480 WebGLElementArrayCache::BufferSubData(size_t pos, const void* ptr,
481 size_t updateByteLength)
483 MOZ_ASSERT(pos + updateByteLength <= mBytes.Length());
484 if (!updateByteLength)
485 return true;
487 if (ptr)
488 memcpy(mBytes.Elements() + pos, ptr, updateByteLength);
489 else
490 memset(mBytes.Elements() + pos, 0, updateByteLength);
491 return UpdateTrees(pos, pos + updateByteLength - 1);
494 bool
495 WebGLElementArrayCache::UpdateTrees(size_t firstByte, size_t lastByte)
497 bool result = true;
498 if (mUint8Tree)
499 result &= mUint8Tree->Update(firstByte, lastByte);
500 if (mUint16Tree)
501 result &= mUint16Tree->Update(firstByte, lastByte);
502 if (mUint32Tree)
503 result &= mUint32Tree->Update(firstByte, lastByte);
504 return result;
507 template<typename T>
508 bool
509 WebGLElementArrayCache::Validate(uint32_t maxAllowed, size_t firstElement,
510 size_t countElements,
511 uint32_t* const out_upperBound)
513 *out_upperBound = 0;
515 // If maxAllowed is >= the max T value, then there is no way that a T index
516 // could be invalid.
517 uint32_t maxTSize = std::numeric_limits<T>::max();
518 if (maxAllowed >= maxTSize) {
519 UpdateUpperBound(out_upperBound, maxTSize);
520 return true;
523 T maxAllowedT(maxAllowed);
525 // Integer overflow must have been handled earlier, so we assert that
526 // maxAllowedT is exactly the max allowed value.
527 MOZ_ASSERT(uint32_t(maxAllowedT) == maxAllowed);
529 if (!mBytes.Length() || !countElements)
530 return true;
532 ScopedDeletePtr<WebGLElementArrayCacheTree<T>>& tree = TreeForType<T>::Value(this);
533 if (!tree) {
534 tree = new WebGLElementArrayCacheTree<T>(*this);
535 if (mBytes.Length()) {
536 bool valid = tree->Update(0, mBytes.Length() - 1);
537 if (!valid) {
538 // Do not assert here. This case would happen if an allocation
539 // failed. We've already settled on fallible allocations around
540 // here.
541 tree = nullptr;
542 return false;
547 size_t lastElement = firstElement + countElements - 1;
549 // Fast-exit path when the global maximum for the whole element array buffer
550 // falls in the allowed range:
551 T globalMax = tree->GlobalMaximum();
552 if (globalMax <= maxAllowedT) {
553 UpdateUpperBound(out_upperBound, globalMax);
554 return true;
557 const T* elements = Elements<T>();
559 // Before calling tree->Validate, we have to validate ourselves the
560 // boundaries of the elements span, to round them to the nearest multiple of
561 // kElementsPerLeaf.
562 size_t firstElementAdjustmentEnd = std::min(lastElement,
563 tree->LastElementUnderSameLeaf(firstElement));
564 while (firstElement <= firstElementAdjustmentEnd) {
565 const T& curData = elements[firstElement];
566 UpdateUpperBound(out_upperBound, curData);
567 if (curData > maxAllowedT)
568 return false;
570 firstElement++;
572 size_t lastElementAdjustmentEnd = std::max(firstElement,
573 tree->FirstElementUnderSameLeaf(lastElement));
574 while (lastElement >= lastElementAdjustmentEnd) {
575 const T& curData = elements[lastElement];
576 UpdateUpperBound(out_upperBound, curData);
577 if (curData > maxAllowedT)
578 return false;
580 lastElement--;
583 // at this point, for many tiny validations, we're already done.
584 if (firstElement > lastElement)
585 return true;
587 // general case
588 return tree->Validate(maxAllowedT, tree->LeafForElement(firstElement),
589 tree->LeafForElement(lastElement), out_upperBound);
592 bool
593 WebGLElementArrayCache::Validate(GLenum type, uint32_t maxAllowed,
594 size_t firstElement, size_t countElements,
595 uint32_t* const out_upperBound)
597 if (type == LOCAL_GL_UNSIGNED_BYTE)
598 return Validate<uint8_t>(maxAllowed, firstElement, countElements,
599 out_upperBound);
600 if (type == LOCAL_GL_UNSIGNED_SHORT)
601 return Validate<uint16_t>(maxAllowed, firstElement, countElements,
602 out_upperBound);
603 if (type == LOCAL_GL_UNSIGNED_INT)
604 return Validate<uint32_t>(maxAllowed, firstElement, countElements,
605 out_upperBound);
607 MOZ_ASSERT(false, "Invalid type.");
608 return false;
611 template<typename T>
612 static size_t
613 SizeOfNullable(mozilla::MallocSizeOf mallocSizeOf, const T& obj)
615 if (!obj)
616 return 0;
617 return obj->SizeOfIncludingThis(mallocSizeOf);
620 size_t
621 WebGLElementArrayCache::SizeOfIncludingThis(mozilla::MallocSizeOf mallocSizeOf) const
623 return mallocSizeOf(this) +
624 mBytes.SizeOfExcludingThis(mallocSizeOf) +
625 SizeOfNullable(mallocSizeOf, mUint8Tree) +
626 SizeOfNullable(mallocSizeOf, mUint16Tree) +
627 SizeOfNullable(mallocSizeOf, mUint32Tree);
630 bool
631 WebGLElementArrayCache::BeenUsedWithMultipleTypes() const
633 // C++ Standard ($4.7)
634 // "If the source type is bool, the value false is converted to zero and
635 // the value true is converted to one."
636 const int num_types_used = (mUint8Tree != nullptr) +
637 (mUint16Tree != nullptr) +
638 (mUint32Tree != nullptr);
639 return num_types_used > 1;
642 } // end namespace mozilla