Hash Hc<T> using its address
Summary:
To get the full benefit of hash-consing, we should ensure that in order to hash a tree, it is not necessary to traverse the entire tree. If we have the term data structure from [this paper](https://www.lri.fr/~filliatr/ftp/publis/hash-consing2.pdf):
```
type Term = Hc<TermNode>;
enum TermNode {
Var(u64),
Lam(Term),
App(Term, Term),
}
```
Then when hashing App, we should need only shallowly combine the memoized hashes of the two Terms inside (rather than traversing those subtrees to hash them). In the paper, this is achieved by storing the hash of the value inside Hc.
Since we currently support unsized types for Hc, it's not so easy to put additional metadata behind the pointer (see summary of
D33698453 (https://github.com/facebook/hhvm/commit/
745a1884bb4de1bca4300078bf1873a35c454610), where we removed the metadata struct). As I see it, we have two choices:
1. Give up on T: ?Sized in Hc<T>, add a metadata struct, and write the hash of T to that metadata. This requires us to box unsized types when hash-consing them (so Hc<str> becomes Hc<Box<str>>, and Hc<[FunParam]> becomes Hc<Box<[FunParam]>>, etc.), adding an allocation and an indirection.
2. Hash Hc<T> values using their address (or the address and length, for unsized T). This is the strategy implemented in this diff.
To avoid producing hashes with identical upper bytes (the problem edwinsmith described on
D33698451 (https://github.com/facebook/hhvm/commit/
e5c35478272115f2cf8ea3aacf9a808239a0ab91)), I've run the pointer value through fnv instead of directly writing the pointer value to the hasher. Possible alternative: consistently use a hasher which tolerates pointer values well, like HN's intern crate does with IdHasher (and the associated Map and Set types).
A downside of this approach is that hash(Hc<T>) != hash(T). I don't know if this has negative consequences in practice but it may be surprising.
Reviewed By: edwinsmith
Differential Revision:
D33825575
fbshipit-source-id:
9aab74c4cdd23275796c67f8480e5b31729a4d4c