2 * Copyright (c) 2015, Facebook, Inc.
5 * This source code is licensed under the MIT license found in the
6 * LICENSE file in the "hack" directory of this source tree.
11 +---------------------------------+
12 | Let's talk about naming tables! |
13 +---------------------------------+
15 Every typechecker or compiler that's more complicated than a toy project needs some way to look
16 up the location of a given symbol, especially one that operates at the same scale as Hack and
17 double especially as Hack moves towards generating data lazily on demand instead of generating
18 all of the data up front during strictly defined typechecking phases. In Hack, the data
19 structures that map from symbols to filenames and from filenames to symbol names are referred to
20 as the "naming table," with the "forward naming table" mapping from filenames to symbol names
21 and the "reverse naming table" mapping from symbol names to file names. We store the forward
22 naming table as a standard OCaml map since we only read and write it from the main process, and
23 we store the reverse naming table in shared memory since we want to allow rapid reads and writes
24 from worker processes.
27 - Forward naming table: OCaml map, maps filename -> symbols in file
28 - Reverse naming table: shared memory, maps symbol name to filename it's defined in
30 Seems simple enough, right? Unfortunately, this is where the real world and all of its attendant
31 complexity comes in. The naming table (forward + reverse) is big. And not only is it big, its
32 size is on the order of O(repo). This conflicts with our goal of making Hack use a flat amount
33 of memory regardless of the amount of code we're actually typechecking, so it needs to go. In
34 this case, we've chosen to use our existing saved state infrastructure and dump the naming
35 table to a SQLite file which we can use to serve queries without having to store anything in
36 memory other than just the differences between what's in the saved state and what's on disk.
38 Right now, only the reverse naming table supports falling back to SQLite, so let's go through
39 each operation that the reverse naming table supports and cover how it handles falling back to
43 +---------------------------------+
44 | Reverse naming table operations |
45 +---------------------------------+
47 The reverse naming table supports three operations: [put], [get], and [delete]. [Put] is pretty
48 simple, conceptually, but there's some trickiness with [delete] that would benefit from being
49 covered in more detail, and that trickiness bleeds into how [get] works as well. One important
50 consideration is that *the SQLite database is read-only.* The general idea of these operations
51 is that shared memory stores only entries that have changed.
54 [Put] is simple. Since the SQLite database is read-only and we use shared memory for storing any
55 changes, we just put the value in shared memory.
57 +------------------+ +------------+
59 | Naming Heap | | SQLite |
61 [put key value] -----> [put key value] | | |
63 +------------------+ +------------+
66 We're going to cover [delete] next because it's the only reason that [get] has any complexity
67 beyond "look for the value in shared memory and if it's not there look for it in SQLite." At
68 first glance, [delete] doesn't seem too difficult. Just delete the entry from shared memory,
69 right? Well, that's fine... unless that key also exists in SQLite. If the value is in SQLite,
70 then all that deleting it from shared memory does is make us start returning the stored SQLite
71 value! This is bad, because if we just deleted a key of course we want [get] operations for it
72 to return [None]. But okay, we can work around that. Instead of storing direct values in shared
73 memory, we'll store an enum of [[Added of 'a | Deleted]]. Easy peasy!
75 ...except what do we do if we want to delete a value in the main process, then add it again in a
76 worker process? That would require changing a key's value in shared memory, which is something
77 that we don't support due to data integrity concerns with multiple writers and readers. We could
78 remove and re-add, except removal can only be done by master because of the same integrity
79 concerns. So master would somehow need to know which entries will be added and prematurely
80 remove the [Deleted] sentinel from them. This is difficult enough as to be effectively
83 So what we're left needing is some way to delete values which can be undone by child processes,
84 which means that undeleting a value needs to consist only of [add] operations to shared memory,
85 and have no dependency on [remove]. Enter: the blocked entries heap.
87 For each of the reverse naming table heaps (types, functions, and constants) we also create a
88 heap that only stores values of a single-case enum: [Blocked]. The crucial difference between
89 [Blocked] and [[Added of 'a | Deleted]] is that *we only check for [Blocked] if the value is
90 not in the shared memory naming heap.* This means that we can effectively undelete an entry by
91 using an only an [add] operation. Exactly what we need! Thus, [delete] becomes:
93 +-----------------+ +---------------------+ +------------+
95 | Naming Heap | | Blocked Entries | | SQLite |
97 [delete key] --+-----> [delete key] | | | | |
99 +----------------------------> [add key Blocked] | | |
101 +-----------------+ +---------------------+ +------------+
104 Wow, what a ride, huh? Now that we know how to add and remove entries, let's make this actually
105 useful and talk about how to read them! The general idea is that we first check to see if the
106 value is in the shared memory naming heap. If so, we can return that immediately. If it's not
107 in the naming heap, then we check to see if it's blocked. If it is, we immediately return
108 [None]. If it's not in the naming heap, and it's not in the blocked entries, then and only then
109 do we read the value from SQLite:
111 +-----------------+ +---------------------+ +------------------+
113 | Naming Heap | | Blocked Entries | | SQLite |
115 [get key] -----------> [get key] is: | | | | |
116 [Some value] <---------- [Some value] | | | | |
117 | [None] -------+ | | | |
119 | | +--> [get key] is: | | |
120 [None] <--------------------------------------- [Some Blocked] | | |
121 | | | [None] -----------+ | |
123 | | | | +--> [get key] is: |
124 [Some value] <------------------------------------------------------------ [Some value] |
125 [None] <------------------------------------------------------------------ [None] |
127 +-----------------+ +---------------------+ +------------------+
129 And we're done! I hope this was easy to understand and as fun to read as it was to write :)
132 Q: When do we delete entries from the Blocked Entries heaps?
133 A: Never! Once an entry has been removed at least once we never want to retrieve the SQLite
134 version for the rest of the program execution.
136 Q: Won't we just end up with a heap full of [Blocked] entries now?
137 A: Not really. We only have to do deletions once to remove entries for dirty files, and after
138 that it's not a concern. Plus [Blocked] entries are incredibly small in shared memory
139 (although they do still occupy a hash slot).
142 (* Changes since baseline can be None if there was no baseline to begin with.
143 The scenario where we apply changes since baseline instead of relying on
144 the packaged local changes in the LOCAL_CHANGES table is this:
145 1) Load the naming table baseline (may include local changes if it was
146 incrementally updated at some point - not currently done in practice,
147 but possible and likely to happen in the future)
148 2) Load the changes since baseline that include naming changes processed
149 at another time (perhaps, on another host)
150 In the scenario where the naming table is saved to SQLite from an Unbacked
151 naming table, there is no baseline to speak of
153 type changes_since_baseline
= Naming_sqlite.local_changes
option
156 | Unbacked
of FileInfo.t
Relative_path.Map.t
157 | Backed
of Naming_sqlite.local_changes
160 type fast
= FileInfo.names
Relative_path.Map.t
162 type saved_state_info
= FileInfo.saved
Relative_path.Map.t
164 (*****************************************************************************)
165 (* Forward naming table functions *)
166 (*****************************************************************************)
168 let empty = Unbacked
Relative_path.Map.empty
172 | Unbacked a
-> Unbacked
(Relative_path.Map.filter a f
)
173 | Backed local_changes
->
174 let file_deltas = local_changes
.Naming_sqlite.file_deltas in
178 Naming_sqlite.file_deltas =
187 Relative_path.Map.add
190 ~data
:Naming_sqlite.Deleted
195 let fold a ~init ~f
=
197 | Unbacked a
-> Relative_path.Map.fold a ~init ~f
198 | Backed local_changes
->
202 ~
file_deltas:local_changes
.Naming_sqlite.file_deltas
206 | Unbacked a
-> Relative_path.Map.keys a
207 | Backed local_changes
->
208 (* Reverse at the end to preserve ascending sort order. *)
211 ~f
:(fun path _ acc
-> path
:: acc
)
212 ~
file_deltas:local_changes
.Naming_sqlite.file_deltas
215 let get_files_changed_since_baseline
216 (changes_since_baseline
: changes_since_baseline
) : Relative_path.t list
=
217 match changes_since_baseline
with
218 | Some changes_since_baseline
->
219 Relative_path.Map.keys changes_since_baseline
.Naming_sqlite.file_deltas
222 let get_file_info a key
=
224 | Unbacked a
-> Relative_path.Map.find_opt a key
225 | Backed local_changes
->
227 Relative_path.Map.find_opt local_changes
.Naming_sqlite.file_deltas key
229 | Some
(Naming_sqlite.Modified fi
) -> Some fi
230 | Some
Naming_sqlite.Deleted
-> None
231 | None
-> Naming_sqlite.get_file_info key
)
233 let get_file_info_unsafe a key
=
234 Core_kernel.Option.value_exn
(get_file_info a key
)
237 match get_file_info a key
with
243 | Unbacked a
-> Relative_path.Map.iter a ~f
244 | Backed local_changes
->
247 ~f
:(fun path fi
() -> f path fi
)
248 ~
file_deltas:local_changes
.Naming_sqlite.file_deltas
252 | Unbacked a
-> Unbacked
(Relative_path.Map.remove a key
)
253 | Backed local_changes
->
257 Naming_sqlite.file_deltas =
258 Relative_path.Map.add
259 local_changes
.Naming_sqlite.file_deltas
261 ~data
:Naming_sqlite.Deleted
;
264 let update a key data
=
266 | Unbacked a
-> Unbacked
(Relative_path.Map.add a ~key ~data
)
267 | Backed local_changes
->
271 Naming_sqlite.file_deltas =
272 Relative_path.Map.add
273 local_changes
.Naming_sqlite.file_deltas
275 ~data
:(Naming_sqlite.Modified data
);
278 let update_many a updates
=
281 (* Reverse the order because union always takes the first value. *)
282 Unbacked
(Relative_path.Map.union updates a
)
283 | Backed local_changes
->
285 Relative_path.Map.map updates ~f
:(fun data
-> Naming_sqlite.Modified data
)
290 Naming_sqlite.file_deltas =
291 Relative_path.Map.union
293 local_changes
.Naming_sqlite.file_deltas;
298 | Unbacked b
-> update_many a b
301 "SQLite-backed naming tables cannot be the second argument to combine."
304 This function saves the local changes structure as a binary blob.
305 Local changes represents the (forward) naming table changes since
306 the SQLite naming table baseline. Therefore, this function is most meaningful
307 when the naming table is backed by SQLite. If the naming table is NOT backed
308 by SQLite, meaning that it was computed from scratch by parsing the whole
309 repo, then this baseline-to-current difference is an empty set.
310 The resulting snapshot can be applied when loading a SQLite naming table.
312 let save_changes_since_baseline naming_table ~destination_path
=
314 match naming_table
with
316 | Backed local_changes
-> Some local_changes
318 let contents = Marshal.to_string
snapshot [Marshal.No_sharing
] in
319 Disk.write_file ~file
:destination_path ~
contents
321 let save naming_table db_name
=
322 match naming_table
with
323 | Unbacked naming_table
->
324 let t = Unix.gettimeofday
() in
325 (* Ideally, we would have a commit hash, which would result in the same
326 content version given the same version of source files. However,
327 using a unique version every time we save the naming table from scratch
328 is good enough to enable us to check that the local changes diff
329 provided as an argument to load_from_sqlite_with_changes_since_baseline
330 is compatible with the underlying SQLite table. *)
331 let base_content_version =
332 Printf.sprintf
"%d-%s" (int_of_float
t) (Random_id.short_string
())
335 Naming_sqlite.save_file_infos db_name naming_table ~
base_content_version
338 Hh_logger.log_duration
340 "Inserted %d files and %d symbols"
341 save_result.Naming_sqlite.files_added
342 save_result.Naming_sqlite.symbols_added
)
346 | Backed local_changes
->
347 let t = Unix.gettimeofday
() in
348 let old_path = Naming_sqlite.get_db_path
() in
349 (* Don't overwrite. *)
351 ~force
:(FileUtil.Ask
(fun _
-> false))
352 [Core_kernel.Option.value_exn
(Naming_sqlite.get_db_path
())]
354 Naming_sqlite.set_db_path
(Some db_name
);
355 Naming_sqlite.update_file_infos db_name local_changes
;
357 Hh_logger.log_duration
359 "Updated a blob with %d entries"
360 (Relative_path.Map.cardinal local_changes
.Naming_sqlite.file_deltas))
363 Naming_sqlite.set_db_path
old_path;
364 { Naming_sqlite.empty_save_result
with Naming_sqlite.files_added
= 1 }
366 (*****************************************************************************)
367 (* Conversion functions *)
368 (*****************************************************************************)
370 let from_saved saved
=
371 Hh_logger.log
"Loading naming table from marshalled blob...";
372 let t = Unix.gettimeofday
() in
375 (Relative_path.Map.fold
377 ~init
:Relative_path.Map.empty
378 ~f
:(fun fn saved acc
->
379 let file_info = FileInfo.from_saved fn saved
in
380 Relative_path.Map.add acc fn
file_info))
382 let _t = Hh_logger.log_duration
"Loaded naming table from blob" t in
387 | Unbacked a
-> Relative_path.Map.map a
FileInfo.to_saved
389 fold a ~init
:Relative_path.Map.empty ~f
:(fun path fi acc
->
390 Relative_path.Map.add acc ~key
:path ~data
:(FileInfo.to_saved fi
))
394 | Unbacked a
-> Relative_path.Map.map a
FileInfo.simplify
396 fold a ~init
:Relative_path.Map.empty ~f
:(fun path fi acc
->
397 Relative_path.Map.add acc ~key
:path ~data
:(FileInfo.simplify fi
))
399 let saved_to_fast saved
= Relative_path.Map.map saved
FileInfo.saved_to_names
401 (*****************************************************************************)
402 (* Forward naming table creation functions *)
403 (*****************************************************************************)
405 let create a
= Unbacked a
407 let update_reverse_entries file_deltas =
408 Relative_path.Map.iter file_deltas ~f
:(fun path delta
->
410 match Naming_sqlite.get_file_info path
with
412 Naming_provider.remove_type_batch
413 (fi
.FileInfo.classes
|> List.map snd
|> SSet.of_list
);
414 Naming_provider.remove_type_batch
415 (fi
.FileInfo.typedefs
|> List.map snd
|> SSet.of_list
);
416 Naming_provider.remove_fun_batch
417 (fi
.FileInfo.funs
|> List.map snd
|> SSet.of_list
);
418 Naming_provider.remove_const_batch
419 (fi
.FileInfo.consts
|> List.map snd
|> SSet.of_list
)
423 | Naming_sqlite.Modified fi
->
425 (fun (pos
, name
) -> Naming_provider.add_class name pos
)
428 (fun (pos
, name
) -> Naming_provider.add_typedef name pos
)
429 fi
.FileInfo.typedefs
;
431 (fun (pos
, name
) -> Naming_provider.add_fun name pos
)
434 (fun (pos
, name
) -> Naming_provider.add_const name pos
)
436 | Naming_sqlite.Deleted
-> ())
438 let choose_local_changes ~local_changes ~custom_local_changes
=
439 match custom_local_changes
with
440 | None
-> local_changes
441 | Some custom_local_changes
->
443 custom_local_changes
.Naming_sqlite.base_content_version
444 = local_changes
.Naming_sqlite.base_content_version
450 "%s\nSQLite content version: %s\nLocal changes content version: %s"
451 "Incompatible local changes diff supplied."
452 local_changes
.Naming_sqlite.base_content_version
453 custom_local_changes
.Naming_sqlite.base_content_version)
456 Loads the naming table from a SQLite database. The optional custom local
457 changes represents the naming table changes that occurred since the original
458 SQLite database was created.
459 To recap, the SQLite-based naming table, once loaded, is not mutated. All the
460 forward naming table changes are kept in an in-memory map. When we save an
461 update to a naming table that is backed by SQLite, we store the changes
462 since the baseline as a blob is a separate table and we rehydrate those
463 changes into an in-memory map when we load the updated naming table later.
464 The custom local changes is an alternative to the changes serialized as a
465 blob into the SQLite table. This enables scenarios that require repeatedly
466 loading the same base SQLite naming table with different versions of
469 let load_from_sqlite_for_type_checking
470 ~
(should_update_reverse_entries
: bool)
471 ~
(custom_local_changes
: Naming_sqlite.local_changes
option)
472 (db_path
: string) : t =
473 Hh_logger.log
"Loading naming table from SQLite...";
474 let t = Unix.gettimeofday
() in
475 Naming_sqlite.set_db_path
(Some db_path
);
478 ~
local_changes:(Naming_sqlite.get_local_changes
())
479 ~custom_local_changes
481 let t = Hh_logger.log_duration
"Loaded local naming table changes" t in
482 if should_update_reverse_entries
then begin
483 update_reverse_entries local_changes.Naming_sqlite.file_deltas;
484 let _t = Hh_logger.log_duration
"Updated reverse naming table entries" t in
489 let load_from_sqlite_with_changes_since_baseline
490 (changes_since_baseline
: Naming_sqlite.local_changes option)
491 (db_path
: string) : t =
492 load_from_sqlite_for_type_checking
493 ~should_update_reverse_entries
:true
494 ~custom_local_changes
:changes_since_baseline
497 let load_from_sqlite_for_batch_update (db_path
: string) : t =
498 load_from_sqlite_for_type_checking
499 ~should_update_reverse_entries
:false
500 ~custom_local_changes
:None
503 let load_from_sqlite (db_path
: string) : t =
504 load_from_sqlite_for_type_checking
505 ~should_update_reverse_entries
:true
506 ~custom_local_changes
:None
509 let get_reverse_naming_fallback_path () : string option =
510 Naming_sqlite.get_db_path
()
512 let get_forward_naming_fallback_path a
: string option =
515 | Backed _
-> Naming_sqlite.get_db_path
()
517 (*****************************************************************************)
518 (* Testing functions *)
519 (*****************************************************************************)
521 let assert_is_backed a backed
=
523 | Unbacked _
-> assert (not backed
)
524 | Backed _
-> assert backed