Support typedefs in `Naming_provider`
[hiphop-php.git] / hphp / hack / src / naming / naming_table.ml
blob8ff130500b89b1d19542e26232afc1d363ac4ac2
1 (*
2 * Copyright (c) 2015, Facebook, Inc.
3 * All rights reserved.
5 * This source code is licensed under the MIT license found in the
6 * LICENSE file in the "hack" directory of this source tree.
8 *)
10 (**
11 +---------------------------------+
12 | Let's talk about naming tables! |
13 +---------------------------------+
15 Every typechecker or compiler that's more complicated than a toy project needs some way to look
16 up the location of a given symbol, especially one that operates at the same scale as Hack and
17 double especially as Hack moves towards generating data lazily on demand instead of generating
18 all of the data up front during strictly defined typechecking phases. In Hack, the data
19 structures that map from symbols to filenames and from filenames to symbol names are referred to
20 as the "naming table," with the "forward naming table" mapping from filenames to symbol names
21 and the "reverse naming table" mapping from symbol names to file names. We store the forward
22 naming table as a standard OCaml map since we only read and write it from the main process, and
23 we store the reverse naming table in shared memory since we want to allow rapid reads and writes
24 from worker processes.
26 So, to recap:
27 - Forward naming table: OCaml map, maps filename -> symbols in file
28 - Reverse naming table: shared memory, maps symbol name to filename it's defined in
30 Seems simple enough, right? Unfortunately, this is where the real world and all of its attendant
31 complexity comes in. The naming table (forward + reverse) is big. And not only is it big, its
32 size is on the order of O(repo). This conflicts with our goal of making Hack use a flat amount
33 of memory regardless of the amount of code we're actually typechecking, so it needs to go. In
34 this case, we've chosen to use our existing saved state infrastructure and dump the naming
35 table to a SQLite file which we can use to serve queries without having to store anything in
36 memory other than just the differences between what's in the saved state and what's on disk.
38 Right now, only the reverse naming table supports falling back to SQLite, so let's go through
39 each operation that the reverse naming table supports and cover how it handles falling back to
40 SQLite.
43 +---------------------------------+
44 | Reverse naming table operations |
45 +---------------------------------+
47 The reverse naming table supports three operations: [put], [get], and [delete]. [Put] is pretty
48 simple, conceptually, but there's some trickiness with [delete] that would benefit from being
49 covered in more detail, and that trickiness bleeds into how [get] works as well. One important
50 consideration is that *the SQLite database is read-only.* The general idea of these operations
51 is that shared memory stores only entries that have changed.
53 PUT:
54 [Put] is simple. Since the SQLite database is read-only and we use shared memory for storing any
55 changes, we just put the value in shared memory.
57 +------------------+ +------------+
58 | | | |
59 | Naming Heap | | SQLite |
60 | | | |
61 [put key value] -----> [put key value] | | |
62 | | | |
63 +------------------+ +------------+
65 DELETE:
66 We're going to cover [delete] next because it's the only reason that [get] has any complexity
67 beyond "look for the value in shared memory and if it's not there look for it in SQLite." At
68 first glance, [delete] doesn't seem too difficult. Just delete the entry from shared memory,
69 right? Well, that's fine... unless that key also exists in SQLite. If the value is in SQLite,
70 then all that deleting it from shared memory does is make us start returning the stored SQLite
71 value! This is bad, because if we just deleted a key of course we want [get] operations for it
72 to return [None]. But okay, we can work around that. Instead of storing direct values in shared
73 memory, we'll store an enum of [[Added of 'a | Deleted]]. Easy peasy!
75 ...except what do we do if we want to delete a value in the main process, then add it again in a
76 worker process? That would require changing a key's value in shared memory, which is something
77 that we don't support due to data integrity concerns with multiple writers and readers. We could
78 remove and re-add, except removal can only be done by master because of the same integrity
79 concerns. So master would somehow need to know which entries will be added and prematurely
80 remove the [Deleted] sentinel from them. This is difficult enough as to be effectively
81 impossible.
83 So what we're left needing is some way to delete values which can be undone by child processes,
84 which means that undeleting a value needs to consist only of [add] operations to shared memory,
85 and have no dependency on [remove]. Enter: the blocked entries heap.
87 For each of the reverse naming table heaps (types, functions, and constants) we also create a
88 heap that only stores values of a single-case enum: [Blocked]. The crucial difference between
89 [Blocked] and [[Added of 'a | Deleted]] is that *we only check for [Blocked] if the value is
90 not in the shared memory naming heap.* This means that we can effectively undelete an entry by
91 using an only an [add] operation. Exactly what we need! Thus, [delete] becomes:
93 +-----------------+ +---------------------+ +------------+
94 | | | | | |
95 | Naming Heap | | Blocked Entries | | SQLite |
96 | | | | | |
97 [delete key] --+-----> [delete key] | | | | |
98 | | | | | | |
99 +----------------------------> [add key Blocked] | | |
100 | | | | | |
101 +-----------------+ +---------------------+ +------------+
103 GET:
104 Wow, what a ride, huh? Now that we know how to add and remove entries, let's make this actually
105 useful and talk about how to read them! The general idea is that we first check to see if the
106 value is in the shared memory naming heap. If so, we can return that immediately. If it's not
107 in the naming heap, then we check to see if it's blocked. If it is, we immediately return
108 [None]. If it's not in the naming heap, and it's not in the blocked entries, then and only then
109 do we read the value from SQLite:
111 +-----------------+ +---------------------+ +------------------+
112 | | | | | |
113 | Naming Heap | | Blocked Entries | | SQLite |
114 | | | | | |
115 [get key] -----------> [get key] is: | | | | |
116 [Some value] <---------- [Some value] | | | | |
117 | [None] -------+ | | | |
118 | | \ | | | |
119 | | +--> [get key] is: | | |
120 [None] <--------------------------------------- [Some Blocked] | | |
121 | | | [None] -----------+ | |
122 | | | | \ | |
123 | | | | +--> [get key] is: |
124 [Some value] <------------------------------------------------------------ [Some value] |
125 [None] <------------------------------------------------------------------ [None] |
126 | | | | | |
127 +-----------------+ +---------------------+ +------------------+
129 And we're done! I hope this was easy to understand and as fun to read as it was to write :)
131 MINUTIAE:
132 Q: When do we delete entries from the Blocked Entries heaps?
133 A: Never! Once an entry has been removed at least once we never want to retrieve the SQLite
134 version for the rest of the program execution.
136 Q: Won't we just end up with a heap full of [Blocked] entries now?
137 A: Not really. We only have to do deletions once to remove entries for dirty files, and after
138 that it's not a concern. Plus [Blocked] entries are incredibly small in shared memory
139 (although they do still occupy a hash slot).
142 (* Changes since baseline can be None if there was no baseline to begin with.
143 The scenario where we apply changes since baseline instead of relying on
144 the packaged local changes in the LOCAL_CHANGES table is this:
145 1) Load the naming table baseline (may include local changes if it was
146 incrementally updated at some point - not currently done in practice,
147 but possible and likely to happen in the future)
148 2) Load the changes since baseline that include naming changes processed
149 at another time (perhaps, on another host)
150 In the scenario where the naming table is saved to SQLite from an Unbacked
151 naming table, there is no baseline to speak of
153 type changes_since_baseline = Naming_sqlite.local_changes option
155 type t =
156 | Unbacked of FileInfo.t Relative_path.Map.t
157 | Backed of Naming_sqlite.local_changes
158 [@@deriving show]
160 type fast = FileInfo.names Relative_path.Map.t
162 type saved_state_info = FileInfo.saved Relative_path.Map.t
164 (*****************************************************************************)
165 (* Forward naming table functions *)
166 (*****************************************************************************)
168 let empty = Unbacked Relative_path.Map.empty
170 let filter a ~f =
171 match a with
172 | Unbacked a -> Unbacked (Relative_path.Map.filter a f)
173 | Backed local_changes ->
174 let file_deltas = local_changes.Naming_sqlite.file_deltas in
175 Backed
177 local_changes with
178 Naming_sqlite.file_deltas =
179 Naming_sqlite.fold
180 ~init:file_deltas
182 begin
183 fun path fi acc ->
184 if f path fi then
186 else
187 Relative_path.Map.add
189 ~key:path
190 ~data:Naming_sqlite.Deleted
192 ~file_deltas;
195 let fold a ~init ~f =
196 match a with
197 | Unbacked a -> Relative_path.Map.fold a ~init ~f
198 | Backed local_changes ->
199 Naming_sqlite.fold
200 ~init
202 ~file_deltas:local_changes.Naming_sqlite.file_deltas
204 let get_files a =
205 match a with
206 | Unbacked a -> Relative_path.Map.keys a
207 | Backed local_changes ->
208 (* Reverse at the end to preserve ascending sort order. *)
209 Naming_sqlite.fold
210 ~init:[]
211 ~f:(fun path _ acc -> path :: acc)
212 ~file_deltas:local_changes.Naming_sqlite.file_deltas
213 |> List.rev
215 let get_files_changed_since_baseline
216 (changes_since_baseline : changes_since_baseline) : Relative_path.t list =
217 match changes_since_baseline with
218 | Some changes_since_baseline ->
219 Relative_path.Map.keys changes_since_baseline.Naming_sqlite.file_deltas
220 | None -> []
222 let get_file_info a key =
223 match a with
224 | Unbacked a -> Relative_path.Map.find_opt a key
225 | Backed local_changes ->
226 (match
227 Relative_path.Map.find_opt local_changes.Naming_sqlite.file_deltas key
228 with
229 | Some (Naming_sqlite.Modified fi) -> Some fi
230 | Some Naming_sqlite.Deleted -> None
231 | None -> Naming_sqlite.get_file_info key)
233 let get_file_info_unsafe a key =
234 Core_kernel.Option.value_exn (get_file_info a key)
236 let has_file a key =
237 match get_file_info a key with
238 | Some _ -> true
239 | None -> false
241 let iter a ~f =
242 match a with
243 | Unbacked a -> Relative_path.Map.iter a ~f
244 | Backed local_changes ->
245 Naming_sqlite.fold
246 ~init:()
247 ~f:(fun path fi () -> f path fi)
248 ~file_deltas:local_changes.Naming_sqlite.file_deltas
250 let remove a key =
251 match a with
252 | Unbacked a -> Unbacked (Relative_path.Map.remove a key)
253 | Backed local_changes ->
254 Backed
256 local_changes with
257 Naming_sqlite.file_deltas =
258 Relative_path.Map.add
259 local_changes.Naming_sqlite.file_deltas
260 ~key
261 ~data:Naming_sqlite.Deleted;
264 let update a key data =
265 match a with
266 | Unbacked a -> Unbacked (Relative_path.Map.add a ~key ~data)
267 | Backed local_changes ->
268 Backed
270 local_changes with
271 Naming_sqlite.file_deltas =
272 Relative_path.Map.add
273 local_changes.Naming_sqlite.file_deltas
274 ~key
275 ~data:(Naming_sqlite.Modified data);
278 let update_many a updates =
279 match a with
280 | Unbacked a ->
281 (* Reverse the order because union always takes the first value. *)
282 Unbacked (Relative_path.Map.union updates a)
283 | Backed local_changes ->
284 let local_updates =
285 Relative_path.Map.map updates ~f:(fun data -> Naming_sqlite.Modified data)
287 Backed
289 local_changes with
290 Naming_sqlite.file_deltas =
291 Relative_path.Map.union
292 local_updates
293 local_changes.Naming_sqlite.file_deltas;
296 let combine a b =
297 match b with
298 | Unbacked b -> update_many a b
299 | _ ->
300 failwith
301 "SQLite-backed naming tables cannot be the second argument to combine."
304 This function saves the local changes structure as a binary blob.
305 Local changes represents the (forward) naming table changes since
306 the SQLite naming table baseline. Therefore, this function is most meaningful
307 when the naming table is backed by SQLite. If the naming table is NOT backed
308 by SQLite, meaning that it was computed from scratch by parsing the whole
309 repo, then this baseline-to-current difference is an empty set.
310 The resulting snapshot can be applied when loading a SQLite naming table.
312 let save_changes_since_baseline naming_table ~destination_path =
313 let snapshot =
314 match naming_table with
315 | Unbacked _ -> None
316 | Backed local_changes -> Some local_changes
318 let contents = Marshal.to_string snapshot [Marshal.No_sharing] in
319 Disk.write_file ~file:destination_path ~contents
321 let save naming_table db_name =
322 match naming_table with
323 | Unbacked naming_table ->
324 let t = Unix.gettimeofday () in
325 (* Ideally, we would have a commit hash, which would result in the same
326 content version given the same version of source files. However,
327 using a unique version every time we save the naming table from scratch
328 is good enough to enable us to check that the local changes diff
329 provided as an argument to load_from_sqlite_with_changes_since_baseline
330 is compatible with the underlying SQLite table. *)
331 let base_content_version =
332 Printf.sprintf "%d-%s" (int_of_float t) (Random_id.short_string ())
334 let save_result =
335 Naming_sqlite.save_file_infos db_name naming_table ~base_content_version
337 let (_ : float) =
338 Hh_logger.log_duration
339 (Printf.sprintf
340 "Inserted %d files and %d symbols"
341 save_result.Naming_sqlite.files_added
342 save_result.Naming_sqlite.symbols_added)
345 save_result
346 | Backed local_changes ->
347 let t = Unix.gettimeofday () in
348 let old_path = Naming_sqlite.get_db_path () in
349 (* Don't overwrite. *)
350 FileUtil.cp
351 ~force:(FileUtil.Ask (fun _ -> false))
352 [Core_kernel.Option.value_exn (Naming_sqlite.get_db_path ())]
353 db_name;
354 Naming_sqlite.set_db_path (Some db_name);
355 Naming_sqlite.update_file_infos db_name local_changes;
356 let (_ : float) =
357 Hh_logger.log_duration
358 (Printf.sprintf
359 "Updated a blob with %d entries"
360 (Relative_path.Map.cardinal local_changes.Naming_sqlite.file_deltas))
363 Naming_sqlite.set_db_path old_path;
364 { Naming_sqlite.empty_save_result with Naming_sqlite.files_added = 1 }
366 (*****************************************************************************)
367 (* Conversion functions *)
368 (*****************************************************************************)
370 let from_saved saved =
371 Hh_logger.log "Loading naming table from marshalled blob...";
372 let t = Unix.gettimeofday () in
373 let naming_table =
374 Unbacked
375 (Relative_path.Map.fold
376 saved
377 ~init:Relative_path.Map.empty
378 ~f:(fun fn saved acc ->
379 let file_info = FileInfo.from_saved fn saved in
380 Relative_path.Map.add acc fn file_info))
382 let _t = Hh_logger.log_duration "Loaded naming table from blob" t in
383 naming_table
385 let to_saved a =
386 match a with
387 | Unbacked a -> Relative_path.Map.map a FileInfo.to_saved
388 | Backed _ ->
389 fold a ~init:Relative_path.Map.empty ~f:(fun path fi acc ->
390 Relative_path.Map.add acc ~key:path ~data:(FileInfo.to_saved fi))
392 let to_fast a =
393 match a with
394 | Unbacked a -> Relative_path.Map.map a FileInfo.simplify
395 | Backed _ ->
396 fold a ~init:Relative_path.Map.empty ~f:(fun path fi acc ->
397 Relative_path.Map.add acc ~key:path ~data:(FileInfo.simplify fi))
399 let saved_to_fast saved = Relative_path.Map.map saved FileInfo.saved_to_names
401 (*****************************************************************************)
402 (* Forward naming table creation functions *)
403 (*****************************************************************************)
405 let create a = Unbacked a
407 let update_reverse_entries file_deltas =
408 Relative_path.Map.iter file_deltas ~f:(fun path delta ->
409 begin
410 match Naming_sqlite.get_file_info path with
411 | Some fi ->
412 Naming_provider.remove_type_batch
413 (fi.FileInfo.classes |> List.map snd |> SSet.of_list);
414 Naming_provider.remove_type_batch
415 (fi.FileInfo.typedefs |> List.map snd |> SSet.of_list);
416 Naming_provider.remove_fun_batch
417 (fi.FileInfo.funs |> List.map snd |> SSet.of_list);
418 Naming_provider.remove_const_batch
419 (fi.FileInfo.consts |> List.map snd |> SSet.of_list)
420 | None -> ()
421 end;
422 match delta with
423 | Naming_sqlite.Modified fi ->
424 List.iter
425 (fun (pos, name) -> Naming_provider.add_class name pos)
426 fi.FileInfo.classes;
427 List.iter
428 (fun (pos, name) -> Naming_provider.add_typedef name pos)
429 fi.FileInfo.typedefs;
430 List.iter
431 (fun (pos, name) -> Naming_provider.add_fun name pos)
432 fi.FileInfo.funs;
433 List.iter
434 (fun (pos, name) -> Naming_provider.add_const name pos)
435 fi.FileInfo.consts
436 | Naming_sqlite.Deleted -> ())
438 let choose_local_changes ~local_changes ~custom_local_changes =
439 match custom_local_changes with
440 | None -> local_changes
441 | Some custom_local_changes ->
443 custom_local_changes.Naming_sqlite.base_content_version
444 = local_changes.Naming_sqlite.base_content_version
445 then
446 custom_local_changes
447 else
448 failwith
449 (Printf.sprintf
450 "%s\nSQLite content version: %s\nLocal changes content version: %s"
451 "Incompatible local changes diff supplied."
452 local_changes.Naming_sqlite.base_content_version
453 custom_local_changes.Naming_sqlite.base_content_version)
456 Loads the naming table from a SQLite database. The optional custom local
457 changes represents the naming table changes that occurred since the original
458 SQLite database was created.
459 To recap, the SQLite-based naming table, once loaded, is not mutated. All the
460 forward naming table changes are kept in an in-memory map. When we save an
461 update to a naming table that is backed by SQLite, we store the changes
462 since the baseline as a blob is a separate table and we rehydrate those
463 changes into an in-memory map when we load the updated naming table later.
464 The custom local changes is an alternative to the changes serialized as a
465 blob into the SQLite table. This enables scenarios that require repeatedly
466 loading the same base SQLite naming table with different versions of
467 the local changes.
469 let load_from_sqlite_for_type_checking
470 ~(should_update_reverse_entries : bool)
471 ~(custom_local_changes : Naming_sqlite.local_changes option)
472 (db_path : string) : t =
473 Hh_logger.log "Loading naming table from SQLite...";
474 let t = Unix.gettimeofday () in
475 Naming_sqlite.set_db_path (Some db_path);
476 let local_changes =
477 choose_local_changes
478 ~local_changes:(Naming_sqlite.get_local_changes ())
479 ~custom_local_changes
481 let t = Hh_logger.log_duration "Loaded local naming table changes" t in
482 if should_update_reverse_entries then begin
483 update_reverse_entries local_changes.Naming_sqlite.file_deltas;
484 let _t = Hh_logger.log_duration "Updated reverse naming table entries" t in
486 end;
487 Backed local_changes
489 let load_from_sqlite_with_changes_since_baseline
490 (changes_since_baseline : Naming_sqlite.local_changes option)
491 (db_path : string) : t =
492 load_from_sqlite_for_type_checking
493 ~should_update_reverse_entries:true
494 ~custom_local_changes:changes_since_baseline
495 db_path
497 let load_from_sqlite_for_batch_update (db_path : string) : t =
498 load_from_sqlite_for_type_checking
499 ~should_update_reverse_entries:false
500 ~custom_local_changes:None
501 db_path
503 let load_from_sqlite (db_path : string) : t =
504 load_from_sqlite_for_type_checking
505 ~should_update_reverse_entries:true
506 ~custom_local_changes:None
507 db_path
509 let get_reverse_naming_fallback_path () : string option =
510 Naming_sqlite.get_db_path ()
512 let get_forward_naming_fallback_path a : string option =
513 match a with
514 | Unbacked _ -> None
515 | Backed _ -> Naming_sqlite.get_db_path ()
517 (*****************************************************************************)
518 (* Testing functions *)
519 (*****************************************************************************)
521 let assert_is_backed a backed =
522 match a with
523 | Unbacked _ -> assert (not backed)
524 | Backed _ -> assert backed