Detect and kill orphaned workers using heartbeat mechanism
commitf5adffa6a2cbaecfbc5a26464e51fa3dd436e9e7
authorBob Ren <bobren@fb.com>
Tue, 9 Mar 2021 02:57:07 +0000 (8 18:57 -0800)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Tue, 9 Mar 2021 02:59:38 +0000 (8 18:59 -0800)
treea804e8f302aab6c3d652f45e3148a05fbb4be52e
parent8933ea09d816915e8f5563058a957d8edcb10d83
Detect and kill orphaned workers using heartbeat mechanism

Summary:
This diff builds upon D26241563 but instead of having each worker have their own heartbeat, only the central controller writes the heartbeats. The workers simply check against the latest heartbeat and decide whether or not to terminate.

Currently the heartbeat timeout is hard coded to 20 seconds. In my testing, it seemed to work pretty robustly.

Since we don't have an OCaml SQL library, and parsing timestamps seemed non-trivial, I decided to delegate the timestamp => seconds conversion to the SQL query.

You can find more context around the problem in T84570409

Differential Revision: D26443087

fbshipit-source-id: 10e4af5f48efc23d44f084f6a64046362aa60f4e
hphp/hack/src/remote/jobRunner_sig.ml
hphp/hack/src/server/serverInit.ml
hphp/hack/src/server/serverInit.mli
hphp/hack/src/server/serverInitTypes.ml
hphp/hack/src/server/serverLocalConfig.ml
hphp/hack/src/server/serverMain.ml
hphp/hack/src/server/serverRemoteInit.ml
hphp/hack/src/server/serverRemoteInit.mli
hphp/hack/src/stubs/remoteWorker.ml
hphp/hack/src/typing/service/typing_service_types.ml