Fix tc-print's handling of bytecode regions
Summary: tc-print has always treated a translation as a single, continuous
range of bytecode instructions. This was almost never true: we've been tracing
through forward Jmps since before tc-print was born. Inlining and PGO have just
made the problem worse. I hacked in rough inlining support a while ago but this
diff fixes it more.
The TC dump now has a list of bytecode blocks contained in the translation
rather than just a starting and stopping offset. Blocks contain a unit md5 (may
differ due to inlining) and a range of bytecode offsets. This is used to print
the correct bytecodes at the beginning of the translation's disassembly. I also
changed the bytecode that we print inline with the assembly to only print
bytecodes that have code directly assigned to them. The previous logic of
attempting to fill in the gaps didn't work when we had more than one block in a
translation, and as a result it often printed some bytecodes that weren't
actually in the translation.
Two more small improvements: guards are now printed in tc-print, and we
shouldn't falsely attribute time spent in the guards to the first bytecode
instruction of the translation anymore.
Reviewed By: @ottoni
Differential Revision:
D1362151