OSX/iOS: Always generate 64 bit non-FAT Mach-O object files.
[luajit-2.0.git] / doc / ext_profiler.html
blob81b5d7739cd08a0ef8f29cf775ca1a2184a5d7ba
1 <!DOCTYPE html>
2 <html>
3 <head>
4 <title>Profiler</title>
5 <meta charset="utf-8">
6 <meta name="Copyright" content="Copyright (C) 2005-2023">
7 <meta name="Language" content="en">
8 <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
9 <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
10 </head>
11 <body>
12 <div id="site">
13 <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
14 </div>
15 <div id="head">
16 <h1>Profiler</h1>
17 </div>
18 <div id="nav">
19 <ul><li>
20 <a href="luajit.html">LuaJIT</a>
21 <ul><li>
22 <a href="https://luajit.org/download.html">Download <span class="ext">&raquo;</span></a>
23 </li><li>
24 <a href="install.html">Installation</a>
25 </li><li>
26 <a href="running.html">Running</a>
27 </li></ul>
28 </li><li>
29 <a href="extensions.html">Extensions</a>
30 <ul><li>
31 <a href="ext_ffi.html">FFI Library</a>
32 <ul><li>
33 <a href="ext_ffi_tutorial.html">FFI Tutorial</a>
34 </li><li>
35 <a href="ext_ffi_api.html">ffi.* API</a>
36 </li><li>
37 <a href="ext_ffi_semantics.html">FFI Semantics</a>
38 </li></ul>
39 </li><li>
40 <a href="ext_buffer.html">String Buffers</a>
41 </li><li>
42 <a href="ext_jit.html">jit.* Library</a>
43 </li><li>
44 <a href="ext_c_api.html">Lua/C API</a>
45 </li><li>
46 <a class="current" href="ext_profiler.html">Profiler</a>
47 </li></ul>
48 </li><li>
49 <a href="https://luajit.org/status.html">Status <span class="ext">&raquo;</span></a>
50 </li><li>
51 <a href="https://luajit.org/faq.html">FAQ <span class="ext">&raquo;</span></a>
52 </li><li>
53 <a href="https://luajit.org/list.html">Mailing List <span class="ext">&raquo;</span></a>
54 </li></ul>
55 </div>
56 <div id="main">
57 <p>
58 LuaJIT has an integrated statistical profiler with very low overhead. It
59 allows sampling the currently executing stack and other parameters in
60 regular intervals.
61 </p>
62 <p>
63 The integrated profiler can be accessed from three levels:
64 </p>
65 <ul>
66 <li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the
67 <a href="#j_p"><tt>-jp</tt></a> command line option.</li>
68 <li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li>
69 <li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li>
70 </ul>
72 <h2 id="hl_profiler">High-Level Profiler</h2>
73 <p>
74 The bundled high-level profiler offers basic profiling functionality. It
75 generates simple textual summaries or source code annotations. It can be
76 accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option
77 or from Lua code by loading the underlying <tt>jit.p</tt> module.
78 </p>
79 <p>
80 To cut to the chase &mdash; run this to get a CPU usage profile by
81 function name:
82 </p>
83 <pre class="code">
84 luajit -jp myapp.lua
85 </pre>
86 <p>
87 It's <em>not</em> a stated goal of the bundled profiler to add every
88 possible option or to cater for special profiling needs. The low-level
89 profiler APIs are documented below. They may be used by third-party
90 authors to implement advanced functionality, e.g. IDE integration or
91 graphical profilers.
92 </p>
93 <p>
94 Note: Sampling works for both interpreted and JIT-compiled code. The
95 results for JIT-compiled code may sometimes be surprising. LuaJIT
96 heavily optimizes and inlines Lua code &mdash; there's no simple
97 one-to-one correspondence between source code lines and the sampled
98 machine code.
99 </p>
101 <h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3>
103 The <tt>-jp</tt> command line option starts the high-level profiler.
104 When the application run by the command line terminates, the profiler
105 stops and writes the results to <tt>stdout</tt> or to the specified
106 <tt>output</tt> file.
107 </p>
109 The <tt>options</tt> argument specifies how the profiling is to be
110 performed:
111 </p>
112 <ul>
113 <li><tt>f</tt> &mdash; Stack dump: function name, otherwise module:line.
114 This is the default mode.</li>
115 <li><tt>F</tt> &mdash; Stack dump: ditto, but dump module:name.</li>
116 <li><tt>l</tt> &mdash; Stack dump: module:line.</li>
117 <li><tt>&lt;number&gt;</tt> &mdash; stack dump depth (callee &larr;
118 caller). Default: 1.</li>
119 <li><tt>-&lt;number&gt;</tt> &mdash; Inverse stack dump depth (caller
120 &rarr; callee).</li>
121 <li><tt>s</tt> &mdash; Split stack dump after first stack level. Implies
122 depth&nbsp;&ge;&nbsp;2 or depth&nbsp;&le;&nbsp;-2.</li>
123 <li><tt>p</tt> &mdash; Show full path for module names.</li>
124 <li><tt>v</tt> &mdash; Show VM states.</li>
125 <li><tt>z</tt> &mdash; Show <a href="#jit_zone">zones</a>.</li>
126 <li><tt>r</tt> &mdash; Show raw sample counts. Default: show percentages.</li>
127 <li><tt>a</tt> &mdash; Annotate excerpts from source code files.</li>
128 <li><tt>A</tt> &mdash; Annotate complete source code files.</li>
129 <li><tt>G</tt> &mdash; Produce raw output suitable for graphical tools.</li>
130 <li><tt>m&lt;number&gt;</tt> &mdash; Minimum sample percentage to be shown.
131 Default: 3%.</li>
132 <li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds.
133 Default: 10ms.<br>
134 Note: The actual sampling precision is OS-dependent.</li>
135 </ul>
137 The default output for <tt>-jp</tt> is a list of the most CPU consuming
138 spots in the application. Increasing the stack dump depth with (say)
139 <tt>-jp=2</tt> may help to point out the main callers or callees of
140 hotspots. But sample aggregation is still flat per unique stack dump.
141 </p>
143 To get a two-level view (split view) of callers/callees, use
144 <tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second
145 level are relative to the first level.
146 </p>
148 To see how much time is spent in each line relative to a function, use
149 <tt>-jp=fl</tt>.
150 </p>
152 To see how much time is spent in different VM states or
153 <a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>.
154 </p>
156 Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level
157 views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time
158 spent in a VM state or zone vs. hotspots. This can be used to answer
159 questions like "Which time-consuming functions are only interpreted?" or
160 "What's the garbage collector overhead for a specific function?".
161 </p>
163 Multiple options can be combined &mdash; but not all combinations make
164 sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels
165 deep in 4ms intervals and shows a split view of the CPU consuming
166 functions and their callers with a 1% threshold.
167 </p>
169 Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are
170 always flat and at the line level. Obviously, the source code files need
171 to be readable by the profiler script.
172 </p>
174 The high-level profiler can also be started and stopped from Lua code with:
175 </p>
176 <pre class="code">
177 require("jit.p").start(options, output)
179 require("jit.p").stop()
180 </pre>
182 <h3 id="jit_zone"><tt>jit.zone</tt> &mdash; Zones</h3>
184 Zones can be used to provide information about different parts of an
185 application to the high-level profiler. E.g. a game could make use of an
186 <tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical,
187 organized as a stack.
188 </p>
190 The <tt>jit.zone</tt> module needs to be loaded explicitly:
191 </p>
192 <pre class="code">
193 local zone = require("jit.zone")
194 </pre>
195 <ul>
196 <li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li>
197 <li><tt>zone()</tt> pops the current zone from the zone stack and
198 returns its name.</li>
199 <li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li>
200 <li><tt>zone:flush()</tt> flushes the zone stack.</li>
201 </ul>
203 To show the time spent in each zone use <tt>-jp=z</tt>. To show the time
204 spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>.
205 </p>
207 <h2 id="ll_lua_api">Low-level Lua API</h2>
209 The <tt>jit.profile</tt> module gives access to the low-level API of the
210 profiler from Lua code. This module needs to be loaded explicitly:
211 <pre class="code">
212 local profile = require("jit.profile")
213 </pre>
215 This module can be used to implement your own higher-level profiler.
216 A typical profiling run starts the profiler, captures stack dumps in
217 the profiler callback, adds them to a hash table to aggregate the number
218 of samples, stops the profiler and then analyzes all captured
219 stack dumps. Other parameters can be sampled in the profiler callback,
220 too. But it's important not to spend too much time in the callback,
221 since this may skew the statistics.
222 </p>
224 <h3 id="profile_start"><tt>profile.start(mode, cb)</tt>
225 &mdash; Start profiler</h3>
227 This function starts the profiler. The <tt>mode</tt> argument is a
228 string holding options:
229 </p>
230 <ul>
231 <li><tt>f</tt> &mdash; Profile with precision down to the function level.</li>
232 <li><tt>l</tt> &mdash; Profile with precision down to the line level.</li>
233 <li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds (default
234 10ms).</br>
235 Note: The actual sampling precision is OS-dependent.
236 </li>
237 </ul>
239 The <tt>cb</tt> argument is a callback function which is called with
240 three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is
241 called on a separate coroutine, the <tt>thread</tt> argument is the
242 state that holds the stack to sample for profiling. Note: do
243 <em>not</em> modify the stack of that state or call functions on it.
244 </p>
246 <tt>samples</tt> gives the number of accumulated samples since the last
247 callback (usually 1).
248 </p>
250 <tt>vmstate</tt> holds the VM state at the time the profiling timer
251 triggered. This may or may not correspond to the state of the VM when
252 the profiling callback is called. The state is either <tt>'N'</tt>
253 native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt>
254 C&nbsp;code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT
255 compiler.
256 </p>
258 <h3 id="profile_stop"><tt>profile.stop()</tt>
259 &mdash; Stop profiler</h3>
261 This function stops the profiler.
262 </p>
264 <h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt>
265 &mdash; Dump stack </h3>
267 This function allows taking stack dumps in an efficient manner. It
268 returns a string with a stack dump for the <tt>thread</tt> (coroutine),
269 formatted according to the <tt>fmt</tt> argument:
270 </p>
271 <ul>
272 <li><tt>p</tt> &mdash; Preserve the full path for module names. Otherwise,
273 only the file name is used.</li>
274 <li><tt>f</tt> &mdash; Dump the function name if it can be derived. Otherwise,
275 use module:line.</li>
276 <li><tt>F</tt> &mdash; Ditto, but dump module:name.</li>
277 <li><tt>l</tt> &mdash; Dump module:line.</li>
278 <li><tt>Z</tt> &mdash; Zap the following characters for the last dumped
279 frame.</li>
280 <li>All other characters are added verbatim to the output string.</li>
281 </ul>
283 The <tt>depth</tt> argument gives the number of frames to dump, starting
284 at the topmost frame of the thread. A negative number dumps the frames in
285 inverse order.
286 </p>
288 The first example prints a list of the current module names and line
289 numbers of up to 10 frames in separate lines. The second example prints
290 semicolon-separated function names for all frames (up to 100) in inverse
291 order:
292 </p>
293 <pre class="code">
294 print(profile.dumpstack(thread, "l\n", 10))
295 print(profile.dumpstack(thread, "lZ;", -100))
296 </pre>
298 <h2 id="ll_c_api">Low-level C API</h2>
300 The profiler can be controlled directly from C&nbsp;code, e.g. for
301 use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see
302 <a href="ext_c_api.html">Lua/C API</a> extensions).
303 </p>
305 <h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt>
306 &mdash; Start profiler</h3>
308 This function starts the profiler. <a href="#profile_start">See
309 above</a> for a description of the <tt>mode</tt> argument.
310 </p>
312 The <tt>cb</tt> argument is a callback function with the following
313 declaration:
314 </p>
315 <pre class="code">
316 typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
317 int samples, int vmstate);
318 </pre>
320 <tt>data</tt> is available for use by the callback. <tt>L</tt> is the
321 state that holds the stack to sample for profiling. Note: do
322 <em>not</em> modify this stack or call functions on this stack &mdash;
323 use a separate coroutine for this purpose. <a href="#profile_start">See
324 above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>.
325 </p>
327 <h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt>
328 &mdash; Stop profiler</h3>
330 This function stops the profiler.
331 </p>
333 <h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt>
334 &mdash; Dump stack </h3>
336 This function allows taking stack dumps in an efficient manner.
337 <a href="#profile_dump">See above</a> for a description of <tt>fmt</tt>
338 and <tt>depth</tt>.
339 </p>
341 This function returns a <tt>const&nbsp;char&nbsp;*</tt> pointing to a
342 private string buffer of the profiler. The <tt>int&nbsp;*len</tt>
343 argument returns the length of the output string. The buffer is
344 overwritten on the next call and deallocated when the profiler stops.
345 You either need to consume the content immediately or copy it for later
346 use.
347 </p>
348 <br class="flush">
349 </div>
350 <div id="foot">
351 <hr class="hide">
352 Copyright &copy; 2005-2023
353 <span class="noprint">
354 &middot;
355 <a href="contact.html">Contact</a>
356 </span>
357 </div>
358 </body>
359 </html>