1 \input texinfo @c -*- texinfo -*-
3 @settitle Tiny C Compiler Reference Documentation
6 @center @titlefont{Tiny C Compiler Reference Documentation}
12 TinyCC (aka TCC) is a small but hyper fast C compiler. Unlike other C
13 compilers, it is meant to be self-suffisant: you do not need an
14 external assembler or linker because TCC does that for you.
16 TCC compiles so @emph{fast} that even for big projects @code{Makefile}s may
19 TCC not only supports ANSI C, but also most of the new ISO C99
20 standard and many GNUC extensions.
22 TCC can also be used to make @emph{C scripts}, i.e. pieces of C source
23 that you run as a Perl or Python script. Compilation is so fast that
24 your script will be as fast as if it was an executable.
26 TCC can also automatically generate memory and bound checks
27 (@xref{bounds}) while allowing all C pointers operations. TCC can do
28 these checks even if non patched libraries are used.
30 With @code{libtcc}, you can use TCC as a backend for dynamic code
31 generation (@xref{libtcc}).
34 @chapter Command line invocation
37 usage: tcc [-Idir] [-Dsym[=val]] [-Usym] [-llib] [-g] [-b]
38 [-i infile] infile [infile_args...]
43 Specify an additionnal include path. The default ones are:
44 @file{/usr/include}, @code{prefix}@file{/lib/tcc/include} (@code{prefix}
45 is usually @file{/usr} or @file{/usr/local}).
48 Define preprocessor symbol 'sym' to
49 val. If val is not present, its value is '1'. Function-like macros can
50 also be defined: @code{'-DF(a)=a+1'}
53 Undefine preprocessor symbol 'sym'.
56 Dynamically link your program with library
57 libxxx.so. Standard library paths are checked, including those
58 specified with LD_LIBRARY_PATH.
61 Generate run time debug information so that you get clear run time
62 error messages: @code{ test.c:68: in function 'test5()': dereferencing
63 invalid pointer} instead of the laconic @code{Segmentation
67 Generate additionnal support code to check
68 memory allocations and array/pointer bounds. '-g' is implied. Note
69 that the generated code is slower and bigger in this case.
72 Compile C source 'file' before main C source. With this
73 command, multiple C files can be compiled and linked together.
77 Note: the @code{-o file} option to generate an ELF executable is
78 currently unsupported.
80 @chapter C language support
84 TCC implements all the ANSI C standard, including structure bit fields
85 and floating point numbers (@code{long double}, @code{double}, and
86 @code{float} fully supported). The following limitations are known:
89 @item The preprocessor tokens are the same as C. It means that in some
90 rare cases, preprocessed numbers are not handled exactly as in ANSI
91 C. This approach has the advantage of being simpler and FAST!
94 @section ISOC99 extensions
96 TCC implements many features of the new C standard: ISO C99. Currently
97 missing items are: complex and imaginary numbers and variable length
100 Currently implemented ISOC99 features:
104 @item 64 bit @code{'long long'} types are fully supported.
106 @item The boolean type @code{'_Bool'} is supported.
108 @item @code{'__func__'} is a string variable containing the current
111 @item Variadic macros: @code{__VA_ARGS__} can be used for
112 function-like macros:
114 #define dprintf(level, __VA_ARGS__) printf(__VA_ARGS__)
116 @code{dprintf} can then be used with a variable number of parameters.
118 @item Declarations can appear anywhere in a block (as in C++).
120 @item Array and struct/union elements can be initialized in any order by
123 struct { int x, y; } st[10] = { [0].x = 1, [0].y = 2 };
125 int tab[10] = { 1, 2, [5] = 5, [9] = 9};
128 @item Compound initializers are supported:
130 int *p = (int []){ 1, 2, 3 };
132 to initialize a pointer pointing to an initialized array. The same
133 works for structures and strings.
135 @item Hexadecimal floating point constants are supported:
137 double d = 0x1234p10;
139 is the same as writing
141 double d = 4771840.0;
144 @item @code{'inline'} keyword is ignored.
146 @item @code{'restrict'} keyword is ignored.
149 @section GNU C extensions
151 TCC implements some GNU C extensions:
155 @item array designators can be used without '=':
157 int a[10] = { [0] 1, [5] 2, 3, 4 };
160 @item Structure field designators can be a label:
162 struct { int x, y; } st = { x: 1, y: 1};
166 struct { int x, y; } st = { .x = 1, .y = 1};
169 @item @code{'\e'} is ASCII character 27.
171 @item case ranges : ranges can be used in @code{case}s:
175 printf("range 1 to 9\n");
178 printf("unexpected\n");
183 @item The keyword @code{__attribute__} is handled to specify variable or
184 function attributes. The following attributes are supported:
186 @item @code{aligned(n)}: align data to n bytes (must be a power of two).
188 @item @code{section(name)}: generate function or data in assembly
189 section name (name is a string containing the section name) instead
190 of the default section.
192 @item @code{unused}: specify that the variable or the function is unused.
194 @item @code{cdecl}: use standard C calling convention.
196 @item @code{stdcall}: use Pascal-like calling convention.
200 Here are some examples:
202 int a __attribute__ ((aligned(8), section(".mysection")));
205 align variable @code{'a'} to 8 bytes and put it in section @code{.mysection}.
208 int my_add(int a, int b) __attribute__ ((section(".mycodesection")))
214 generate function @code{'my_add'} in section @code{.mycodesection}.
216 @item GNU style variadic macros:
218 #define dprintf(fmt, args...) printf(fmt, ## args)
221 dprintf("one arg %d\n", 1);
226 @section TinyCC extensions
230 @item @code{__TINYC__} is a predefined macro to @code{'1'} to
231 indicate that you use TCC.
233 @item @code{'#!'} at the start of a line is ignored to allow scripting.
235 @item Binary digits can be entered (@code{'0b101'} instead of
238 @item @code{__BOUNDS_CHECKING_ON} is defined if bound checking is activated.
243 @chapter TinyCC Memory and Bound checks
245 This feature is activated with the @code{'-b'} (@xref{invoke}).
247 Note that pointer size is @emph{unchanged} and that code generated
248 with bound checks is @emph{fully compatible} with unchecked
249 code. When a pointer comes from unchecked code, it is assumed to be
250 valid. Even very obscure C code with casts should work correctly.
252 To have more information about the ideas behind this method, check at
253 @url{http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html}.
255 Here are some examples of catched errors:
259 @item Invalid range with standard string function:
267 @item Bound error in global or local arrays:
277 @item Bound error in allocated data:
281 tab = malloc(20 * sizeof(int));
289 @item Access to a freed region:
293 tab = malloc(20 * sizeof(int));
301 @item Freeing an already freed region:
305 tab = malloc(20 * sizeof(int));
314 @chapter The @code{libtcc} library
316 The @code{libtcc} library enables you to use TCC as a backend for
317 dynamic code generation.
319 Read the @file{libtcc.h} to have an overview of the API. Read
320 @file{libtcc_test.c} to have a very simple example.
322 The idea consists in giving a C string containing the program you want
323 to compile directly to @code{libtcc}. Then the @code{main()} function of
324 the compiled string can be launched.
326 @chapter Developper's guide
328 This chapter gives some hints to understand how TCC works. You can skip
329 it if you do not intend to modify the TCC code.
331 @section File reading
333 The @code{BufferedFile} structure contains the context needed to read a
334 file, including the current line number. @code{tcc_open()} opens a new
335 file and @code{tcc_close()} closes it. @code{inp()} returns the next
340 @code{next()} reads the next token in the current
341 file. @code{next_nomacro()} reads the next token without macro
344 @code{tok} contains the current token (see @code{TOK_xxx})
345 constants. Identifiers and keywords are also keywords. @code{tokc}
346 contains additionnal infos about the token (for example a constant value
347 if number or string token).
351 The parser is hardcoded (yacc is not necessary). It does only one pass,
356 @item For initialized arrays with unknown size, a first pass
357 is done to count the number of elements.
359 @item For architectures where arguments are evaluated in
360 reverse order, a first pass is done to reverse the argument order.
366 The types are stored in a single 'int' variable. It was choosen in the
367 first stages of development when tcc was much simpler. Now, it may not
368 be the best solution.
371 #define VT_INT 0 /* integer type */
372 #define VT_BYTE 1 /* signed byte type */
373 #define VT_SHORT 2 /* short type */
374 #define VT_VOID 3 /* void type */
375 #define VT_PTR 4 /* pointer */
376 #define VT_ENUM 5 /* enum definition */
377 #define VT_FUNC 6 /* function type */
378 #define VT_STRUCT 7 /* struct/union definition */
379 #define VT_FLOAT 8 /* IEEE float */
380 #define VT_DOUBLE 9 /* IEEE double */
381 #define VT_LDOUBLE 10 /* IEEE long double */
382 #define VT_BOOL 11 /* ISOC99 boolean type */
383 #define VT_LLONG 12 /* 64 bit integer */
384 #define VT_LONG 13 /* long integer (NEVER USED as type, only
386 #define VT_BTYPE 0x000f /* mask for basic type */
387 #define VT_UNSIGNED 0x0010 /* unsigned type */
388 #define VT_ARRAY 0x0020 /* array type (also has VT_PTR) */
389 #define VT_BITFIELD 0x0040 /* bitfield modifier */
391 #define VT_STRUCT_SHIFT 16 /* structure/enum name shift (16 bits left) */
394 When a reference to another type is needed (for pointers, functions and
395 structures), the @code{32 - VT_STRUCT_SHIFT} high order bits are used to
396 store an identifier reference.
398 The @code{VT_UNSIGNED} flag can be set for chars, shorts, ints and long
401 Arrays are considered as pointers @code{VT_PTR} with the flag
404 The @code{VT_BITFIELD} flag can be set for chars, shorts, ints and long
405 longs. If it is set, then the bitfield position is stored from bits
406 VT_STRUCT_SHIFT to VT_STRUCT_SHIFT + 5 and the bit field size is stored
407 from bits VT_STRUCT_SHIFT + 6 to VT_STRUCT_SHIFT + 11.
409 @code{VT_LONG} is never used except during parsing.
411 During parsing, the storage of an object is also stored in the type
415 #define VT_EXTERN 0x00000080 /* extern definition */
416 #define VT_STATIC 0x00000100 /* static variable */
417 #define VT_TYPEDEF 0x00000200 /* typedef definition */
422 All symbols are stored in hashed symbol stacks. Each symbol stack
423 contains @code{Sym} structures.
425 @code{Sym.v} contains the symbol name (remember
426 an idenfier is also a token, so a string is never necessary to store
427 it). @code{Sym.t} gives the type of the symbol. @code{Sym.r} is usually
428 the register in which the corresponding variable is stored. @code{Sym.c} is
429 usually a constant associated to the symbol.
431 Four main symbol stacks are defined:
436 for the macros (@code{#define}s).
439 for the global variables, functions and types.
442 for the external symbols shared between files.
445 for the local variables, functions and types.
448 for the local labels (for @code{goto}).
452 @code{sym_push()} is used to add a new symbol in the local symbol
453 stack. If no local symbol stack is active, it is added in the global
456 @code{sym_pop(st,b)} pops symbols from the symbol stack @var{st} until
457 the symbol @var{b} is on the top of stack. If @var{b} is NULL, the stack
460 @code{sym_find(v)} return the symbol associated to the identifier
461 @var{v}. The local stack is searched first from top to bottom, then the
466 The generated code and datas are written in sections. The structure
467 @code{Section} contains all the necessary information for a given
468 section. @code{new_section()} creates a new section. ELF file semantics
469 is assumed for each section.
471 The following sections are predefined:
476 is the section containing the generated code. @var{ind} contains the
477 current position in the code section.
480 contains initialized data
483 contains uninitialized data
486 @itemx lbounds_section
487 are used when bound checking is activated
490 @itemx stabstr_section
491 are used when debugging is actived to store debug information
494 @itemx strtab_section
495 contain the exported symbols (currently only used for debugging).
499 @section Code generation
501 @subsection Introduction
503 The TCC code generator directly generates linked binary code in one
504 pass. It is rather unusual these days (see gcc for example which
505 generates text assembly), but it allows to be very fast and surprisingly
508 The TCC code generator is register based. Optimization is only done at
509 the expression level. No intermediate representation of expression is
510 kept except the current values stored in the @emph{value stack}.
512 On x86, three temporary registers are used. When more registers are
513 needed, one register is flushed in a new local variable.
515 @subsection The value stack
517 When an expression is parsed, its value is pushed on the value stack
518 (@var{vstack}). The top of the value stack is @var{vtop}. Each value
519 stack entry is the structure @code{SValue}.
521 @code{SValue.t} is the type. @code{SValue.r} indicates how the value is
522 currently stored in the generated code. It is usually a CPU register
523 index (@code{REG_xxx} constants), but additionnal values and flags are
527 #define VT_CONST 0x00f0 /* constant in vc
528 (must be first non register value) */
529 #define VT_LLOCAL 0x00f1 /* lvalue, offset on stack */
530 #define VT_LOCAL 0x00f2 /* offset on stack */
531 #define VT_CMP 0x00f3 /* the value is stored in processor flags (in vc) */
532 #define VT_JMP 0x00f4 /* value is the consequence of jmp true (even) */
533 #define VT_JMPI 0x00f5 /* value is the consequence of jmp false (odd) */
534 #define VT_LVAL 0x0100 /* var is an lvalue */
535 #define VT_FORWARD 0x0200 /* value is forward reference */
536 #define VT_MUSTCAST 0x0400 /* value must be casted to be correct (used for
537 char/short stored in integer registers) */
538 #define VT_MUSTBOUND 0x0800 /* bound checking must be done before
539 dereferencing value */
540 #define VT_BOUNDED 0x8000 /* value is bounded. The address of the
541 bounding function call point is in vc */
542 #define VT_LVAL_BYTE 0x1000 /* lvalue is a byte */
543 #define VT_LVAL_SHORT 0x2000 /* lvalue is a short */
544 #define VT_LVAL_UNSIGNED 0x4000 /* lvalue is unsigned */
545 #define VT_LVAL_TYPE (VT_LVAL_BYTE | VT_LVAL_SHORT | VT_LVAL_UNSIGNED)
551 indicates that the value is a constant. It is stored in the union
552 @code{SValue.c}, depending on its type.
555 indicates a local variable pointer at offset @code{SValue.c.i} in the
559 indicates that the value is actually stored in the CPU flags (i.e. the
560 value is the consequence of a test). The value is either 0 or 1. The
561 actual CPU flags used is indicated in @code{SValue.c.i}.
565 indicates that the value is the consequence of a jmp. For VT_JMP, it is
566 1 if the jump is taken, 0 otherwise. For VT_JMPI it is inverted.
568 These values are used to compile the @code{||} and @code{&&} logical
572 is a flag indicating that the value is actually an lvalue (left value of
573 an assignment). It means that the value stored is actually a pointer to
576 Understanding the use @code{VT_LVAL} is very important if you want to
577 understand how TCC works.
581 @itemx VT_LVAL_UNSIGNED
582 if the lvalue has an integer type, then these flags give its real
583 type. The type alone is not suffisant in case of cast optimisations.
586 is a saved lvalue on the stack. @code{VT_LLOCAL} should be suppressed
587 ASAP because its semantics are rather complicated.
590 indicates that a cast to the value type must be performed if the value
591 is used (lazy casting).
594 indicates that the value is a forward reference to a variable or a function.
598 are only used for optional bound checking.
602 @subsection Manipulating the value stack
604 @code{vsetc()} and @code{vset()} pushes a new value on the value
605 stack. If the previous @code{vtop} was stored in a very unsafe place(for
606 example in the CPU flags), then some code is generated to put the
607 previous @code{vtop} in a safe storage.
609 @code{vpop()} pops @code{vtop}. In some cases, it also generates cleanup
610 code (for example if stacked floating point registers are used as on
613 The @code{gv(rc)} function generates code to evaluate @code{vtop} (the
614 top value of the stack) into registers. @var{rc} selects in which
615 register class the value should be put. @code{gv()} is the @emph{most
616 important function} of the code generator.
618 @code{gv2()} is the same as @code{gv()} but for the top two stack
621 @subsection CPU dependent code generation
623 See the @file{i386-gen.c} file to have an example.
628 must generate the code needed to load a stack value into a register.
631 must generate the code needed to store a register into a stack value
637 should generate a function call
640 @itemx gfunc_epilog()
641 should generate a function prolog/epilog.
644 must generate the binary integer operation @var{op} on the two top
645 entries of the stack which are guaranted to contain integer types.
647 The result value should be put on the stack.
650 same as @code{gen_opi()} for floating point operations. The two top
651 entries of the stack are guaranted to contain floating point values of
655 integer to floating point conversion.
658 floating point to integer conversion.
661 floating point to floating point of different size conversion.
663 @item gen_bounded_ptr_add()
664 @item gen_bounded_ptr_deref()
665 are only used for bound checking.
669 @section Optimizations done
671 Constant propagation is done for all operations. Multiplications and
672 divisions are optimized to shifts when appropriate. Comparison
673 operators are optimized by maintaining a special cache for the
674 processor flags. &&, || and ! are optimized by maintaining a special
675 'jump target' value. No other jump optimization is currently performed
676 because it would require to store the code in a more abstract fashion.