1 \input texinfo @c -*- texinfo -*-
3 @settitle Tiny C Compiler Reference Documentation
6 @center @titlefont{Tiny C Compiler Reference Documentation}
12 TinyCC (aka TCC) is a small but hyper fast C compiler. Unlike other C
13 compilers, it is meant to be self-suffisant: you do not need an
14 external assembler or linker because TCC does that for you.
16 TCC compiles so @emph{fast} that even for big projects @code{Makefile}s may
19 TCC not only supports ANSI C, but also most of the new ISO C99
20 standard and many GNUC extensions including inline assembly.
22 TCC can also be used to make @emph{C scripts}, i.e. pieces of C source
23 that you run as a Perl or Python script. Compilation is so fast that
24 your script will be as fast as if it was an executable.
26 TCC can also automatically generate memory and bound checks
27 (@xref{bounds}) while allowing all C pointers operations. TCC can do
28 these checks even if non patched libraries are used.
30 With @code{libtcc}, you can use TCC as a backend for dynamic code
31 generation (@xref{libtcc}).
34 @chapter Command line invocation
39 usage: tcc [-c] [-o outfile] [-Bdir] [-bench] [-Idir] [-Dsym[=val]] [-Usym]
40 [-g] [-b] [-bt N] [-Ldir] [-llib] [-shared] [-static]
41 [--] infile1 [infile2... --] [infile_args...]
44 TCC options are a very much like gcc. The main difference is that TCC
45 can also execute directly the resulting program and give it runtime
48 Here are some examples to understand the logic:
52 Compile a.c and execute it directly
55 Compile a.c and execute it directly. arg1 is given as first argument to
56 the @code{main()} of a.c.
58 @item tcc -- a.c b.c -- arg1
59 Compile a.c and b.c, link them together and execute them. arg1 is given
60 as first argument to the @code{main()} of the resulting program. Because
61 multiple C files are specified, @code{--} are necessary to clearly separate the
62 program arguments from the TCC options.
64 @item tcc -o myprog a.c b.c
65 Compile a.c and b.c, link them and generate the executable myprog.
67 @item tcc -o myprog a.o b.o
68 link a.o and b.o together and generate the executable myprog.
71 Compile a.c and generate object file a.o
73 @item tcc -c asmfile.S
74 Preprocess with C preprocess and assemble asmfile.S and generate
75 object file asmfile.o.
77 @item tcc -c asmfile.s
78 Assemble (but not preprocess) asmfile.s and generate object file
81 @item tcc -r -o ab.o a.c b.c
82 Compile a.c and b.c, link them together and generate the object file ab.o.
88 TCC can be invoked from @emph{scripts}, just as shell scripts. You just
89 need to add @code{#!/usr/local/bin/tcc} at the start of your C source:
97 printf("Hello World\n");
102 @section Option summary
108 Generate an object file (@samp{-o} option must also be given).
111 Put object file, executable, or dll into output file @file{outfile}.
114 Set the path where the tcc internal libraries can be found (default is
115 @file{PREFIX/lib/tcc}).
118 Output compilation statistics.
121 Preprocessor options:
125 Specify an additionnal include path. Include paths are searched in the
126 order they are specified.
128 System include paths are always searched after. The default system
129 include paths are: @file{/usr/local/include}, @file{/usr/include}
130 and @file{PREFIX/lib/tcc/include}. (@code{PREFIX} is usually
131 @file{/usr} or @file{/usr/local}).
134 Define preprocessor symbol 'sym' to
135 val. If val is not present, its value is '1'. Function-like macros can
136 also be defined: @code{'-DF(a)=a+1'}
139 Undefine preprocessor symbol 'sym'.
146 Specify an additionnal static library path for the @samp{-l} option. The
147 default library paths are @file{/usr/local/lib}, @file{/usr/lib} and @file{/lib}.
150 Link your program with dynamic library libxxx.so or static library
151 libxxx.a. The library is searched in the paths specified by the
155 Generate a shared library instead of an executable (@samp{-o} option
159 Generate a statically linked executable (default is a shared linked
160 executable) (@samp{-o} option must also be given).
163 Generate an object file combining all input files (@samp{-o} option must
172 Generate run time debug information so that you get clear run time
173 error messages: @code{ test.c:68: in function 'test5()': dereferencing
174 invalid pointer} instead of the laconic @code{Segmentation
178 Generate additionnal support code to check
179 memory allocations and array/pointer bounds. @samp{-g} is implied. Note
180 that the generated code is slower and bigger in this case.
183 Display N callers in stack traces. This is useful with @samp{-g} or
188 Note: GCC options @samp{-Ox}, @samp{-Wx}, @samp{-fx} and @samp{-mx} are
191 @chapter C language support
195 TCC implements all the ANSI C standard, including structure bit fields
196 and floating point numbers (@code{long double}, @code{double}, and
197 @code{float} fully supported).
199 @section ISOC99 extensions
201 TCC implements many features of the new C standard: ISO C99. Currently
202 missing items are: complex and imaginary numbers and variable length
205 Currently implemented ISOC99 features:
209 @item 64 bit @code{'long long'} types are fully supported.
211 @item The boolean type @code{'_Bool'} is supported.
213 @item @code{'__func__'} is a string variable containing the current
216 @item Variadic macros: @code{__VA_ARGS__} can be used for
217 function-like macros:
219 #define dprintf(level, __VA_ARGS__) printf(__VA_ARGS__)
221 @code{dprintf} can then be used with a variable number of parameters.
223 @item Declarations can appear anywhere in a block (as in C++).
225 @item Array and struct/union elements can be initialized in any order by
228 struct { int x, y; } st[10] = { [0].x = 1, [0].y = 2 };
230 int tab[10] = { 1, 2, [5] = 5, [9] = 9};
233 @item Compound initializers are supported:
235 int *p = (int []){ 1, 2, 3 };
237 to initialize a pointer pointing to an initialized array. The same
238 works for structures and strings.
240 @item Hexadecimal floating point constants are supported:
242 double d = 0x1234p10;
244 is the same as writing
246 double d = 4771840.0;
249 @item @code{'inline'} keyword is ignored.
251 @item @code{'restrict'} keyword is ignored.
254 @section GNU C extensions
256 TCC implements some GNU C extensions:
260 @item array designators can be used without '=':
262 int a[10] = { [0] 1, [5] 2, 3, 4 };
265 @item Structure field designators can be a label:
267 struct { int x, y; } st = { x: 1, y: 1};
271 struct { int x, y; } st = { .x = 1, .y = 1};
274 @item @code{'\e'} is ASCII character 27.
276 @item case ranges : ranges can be used in @code{case}s:
280 printf("range 1 to 9\n");
283 printf("unexpected\n");
288 @item The keyword @code{__attribute__} is handled to specify variable or
289 function attributes. The following attributes are supported:
291 @item @code{aligned(n)}: align data to n bytes (must be a power of two).
293 @item @code{section(name)}: generate function or data in assembly
294 section name (name is a string containing the section name) instead
295 of the default section.
297 @item @code{unused}: specify that the variable or the function is unused.
299 @item @code{cdecl}: use standard C calling convention.
301 @item @code{stdcall}: use Pascal-like calling convention.
305 Here are some examples:
307 int a __attribute__ ((aligned(8), section(".mysection")));
310 align variable @code{'a'} to 8 bytes and put it in section @code{.mysection}.
313 int my_add(int a, int b) __attribute__ ((section(".mycodesection")))
319 generate function @code{'my_add'} in section @code{.mycodesection}.
321 @item GNU style variadic macros:
323 #define dprintf(fmt, args...) printf(fmt, ## args)
326 dprintf("one arg %d\n", 1);
329 @item @code{__FUNCTION__} is interpreted as C99 @code{__func__}
330 (so it has not exactly the same semantics as string literal GNUC
331 where it is a string literal).
333 @item The @code{__alignof__} keyword can be used as @code{sizeof}
334 to get the alignment of a type or an expression.
336 @item The @code{typeof(x)} returns the type of @code{x}.
337 @code{x} is an expression or a type.
339 @item Computed gotos: @code{&&label} returns a pointer of type
340 @code{void *} on the goto label @code{label}. @code{goto *expr} can be
341 used to jump on the pointer resulting from @code{expr}.
343 @item Inline assembly with asm instruction:
345 static inline void * my_memcpy(void * to, const void * from, size_t n)
348 __asm__ __volatile__(
353 "1:\ttestb $1,%b4\n\t"
357 : "=&c" (d0), "=&D" (d1), "=&S" (d2)
358 :"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from)
364 TCC includes its own x86 inline assembler with a @code{gas}-like (GNU
365 assembler) syntax. No intermediate files are generated. GCC 3.x named
366 operands are supported.
370 @section TinyCC extensions
374 @item @code{__TINYC__} is a predefined macro to @code{'1'} to
375 indicate that you use TCC.
377 @item @code{'#!'} at the start of a line is ignored to allow scripting.
379 @item Binary digits can be entered (@code{'0b101'} instead of
382 @item @code{__BOUNDS_CHECKING_ON} is defined if bound checking is activated.
386 @chapter TinyCC Assembler
388 Since version 0.9.16, TinyCC integrates its own assembler. TinyCC
389 assembler supports a gas-like syntax (GNU assembler). You can
390 desactivate assembler support if you want a smaller TinyCC executable
391 (the C compiler does not rely on the assembler).
393 TinyCC Assembler is used to handle files with @file{.S} (C
394 preprocessed assembler) and @file{.s} extensions. It is also used to
395 handle the GNU inline assembler with the @code{asm} keyword.
399 TinyCC Assembler supports most of the gas syntax. The tokens are the
404 @item C and C++ comments are supported.
406 @item Identifiers are the same as C, so you cannot use '.' or '$'.
408 @item Only 32 bit integer numbers are supported.
416 @item Integers in decimal, octal and hexa are supported.
418 @item Unary operators: +, -, ~.
420 @item Binary operators in decreasing priority order:
428 @item A value is either an absolute number or a label plus an offset.
429 All operators accept absolute values except '+' and '-'. '+' or '-' can be
430 used to add an offset to a label. '-' supports two labels only if they
431 are the same or if they are both defined and in the same section.
439 @item All labels are considered as local, except undefined ones.
441 @item Numeric labels can be used as local @code{gas}-like labels.
442 They can be defined several times in the same source. Use 'b'
443 (backward) or 'f' (forward) as suffix to reference them:
447 jmp 1b /* jump to '1' label before */
448 jmp 1f /* jump to '1' label after */
456 All directives are preceeded by a '.'. The following directives are
460 @item .align n[,value]
461 @item .skip n[,value]
462 @item .space n[,value]
463 @item .byte value1[,value2...]
464 @item .word value1[,value2...]
465 @item .short value1[,value2...]
466 @item .int value1[,value2...]
467 @item .long value1[,value2...]
470 @section X86 Assembler
472 All X86 opcodes are supported. Only ATT syntax is supported (source
473 then destination operand order). If no size suffix is given, TinyCC
474 tries to guess it from the operand sizes.
476 Currently, MMX opcodes are supported but not SSE ones.
478 @chapter TinyCC Linker
480 @section ELF file generation
482 TCC can directly output relocatable ELF files (object files),
483 executable ELF files and dynamic ELF libraries without relying on an
486 Dynamic ELF libraries can be output but the C compiler does not generate
487 position independant code (PIC) code. It means that the dynamic librairy
488 code generated by TCC cannot be factorized among processes yet.
490 TCC linker cannot currently suppress unused object code. But TCC
491 will soon integrate a novel feature not found in GNU tools: unused code
492 will be suppressed at the function or variable level, provided you only
493 use TCC to compile your files.
495 @section ELF file loader
497 TCC can load ELF object files, archives (.a files) and dynamic
500 @section GNU Linker Scripts
502 Because on many Linux systems some dynamic libraries (such as
503 @file{/usr/lib/libc.so}) are in fact GNU ld link scripts (horrible!),
504 TCC linker also support a subset of GNU ld scripts.
506 The @code{GROUP} and @code{FILE} commands are supported.
508 Example from @file{/usr/lib/libc.so}:
511 Use the shared library, but some functions are only in
512 the static library, so try that secondarily. */
513 GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a )
517 @chapter TinyCC Memory and Bound checks
519 This feature is activated with the @code{'-b'} (@xref{invoke}).
521 Note that pointer size is @emph{unchanged} and that code generated
522 with bound checks is @emph{fully compatible} with unchecked
523 code. When a pointer comes from unchecked code, it is assumed to be
524 valid. Even very obscure C code with casts should work correctly.
526 To have more information about the ideas behind this method, check at
527 @url{http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html}.
529 Here are some examples of catched errors:
533 @item Invalid range with standard string function:
541 @item Bound error in global or local arrays:
551 @item Bound error in allocated data:
555 tab = malloc(20 * sizeof(int));
563 @item Access to a freed region:
567 tab = malloc(20 * sizeof(int));
575 @item Freeing an already freed region:
579 tab = malloc(20 * sizeof(int));
588 @chapter The @code{libtcc} library
590 The @code{libtcc} library enables you to use TCC as a backend for
591 dynamic code generation.
593 Read the @file{libtcc.h} to have an overview of the API. Read
594 @file{libtcc_test.c} to have a very simple example.
596 The idea consists in giving a C string containing the program you want
597 to compile directly to @code{libtcc}. Then the @code{main()} function of
598 the compiled string can be launched.
600 @chapter Developper's guide
602 This chapter gives some hints to understand how TCC works. You can skip
603 it if you do not intend to modify the TCC code.
605 @section File reading
607 The @code{BufferedFile} structure contains the context needed to read a
608 file, including the current line number. @code{tcc_open()} opens a new
609 file and @code{tcc_close()} closes it. @code{inp()} returns the next
614 @code{next()} reads the next token in the current
615 file. @code{next_nomacro()} reads the next token without macro
618 @code{tok} contains the current token (see @code{TOK_xxx})
619 constants. Identifiers and keywords are also keywords. @code{tokc}
620 contains additionnal infos about the token (for example a constant value
621 if number or string token).
625 The parser is hardcoded (yacc is not necessary). It does only one pass,
630 @item For initialized arrays with unknown size, a first pass
631 is done to count the number of elements.
633 @item For architectures where arguments are evaluated in
634 reverse order, a first pass is done to reverse the argument order.
640 The types are stored in a single 'int' variable. It was choosen in the
641 first stages of development when tcc was much simpler. Now, it may not
642 be the best solution.
645 #define VT_INT 0 /* integer type */
646 #define VT_BYTE 1 /* signed byte type */
647 #define VT_SHORT 2 /* short type */
648 #define VT_VOID 3 /* void type */
649 #define VT_PTR 4 /* pointer */
650 #define VT_ENUM 5 /* enum definition */
651 #define VT_FUNC 6 /* function type */
652 #define VT_STRUCT 7 /* struct/union definition */
653 #define VT_FLOAT 8 /* IEEE float */
654 #define VT_DOUBLE 9 /* IEEE double */
655 #define VT_LDOUBLE 10 /* IEEE long double */
656 #define VT_BOOL 11 /* ISOC99 boolean type */
657 #define VT_LLONG 12 /* 64 bit integer */
658 #define VT_LONG 13 /* long integer (NEVER USED as type, only
660 #define VT_BTYPE 0x000f /* mask for basic type */
661 #define VT_UNSIGNED 0x0010 /* unsigned type */
662 #define VT_ARRAY 0x0020 /* array type (also has VT_PTR) */
663 #define VT_BITFIELD 0x0040 /* bitfield modifier */
665 #define VT_STRUCT_SHIFT 16 /* structure/enum name shift (16 bits left) */
668 When a reference to another type is needed (for pointers, functions and
669 structures), the @code{32 - VT_STRUCT_SHIFT} high order bits are used to
670 store an identifier reference.
672 The @code{VT_UNSIGNED} flag can be set for chars, shorts, ints and long
675 Arrays are considered as pointers @code{VT_PTR} with the flag
678 The @code{VT_BITFIELD} flag can be set for chars, shorts, ints and long
679 longs. If it is set, then the bitfield position is stored from bits
680 VT_STRUCT_SHIFT to VT_STRUCT_SHIFT + 5 and the bit field size is stored
681 from bits VT_STRUCT_SHIFT + 6 to VT_STRUCT_SHIFT + 11.
683 @code{VT_LONG} is never used except during parsing.
685 During parsing, the storage of an object is also stored in the type
689 #define VT_EXTERN 0x00000080 /* extern definition */
690 #define VT_STATIC 0x00000100 /* static variable */
691 #define VT_TYPEDEF 0x00000200 /* typedef definition */
696 All symbols are stored in hashed symbol stacks. Each symbol stack
697 contains @code{Sym} structures.
699 @code{Sym.v} contains the symbol name (remember
700 an idenfier is also a token, so a string is never necessary to store
701 it). @code{Sym.t} gives the type of the symbol. @code{Sym.r} is usually
702 the register in which the corresponding variable is stored. @code{Sym.c} is
703 usually a constant associated to the symbol.
705 Four main symbol stacks are defined:
710 for the macros (@code{#define}s).
713 for the global variables, functions and types.
716 for the local variables, functions and types.
719 for the local labels (for @code{goto}).
723 @code{sym_push()} is used to add a new symbol in the local symbol
724 stack. If no local symbol stack is active, it is added in the global
727 @code{sym_pop(st,b)} pops symbols from the symbol stack @var{st} until
728 the symbol @var{b} is on the top of stack. If @var{b} is NULL, the stack
731 @code{sym_find(v)} return the symbol associated to the identifier
732 @var{v}. The local stack is searched first from top to bottom, then the
737 The generated code and datas are written in sections. The structure
738 @code{Section} contains all the necessary information for a given
739 section. @code{new_section()} creates a new section. ELF file semantics
740 is assumed for each section.
742 The following sections are predefined:
747 is the section containing the generated code. @var{ind} contains the
748 current position in the code section.
751 contains initialized data
754 contains uninitialized data
757 @itemx lbounds_section
758 are used when bound checking is activated
761 @itemx stabstr_section
762 are used when debugging is actived to store debug information
765 @itemx strtab_section
766 contain the exported symbols (currently only used for debugging).
770 @section Code generation
772 @subsection Introduction
774 The TCC code generator directly generates linked binary code in one
775 pass. It is rather unusual these days (see gcc for example which
776 generates text assembly), but it allows to be very fast and surprisingly
779 The TCC code generator is register based. Optimization is only done at
780 the expression level. No intermediate representation of expression is
781 kept except the current values stored in the @emph{value stack}.
783 On x86, three temporary registers are used. When more registers are
784 needed, one register is flushed in a new local variable.
786 @subsection The value stack
788 When an expression is parsed, its value is pushed on the value stack
789 (@var{vstack}). The top of the value stack is @var{vtop}. Each value
790 stack entry is the structure @code{SValue}.
792 @code{SValue.t} is the type. @code{SValue.r} indicates how the value is
793 currently stored in the generated code. It is usually a CPU register
794 index (@code{REG_xxx} constants), but additionnal values and flags are
798 #define VT_CONST 0x00f0
799 #define VT_LLOCAL 0x00f1
800 #define VT_LOCAL 0x00f2
801 #define VT_CMP 0x00f3
802 #define VT_JMP 0x00f4
803 #define VT_JMPI 0x00f5
804 #define VT_LVAL 0x0100
805 #define VT_SYM 0x0200
806 #define VT_MUSTCAST 0x0400
807 #define VT_MUSTBOUND 0x0800
808 #define VT_BOUNDED 0x8000
809 #define VT_LVAL_BYTE 0x1000
810 #define VT_LVAL_SHORT 0x2000
811 #define VT_LVAL_UNSIGNED 0x4000
812 #define VT_LVAL_TYPE (VT_LVAL_BYTE | VT_LVAL_SHORT | VT_LVAL_UNSIGNED)
818 indicates that the value is a constant. It is stored in the union
819 @code{SValue.c}, depending on its type.
822 indicates a local variable pointer at offset @code{SValue.c.i} in the
826 indicates that the value is actually stored in the CPU flags (i.e. the
827 value is the consequence of a test). The value is either 0 or 1. The
828 actual CPU flags used is indicated in @code{SValue.c.i}.
830 If any code is generated which destroys the CPU flags, this value MUST be
831 put in a normal register.
835 indicates that the value is the consequence of a jmp. For VT_JMP, it is
836 1 if the jump is taken, 0 otherwise. For VT_JMPI it is inverted.
838 These values are used to compile the @code{||} and @code{&&} logical
841 If any code is generated, this value MUST be put in a normal
842 register. Otherwise, the generated code won't be executed if the jump is
846 is a flag indicating that the value is actually an lvalue (left value of
847 an assignment). It means that the value stored is actually a pointer to
850 Understanding the use @code{VT_LVAL} is very important if you want to
851 understand how TCC works.
855 @itemx VT_LVAL_UNSIGNED
856 if the lvalue has an integer type, then these flags give its real
857 type. The type alone is not suffisant in case of cast optimisations.
860 is a saved lvalue on the stack. @code{VT_LLOCAL} should be suppressed
861 ASAP because its semantics are rather complicated.
864 indicates that a cast to the value type must be performed if the value
865 is used (lazy casting).
868 indicates that the symbol @code{SValue.sym} must be added to the constant.
872 are only used for optional bound checking.
876 @subsection Manipulating the value stack
878 @code{vsetc()} and @code{vset()} pushes a new value on the value
879 stack. If the previous @code{vtop} was stored in a very unsafe place(for
880 example in the CPU flags), then some code is generated to put the
881 previous @code{vtop} in a safe storage.
883 @code{vpop()} pops @code{vtop}. In some cases, it also generates cleanup
884 code (for example if stacked floating point registers are used as on
887 The @code{gv(rc)} function generates code to evaluate @code{vtop} (the
888 top value of the stack) into registers. @var{rc} selects in which
889 register class the value should be put. @code{gv()} is the @emph{most
890 important function} of the code generator.
892 @code{gv2()} is the same as @code{gv()} but for the top two stack
895 @subsection CPU dependent code generation
897 See the @file{i386-gen.c} file to have an example.
902 must generate the code needed to load a stack value into a register.
905 must generate the code needed to store a register into a stack value
911 should generate a function call
914 @itemx gfunc_epilog()
915 should generate a function prolog/epilog.
918 must generate the binary integer operation @var{op} on the two top
919 entries of the stack which are guaranted to contain integer types.
921 The result value should be put on the stack.
924 same as @code{gen_opi()} for floating point operations. The two top
925 entries of the stack are guaranted to contain floating point values of
929 integer to floating point conversion.
932 floating point to integer conversion.
935 floating point to floating point of different size conversion.
937 @item gen_bounded_ptr_add()
938 @item gen_bounded_ptr_deref()
939 are only used for bound checking.
943 @section Optimizations done
945 Constant propagation is done for all operations. Multiplications and
946 divisions are optimized to shifts when appropriate. Comparison
947 operators are optimized by maintaining a special cache for the
948 processor flags. &&, || and ! are optimized by maintaining a special
949 'jump target' value. No other jump optimization is currently performed
950 because it would require to store the code in a more abstract fashion.