1 \input texinfo @c -*-texinfo-*-
3 @c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!!
4 @c NOTE THIS IS NOT A GOOD EXAMPLE OF HOW TO DO A MANUAL. FIXME!!!
8 @setfilename treelang.info
10 @include gcc-common.texi
12 @set copyrights-treelang 1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005
14 @set email-general gcc@@gcc.gnu.org
15 @set email-bugs gcc-bugs@@gcc.gnu.org or bug-gcc@@gnu.org
16 @set email-patches gcc-patches@@gcc.gnu.org
17 @set path-treelang gcc/gcc/treelang
19 @set which-treelang GCC-@value{version-GCC}
22 @set email-josling tej@@melbpc.org.au
23 @set www-josling http://www.geocities.com/timjosling
25 @c This tells @include'd files that they're part of the overall TREELANG doc
26 @c set. (They might be part of a higher-level doc set too.)
29 @c @setfilename usetreelang.info
30 @c @setfilename maintaintreelang.info
31 @c To produce the full manual, use the "treelang.info" setfilename, and
32 @c make sure the following do NOT begin with '@c' (and the @clear lines DO)
35 @c To produce a user-only manual, use the "usetreelang.info" setfilename, and
36 @c make sure the following does NOT begin with '@c':
38 @c To produce a maintainer-only manual, use the "maintaintreelang.info" setfilename,
39 @c and make sure the following does NOT begin with '@c':
44 @settitle Using and Maintaining GNU Treelang
47 @c seems reasonable to assume at least one of INTERNALS or USING is set...
49 @settitle Using GNU Treelang
52 @settitle Maintaining GNU Treelang
54 @c then again, have some fun
57 @settitle Doing Very Little at all with GNU Treelang
65 @c Cause even numbered pages to be printed on the left hand side of
66 @c the page and odd numbered pages to be printed on the right hand
67 @c side of the page. Using this, you can print on both sides of a
68 @c sheet of paper and have the text on the same part of the sheet.
70 @c The text on right hand pages is pushed towards the right hand
71 @c margin and the text on left hand pages is pushed toward the left
73 @c (To provide the reverse effect, set bindingoffset to -0.75in.)
76 @c \global\bindingoffset=0.75in
77 @c \global\normaloffset =0.75in
81 Copyright @copyright{} @value{copyrights-treelang} Free Software Foundation, Inc.
83 Permission is granted to copy, distribute and/or modify this document
84 under the terms of the GNU Free Documentation License, Version 1.2 or
85 any later version published by the Free Software Foundation; with the
86 Invariant Sections being ``GNU General Public License'', the Front-Cover
87 texts being (a) (see below), and with the Back-Cover Texts being (b)
88 (see below). A copy of the license is included in the section entitled
89 ``GNU Free Documentation License''.
91 (a) The FSF's Front-Cover Text is:
95 (b) The FSF's Back-Cover Text is:
97 You have freedom to copy and modify this GNU Manual, like GNU
98 software. Copies published by the Free Software Foundation raise
99 funds for GNU development.
103 @dircategory Software development
105 * treelang: (treelang). The GNU Treelang compiler.
109 This file documents the use and the internals of the GNU Treelang
110 (@code{treelang}) compiler. At the moment this manual is not
111 incorporated into the main GCC manual as it is incomplete. It
112 corresponds to the @value{which-treelang} version of @code{treelang}.
116 This file documents the internals of the GNU Treelang (@code{treelang}) compiler.
117 It corresponds to the @value{which-treelang} version of @code{treelang}.
120 This file documents the use of the GNU Treelang (@code{treelang}) compiler.
121 It corresponds to the @value{which-treelang} version of @code{treelang}.
124 Published by the Free Software Foundation
125 51 Franklin Street, Fifth Floor
126 Boston, MA 02110-1301 USA
131 @setchapternewpage odd
136 @title Using and Maintaining GNU Treelang
140 @title Using GNU Treelang
143 @title Maintaining GNU Treelang
148 @vskip 0pt plus 1filll
149 Published by the Free Software Foundation @*
150 51 Franklin Street, Fifth Floor@*
151 Boston, MA 02110-1301, USA@*
152 @c Last printed ??ber, 19??.@*
153 @c Printed copies are available for $? each.@*
162 @node Top, Copying,, (dir)
168 This manual documents how to run, install and maintain @code{treelang}.
169 It also documents the features and incompatibilities in the @value{which-treelang}
170 version of @code{treelang}.
175 This manual documents how to run and install @code{treelang}.
176 It also documents the features and incompatibilities in the @value{which-treelang}
177 version of @code{treelang}.
180 This manual documents how to maintain @code{treelang}.
181 It also documents the features and incompatibilities in the @value{which-treelang}
182 version of @code{treelang}.
190 * GNU Free Documentation License::
193 * What is GNU Treelang?::
196 * Compiler Overview::
200 * treelang internals::
208 --- The Detailed Node Listing ---
212 * Interoperating with C and C++::
217 * treelang compiler interfaces::
220 treelang compiler interfaces
223 * treelang main compiler::
225 treelang main compiler
227 * Interfacing to toplev.c::
228 * Interfacing to the garbage collection::
229 * Interfacing to the code generation code. ::
244 @unnumbered Contributors to GNU Treelang
248 Treelang was based on 'toy' by Richard Kenner, and also uses code from
249 the GCC core code tree. Tim Josling first created the language and
250 documentation, based on the GCC Fortran compiler's documentation
251 framework. Treelang was updated to use the TreeSSA infrastructure by
256 The packaging and compiler portions of GNU Treelang are based largely
258 @xref{Contributors,,Contributors to GCC,GCC,Using and Maintaining GCC},
259 for more information.
262 There is no specific run-time library for treelang, other than the
266 It would have been difficult to build treelang without access to Joachim
267 Nadler's guide to writing a front end to GCC (written in German). A
268 translation of this document into English is available via the
269 CobolForGCC project or via the documentation links from the GCC home
270 page @uref{http://gcc.gnu.org}.
273 @include funding.texi
275 @node Getting Started
276 @chapter Getting Started
277 @cindex getting started
282 Treelang is a sample language, useful only to help people understand how
283 to implement a new language front end to GCC. It is not a useful
284 language in itself other than as an example or basis for building a new
285 language. Therefore only language developers are likely to have an
288 This manual assumes familiarity with GCC, which you can obtain by using
289 it and by reading the manuals @samp{Using the GNU Compiler Collection (GCC)}
290 and @samp{GNU Compiler Collection (GCC) Internals}.
292 To install treelang, follow the GCC installation instructions,
293 taking care to ensure you specify treelang in the configure step by adding
294 treelang to the list of languages specified by @option{--enable-languages},
295 e.g.@: @samp{--enable-languages=all,treelang}.
297 If you're generally curious about the future of
298 @code{treelang}, see @ref{Projects}.
299 If you're curious about its past,
300 see @ref{Contributors}.
302 To see a few of the questions maintainers of @code{treelang} have,
303 and that you might be able to answer,
304 see @ref{Open Questions}.
307 @node What is GNU Treelang?, Lexical Syntax, Getting Started, Top
308 @chapter What is GNU Treelang?
309 @cindex concepts, basic
310 @cindex basic concepts
312 GNU Treelang, or @code{treelang}, is designed initially as a free
313 replacement for, or alternative to, the 'toy' language, but which is
314 amenable to inclusion within the GCC source tree.
316 @code{treelang} is largely a cut down version of C, designed to showcase
317 the features of the GCC code generation back end. Only those features
318 that are directly supported by the GCC code generation back end are
319 implemented. Features are implemented in a manner which is easiest and
320 clearest to implement. Not all or even most code generation back end
321 features are implemented. The intention is to add features incrementally
322 until most features of the GCC back end are implemented in treelang.
324 The main features missing are structures, arrays and pointers.
326 A sample program follows:
329 // @r{function prototypes}
330 // @r{function 'add' taking two ints and returning an int}
331 external_definition int add(int arg1, int arg2);
332 external_definition int subtract(int arg3, int arg4);
333 external_definition int first_nonzero(int arg5, int arg6);
334 external_definition int double_plus_one(int arg7);
336 // @r{function definition}
339 // @r{return the sum of arg1 and arg2}
351 // @r{aaa is a variable, of type integer and allocated at the start of}
354 // @r{set aaa to the value returned from add, when passed arg7 and arg7 as}
355 // @r{the two parameters}
358 aaa=subtract(subtract(aaa, arg7), arg7) + 1;
364 // @r{C-like if statement}
376 @node Lexical Syntax, Parsing Syntax, What is GNU Treelang?, Top
377 @chapter Lexical Syntax
378 @cindex Lexical Syntax
380 Treelang programs consist of whitespace, comments, keywords and names.
384 Whitespace consists of the space character, a tab, and the end of line
385 character. Line terminations are as defined by the
386 standard C library. Whitespace is ignored except within comments,
387 and where it separates parts of the program. In the example below, A and
388 B are two separate names separated by whitespace.
395 Comments consist of @samp{//} followed by any characters up to the end
396 of the line. C style comments (/* */) are not supported. For example,
397 the assignment below is followed by a not very helpful comment.
400 x = 1; // @r{Set X to 1}
404 Keywords consist of any of the following reserved words or symbols:
408 used to start the statements in a function
410 used to end the statements in a function
412 start list of function arguments, or to change the precedence of operators in
415 end list or prioritized operators in expression
417 used to separate parameters in a function prototype or in a function call
419 used to end a statement
421 addition, or unary plus for signed literals
423 subtraction, or unary minus for signed literals
431 begin 'else' portion of IF statement
433 indicate variable is permanent, or function has file scope only
435 indicate that variable is allocated for the life of the current scope
436 @item external_reference
437 indicate that variable or function is defined in another file
438 @item external_definition
439 indicate that variable or function is to be accessible from other files
441 variable is an integer (same as C int)
443 variable is a character (same as C char)
445 variable is unsigned. If this is not present, the variable is signed
447 start function return statement
449 used as function type to indicate function returns nothing
454 Names consist of any letter or "_" followed by any number of letters,
455 numbers, or "_". "$" is not allowed in a name. All names must be globally
456 unique, i.e. may not be used twice in any context, and must
457 not be a keyword. Names and keywords are case sensitive. For example:
463 are all different names.
467 @node Parsing Syntax, Compiler Overview, Lexical Syntax, Top
468 @chapter Parsing Syntax
469 @cindex Parsing Syntax
471 Declarations are built up from the lexical elements described above. A
472 file may contain one of more declarations.
477 declaration: variable declaration OR function prototype OR function declaration
480 Function Prototype: storage type NAME ( optional_parameter_list )
483 static int add (int a, int b)
487 variable_declaration: storage type NAME initial;
495 A variable declaration can be outside a function, or at the start of a
499 storage: automatic OR static OR external_reference OR external_definition
501 This defines the scope, duration and visibility of a function or variable
506 automatic: This means a variable is allocated at start of the current scope and
507 released when the current scope is exited. This can only be used for variables
508 within functions. It cannot be used for functions.
511 static: This means a variable is allocated at start of program and
512 remains allocated until the program as a whole ends. For a function, it
513 means that the function is only visible within the current file.
516 external_definition: For a variable, which must be defined outside a
517 function, it means that the variable is visible from other files. For a
518 function, it means that the function is visible from another file.
521 external_reference: For a variable, which must be defined outside a
522 function, it means that the variable is defined in another file. For a
523 function, it means that the function is defined in another file.
528 type: int OR unsigned int OR char OR unsigned char OR void
530 This defines the data type of a variable or the return type of a function.
535 int: The variable is a signed integer. The function returns a signed integer.
538 unsigned int: The variable is an unsigned integer. The function returns an unsigned integer.
541 char: The variable is a signed character. The function returns a signed character.
544 unsigned char: The variable is an unsigned character. The function returns an unsigned character.
549 parameter_list OR parameter [, parameter]...
552 parameter: variable_declaration ,
554 The variable declarations must not have initializations.
560 value: integer_constant
562 Values without a unary plus or minus are considered to be unsigned.
568 function_declaration: name @{ variable_declarations statements @}
570 A function consists of the function name then the declarations (if any)
571 and statements (if any) within one pair of braces.
573 The details of the function arguments come from the function
574 prototype. The function prototype must precede the function declaration
578 statement: if_statement OR expression_statement OR return_statement
581 if_statement: if ( expression ) @{ variable_declarations statements @}
582 else @{ variable_declarations statements @}
584 The first lot of statements is executed if the expression is
585 nonzero. Otherwise the second lot of statements is executed. Either
586 list of statements may be empty, but both sets of braces and the else must be present.
600 expression_statement: expression;
602 The expression is executed, including any side effects.
605 return_statement: return expression_opt;
607 Returns from the function. If the function is void, the expression must
608 be absent, and if the function is not void the expression must be
612 expression: variable OR integer_constant OR expression + expression
613 OR expression - expression OR expression == expression OR ( expression )
614 OR variable = expression OR function_call
616 An expression can be a constant or a variable reference or a
617 function_call. Expressions can be combined as a sum of two expressions
618 or the difference of two expressions, or an equality test of two
619 expressions. An assignment is also an expression. Expressions and operator
620 precedence work as in C.
623 function_call: function_name ( optional_comma_separated_expressions )
625 This invokes the function, passing to it the values of the expressions
626 as actual parameters.
631 @node Compiler Overview, TREELANG and GCC, Parsing Syntax, Top
632 @chapter Compiler Overview
633 treelang is run as part of the GCC compiler.
641 It reads a user's program, stored in a file and containing instructions
642 written in the appropriate language (Treelang, C, and so on). This file
643 contains @dfn{source code}.
645 @cindex translation of user programs
647 @cindex code, machine
650 It translates the user's program into instructions a computer can carry
651 out more quickly than it takes to translate the instructions in the
652 first place. These instructions are called @dfn{machine code}---code
653 designed to be efficiently translated and processed by a machine such as
654 a computer. Humans usually aren't as good writing machine code as they
655 are at writing Treelang or C, because it is easy to make tiny mistakes
656 writing machine code. When writing Treelang or C, it is easy to make
657 big mistakes. But you can only make one mistake, because the compiler
658 stops after it finds any problem.
661 @cindex bugs, finding
662 @cindex @code{gdb}, command
663 @cindex commands, @code{gdb}
665 It provides information in the generated machine code
666 that can make it easier to find bugs in the program
667 (using a debugging tool, called a @dfn{debugger},
672 @cindex @code{ld} command
673 @cindex commands, @code{ld}
675 It locates and gathers machine code already generated to perform actions
676 requested by statements in the user's program. This machine code is
677 organized into @dfn{libraries} and is located and gathered during the
678 @dfn{link} phase of the compilation process. (Linking often is thought
679 of as a separate step, because it can be directly invoked via the
680 @code{ld} command. However, the @code{gcc} command, as with most
681 compiler commands, automatically performs the linking step by calling on
682 @code{ld} directly, unless asked to not do so by the user.)
684 @cindex language, incorrect use of
685 @cindex incorrect use of language
687 It attempts to diagnose cases where the user's program contains
688 incorrect usages of the language. The @dfn{diagnostics} produced by the
689 compiler indicate the problem and the location in the user's source file
690 where the problem was first noticed. The user can use this information
691 to locate and fix the problem.
693 The compiler stops after the first error. There are no plans to fix
694 this, ever, as it would vastly complicate the implementation of treelang
695 to little or no benefit.
697 @cindex diagnostics, incorrect
698 @cindex incorrect diagnostics
699 @cindex error messages, incorrect
700 @cindex incorrect error messages
701 (Sometimes an incorrect usage of the language leads to a situation where
702 the compiler can not make any sense of what it reads---while a human
703 might be able to---and thus ends up complaining about an incorrect
704 ``problem'' it encounters that, in fact, reflects a misunderstanding of
705 the programmer's intention.)
708 @cindex questionable instructions
710 There are a few warnings in treelang. For example an unused static function
711 generate a warnings when -Wunused-function is specified, similarly an unused
712 static variable generates a warning when -Wunused-variable are specified.
713 The only treelang specific warning is a warning when an expression is in a
714 return statement for functions that return void.
717 @cindex components of treelang
718 @cindex @code{treelang}, components of
719 @code{treelang} consists of several components:
721 @cindex @code{gcc}, command
722 @cindex commands, @code{gcc}
725 A modified version of the @code{gcc} command, which also might be
726 installed as the system's @code{cc} command.
727 (In many cases, @code{cc} refers to the
728 system's ``native'' C compiler, which
729 might be a non-GNU compiler, or an older version
730 of @code{GCC} considered more stable or that is
731 used to build the operating system kernel.)
733 @cindex @code{treelang}, command
734 @cindex commands, @code{treelang}
736 The @code{treelang} command itself.
739 The @code{libc} run-time library. This library contains the machine
740 code needed to support capabilities of the Treelang language that are
741 not directly provided by the machine code generated by the
742 @code{treelang} compilation phase. This is the same library that the
743 main C compiler uses (libc).
745 @cindex @code{tree1}, program
746 @cindex programs, @code{tree1}
748 @cindex @code{as} command
749 @cindex commands, @code{as}
750 @cindex assembly code
751 @cindex code, assembly
753 The compiler itself, is internally named @code{tree1}.
755 Note that @code{tree1} does not generate machine code directly---it
756 generates @dfn{assembly code} that is a more readable form
757 of machine code, leaving the conversion to actual machine code
758 to an @dfn{assembler}, usually named @code{as}.
761 @code{GCC} is often thought of as ``the C compiler'' only,
762 but it does more than that.
763 Based on command-line options and the names given for files
764 on the command line, @code{gcc} determines which actions to perform, including
765 preprocessing, compiling (in a variety of possible languages), assembling,
768 @cindex driver, gcc command as
769 @cindex @code{gcc}, command as driver
770 @cindex executable file
771 @cindex files, executable
773 @cindex programs, cc1
776 @cindex programs, cpp
777 For example, the command @samp{gcc foo.c} @dfn{drives} the file
778 @file{foo.c} through the preprocessor @code{cpp}, then
779 the C compiler (internally named
780 @code{cc1}), then the assembler (usually @code{as}), then the linker
781 (@code{ld}), producing an executable program named @file{a.out} (on
784 @cindex treelang program
785 @cindex programs, treelang
786 As another example, the command @samp{gcc foo.tree} would do much the
787 same as @samp{gcc foo.c}, but instead of using the C compiler named
788 @code{cc1}, @code{gcc} would use the treelang compiler (named
789 @code{tree1}). However there is no preprocessor for treelang.
791 @cindex @code{tree1}, program
792 @cindex programs, @code{tree1}
793 In a GNU Treelang installation, @code{gcc} recognizes Treelang source
794 files by name just like it does C and C++ source files. It knows to use
795 the Treelang compiler named @code{tree1}, instead of @code{cc1} or
796 @code{cc1plus}, to compile Treelang files. If a file's name ends in
797 @code{.tree} then GCC knows that the program is written in treelang. You
798 can also manually override the language.
800 @cindex @code{gcc}, not recognizing Treelang source
801 @cindex unrecognized file format
802 @cindex file format not recognized
803 Non-Treelang-related operation of @code{gcc} is generally
804 unaffected by installing the GNU Treelang version of @code{gcc}.
805 However, without the installed version of @code{gcc} being the
806 GNU Treelang version, @code{gcc} will not be able to compile
807 and link Treelang programs.
809 @cindex printing version information
810 @cindex version information, printing
811 The command @samp{gcc -v x.tree} where @samp{x.tree} is a file which
812 must exist but whose contents are ignored, is a quick way to display
813 version information for the various programs used to compile a typical
814 Treelang source file.
816 The @code{tree1} program represents most of what is unique to GNU
817 Treelang; @code{tree1} is a combination of two rather large chunks of
820 @cindex GCC Back End (GBE)
822 @cindex @code{GCC}, back end
823 @cindex back end, GCC
824 @cindex code generator
825 One chunk is the so-called @dfn{GNU Back End}, or GBE,
826 which knows how to generate fast code for a wide variety of processors.
827 The same GBE is used by the C, C++, and Treelang compiler programs @code{cc1},
828 @code{cc1plus}, and @code{tree1}, plus others.
829 Often the GBE is referred to as the ``GCC back end'' or
830 even just ``GCC''---in this manual, the term GBE is used
831 whenever the distinction is important.
833 @cindex GNU Treelang Front End (TFE)
835 @cindex @code{treelang}, front end
836 @cindex front end, @code{treelang}
837 The other chunk of @code{tree1} is the majority of what is unique about
838 GNU Treelang---the code that knows how to interpret Treelang programs to
839 determine what they are intending to do, and then communicate that
840 knowledge to the GBE for actual compilation of those programs. This
841 chunk is called the @dfn{Treelang Front End} (TFE). The @code{cc1} and
842 @code{cc1plus} programs have their own front ends, for the C and C++
843 languages, respectively. These fronts ends are responsible for
844 diagnosing incorrect usage of their respective languages by the programs
845 the process, and are responsible for most of the warnings about
846 questionable constructs as well. (The GBE in principle handles
847 producing some warnings, like those concerning possible references to
848 undefined variables, but these warnings should not occur in treelang
849 programs as the front end is meant to pick them up first).
851 Because so much is shared among the compilers for various languages,
852 much of the behavior and many of the user-selectable options for these
853 compilers are similar.
854 For example, diagnostics (error messages and
855 warnings) are similar in appearance; command-line
856 options like @samp{-Wall} have generally similar effects; and the quality
857 of generated code (in terms of speed and size) is roughly similar
858 (since that work is done by the shared GBE).
860 @node TREELANG and GCC, Compiler, Compiler Overview, Top
861 @chapter Compile Treelang, C, or Other Programs
862 @cindex compiling programs
863 @cindex programs, compiling
865 @cindex @code{gcc}, command
866 @cindex commands, @code{gcc}
867 A GNU Treelang installation includes a modified version of the @code{gcc}
870 In a non-Treelang installation, @code{gcc} recognizes C, C++,
871 and Objective-C source files.
873 In a GNU Treelang installation, @code{gcc} also recognizes Treelang source
874 files and accepts Treelang-specific command-line options, plus some
875 command-line options that are designed to cater to Treelang users
876 but apply to other languages as well.
878 @xref{G++ and GCC,,Programming Languages Supported by GCC,GCC,Using
879 the GNU Compiler Collection (GCC)},
880 for information on the way different languages are handled
881 by the GCC compiler (@code{gcc}).
883 You can use this, combined with the output of the @samp{gcc -v x.tree}
884 command to get the options applicable to treelang. Treelang programs
885 must end with the suffix @samp{.tree}.
889 Treelang programs are not by default run through the C
890 preprocessor by @code{gcc}. There is no reason why they cannot be run through the
891 preprocessor manually, but you would need to prevent the preprocessor
892 from generating #line directives, using the @samp{-P} option, otherwise
893 tree1 will not accept the input.
895 @node Compiler, Other Languages, TREELANG and GCC, Top
896 @chapter The GNU Treelang Compiler
898 The GNU Treelang compiler, @code{treelang}, supports programs written
899 in the GNU Treelang language.
901 @node Other Languages, treelang internals, Compiler, Top
902 @chapter Other Languages
905 * Interoperating with C and C++::
908 @node Interoperating with C and C++, , Other Languages, Other Languages
909 @section Tools and advice for interoperating with C and C++
911 The output of treelang programs looks like C program code to the linker
912 and everybody else, so you should be able to freely mix treelang and C
913 (and C++) code, with one proviso.
915 C promotes small integer types to 'int' when used as function parameters and
916 return values in non-prototyped functions. Since treelang has no
917 non-prototyped functions, the treelang compiler does not do this.
920 @node treelang internals, Open Questions, Other Languages, Top
921 @chapter treelang internals
925 * treelang compiler interfaces::
929 @node treelang files, treelang compiler interfaces, treelang internals, treelang internals
930 @section treelang files
932 To create a compiler that integrates into GCC, you need create many
933 files. Some of the files are integrated into the main GCC makefile, to
934 build the various parts of the compiler and to run the test
935 suite. Others are incorporated into various GCC programs such as
936 @file{gcc.c}. Finally you must provide the actual programs comprising your
946 COPYING. This is the copyright file, assuming you are going to use the
947 GNU General Public License. You probably need to use the GPL because if
948 you use the GCC back end your program and the back end are one program,
949 and the back end is GPLed.
951 This need not be present if the language is incorporated into the main
952 GCC tree, as the main GCC directory has this file.
955 COPYING.LIB. This is the copyright file for those parts of your program
956 that are not to be covered by the GPL, but are instead to be covered by
957 the LGPL (Library or Lesser GPL). This license may be appropriate for
958 the library routines associated with your compiler. These are the
959 routines that are linked with the @emph{output} of the compiler. Using
960 the LGPL for these programs allows programs written using your compiler
961 to be closed source. For example LIBC is under the LGPL.
963 This need not be present if the language is incorporated into the main
964 GCC tree, as the main GCC directory has this file.
967 ChangeLog. Record all the changes to your compiler. Use the same format
968 as used in treelang as it is supported by an emacs editing mode and is
969 part of the FSF coding standard. Normally each directory has its own
970 changelog. The FSF standard allows but does not require a meaningful
971 comment on why the changes were made, above and beyond @emph{why} they
972 were made. In the author's opinion it is useful to provide this
976 treelang.texi. The manual, written in texinfo. Your manual would have a
977 different file name. You need not write it in texinfo if you don't want
978 do, but a lot of GNU software does use texinfo.
982 Make-lang.in. This file is part of the make file which in incorporated
983 with the GCC make file skeleton (Makefile.in in the GCC directory) to
984 make Makefile, as part of the configuration process.
986 Makefile in turn is the main instruction to actually build
987 everything. The build instructions are held in the main GCC manual and
988 web site so they are not repeated here.
990 There are some comments at the top which will help you understand what
993 There are make commands to build things, remove generated files with
994 various degrees of thoroughness, count the lines of code (so you know
995 how much progress you are making), build info and html files from the
996 texinfo source, run the tests etc.
999 README. Just a brief informative text file saying what is in this
1002 @cindex config-lang.in
1004 config-lang.in. This file is read by the configuration progress and must
1005 be present. You specify the name of your language, the name(s) of the
1006 compiler(s) including preprocessors you are going to build, whether any,
1007 usually generated, files should be excluded from diffs (ie when making
1008 diff files to send in patches). Whether the equate 'stagestuff' is used
1013 lang.opt. This file is included into @file{gcc.c}, the main GCC driver, and
1014 tells it what options your language supports. This is also used to
1017 @cindex lang-specs.h
1019 lang-specs.h. This file is also included in @file{gcc.c}. It tells
1020 @file{gcc.c} when to call your programs and what options to send them. The
1021 mini-language 'specs' is documented in the source of @file{gcc.c}. Do not
1022 attempt to write a specs file from scratch - use an existing one as the base
1026 Your texi files. Texinfo can be used to build documentation in HTML,
1027 info, dvi and postscript formats. It is a tagged language, is documented
1028 in its own manual, and has its own emacs mode.
1031 Your programs. The relationships between all the programs are explained
1032 in the next section. You need to write or use the following programs:
1037 lexer. This breaks the input into words and passes these to the
1038 parser. This is @file{lex.l} in treelang, which is passed through flex, a lex
1039 variant, to produce C code @file{lex.c}. Note there is a school of thought
1040 that says real men hand code their own lexers. However, you may prefer to
1041 write far less code and use flex, as was done with treelang.
1044 parser. This breaks the program into recognizable constructs such as
1045 expressions, statements etc. This is @file{parse.y} in treelang, which is
1046 passed through bison, which is a yacc variant, to produce C code
1050 back end interface. This interfaces to the code generation back end. In
1051 treelang, this is @file{tree1.c} which mainly interfaces to @file{toplev.c} and
1052 @file{treetree.c} which mainly interfaces to everything else. Many languages
1053 mix up the back end interface with the parser, as in the C compiler for
1054 example. It is a matter of taste which way to do it, but with treelang
1055 it is separated out to make the back end interface cleaner and easier to
1059 header files. For function prototypes and common data items. One point
1060 to note here is that bison can generate a header files with all the
1061 numbers is has assigned to the keywords and symbols, and you can include
1062 the same header in your lexer. This technique is demonstrated in
1066 compiler main file. GCC comes with a file @file{toplev.c} which is a
1067 perfectly serviceable main program for your compiler. GNU Treelang uses
1068 @file{toplev.c} but other languages have been known to replace it with their
1069 own main program. Again this is a matter of taste and how much code you
1076 @node treelang compiler interfaces, Hints and tips, treelang files, treelang internals
1077 @section treelang compiler interfaces
1084 * treelang main compiler::
1087 @node treelang driver, treelang main compiler, treelang compiler interfaces, treelang compiler interfaces
1088 @subsection treelang driver
1090 The GCC compiler consists of a driver, which then executes the various
1091 compiler phases based on the instructions in the specs files.
1093 Typically a program's language will be identified from its suffix
1094 (e.g., @file{.tree}) for treelang programs.
1096 The driver (@file{gcc.c}) will then drive (exec) in turn a preprocessor,
1097 the main compiler, the assembler and the link editor. Options to GCC allow you
1098 to override all of this. In the case of treelang programs there is no
1099 preprocessor, and mostly these days the C preprocessor is run within the
1100 main C compiler rather than as a separate process, apparently for reasons of speed.
1102 You will be using the standard assembler and linkage editor so these are
1103 ignored from now on.
1105 You have to write your own preprocessor if you want one. This is usually
1106 totally language specific. The main point to be aware of is to ensure
1107 that you find some way to pass file name and line number information
1108 through to the main compiler so that it can tell the back end this
1109 information and so the debugger can find the right source line for each
1110 piece of code. That is all there is to say about the preprocessor except
1111 that the preprocessor will probably not be the slowest part of the
1112 compiler and will probably not use the most memory so don't waste too
1113 much time tuning it until you know you need to do so.
1115 @node treelang main compiler, , treelang driver, treelang compiler interfaces
1116 @subsection treelang main compiler
1118 The main compiler for treelang consists of @file{toplev.c} from the main GCC
1119 compiler, the parser, lexer and back end interface routines, and the
1120 back end routines themselves, of which there are many.
1122 @file{toplev.c} does a lot of work for you and you should almost certainly
1125 Writing this code is the hard part of creating a compiler using GCC. The
1126 back end interface documentation is incomplete and the interface is
1129 There are three main aspects to interfacing to the other GCC code.
1132 * Interfacing to toplev.c::
1133 * Interfacing to the garbage collection::
1134 * Interfacing to the code generation code. ::
1137 @node Interfacing to toplev.c, Interfacing to the garbage collection, treelang main compiler, treelang main compiler
1138 @subsubsection Interfacing to toplev.c
1140 In treelang this is handled mainly in tree1.c
1141 and partly in treetree.c. Peruse toplev.c for details of what you need
1144 @node Interfacing to the garbage collection, Interfacing to the code generation code. , Interfacing to toplev.c, treelang main compiler
1145 @subsubsection Interfacing to the garbage collection
1147 Interfacing to the garbage collection. In treelang this is mainly in
1150 Memory allocation in the compiler should be done using the ggc_alloc and
1151 kindred routines in ggc*.*. At the end of every 'function' in your language, toplev.c calls
1152 the garbage collection several times. The garbage collection calls mark
1153 routines which go through the memory which is still used, telling the
1154 garbage collection not to free it. Then all the memory not used is
1157 What this means is that you need a way to hook into this marking
1158 process. This is done by calling ggc_add_root. This provides the address
1159 of a callback routine which will be called duing garbage collection and
1160 which can call ggc_mark to save the storage. If storage is only
1161 used within the parsing of a function, you do not need to provide a way
1164 Note that you can also call ggc_mark_tree to mark any of the back end
1165 internal 'tree' nodes. This routine will follow the branches of the
1166 trees and mark all the subordinate structures. This is useful for
1167 example when you have created a variable declaration that will be used
1168 across multiple functions, or for a function declaration (from a
1169 prototype) that may be used later on. See the next item for more on the
1172 @node Interfacing to the code generation code. , , Interfacing to the garbage collection, treelang main compiler
1173 @subsubsection Interfacing to the code generation code.
1175 In treelang this is done in treetree.c. A typedef called 'tree' which is
1176 defined in tree.h and tree.def in the GCC directory and largely
1177 implemented in tree.c and stmt.c forms the basic interface to the
1180 In general you call various tree routines to generate code, either
1181 directly or through toplev.c. You build up data structures and
1182 expressions in similar ways.
1184 You can read some documentation on this which can be found via the GCC
1185 main web page. In particular, the documentation produced by Joachim
1186 Nadler and translated by Tim Josling can be quite useful. the C compiler
1187 also has documentation in the main GCC manual (particularly the current
1188 CVS version) which is useful on a lot of the details.
1190 In time it is hoped to enhance this document to provide a more
1191 comprehensive overview of this topic. The main gap is in explaining how
1192 it all works together.
1194 @node Hints and tips, , treelang compiler interfaces, treelang internals
1195 @section Hints and tips
1200 TAGS: Use the make ETAGS commands to create TAGS files which can be used in
1201 emacs to jump to any symbol quickly.
1204 GREP: grep is also a useful way to find all uses of a symbol.
1207 TREE: The main routines to look at are tree.h and tree.def. You will
1208 probably want a hardcopy of these.
1211 SAMPLE: look at the sample interfacing code in treetree.c. You can use
1212 gdb to trace through the code and learn about how it all works.
1215 GDB: the GCC back end works well with gdb. It traps abort() and allows
1216 you to trace back what went wrong.
1219 Error Checking: The compiler back end does some error and consistency
1220 checking. Often the result of an error is just no code being
1221 generated. You will then need to trace through and find out what is
1222 going wrong. The rtl dump files can help here also.
1225 rtl dump files: The main compiler documents these files which are dumps
1226 of the rtl (intermediate code) which is manipulated doing the code
1227 generation process. This can provide useful clues about what is going
1228 wrong. The rtl 'language' is documented in the main GCC manual.
1234 @node Open Questions, Bugs, treelang internals, Top
1235 @chapter Open Questions
1237 If you know GCC well, please consider looking at the file treetree.c and
1238 resolving any questions marked "???".
1240 @node Bugs, Service, Open Questions, Top
1241 @chapter Reporting Bugs
1243 @cindex reporting bugs
1245 You can report bugs to @email{@value{email-bugs}}. Please make
1246 sure bugs are real before reporting them. Follow the guidelines in the
1247 main GCC manual for submitting bug reports.
1253 @node Sending Patches, , Bugs, Bugs
1254 @section Sending Patches for GNU Treelang
1256 If you would like to write bug fixes or improvements for the GNU
1257 Treelang compiler, that is very helpful. Send suggested fixes to
1258 @email{@value{email-patches}}.
1260 @node Service, Projects, Bugs, Top
1261 @chapter How To Get Help with GNU Treelang
1263 If you need help installing, using or changing GNU Treelang, there are two
1269 Look in the service directory for someone who might help you for a fee.
1270 The service directory is found in the file named @file{SERVICE} in the
1274 Send a message to @email{@value{email-general}}.
1281 @node Projects, Index, Service, Top
1285 If you want to contribute to @code{treelang} by doing research,
1286 design, specification, documentation, coding, or testing,
1287 the following information should give you some ideas.
1289 Send a message to @email{@value{email-general}} if you plan to add a
1292 The main requirement for treelang is to add features and to add
1293 documentation. Features are things that the GCC back end can do but
1294 which are not reflected in treelang. Examples include structures,
1295 unions, pointers, arrays.
1299 @node Index, , Projects, Top