1 @c gm2-internals.texi describes the internals of gm2.
2 @c Copyright @copyright{} 2000-2023 Free Software Foundation, Inc.
4 @c This is part of the GM2 manual.
5 @c For copying conditions, see the file gcc/doc/include/fdl.texi.
7 @chapter GNU Modula-2 Internals
9 This document is a small step in the long journey of documenting the GNU
10 Modula-2 compiler and how it integrates with GCC.
11 The document is still in it's infancy.
14 * History:: How GNU Modula-2 came about.
15 * Overview:: Overview of the structure of GNU Modula-2.
16 * Integrating:: How the front end integrates with gcc.
17 * Passes:: What gets processed during each pass.
18 * Run time:: Integration of run time modules with the compiler.
19 * Scope rules:: Clarification of some the scope rules.
20 * Done list:: Progression of the GNU Modula-2 project.
21 * To do list:: Outstanding issues.
24 @node History, Overview, , Internals
27 This document is out of date and needs to be rewritten.
29 The Modula-2 compiler sources have come from the m2f compiler which
30 runs under GNU/Linux. The original m2f compiler was written in Modula-2
31 and was bootstrapped via a modified version of p2c 1.20. The m2f
32 compiler was a recursive descent which generated quadruples as
33 intermediate code. It also used C style calling convention wherever
34 possible and utilized a C structure for dynamic arrays.
36 @node Overview, Integrating, History, Internals
39 GNU Modula-2 uses flex and a machine generated recursive descent
40 parser. Most of the source code is written in Modula-2 and
41 bootstrapping is achieved via a modified version of p2c-1.20.
42 The modified p2c-1.20 is contained in the GNU Modula-2 source
43 tree as are a number of other tools necessary for bootstrapping.
45 The changes to p2c include:
49 allowing @code{DEFINITION MODULE FOR "C"}
51 fixes to abstract data types.
53 making p2c understand the 2nd Edition dialect of Modula-2.
55 introducing the @code{UNQUALIFIED} keyword.
57 allowing varargs (@code{...}) inside @code{DEFINITION MODULE FOR "C"} modules.
59 fixing the parser to understand commented @code{FORWARD} prototypes,
60 which are ignored by GNU Modula-2.
62 fixes to the @code{CASE} syntax for 2nd Edition Modula-2.
64 fixes to a @code{FOR} loop counting down to zero using a @code{CARDINAL}.
66 introducing an initialization section for each implementation module.
68 various porting improvements and general tidying up so that
69 it compiles with the gcc option @code{-Wall}.
72 GNU Modula-2 comes with PIM and ISO style libraries. The compiler
73 is built using PIM libraries and the source of the compiler
74 complies with the PIM dialect together with a few @code{C}
75 library calling extensions.
77 The compiler is a four pass compiler. The first pass tokenizes
78 the source code, creates scope and enumeration type symbols.
79 All tokens are placed into a dynamic buffer and subsequent passes reread
80 tokens and build types, quadruples and resolve hidden types.
83 GNU Modula-2 uses a technique of double book keeping @footnote{See the
84 excellent tutorial by Joachim Nadler translated by Tim Josling}.
85 @xref{Back end Access to Symbol Table, , , gcc}.
86 The front end builds a complete symbol table and a list of quadruples.
87 Each symbol is translated into a @code{gcc} equivalent after which
88 each quadruple is translated into a @code{gcc} @code{tree}.
90 @node Integrating, Passes, Overview, Internals
91 @section How the front end integrates with gcc
93 The M2Base and M2System
94 modules contain base types and system types respectively they
95 map onto GCC back-end data types.
97 @node Passes, Run time, Integrating, Internals
100 This section describes the general actions of each pass. The key to
101 building up the symbol table correctly is to ensure that the symbols
102 are only created in the scope where they were declared. This may seem
103 obvious (and easy) but it is complicated by two issues: firstly GNU
104 Modula-2 does not generate @code{.sym} files and so all imported
105 definition modules are parsed after the module is parsed; secondly the
106 import/export rules might mean that you can see and use a symbol
107 before it is declared in a completely different scope.
109 Here is a brief description of the lists of symbols maintained within
110 @code{DefImp} and @code{Module} symbols. It is these lists and actions
111 at each pass which manipulate these lists which solve the scoping and
112 visability of all symbols.
114 The @code{DefImp} symbol maintains the: @code{ExportQualified},
115 @code{ExportUnQualified}, @code{ExportRequest}, @code{IncludeList},
116 @code{ImportTree}, @code{ExportUndeclared},
117 @code{NeedToBeImplemented}, @code{LocalSymbols},
118 @code{EnumerationScopeList}, @code{Unresolved}, @code{ListOfVars},
119 @code{ListOfProcs} and @code{ListOfModules} lists.
121 The @code{Module} symbol maintains the: @code{LocalSymbols},
122 @code{ExportTree}, @code{IncludeList}, @code{ImportTree},
123 @code{ExportUndeclared}, @code{EnumerationScopeList},
124 @code{Unresolved}, @code{ListOfVars}, @code{ListOfProcs} and
125 @code{ListOfModules} lists.
127 Initially we discuss the lists which are common to both @code{DefImp}
128 and @code{Module} symbols, thereafter the lists peculiar to @code{DefImp}
129 and @code{Module} symbols are discussed.
131 The @code{ListOfVars}, @code{ListOfProcs} and @code{ListOfModules}
132 lists (common to both symbols) and simply contain a list of
133 variables, procedures and inner modules which are declared with this
134 definition/implementation or program module.
136 The @code{LocalSymbols} list (common to both symbols) contains a
137 complete list of symbols visible in this modules scope. The symbols in
138 this list may have been imported or exported from an inner module.
140 The @code{EnumerationScope} list (common to both symbols) defines all
141 visible enumeration symbols. When this module is parsed the contents
142 of these enumeration types are marked as visible. Internally to GNU
143 Modula-2 these form a pseudo scope (rather like a @code{WITH}
144 statement which temporarily makes the fields of the record visible).
146 The @code{ExportUndeclared} list (common to both symbols) contains a
147 list of all symbols marked as exported but are as yet undeclared.
149 The @code{IncludeList} is (common to both symbols) contains a list of
150 all modules imported by the @code{IMPORT modulename ;} construct.
152 The @code{ImportTree} (common to both symbols) contains a tree of all
153 imported identifiers.
155 The @code{ExportQualified} and @code{ExportUnQualified} trees (only
156 present in the @code{DefImp} symbol) contain identifiers which are
157 marked as @code{EXPORT QUALIFIED} and @code{EXPORT UNQUALIFIED}
160 The @code{NeedToBeImplemented} list (only present in the @code{DefImp}
161 symbol) and contains a list of all unresolved symbols which are exported.
165 During pass 1 each @code{DefImp} and @code{Module} symbol is
166 created. These are also placed into a list of outstanding sources to
167 be parsed. The import and export lists are recorded and each object
168 imported is created in the module from whence it is exported and added
169 into the imported list of the current module. Any exported objects are
170 placed into the export list and marked as qualified or unqualified.
172 Inner module symbols are also created and their import and export
173 lists are also processed. An import list will result in a symbol being
174 fetched (or created if it does not exist) from the outer scope and
175 placed into the scope of the inner module. An export list results in
176 each symbol being fetched or created in the current inner scope and
177 added to the outer scope. If the symbol has not yet been declared then
178 it is added to the current modules @code{ExportUndeclared} list.
180 Procedure symbols are created (the parameters are parsed but no more
181 symbols are created). Enumerated types are created, hidden types in
182 the definition modules are marked as such. All the rest of the Modula-2
183 syntax is parsed but no symbols are created.
187 This section discuss varient records and their representation within
188 the front end @file{gm2/gm2-compiler/SymbolTable.mod}. Records and
189 varient records are declared in pass 2.
191 Ordinary records are represented by the following symbol table entries:
203 | Name = this | SymRecordField [2]
204 | ListOfSons | +-------------------+
205 | +--------| | Name = foo |
206 | | [2] [3]| | Parent = [1] |
207 +-------------+ | Type = [Cardinal] |
208 | LocalSymbols| +-------------------+
216 +-------------------+
219 | Type = [Cardinal] |
220 +-------------------+
223 Whereas varient records are represented by the following symbol table
240 | Name = this | SymRecordField [2]
241 | ListOfSons | +-------------------+
242 | +--------| | Name = tag |
243 | | [2] [3]| | Parent = [1] |
244 | +--------+ | Type = [CHAR] |
245 | LocalSymbols| +-------------------+
252 SymVarient [3] SymFieldVarient [4]
253 +-------------------+ +-------------------+
254 | Parent = [1] | | Parent = [1] |
255 | ListOfSons | | ListOfSons |
256 | +--------------| | +--------------|
257 | | [4] [5] | | | [6] [7] |
258 +-------------------+ +-------------------+
261 +-------------------+
266 +-------------------+
268 SymRecordField [6] SymRecordField [7]
269 +-------------------+ +-------------------+
270 | Name = foo | | Name = bar |
271 | Parent = [1] | | Parent = [1] |
272 | Type = [CARDINAL] | | Type = [CHAR] |
273 +-------------------+ +-------------------+
276 +-------------------+
280 +-------------------+
283 Varient records which have nested @code{CASE} statements are
284 represented by the following symbol table entries:
303 | Name = this | SymRecordField [2]
304 | ListOfSons | +-------------------+
305 | +--------| | Name = tag |
306 | | [2] [3]| | Parent = [1] |
307 | +--------+ | Type = [CHAR] |
308 | LocalSymbols| +-------------------+
316 ('1st CASE') ('a' selector)
317 SymVarient [3] SymFieldVarient [4]
318 +-------------------+ +-------------------+
319 | Parent = [1] | | Parent = [1] |
320 | ListOfSons | | ListOfSons |
321 | +--------------| | +--------------|
322 | | [4] [5] | | | [6] [7] [8] |
323 +-------------------+ +-------------------+
327 +-------------------+
332 +-------------------+
334 SymRecordField [6] SymRecordField [7]
335 +-------------------+ +-------------------+
336 | Name = foo | | Name = bar |
337 | Parent = [1] | | Parent = [1] |
338 | Type = [CARDINAL] | | Type = [BOOLEAN] |
339 +-------------------+ +-------------------+
343 +-------------------+
348 +-------------------+
351 +-------------------+
355 +-------------------+
357 SymRecordField [10] SymRecordField [11]
358 +-------------------+ +-------------------+
359 | Name = bt | | Name = bf |
360 | Parent = [1] | | Parent = [1] |
361 | Type = [REAL] | | Type = [REAL] |
362 +-------------------+ +-------------------+
364 (TRUE selector) (FALSE selector)
365 SymFieldVarient [12] SymFieldVarient [13]
366 +-------------------+ +-------------------+
367 | Parent = [1] | | Parent = [1] |
368 | ListOfSons | | ListOfSons |
369 | +--------------| | +--------------|
370 | | [10] | | | [11] |
371 +-------------------+ +-------------------+
382 @subsection Declaration ordering
384 This section gives a few stress testing examples and walks though
385 the mechanics of the passes and how the lists of symbols are created.
387 The first example contains a nested module in which an enumeration
388 type is created and exported. A procedure declared before the nested
389 module uses the enumeration type.
394 PROCEDURE make (VAR c: colours) ;
403 colours = (red, blue, yellow, white) ;
413 @node Run time, Scope rules, Passes, Internals
416 This section describes how the GNU Modula-2 compiler interfaces with
417 the run time system. The modules which must be common to all library
418 collections are @code{M2RTS} and @code{SYSTEM}. In the PIM library
419 collection an implementation of @code{M2RTS} and @code{SYSTEM} exist;
420 likewise in the ISO library and ULM library collection these modules
423 The @code{M2RTS} module contains many of the base runtime features
424 required by the GNU Modula-2 compiler. For example @code{M2RTS}
425 contains the all the low level exception handling routines. These
426 include exception handlers for run time range checks for: assignments,
427 increments, decrements, static array access, dynamic array access, for
428 loop begin, for loop to, for loop increment, pointer via nil, function
429 without return, case value not specified and no exception. The
430 @code{M2RTS} module also contains the @code{HALT} and @code{LENGTH}
431 procedure. The ISO @code{SYSTEM} module contains a number of
432 @code{SHIFT} and @code{ROTATE} procedures which GNU Modula-2 will call
433 when wishing to shift and rotate multi-word set types.
435 @subsection Exception handling
437 This section describes how exception handling is implemented in GNU
438 Modula-2. We begin by including a simple Modula-2 program which uses
439 exception handling and provide the same program written in C++. The
440 compiler will translate the Modula-2 into the equivalent trees, just
441 like the C++ frontend. This ensures that the Modula-2 frontend will
442 not do anything that the middle and backend cannot process, which
443 ensures that migration through the later gcc releases will be smooth.
445 Here is an example of Modula-2 using exception handling:
450 FROM libc IMPORT printf ;
451 FROM Storage IMPORT ALLOCATE, DEALLOCATE ;
455 printf("fly main body\n") ;
458 printf("yes it worked\n")
460 printf("no it failed\n")
464 PROCEDURE tryFlying ;
466 printf("tryFlying main body\n");
469 printf("inside tryFlying exception routine\n") ;
470 IF (ip#NIL) AND (ip^=0)
477 PROCEDURE keepFlying ;
479 printf("keepFlying main body\n") ;
482 printf("inside keepFlying exception routine\n") ;
492 ip: POINTER TO INTEGER ;
500 Now the same program implemented in GNU C++
506 // a c++ example of Modula-2 exception handling
508 static int *ip = NULL;
512 printf("fly main body\n") ;
518 printf("yes it worked\n");
520 printf("no it failed\n");
524 * a C++ version of the Modula-2 example given in the ISO standard.
527 void tryFlying (void)
530 printf("tryFlying main body\n");
535 printf("inside tryFlying exception routine\n") ;
536 if ((ip != NULL) && ((*ip) == 0)) @{
539 goto again_tryFlying;
541 printf("did't handle exception here so we will call the next exception routine\n") ;
542 throw; // unhandled therefore call previous exception handler
546 void keepFlying (void)
549 printf("keepFlying main body\n") ;
554 printf("inside keepFlying exception routine\n");
556 ip = (int *)malloc(sizeof(int));
558 goto again_keepFlying;
560 throw; // unhandled therefore call previous exception handler
567 printf("all done\n");
571 The equivalent program in GNU C is given below. However the
572 use of @code{setjmp} and @code{longjmp} in creating an exception
573 handler mechanism is not used used by GNU C++ and GNU Java.
574 The GNU exception handling ABI uses @code{TRY_CATCH_EXPR} tree
575 nodes. Thus GNU Modula-2 generates trees which model the C++
576 code above, rather than the C code shown below. The code here
577 serves as a mental model (for readers who are familiar with C
578 but not of C++) of what is happening in the C++ code above.
585 typedef enum jmpstatus @{
591 struct setjmp_stack @{
593 struct setjmp_stack *next;
596 void pushsetjmp (void)
598 struct setjmp_stack *p = (struct setjmp_stack *)
599 malloc (sizeof (struct setjmp_stack));
605 void exception (void)
607 printf("invoking exception handler\n");
608 longjmp (head->env, jmp_exception);
614 longjmp (head->env, jmp_retry);
617 void popsetjmp (void)
619 struct setjmp_stack *p = head;
625 static int *ip = NULL;
629 printf("fly main body\n");
631 printf("ip == NULL\n");
635 printf("*ip == 0\n");
638 if ((4 / (*ip)) == 4)
639 printf("yes it worked\n");
641 printf("no it failed\n");
644 void tryFlying (void)
646 void tryFlying_m2_exception () @{
647 printf("inside tryFlying exception routine\n");
648 if ((ip != NULL) && ((*ip) == 0)) @{
658 t = setjmp (head->env);
659 @} while (t == jmp_retry);
661 if (t == jmp_exception) @{
662 /* exception called */
663 tryFlying_m2_exception ();
664 /* exception has not been handled, invoke previous handler */
665 printf("exception not handled here\n");
670 printf("tryFlying main body\n");
675 void keepFlying (void)
677 void keepFlying_m2_exception () @{
678 printf("inside keepFlying exception routine\n");
680 ip = (int *)malloc (sizeof (int));
689 t = setjmp (head->env);
690 @} while (t == jmp_retry);
692 if (t == jmp_exception) @{
693 /* exception called */
694 keepFlying_m2_exception ();
695 /* exception has not been handled, invoke previous handler */
699 printf("keepFlying main body\n");
707 printf("all done\n");
711 @node Scope rules, Done list, Run time, Internals
714 This section describes my understanding of the Modula-2 scope rules
715 with respect to enumerated types. If they are incorrect please
716 correct me by email @email{gaius@@gnu.org}. They also serve to
717 document the behaviour of GNU Modula-2 in these cirumstances.
719 In GNU Modula-2 the syntax for a type declaration is defined as:
722 TypeDeclaration := Ident "=" Type =:
724 Type := SimpleType | ArrayType
731 SimpleType := Qualident | Enumeration | SubrangeType =:
735 If the @code{TypeDeclaration} rule is satisfied by
736 @code{SimpleType} and @code{Qualident} ie:
743 then @code{foo} is said to be equivalent to @code{bar}. Thus
744 variables, parameters and record fields declared with either type will
745 be compatible with each other.
747 If, however, the @code{TypeDeclaration} rule is satisfied by any
748 alternative clause @code{ArrayType}, @code{RecordType},
749 @code{SetType}, @code{PointerType}, @code{ProcedureType},
750 @code{Enumeration} or @code{SubrangeType} then in these cases a new
751 type is created which is distinct from all other types. It will be
752 incompatible with all other user defined types.
754 It also has furthur consequences in that if bar was defined as an
755 enumerated type and foo is imported by another module then the
756 enumerated values are also visible in this module.
758 Consider the following modules:
761 DEFINITION MODULE impc ;
764 C = (red, blue, green) ;
770 DEFINITION MODULE impb ;
792 Here we see that the type @code{C} defined in module @code{impb} is
793 equivalent to the type @code{C} in module @code{impc}. Module
794 @code{impa} imports the type @code{C} from module @code{impb}
795 and at that point the enumeration values @code{red, blue, green}
796 (declared in module @code{impc}) are also visible.
798 The ISO Standand (p.41) in section 6.1.8 Import Lists states:
800 ``Following the module heading, a module may have a sequence of import
801 lists. An import list includes a list of the identifiers that are to
802 be explicitly imported into the module. Explicit import of an
803 enumeration type identifier implicitly imports the enumeration
804 constant identifiers of the enumeration type.
806 Imported identifiers are introduced into the module, thus extending
807 their scope, but they have a defining occurrence that appears elsewhere.
809 Every kind of module may include a sequence of import lists, whether it
810 is a program module, a definition module, an implementation module or
811 a local module. In the case of any other kind of module, the imported
812 identifiers may be used in the block of the module.''
814 These statements confirm that the previous example is legal. But it
815 prompts the question, what about implicit imports othersise known
816 as qualified references.
818 In section 6.10 Implicit Import and Export of the ISO Modula-2 standard
821 ``The set of identifiers that is imported or exported if an identifier
822 is explicitly imported or exported is called the (import and export)
823 closure of that identifier. Normally, the closure includes only the
824 explicitly imported or exported identifier. However, in the case
825 of the explicit import or export of an identifier of an enumeration
826 type, the closure also includes the identifiers of the values of that
829 Implicit export applies to the identifiers that are exported (qualified)
830 from separate modules, by virtue of their being the subject of a
831 definition module, as well as to export from a local module that
832 uses an export list.''
834 Clearly this means that the following is legal:
848 It also means that the following code is legal:
862 And also this code is legal:
876 And also that this code is legal:
879 DEFINITION MODULE impg ;
890 IMPLEMENTATION MODULE impg ;
899 Furthermore the following code is also legal as the new type, @code{C}
900 is declared and exported. Once exported all its enumerated fields
904 DEFINITION MODULE imph;
913 Here we see that the current scope is populated with the enumeration
914 fields @code{red, blue, green} and also it is possible to reference
915 these values via a qualified identifier.
918 IMPLEMENTATION MODULE imph;
934 @node Done list, To do list, Scope rules, Internals
942 Coroutines have been implemented. The @code{SYSTEM} module in
943 PIM-[234] now includes @code{TRANSFER}, @code{IOTRANSFER} and
944 @code{NEWPROCESS}. This module is available in the directory
945 @file{gm2/gm2-libs-coroutines}. Users of this module also have to
946 link with GNU Pthreads @code{-lpth}.
949 GM2 now works on the @code{opteron} 64 bit architecture. @code{make
950 gm2.paranoid} and @code{make check-gm2} pass.
953 GM2 can now be built as a cross compiler to the MinGW platform under
957 GM2 now works on the @code{sparc} architecture. @code{make
958 gm2.paranoid} and @code{make check-gm2} pass.
961 converted the regression test suite into the GNU dejagnu format.
962 In turn this can be grafted onto the GCC testsuite and can be
963 invoked as @code{make check-gm2}. GM2 should now pass all
967 provided access to a few compiler built-in constants
968 and twenty seven built-in C functions.
971 definition modules no longer have to @code{EXPORT QUALIFIED}
972 objects (as per PIM-3, PIM-4 and ISO).
975 implemented ISO Modula-2 sets. Large sets are now allowed,
976 no limits imposed. The comparison operators
977 @code{# = <= >= < >} all behave as per ISO standard.
978 The obvious use for large sets is
979 @code{SET OF CHAR}. These work well with gdb once it has been
980 patched to understand Modula-2 sets.
983 added @code{DEFINITION MODULE FOR "C"} method of linking
984 to C. Also added varargs handling in C definition modules.
987 cpp can be run on definition and implementation modules.
990 @samp{-fmakell} generates a temporary @code{Makefile} and
991 will build all dependant modules.
994 compiler will bootstrap itself and three generations of the
995 compiler all produce the same code.
998 the back end will generate code and assembly declarations for
999 modules containing global variables of all types. Procedure
1000 prologue/epilogue is created.
1003 all loop constructs, if then else, case statements and expressions.
1006 nested module initialization.
1009 pointers, arrays, procedure calls, nested procedures.
1012 front end @samp{gm2} can now compile and link modules.
1015 the ability to insert gnu asm statements within GNU Modula-2.
1018 inbuilt functions, @code{SIZE}, @code{ADR}, @code{TSIZE}, @code{HIGH} etc
1021 block becomes and complex procedure parameters (unbounded arrays, strings).
1024 the front end now utilizes GCC tree constants and types and is no
1025 longer tied to a 32 bit architecture, but reflects the 'configure'
1026 target machine description.
1029 fixed all C compiler warnings when gcc compiles the p2c generated C
1033 built a new parser which implements error recovery.
1036 added mechanism to invoke cpp to support conditional compilation if required.
1039 all @samp{Makefile}s are generated via @samp{./configure}
1043 @node To do list, , Done list, Internals
1046 What needs to be done:
1051 ISO library implementation needs to be completed and debugged.
1054 Easy access to other libraries using @code{-flibs=} so that libraries
1055 can be added into the @file{/usr/.../gcc-lib/gm2/...} structure.
1058 improve documentation, specifically this document which should
1059 also include a synopsis of 2nd Edition Modula-2.
1062 modifying @file{SymbolTable.mod} to make all the data structures dynamic.
1065 testing and fixing bugs