1 This is Info file suif1.info, produced by Makeinfo version 1.68 from
2 the input file suif1.texi.
4 This file documents the SUIF library.
6 Copyright (C) 1994 Stanford University. All rights reserved.
8 Permission is given to use, copy, and modify this documentation for
9 any non-commercial purpose as long as this copyright notice is not
10 removed. All other uses, including redistribution in whole or in part,
11 are forbidden without prior written permission.
14 File: suif1.info, Node: Load Constant Instructions, Next: Call Instructions, Prev: Branch and Jump Instructions, Up: Instructions
16 Load Constant Instructions
17 ==========================
19 Rather than allowing constant values to be used directly as operands,
20 SUIF uses separate `ldc' instructions to load constant values. The
21 `in_ldc' class holds these instructions. Instead of the usual source
22 operands, this class has an immediate value field (*note Immeds::.).
23 The `value' and `set_value' methods may be used to access this field.
25 Only certain kinds of immediate values are supported in an `ldc'
28 Symbolic addresses (*Note Symbolic Addresses::)
29 The result type of the instruction must be a pointer type.
32 The result type must be an integer or pointer type. Pointer types
33 are allowed so that the null pointer can be loaded as the integer
37 The result type must be a floating-point type.
39 Other kinds of immediate values may be stored in the `value' field of
40 an `ldc' instruction, but most SUIF passes and certain library
41 functions will not be able to handle them.
44 File: suif1.info, Node: Call Instructions, Next: Array Instructions, Prev: Load Constant Instructions, Up: Instructions
49 SUIF uses a special `cal' instruction to represent procedure calls.
50 This high-level representation hides the details of various linkage
51 conventions. The `in_cal' class is used to represent these call
52 instructions. A call instruction contains a source operand to hold a
53 pointer to the procedure to be called. The `addr_op' and `set_addr_op'
54 methods access this operand field.
56 The actual parameters for the procedure are stored in an array of
57 operands. The `num_args' method returns the number of elements in this
58 array. The size of the array can be changed at any time using the
59 `set_num_args' method. If necessary, the array will be reallocated.
60 Elements of the argument array may be accessed using the `argument' and
61 `set_argument' methods. You must specify the array index. The first
62 argument is at index zero.
64 Call instructions must obey some conventions on the types of the
65 operands. The `addr' operand must hold a pointer to a function type
66 which is compatible with the type of the procedure being called. The
67 result type of the call instruction must match the return type of the
68 procedure. The restrictions on instruction result types (*note Result
69 Types::.) guarantee that the return type will either be void or have
70 known, non-zero size. If the function type specifies the number of
71 arguments, it must match the number of actual parameters (unless the
72 function takes a variable number of arguments). Moreover, each operand
73 in the argument array must be compatible with the type of the
74 corresponding formal parameter. Whether or not the function type
75 specifies the argument types, the restrictions on instruction result
76 types (*note Result Types::.) and variables (*note Variable Symbols::.)
77 guarantee that all arguments will have known, non-zero size.
80 File: suif1.info, Node: Array Instructions, Next: Multi-way Branch Instructions, Prev: Call Instructions, Up: Instructions
85 Because many SUIF passes focus on analyzing and optimizing Fortran
86 code, a high-level representation of array references is crucial. SUIF
87 provides `array' instructions which retain all of the high-level
88 information in combination with other fields needed to generate code for
89 the address computations. The `in_array' class is used to hold these
92 Array instructions include a number of fields. First, a pointer to
93 the base of the array is specified in an operand field that can be
94 accessed with the `base_op' and `set_base_op' methods. If the array
95 elements are structures, a constant offset within the selected element
96 may be included. This optional integer offset can be accessed using the
97 `offset' and `set_offset' methods. The element size is needed to
98 generate low-level code for the array address calculation. The
99 `elem_size' method returns the element size in bits. The
100 `set_elem_size' method may be used to change the element size.
102 Because Fortran arrays do not always begin with index zero, an
103 optional operand, which is referenced using the `offset_op' and
104 `set_offset_op' methods, is provided to specify an offset. Since there
105 is a single offset operand, the offsets for all of the dimensions must
106 be combined into a single value. The arrays are stored in row-major
107 form, so the offset for the first dimension is multiplied by the size
108 of the remaining dimensions, etc. If the offset operand is provided,
109 it must have an integer type.
111 Array instructions can treat arrays of arrays as multidimensional
112 arrays, even though the type system does not support that directly.
113 Each array instruction includes a field to specify the number of
114 dimensions in the array. This field may be accessed with the `dims'
115 and `set_dims' methods. The indexes for the array reference are stored
116 in an array of source operands, one for each dimension. These index
117 operands can be accessed using the `index' and `set_index' methods.
118 The dimensions are numbered beginning with zero. Similarly, the number
119 of elements in each dimension are stored in another array of source
120 operands, which can be accessed with the `bound' and `set_bound'
123 The result type of an array instruction must be a pointer. However,
124 it need not be a pointer to the element type. If the elements are
125 structures, the result type may be a pointer to one of the structure
126 fields. SUIF does not actually require that the result type match
127 anything within the array element type, although that is highly
128 recommended. The `elem_type' method can be used to determine the
129 actual type of the element being addressed.
131 The types of the array instruction operands must follow some
132 conventions. The index and bound operands must all have integer types.
133 The base operand must be a pointer to an array. If the array
134 instruction has multiple dimensions, the base must point to a nested
135 array (an array of arrays of arrays...) with the same depth as the
136 number of dimensions. For each dimension, if the bound operand is a
137 constant, it must match the number of elements specified in the
138 corresponding array type. (If the lower and upper bounds in the array
139 type are not both constant, then the bound operand may have any value.)
140 The bound operand for the first dimension is optional and may be null.
141 Finally, the element size must match the size of the elements in the
145 File: suif1.info, Node: Multi-way Branch Instructions, Next: Label Instructions, Prev: Array Instructions, Up: Instructions
147 Multi-way Branch Instructions
148 =============================
150 Fortran computed `goto' statements and C `switch' statements are
151 represented in SUIF by multi-way branch (`mbr') instructions. These
152 are easier to analyze than the equivalent series of conditional
153 branches, and they can easily be used to generate efficient jump table
154 code. The `in_mbr' class holds these multi-way branch instructions.
156 The `in_mbr' class contains a field with a pointer to an array of
157 label symbols. The `num_labs' method returns the number of labels in
158 the array. The size of the array can be changed at any time using the
159 `set_num_labs' method; if necessary the array will be reallocated. A
160 particular element within the array can be accessed using the `label'
161 and `set_label' methods. You must specify the array index, and, as
162 usual, the elements are numbered beginning with zero.
164 A multi-way branch instruction transfers control to one of the target
165 labels depending on the value in the source operand. This operand must
166 have an integer type. It can be accessed using the `src_op' and
167 `set_src' methods. The value of the source operand is combined with an
168 integer offset to determine the target label. The offset can be
169 accessed with the `lower' and `set_lower' methods. The offset is
170 subtracted from the value in the source operand and the result is used
171 to index into the array of target labels. If the index is within the
172 range of the array, the instruction branches to the label at that
173 position in the array; otherwise, it branches to the default target
174 label. The `default_lab' and `set_default_lab' methods access this
175 default label field. The destination operand of a multi-way branch is
176 unused and trying to set it will cause an error. The result type
177 should always be the SUIF `void' type.
180 File: suif1.info, Node: Label Instructions, Next: Generic Instructions, Prev: Multi-way Branch Instructions, Up: Instructions
185 SUIF uses special pseudo-instructions to mark the positions of labels
186 within the lists of instructions. These label (`lab') instructions are
187 represented by the `in_lab' class, which contains a single field
188 holding the symbol for a label. The `label' and `set_label' methods
191 No operation is performed by a label instruction. Its only purpose
192 is to mark the location of a label symbol in an instruction list. The
193 `label' field must be a pointer to the symbol for the label, which must
194 be defined within the scope where the label instruction occurs. The
195 destination operand is unused and trying to set it will cause an error.
196 The result type should always be the SUIF `void' type.
199 File: suif1.info, Node: Generic Instructions, Prev: Label Instructions, Up: Instructions
204 To help support special-purpose extensions to SUIF, we have provided
205 a generic class of instructions. This is implemented in the `in_gen'
206 class. These generic instructions contain arbitrarily large arrays of
207 source operands and a character string field to hold user-defined names
208 that function as "sub-opcodes". Generic instructions are not part of
209 standard SUIF and most SUIF passes will not handle them.
211 Because it is difficult to add new opcodes to SUIF at run-time, the
212 generic instructions all share the same `gen' opcode. Instead, they
213 are distinguished by user-defined names. The `name' and `set_name'
214 methods may be used to access these character string fields. The
215 `set_name' method automatically enters the name in the lexicon (*note
218 A generic instruction contains a pointer to an array of source
219 operands. The base class `num_srcs' method may be used to determine
220 the size of this array. The size may be changed at any time using the
221 `set_num_srcs' method. If necessary, the array will be reallocated.
222 The elements of the source operand array can be accessed using the
223 standard base class `src_op' and `set_src_op' methods. *Note Source
227 File: suif1.info, Node: Symbol Tables, Next: Annotations, Prev: Types, Up: Top
232 Symbol tables contain the definitions of the symbols and types used
233 within a SUIF program. Each symbol table is associated with an object
234 corresponding to a particular scope. For example, a procedure symbol
235 table is attached to the abstract syntax tree representing the body of
236 the procedure. The symbol tables can be reached through the
237 corresponding objects and vice versa.
239 This section describes the symbol table hierarchy and the details of
240 the symbol table operations, such as looking up symbol table entries and
241 adding new entries. It also explains how the symbol tables handle the
242 task of assigning unique ID numbers to the symbols and types. The
243 `symtab.h' and `symtab.cc' files contain the code for symbol tables.
247 * Symbol Table Hierarchy:: Different kinds of symbol tables.
248 * Basic Symtab Features:: Basic features common to all symbol tables.
249 * Lookup Methods:: Finding symbol table entries.
250 * Creating New Entries:: Creating new objects in a symbol table.
251 * Adding and Removing Entries:: Changing the symbol table contents.
252 * Numbering Types and Symbols:: Assigning ID numbers to types and symbols.
255 File: suif1.info, Node: Symbol Table Hierarchy, Next: Basic Symtab Features, Up: Symbol Tables
257 Symbol Table Hierarchy
258 ======================
260 The SUIF symbol tables are organized in a hierarchy of nested scopes
261 and maintained internally within a tree structure. Every table
262 contains a list of the symbol tables that are its children, and each
263 table also has a pointer back to its parent in the tree (except for the
264 global symbol table which does not have a parent). The `children'
265 method returns a pointer to the list of children and the `parent'
266 method gets the pointer to the parent symbol table. Thus, to search
267 through all of the enclosing scopes, one can follow the parent pointers
268 back to the global symbol table, visiting all of the symbol tables
269 along the way. The `is_ancestor' method provides an easy way to check
270 if a given symbol table is an ancestor (i.e. an enclosing scope) of the
273 Note that the symbol table hierarchy is not independent. The primary
274 objects in a SUIF program are the files and the abstract syntax trees
275 for the procedures. The symbol tables are always attached to these
276 primary objects and are generally treated as if they are parts of those
277 objects. For example, when a block of code is deleted the associated
278 symbol table is automatically removed from the hierarchy and deleted.
280 The `base_symtab' class is the base class from which the other
281 symbol table classes are derived, but it is an abstract class and cannot
282 be used directly. There are four different derived symbol tables
283 classes. They have much in common, but each is used at a different
284 level in the hierarchy and thus has slightly different features.
288 * Global Symbol Table:: Global scope (shared across files).
289 * File Symbol Tables:: File-level global scopes.
290 * Procedure Symbol Tables:: Top-level procedure scopes.
291 * Block Symbol Tables:: Nested scopes within procedures.
294 File: suif1.info, Node: Global Symbol Table, Next: File Symbol Tables, Up: Symbol Table Hierarchy
296 The Global Symbol Table
297 -----------------------
299 The global symbol table is at the top of the symbol table hierarchy
300 and corresponds to the outermost global scope. It contains objects
301 that are visible across source files (i.e. shared types and global
302 symbols with external linkage). For this reason, it is associated with
303 the `file_set' object. *Note File Set::.
305 The advantage of using a shared global symbol table appears when
306 performing interprocedural analyses and transformations. Without a
307 common symbol table, it can be quite a burden to deal with references to
308 symbols that are defined in some files but not in others. Even trying
309 to determine which symbols from different files correspond to the same
310 objects is difficult. In essence, each interprocedural pass would need
311 to do the work of a linker! The shared global symbol table avoids all
312 of these problems and makes interprocedural optimization relatively
315 Along with the benefits of the global symbol table come a few
316 difficulties. Sharing the global symbol table across files makes it
317 difficult to support separate compilation. Each file must contain a
318 copy of the global symbol table, and if these files are manipulated
319 individually, their copies of the global symbol table will not be
320 consistent. Thus, before a group of files can be combined in a SUIF
321 file set, their global symbol tables must be "linked" together using
322 the SUIF linker pass. Whether this is preferable to just combining all
323 of the source files into one big SUIF file is debatable.
325 The `global_symtab' class is used to represent the global symbol
326 table. It is also used as the base class for file symbol tables.
327 Because procedure symbols may only be entered in global and file symbol
328 tables, this class contains the methods to deal with them. The
329 `new_proc' method creates a new procedure symbol and enters it in the
330 table (*note Creating New Entries::.), and the `lookup_proc' method
331 searches for an existing procedure symbol (*note Lookup Methods::.).
332 The `number_globals' method in this class handles the task of assigning
333 ID numbers to the symbols and types in global and file symbol tables
334 (*note Numbering Types and Symbols::.).
337 File: suif1.info, Node: File Symbol Tables, Next: Procedure Symbol Tables, Prev: Global Symbol Table, Up: Symbol Table Hierarchy
342 A file symbol table corresponds to the global scope for a source
343 file. It contains procedure symbols and global variable symbols with
344 static linkage, as well as types that are only used within the file.
345 Each file symbol table is associated with a particular file set entry.
346 *Note File Set Entries::.
348 The `file_symtab' class is derived from the `global_symtab' class to
349 implement the file symbol tables. Besides the features that this class
350 inherits from its base class, it also contains a field to record the
351 file set entry with which it is associated. This field is set
352 automatically when the file symbol table is created by the file set
353 entry. The `fse' method retrieves the value of this field.
356 File: suif1.info, Node: Procedure Symbol Tables, Next: Block Symbol Tables, Prev: File Symbol Tables, Up: Symbol Table Hierarchy
358 Procedure Symbol Tables
359 -----------------------
361 Procedure symbol tables represent the top-level scopes within
362 procedures and are associated with the `tree_proc' objects at the roots
363 of the abstract syntax trees for the procedures. *Note Procedure
364 Nodes::. Because the procedure symbol tables provide a superset of the
365 block symbol table functions, they are implemented by deriving the
366 `proc_symtab' class from the `block_symtab' class. Thus, all of the
367 `block_symtab' methods can also be applied to `proc_symtab' objects.
369 Besides the inherited methods, the procedure symbol tables have some
370 added features. Each procedure symbol table contains a list of the
371 formal parameters for the procedure. The `params' method returns a
372 pointer to this list. The entries on this list must be pointers to
373 symbols for variables that are contained within the procedure symbol
374 table. (Formal parameters cannot be global variables or local variables
375 in inner scopes.) The symbols are listed in order. If the function
376 type for the procedure specifies the parameter types, they should match
377 the types of the variables on the parameter list.
379 The procedure symbol table also records the next instruction ID
380 number for the procedure (*note ID Numbers::.). The `number_locals'
381 method handles the task of assigning ID numbers to the symbols and
382 types in symbol tables within the procedure (*note Numbering Types and
386 File: suif1.info, Node: Block Symbol Tables, Prev: Procedure Symbol Tables, Up: Symbol Table Hierarchy
391 The `block_symtab' class is used for nested block symbol tables and
392 as the base class for procedure symbol tables. Each one is associated
393 with a particular `tree_block' (or `tree_proc') node in an abstract
394 syntax tree. *Note Block Nodes::. Each block symbol table contains a
395 pointer to the corresponding `tree_block' node. The `block' method
396 retrieves the value of this pointer. When a symbol table is connected
397 to a `tree_block', its `block' pointer is set automatically.
399 Since label symbols may not be declared in global scopes, the
400 `block_symtab' class is the natural place to provide methods for
401 working with labels. The `new_label' method creates a new label symbol
402 and enters it in the table (*note Creating New Entries::.). The
403 `new_unique_label' does the same thing but it first makes sure that the
404 label will have a unique name. The `lookup_label' method searches for
405 an existing label symbol (*note Lookup Methods::.).
407 Block symbol tables also provide a method to create a new child
408 symbol table, i.e. an inner scope. The `new_unique_child' method can be
409 used to create a new child block symtab with a unique name (*note
410 Creating New Entries::.). This method is not provided for global
411 symbol tables, because their children must correspond to procedures,
412 which already have unique names.
415 File: suif1.info, Node: Basic Symtab Features, Next: Lookup Methods, Prev: Symbol Table Hierarchy, Up: Symbol Tables
420 Symbol tables contain three different kinds of objects: types,
421 symbols, and variable definitions. The entries within a symbol table
422 may only be referenced within the corresponding scope. This includes
423 references within registered annotations. Violating this condition may
424 lead to strange and unexpected errors.
426 For simplicity, the symbol table entries are stored on lists instead
427 of using hash tables. In theory, the actual implementation (lists or
428 hash tables) should not be visible in the symbol table interface.
429 Unfortunately that is not completely true for the current implementation
430 of SUIF--the lists can be accessed directly. The `types', `symbols',
431 and `var_defs' methods return pointers to the lists. However, these
432 lists should only be accessed to examine the entries and should never
433 be modified directly. The symbol table classes provide other methods
434 to add and remove entries from the lists and those methods should
435 always be used. If the list implementation becomes a performance
436 bottleneck, we may need to switch to hash tables, and code that
437 modifies the lists directly will be relatively hard to convert.
439 To distinguish the symbol tables nested within a particular scope,
440 each table is given a name. The `name' and `set_name' methods retrieve
441 and modify this name. If a scope in the source program has a name
442 associated with it, that name may be used for the corresponding symbol
443 table. For example, the name of a procedure-level symbol table should
444 generally be the same as the name of the procedure. On the other hand,
445 nested scopes within procedures are typically unnamed, and names must
446 be generated for the corresponding symbol tables.
448 The symbol table names are used when printing a reference to a
449 symbol or named type. Because the symbol or type name alone may not be
450 sufficient to identify it uniquely, the `chain_name' method is used to
451 identify the symbol table. The chain name of a symbol table includes
452 the names of all of the symbol tables from the procedure-level downward,
453 separated by slashes (as in a Unix path). The file-level name is not
454 included since it should always be clear from the context. The chain
455 name for a global or file symbol table is the empty string.
457 Duplicate names within a symbol table should be avoided whenever
458 possible. Each kind of symbol has a separate name space. A variable,
459 for example, may have the same name as a label in the same symbol table.
460 Named types and child symbol table names are also in separate name
461 spaces. Duplicate names may be temporarily introduced but to avoid
462 problems they should be renamed as soon as possible. The
463 `rename_duplicates' method is provided to check for and rename any
464 duplicates in a symbol table. This method is automatically called
465 before writing out each symbol table.
468 File: suif1.info, Node: Lookup Methods, Next: Creating New Entries, Prev: Basic Symtab Features, Up: Symbol Tables
473 SUIF symbol tables provide a number of methods to search for and
474 retrieve particular types, symbols, and variable definitions. Most of
475 these lookup methods will optionally search all the ancestor symbol
476 tables, making it easy to determine if an object is defined in the
479 The `lookup_type' method is available at all levels in the symbol
480 table hierarchy to search for SUIF types. Given an existing type, the
481 method searches for a type that is the same. It uses the `is_same'
482 method from the `type_node' class to perform these comparisons. If a
483 matching type is not found within the current symbol table,
484 `lookup_type' will continue searching in the ancestor symbol tables by
485 default. However, if the optional `up' parameter is set to `FALSE', it
486 will give up after searching the first table.
488 Several methods are provided to lookup symbols. Each different kind
489 of symbol (variable, procedure, and label) has its own name space, so
490 the `lookup_sym' method requires that you specify both the name and the
491 kind of symbol for which to search. This method may be used with all
492 symbol tables. For convenience, other methods are defined as wrappers
493 around `lookup_sym'. Each of these wrappers searches for a particular
494 kind of symbol: `lookup_var' searches for variables, `lookup_proc'
495 searches for procedures, and `lookup_label' searches for labels.
496 Because procedure symbols may only be defined in global symbol tables,
497 the `lookup_proc' method is declared in the `global_symtab' class.
498 Similarly, the `lookup_label' method is declared in the `block_symtab'
499 class, because labels may only be defined within procedures. By
500 default, all of these methods search the current symbol table and, if
501 unsuccessful, proceed to search the ancestor symbol tables. The
502 optional `up' parameters may be set to `FALSE' to turn off this default
503 behavior and only search the current symbol table.
505 A symbol for a global variable is just a declaration of that variable
506 and does not automatically have any storage allocated. Variable
507 definitions are required to allocate storage and to specify alignment
508 requirements and any initial data for the variable. Since the variable
509 definitions are not directly connected to the variable symbols, the
510 `lookup_var_def' method is provided to search a symbol table for the
511 definition of a particular variable symbol. This method does not
512 search the parent symbol table. In general the `definition' method in
513 the `var_sym' class is a better way to locate a variable definition.
515 Symbols and types are assigned ID numbers (*note Numbering Types and
516 Symbols::.) that uniquely identify them within a particular context.
517 The `lookup_type_id' method searches the types defined within a symbol
518 table and its ancestors for a type with the specified ID number. The
519 `lookup_sym_id' does the same thing for symbols.
521 Besides searching for one of the entries in a symbol table, you can
522 also search for one of its children in the symbol table hierarchy. The
523 `lookup_child' method searches through the list of children for a
524 symbol table with a given name. This may not be very useful, but it is
525 included for completeness.
528 File: suif1.info, Node: Creating New Entries, Next: Adding and Removing Entries, Prev: Lookup Methods, Up: Symbol Tables
533 To make it easier to add new entries, the symbol tables provide
534 methods that combine the steps of creating new objects and then
535 entering them in the tables. Some of these methods automatically make
536 sure that the new entries have unique names and that is particularly
539 New variables can be added to tables anywhere in the symbol table
540 hierarchy. The `new_var' method creates a new variable with a given
541 name and type and then enters the new variable symbol in the table.
542 The `new_unique_var' method is similar, but it also checks that the
543 name of the new variable is unique. If not, it appends a number to the
544 specified name until it is unique. With this method, the base name is
545 optional; the default value is `suif_tmp'.
547 Procedure symbols can be created in global and file symbol tables
548 using the `new_proc' method. The name of the procedure, its type, and
549 the source language must be specified. There is currently no method to
550 automatically create a new procedure symbol with a unique name.
552 Because label symbols may only be declared within procedures, the
553 `new_label' and `new_unique_label' methods are provided in the
554 `block_symtab' class. The only parameter of these methods is the name
555 of the label. The name is optional for `new_unique_label'; its default
556 value is `L'. Just as with variables, unique label names are created
557 by adding a number to the end of the base names.
559 Within a procedure, new inner scopes may be created to be used with
560 new `tree_block' nodes. The `block_symtab' class provides the
561 `new_unique_child' method to create a new symbol table, give it a
562 unique name, and add it to the list of children. The unique name is
563 created by appending a number to the optional base name. If the base
564 name is not given, it defaults to `block'. This method is not needed
565 at the global level, because the child symbol tables there correspond
566 to procedures which should already have unique names.
568 Finally, new variable definitions can be added to any symbol table
569 using the `define_var' method. The parameters are the variable symbol
570 and the alignment for the storage to be defined. It returns a pointer
571 to the new variable definition object, so that you can attach initial
572 data annotations to it.
575 File: suif1.info, Node: Adding and Removing Entries, Next: Numbering Types and Symbols, Prev: Creating New Entries, Up: Symbol Tables
577 Adding and Removing Entries
578 ===========================
580 Entries in symbol tables should always be added and removed using the
581 methods provided by the symbol tables. Although it is possible to add
582 and remove entries by directly manipulating the lists, that should never
583 be done. The methods for adding and removing entries hide the
584 underlying representation and using them will make it much easier to
585 update your code if that representation changes. Even more importantly,
586 most symbol table entries contain back pointers to the tables which hold
587 them, and the adding and removing methods are responsible for
588 maintaining those pointers and for performing a few other automatic
591 Types, symbols, and child symbol tables may be added using the
592 `add_type', `add_sym', and `add_child' methods, respectively. Each of
593 these entries contains a pointer back to the parent symbol table, and
594 these methods automatically set those back pointers. They do not,
595 however, perform any other sanity checks, such as checking for
596 duplicate names. Similarly, the `remove_type', `remove_sym', and
597 `remove_child' methods remove types, symbols, and child symbol table
598 entries. These methods clear the parent pointers but do not delete the
599 entries that are removed.
601 Variable definitions are treated a bit differently from other kinds
602 of symbol table entries. They do not have parent pointers so the
603 `add_def' and `remove_def' methods do not have to deal with that.
604 However, adding and removing variable definitions change some
605 attributes of the corresponding variables, and those attributes must be
606 automatically updated. First, each variable has a flag to indicate
607 whether a variable definition exists for it. A variable cannot have
608 more than one definition, so the `add_def' method will fail if this
609 flag is already set. Otherwise, it sets the flag when the new
610 definition is added. Second, variable symbols also have a flag to
611 indicate whether they are actual definitions or just declarations of
612 symbols with external linkage. This `extern' flag must be set to
613 `FALSE' when a variable definition is added for a global variable.
614 When removing a variable definition, these flags must be reversed.
616 Unlike symbol nodes which always define separate symbols, multiple
617 type nodes can represent the same type. The basic `add_type' method
618 will add a new type even if an equivalent type was already defined in
619 the same scope. In most cases, what is actually needed is a method to
620 first check if an equivalent type exists and if so to throw away the
621 duplicate and return the existing type. The `install_type' method
622 provides this functionality. It first checks if a type has already been
623 entered in the symbol table or one of its ancestors using the
624 `lookup_type' method. If so, it deletes the new type and returns the
625 existing one. If a type is not found, it is entered into the symbol
626 table and returned. All of the components of a type are recursively
627 installed before the type itself. This makes it easy to create new
628 types without worrying about duplicate entries in the symbol tables.
631 File: suif1.info, Node: Numbering Types and Symbols, Prev: Adding and Removing Entries, Up: Symbol Tables
633 Numbering Types and Symbols
634 ===========================
636 Every symbol and type is assigned an ID number that uniquely
637 identifies it within a particular context. These ID numbers should be
638 used to refer to symbols and types in annotations that will be written
639 to the output files and in other situations where pointers to the
640 symbol and type nodes cannot be used. The `sym_id' method retrieves
641 the ID number for a symbol, and the `type_id' method gets the number
644 For symbols and types within a procedure, the ID numbers are only
645 unique within that procedure. Similarly, the ID numbers for symbols
646 and types in a file symbol table are only unique within that file.
647 Only in the global symbol table are the ID numbers truly unique. This
648 is implemented by dividing the ID numbers into three ranges. Each
649 range is reserved for a particular level in the symbol table hierarchy.
650 To make it easier to read an ID number, the `print_id_number' function
651 prints it as a character to identify the range (`g' for global, `f' for
652 file, `p' for procedure) combined with the offset of the number within
655 The symbol and type ID numbers cannot be assigned individually, but
656 the symbol tables provide methods to set them. The `number_globals'
657 method is defined in the `global_symtab' class to number the entries in
658 global and file symbol tables, and the `number_locals' method is
659 defined in the `proc_symtab' class to number all of the entries in the
660 procedure symbol table and its descendents. These methods only assign
661 ID numbers to symbols and types that do not already have numbers.
662 These methods are called automatically before writing things out to
663 files, but they can also be called whenever you want to assign numbers
664 to new symbols and types.
666 The `clear_sym_id' symbol method and `clear_type_id' method are
667 provided to reset the ID numbers to zero manually, but as far as the
668 library itself is concerned, this is never necessary. The library
669 automatically changes ID numbers when necessary, such as when moving
670 from one symbol table to another.
673 File: suif1.info, Node: Symbols, Next: Types, Prev: Instructions, Up: Top
678 SUIF symbols are stored in the symbol tables (*note Symbol
679 Tables::.) to represent variables, labels, and procedures. The
680 `sym_node' class is the base class for all SUIF symbols. This is an
681 abstract class so it cannot be used directly. The library also defines
682 the `sym_node_list' class for lists of pointers to symbols. Classes
683 are derived from the `sym_node' class for each kind of symbol. Given
684 an arbitrary symbol, the `kind' method identifies the kind of symbol
685 and thus the derived class to which it belongs. This method returns a
686 value from the `sym_kinds' enumerated type. The following values are
687 defined in that enumeration:
690 Variable symbol. The `var_sym' class represents variable symbols.
693 Label symbol. The `label_sym' class represents label symbols.
696 Procedure symbol. The `proc_sym' class represents procedure
699 All symbols share some common fields including the symbol names.
700 These are described in the first section below. Each kind of symbol
701 also uses additional fields that are specific to that kind. For
702 example, variable symbols specify the types of the variables. The
703 subsequent sections describe the specific features of each kind of
706 The `symbols.h' and `symbols.cc' files contain the source code for
711 * Symbol Features:: Basic features of all symbols.
712 * Procedure Symbols:: Procedures.
713 * Label Symbols:: Labels.
714 * Variable Symbols:: Variables.
717 File: suif1.info, Node: Symbol Features, Next: Procedure Symbols, Up: Symbols
719 Basic Features of Symbols
720 =========================
722 The `sym_node' class defines several fields that are used by all
723 kinds of symbols. The most obvious of these is the symbol name. Each
724 symbol has a name that should be unique within the symbol table where it
725 is defined. The `name' and `set_name' methods access this field. The
726 names are automatically entered in the lexicon (*note Lexicon::.) by
727 `set_name'. Because the name of a symbol alone is generally
728 insufficient to uniquely identify it, the symbols are also given ID
729 numbers. *Note Numbering Types and Symbols::.
731 When a symbol is entered in a symbol table, it automatically records
732 a pointer to that parent table. Similarly, when the symbol is removed
733 from the symbol table, its parent pointer is cleared. The `parent'
734 method retrieves this parent pointer.
736 All symbols contain flags to specify various attributes. The
737 `sym_node' class provides methods to access these flags. The
738 `is_userdef' method tests a flag to see if it is a user-defined symbol
739 (from the source code) or a new symbol introduced by the compiler. The
740 `set_userdef' and `reset_userdef' methods change the value of this flag.
742 Another flag is used to mark symbols that are only declarations of
743 external symbols, rather than actual definitions. This flag is set
744 automatically. The `is_extern' method retrieves its value. Label
745 symbols are never `extern'. A procedure symbol is `extern' unless the
746 procedure body is defined in the input file(s). A global variable
747 symbol is `extern' unless it has a separate definition (*note Variable
748 Definitions::.); no other variables are `extern'.
750 Since symbols may be treated differently depending on their scopes,
751 the `sym_node' class includes methods to determine which kind of symbol
752 table contains a symbol. The `is_global' method checks if the parent
753 table is a global or file symbol table. This is really only useful for
754 variable symbols, because procedures are always global and labels are
755 never global. The `is_private' method checks if a symbol is global but
756 private to one source file by checking if the parent symbol table is a
757 file symbol table. This is obviously irrelevant for label symbols.
759 The `add_to_table' and `remove_from_table' methods are provided for
760 convenience when adding or removing symbols from symbol tables. In the
761 case of variable symbols, the entire hierarchy of sub-variables (*note
762 Sub-Variables::.) is added or removed at one time by this method.
764 The `copy' method makes a copy of a symbol. This is a virtual
765 function so it copies the fields that are specific to each kind of
766 symbol. However, it only copies the symbol itself: copying a procedure
767 symbol will not copy the procedure body and copying a variable symbol
768 will not copy the variable definition. The `copy' method does not copy
769 annotations on the symbol, either. Since the copy will have the same
770 name as the original symbol, it should generally be renamed or used in
771 a different symbol table.
773 Two different methods are available for printing symbols. The
774 `print' method just prints the name of the symbol. Label symbols are
775 prefixed by `L:' and procedure symbols by `P:' to distinguish them from
776 variable symbols. The `print_full' method is used by the library when
777 listing the contents of symbol tables. It includes all the fields from
781 File: suif1.info, Node: Procedure Symbols, Next: Label Symbols, Prev: Symbol Features, Up: Symbols
786 Procedure symbols are represented by objects of the `proc_sym'
787 class. SUIF does not support nested procedures, so these symbols may
788 only be entered in global and file symbol tables. The fields in a
789 procedure symbol hold information about the procedure, including a
790 pointer to the body if it is in memory. The `proc_sym' class also
791 provides methods to read procedure bodies from input files, write them
792 to the output files, and flush them from memory.
794 Each procedure symbol contains a field to record the source language
795 for the procedure. The `src_lang' and `set_src_lang' methods access
796 this field, which holds a value from `src_lang_type' enumeration:
797 `src_unknown', `src_c', `src_fortran', or `src_verilog'. Other values
798 may be added in the future.
800 A procedure symbol also has a field that specifies the type of the
801 procedure. The `type' and `set_type' methods retrieve and change this
802 field. The type must be a function type. *Note Function Types::.
804 The body of a procedure is represented by its abstract syntax tree.
805 *Note Trees::. The procedure symbol contains a pointer to the root node
806 of this tree. The `block' and `set_block' methods access this pointer.
807 If the body is not in memory, the `block' pointer will be `NULL'; the
808 `is_in_memory' method is provided to check this condition.
810 The `proc_sym' class contains the methods to read procedure bodies
811 from binary SUIF files and to write them out again. The details of SUIF
812 I/O are thus hidden from users; only entire procedures can be read and
813 written. If one of the input files contains the body for a procedure, a
814 pointer to the file set entry (*note File Set Entries::.) is recorded in
815 the procedure symbol. The `file' method retrieves this pointer for a
816 particular `proc_sym'. The same procedure can be read in and flushed
817 from memory many times, but once it has been written out it can no
818 longer be read or written again. The procedure symbol contains a flag
819 to indicate if it has been written out yet. The `is_written' method
820 returns the value of this flag. The `is_readable' method checks if the
821 procedure body exists in one of the input files and if it has not yet
822 been written out. If this method returns `TRUE', the `read_proc'
823 method can be used to read the body of the procedure. By default,
824 `read_proc' also converts the procedure to expression tree form (*note
825 Expression Trees::.) but it does not convert to Fortran form (*note
826 Fortran::.). The `exp_trees' and `use_fortran_form' parameters to
827 `read_proc' can be used to override these defaults.
829 After a procedure body has been read in and possibly modified, it
830 can be written to the output file using the procedure symbol's
831 `write_proc' method. You must specify the file set entry to which the
832 procedure should be written. In most cases, the input and output file
833 set entry will be the same, and you will just use the `file' method to
834 determine the output file set entry. As mentioned above, once a
835 procedure has been written out it cannot be rewritten or read in again.
836 Obviously, it should not be changed after that point because the
837 changes could not be saved. Besides avoiding changes directly to the
838 procedure, however, you must also be careful to avoid certain changes to
839 the global symbol tables. The symbols and types within the procedure
840 are written out using their ID numbers. *Note Numbering Types and
841 Symbols::. Thus, you must not do anything to the global symbol tables
842 that would cause the ID numbers for those symbols and types to change.
843 For example, moving a symbol from a file symbol table to the global
844 symbol table would require that its ID number change. The best solution
845 to this is to not write out the procedures until you are certain that
846 such changes to the symbol tables will not be needed.
848 When a procedure body is no longer needed, typically after it has
849 been written out, call the `flush_proc' method for the procedure symbol
850 to deallocate the storage used by the procedure. In some cases, you may
851 want to flush the procedure before it is written. For example,
852 interprocedural analysis requires that all procedures be read in and
853 analyzed together. To save space, the procedures can be summarized for
854 the purpose of the particular analysis and then flushed. After the
855 analysis is complete, they can be re-read and the results can be
856 attached to the code.
859 File: suif1.info, Node: Label Symbols, Next: Variable Symbols, Prev: Procedure Symbols, Up: Symbols
864 The `label_sym' class represents label symbols. Labels are used
865 within procedures to specify targets of branch and jump instructions.
866 They may not be entered in global or file symbol tables. The position
867 of a label is usually indicated by a label instruction (*note Label
868 Instructions::.), but for labels associated with high-level AST nodes,
869 the label positions may be implicit. The `label_sym' class contains no
870 extra fields beyond those in the base `sym_node' class.
873 File: suif1.info, Node: Variable Symbols, Prev: Label Symbols, Up: Symbols
878 In SUIF, variable symbols represent data objects. Variable symbols
879 are represented by objects of the `var_sym' class. This class adds a
880 field to specify the type of the variable as well as some additional
881 flags. Unlike procedures and labels, variables may be defined in any
884 SUIF provides optional "sub-variables" to make it easier to deal
885 with pieces of aggregate objects that may or may not overlap, in
886 particular Fortran equivalences and reshaped common blocks. Instead of
887 referring to a piece of an aggregate by an offset combined with the
888 aggregate symbol, a sub-variable can be created to represent the data at
889 a particular offset within the aggregate, so that it can be referenced
890 in the same way as if it were not contained within an aggregate
895 * Variable Features:: Basic features of variables.
896 * Sub-Variables:: Variables contained within aggregates.
897 * Variable Definitions:: Definitions of global and static variables.