gcc/cp/gxxint.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename g++int.info
   4 @settitle G++ internals
   5 @setchapternewpage odd
   6 @c %**end of header
   7
   8 @node Top, Limitations of g++, (dir), (dir)
   9 @chapter Internal Architecture of the Compiler
  10
  11 This is meant to describe the C++ front-end for gcc in detail.
  12 Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}.
  13
  14 @menu
  15 * Limitations of g++::
  16 * Routines::
  17 * Implementation Specifics::
  18 * Glossary::
  19 * Macros::
  20 * Typical Behavior::
  21 * Coding Conventions::
  22 * Templates::
  23 * Access Control::
  24 * Error Reporting::
  25 * Parser::
  26 * Exception Handling::
  27 * Free Store::
  28 * Mangling::  Function name mangling for C++ and Java
  29 * Concept Index::
  30 @end menu
  31
  32 @node Limitations of g++, Routines, Top, Top
  33 @section Limitations of g++
  34
  35 @itemize @bullet
  36 @item
  37 Limitations on input source code: 240 nesting levels with the parser
  38 stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
  39 16.4k swap space per nesting level.  The parser needs about 2.09 *
  40 number of nesting levels worth of stackspace.
  41
  42 @cindex pushdecl_class_level
  43 @item
  44 I suspect there are other uses of pushdecl_class_level that do not call
  45 set_identifier_type_value in tandem with the call to
  46 pushdecl_class_level.  It would seem to be an omission.
  47
  48 @cindex access checking
  49 @item
  50 Access checking is unimplemented for nested types.
  51
  52 @cindex @code{volatile}
  53 @item
  54 @code{volatile} is not implemented in general.
  55
  56 @end itemize
  57
  58 @node Routines, Implementation Specifics, Limitations of g++, Top
  59 @section Routines
  60
  61 This section describes some of the routines used in the C++ front-end.
  62
  63 @code{build_vtable} and @code{prepare_fresh_vtable} is used only within
  64 the @file{cp-class.c} file, and only in @code{finish_struct} and
  65 @code{modify_vtable_entries}.
  66
  67 @code{build_vtable}, @code{prepare_fresh_vtable}, and
  68 @code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
  69
  70 @code{finish_struct} can steal the virtual function table from parents,
  71 this prohibits related_vslot from working.  When finish_struct steals,
  72 we know that
  73
  74 @example
  75 get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
  76 @end example
  77
  78 @noindent
  79 will get the related binfo.
  80
  81 @code{layout_basetypes} does something with the VIRTUALS.
  82
  83 Supposedly (according to Tiemann) most of the breadth first searching
  84 done, like in @code{get_base_distance} and in @code{get_binfo} was not
  85 because of any design decision.  I have since found out the at least one
  86 part of the compiler needs the notion of depth first binfo searching, I
  87 am going to try and convert the whole thing, it should just work.  The
  88 term left-most refers to the depth first left-most node.  It uses
  89 @code{MAIN_VARIANT == type} as the condition to get left-most, because
  90 the things that have @code{BINFO_OFFSET}s of zero are shared and will
  91 have themselves as their own @code{MAIN_VARIANT}s.  The non-shared right
  92 ones, are copies of the left-most one, hence if it is its own
  93 @code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
  94 a non-left-most one.
  95
  96 @code{get_base_distance}'s path and distance matters in its use in:
  97
  98 @itemize @bullet
  99 @item
 100 @code{prepare_fresh_vtable} (the code is probably wrong)
 101 @item
 102 @code{init_vfields} Depends upon distance probably in a safe way,
 103 build_offset_ref might use partial paths to do further lookups,
 104 hack_identifier is probably not properly checking access.
 105
 106 @item
 107 @code{get_first_matching_virtual} probably should check for
 108 @code{get_base_distance} returning -2.
 109
 110 @item
 111 @code{resolve_offset_ref} should be called in a more deterministic
 112 manner.  Right now, it is called in some random contexts, like for
 113 arguments at @code{build_method_call} time, @code{default_conversion}
 114 time, @code{convert_arguments} time, @code{build_unary_op} time,
 115 @code{build_c_cast} time, @code{build_modify_expr} time,
 116 @code{convert_for_assignment} time, and
 117 @code{convert_for_initialization} time.
 118
 119 But, there are still more contexts it needs to be called in, one was the
 120 ever simple:
 121
 122 @example
 123 if (obj.*pmi != 7)
 124    @dots{}
 125 @end example
 126
 127 Seems that the problems were due to the fact that @code{TREE_TYPE} of
 128 the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
 129 of the referent (like @code{INTEGER_TYPE}).  This problem was fixed by
 130 changing @code{default_conversion} to check @code{TREE_CODE (x)},
 131 instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
 132 was @code{OFFSET_TYPE}.
 133
 134 @end itemize
 135
 136 @node Implementation Specifics, Glossary, Routines, Top
 137 @section Implementation Specifics
 138
 139 @itemize @bullet
 140 @item Explicit Initialization
 141
 142 The global list @code{current_member_init_list} contains the list of
 143 mem-initializers specified in a constructor declaration.  For example:
 144
 145 @example
 146 foo::foo() : a(1), b(2) @{@}
 147 @end example
 148
 149 @noindent
 150 will initialize @samp{a} with 1 and @samp{b} with 2.
 151 @code{expand_member_init} places each initialization (a with 1) on the
 152 global list.  Then, when the fndecl is being processed,
 153 @code{emit_base_init} runs down the list, initializing them.  It used to
 154 be the case that g++ first ran down @code{current_member_init_list},
 155 then ran down the list of members initializing the ones that weren't
 156 explicitly initialized.  Things were rewritten to perform the
 157 initializations in order of declaration in the class.  So, for the above
 158 example, @samp{a} and @samp{b} will be initialized in the order that
 159 they were declared:
 160
 161 @example
 162 class foo @{ public: int b; int a; foo (); @};
 163 @end example
 164
 165 @noindent
 166 Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
 167 initialized with 1, regardless of how they're listed in the mem-initializer.
 168
 169 @item The Explicit Keyword
 170
 171 The use of @code{explicit} on a constructor is used by @code{grokdeclarator}
 172 to set the field @code{DECL_NONCONVERTING_P}.  That value is used by
 173 @code{build_method_call} and @code{build_user_type_conversion_1} to decide
 174 if a particular constructor should be used as a candidate for conversions.
 175
 176 @end itemize
 177
 178 @node Glossary, Macros, Implementation Specifics, Top
 179 @section Glossary
 180
 181 @table @r
 182 @item binfo
 183 The main data structure in the compiler used to represent the
 184 inheritance relationships between classes.  The data in the binfo can be
 185 accessed by the BINFO_ accessor macros.
 186
 187 @item vtable
 188 @itemx virtual function table
 189
 190 The virtual function table holds information used in virtual function
 191 dispatching.  In the compiler, they are usually referred to as vtables,
 192 or vtbls.  The first index is not used in the normal way, I believe it
 193 is probably used for the virtual destructor.
 194
 195 @item vfield
 196
 197 vfields can be thought of as the base information needed to build
 198 vtables.  For every vtable that exists for a class, there is a vfield.
 199 See also vtable and virtual function table pointer.  When a type is used
 200 as a base class to another type, the virtual function table for the
 201 derived class can be based upon the vtable for the base class, just
 202 extended to include the additional virtual methods declared in the
 203 derived class.  The virtual function table from a virtual base class is
 204 never reused in a derived class.  @code{is_normal} depends upon this.
 205
 206 @item virtual function table pointer
 207
 208 These are @code{FIELD_DECL}s that are pointer types that point to
 209 vtables.  See also vtable and vfield.
 210 @end table
 211
 212 @node Macros, Typical Behavior, Glossary, Top
 213 @section Macros
 214
 215 This section describes some of the macros used on trees.  The list
 216 should be alphabetical.  Eventually all macros should be documented
 217 here.
 218
 219 @table @code
 220 @item BINFO_BASETYPES
 221 A vector of additional binfos for the types inherited by this basetype.
 222 The binfos are fully unshared (except for virtual bases, in which
 223 case the binfo structure is shared).
 224
 225    If this basetype describes type D as inherited in C,
 226    and if the basetypes of D are E anf F,
 227    then this vector contains binfos for inheritance of E and F by C.
 228
 229 Has values of:
 230
 231         TREE_VECs
 232
 233
 234 @item BINFO_INHERITANCE_CHAIN
 235 Temporarily used to represent specific inheritances.  It usually points
 236 to the binfo associated with the lesser derived type, but it can be
 237 reversed by reverse_path.  For example:
 238
 239 @example
 240         Z ZbY   least derived
 241         |
 242         Y YbX
 243         |
 244         X Xb    most derived
 245
 246 TYPE_BINFO (X) == Xb
 247 BINFO_INHERITANCE_CHAIN (Xb) == YbX
 248 BINFO_INHERITANCE_CHAIN (Yb) == ZbY
 249 BINFO_INHERITANCE_CHAIN (Zb) == 0
 250 @end example
 251
 252 Not sure is the above is really true, get_base_distance has is point
 253 towards the most derived type, opposite from above.
 254
 255 Set by build_vbase_path, recursive_bounded_basetype_p,
 256 get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
 257
 258 What things can this be used on:
 259
 260         TREE_VECs that are binfos
 261
 262
 263 @item BINFO_OFFSET
 264 The offset where this basetype appears in its containing type.
 265 BINFO_OFFSET slot holds the offset (in bytes) from the base of the
 266 complete object to the base of the part of the object that is allocated
 267 on behalf of this `type'.  This is always 0 except when there is
 268 multiple inheritance.
 269
 270 Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
 271
 272
 273 @item BINFO_VIRTUALS
 274 A unique list of functions for the virtual function table.  See also
 275 TYPE_BINFO_VIRTUALS.
 276
 277 What things can this be used on:
 278
 279         TREE_VECs that are binfos
 280
 281
 282 @item BINFO_VTABLE
 283 Used to find the VAR_DECL that is the virtual function table associated
 284 with this binfo.  See also TYPE_BINFO_VTABLE.  To get the virtual
 285 function table pointer, see CLASSTYPE_VFIELD.
 286
 287 What things can this be used on:
 288
 289         TREE_VECs that are binfos
 290
 291 Has values of:
 292
 293         VAR_DECLs that are virtual function tables
 294
 295
 296 @item BLOCK_SUPERCONTEXT
 297 In the outermost scope of each function, it points to the FUNCTION_DECL
 298 node.  It aids in better DWARF support of inline functions.
 299
 300
 301 @item CLASSTYPE_TAGS
 302 CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
 303 class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
 304 these and calls pushtag on them.)
 305
 306 finish_struct scans these to produce TYPE_DECLs to add to the
 307 TYPE_FIELDS of the type.
 308
 309 It is expected that name found in the TREE_PURPOSE slot is unique,
 310 resolve_scope_to_name is one such place that depends upon this
 311 uniqueness.
 312
 313
 314 @item CLASSTYPE_METHOD_VEC
 315 The following is true after finish_struct has been called (on the
 316 class?) but not before.  Before finish_struct is called, things are
 317 different to some extent.  Contains a TREE_VEC of methods of the class.
 318 The TREE_VEC_LENGTH is the number of differently named methods plus one
 319 for the 0th entry.  The 0th entry is always allocated, and reserved for
 320 ctors and dtors.  If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
 321 Each entry of the TREE_VEC is a FUNCTION_DECL.  For each FUNCTION_DECL,
 322 there is a DECL_CHAIN slot.  If the FUNCTION_DECL is the last one with a
 323 given name, the DECL_CHAIN slot is NULL_TREE.  Otherwise it is the next
 324 method that has the same name (but a different signature).  It would
 325 seem that it is not true that because the DECL_CHAIN slot is used in
 326 this way, we cannot call pushdecl to put the method in the global scope
 327 (cause that would overwrite the TREE_CHAIN slot), because they use
 328 different _CHAINs.  finish_struct_methods setups up one version of the
 329 TREE_CHAIN slots on the FUNCTION_DECLs.
 330
 331 friends are kept in TREE_LISTs, so that there's no need to use their
 332 TREE_CHAIN slot for anything.
 333
 334 Has values of:
 335
 336         TREE_VECs
 337
 338
 339 @item CLASSTYPE_VFIELD
 340 Seems to be in the process of being renamed TYPE_VFIELD.  Use on types
 341 to get the main virtual function table pointer.  To get the virtual
 342 function table use BINFO_VTABLE (TYPE_BINFO ()).
 343
 344 Has values of:
 345
 346         FIELD_DECLs that are virtual function table pointers
 347
 348 What things can this be used on:
 349
 350         RECORD_TYPEs
 351
 352
 353 @item DECL_CLASS_CONTEXT
 354 Identifies the context that the _DECL was found in.  For virtual function
 355 tables, it points to the type associated with the virtual function
 356 table.  See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
 357
 358 The difference between this and DECL_CONTEXT, is that for virtuals
 359 functions like:
 360
 361 @example
 362 struct A
 363 @{
 364   virtual int f ();
 365 @};
 366
 367 struct B : A
 368 @{
 369   int f ();
 370 @};
 371
 372 DECL_CONTEXT (A::f) == A
 373 DECL_CLASS_CONTEXT (A::f) == A
 374
 375 DECL_CONTEXT (B::f) == A
 376 DECL_CLASS_CONTEXT (B::f) == B
 377 @end example
 378
 379 Has values of:
 380
 381         RECORD_TYPEs, or UNION_TYPEs
 382
 383 What things can this be used on:
 384
 385         TYPE_DECLs, _DECLs
 386
 387
 388 @item DECL_CONTEXT
 389 Identifies the context that the _DECL was found in.  Can be used on
 390 virtual function tables to find the type associated with the virtual
 391 function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
 392 better access method.  Internally the same as DECL_FIELD_CONTEXT, so
 393 don't us both.  See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
 394 DECL_CLASS_CONTEXT.
 395
 396 Has values of:
 397
 398         RECORD_TYPEs
 399
 400
 401 What things can this be used on:
 402
 403 @display
 404 VAR_DECLs that are virtual function tables
 405 _DECLs
 406 @end display
 407
 408
 409 @item DECL_FIELD_CONTEXT
 410 Identifies the context that the FIELD_DECL was found in.  Internally the
 411 same as DECL_CONTEXT, so don't us both.  See also DECL_CONTEXT,
 412 DECL_FCONTEXT and DECL_CLASS_CONTEXT.
 413
 414 Has values of:
 415
 416         RECORD_TYPEs
 417
 418 What things can this be used on:
 419
 420 @display
 421 FIELD_DECLs that are virtual function pointers
 422 FIELD_DECLs
 423 @end display
 424
 425
 426 @item DECL_NAME
 427
 428 Has values of:
 429
 430 @display
 431 0 for things that don't have names
 432 IDENTIFIER_NODEs for TYPE_DECLs
 433 @end display
 434
 435 @item DECL_IGNORED_P
 436 A bit that can be set to inform the debug information output routines in
 437 the back-end that a certain _DECL node should be totally ignored.
 438
 439 Used in cases where it is known that the debugging information will be
 440 output in another file, or where a sub-type is known not to be needed
 441 because the enclosing type is not needed.
 442
 443 A compiler constructed virtual destructor in derived classes that do not
 444 define an explicit destructor that was defined explicit in a base class
 445 has this bit set as well.  Also used on __FUNCTION__ and
 446 __PRETTY_FUNCTION__ to mark they are ``compiler generated.''  c-decl and
 447 c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
 448 and ``user-invisible variable.''
 449
 450 Functions built by the C++ front-end such as default destructors,
 451 virtual destructors and default constructors want to be marked that
 452 they are compiler generated, but unsure why.
 453
 454 Currently, it is used in an absolute way in the C++ front-end, as an
 455 optimization, to tell the debug information output routines to not
 456 generate debugging information that will be output by another separately
 457 compiled file.
 458
 459
 460 @item DECL_VIRTUAL_P
 461 A flag used on FIELD_DECLs and VAR_DECLs.  (Documentation in tree.h is
 462 wrong.)  Used in VAR_DECLs to indicate that the variable is a vtable.
 463 It is also used in FIELD_DECLs for vtable pointers.
 464
 465 What things can this be used on:
 466
 467         FIELD_DECLs and VAR_DECLs
 468
 469
 470 @item DECL_VPARENT
 471 Used to point to the parent type of the vtable if there is one, else it
 472 is just the type associated with the vtable.  Because of the sharing of
 473 virtual function tables that goes on, this slot is not very useful, and
 474 is in fact, not used in the compiler at all.  It can be removed.
 475
 476 What things can this be used on:
 477
 478         VAR_DECLs that are virtual function tables
 479
 480 Has values of:
 481
 482         RECORD_TYPEs maybe UNION_TYPEs
 483
 484
 485 @item DECL_FCONTEXT
 486 Used to find the first baseclass in which this FIELD_DECL is defined.
 487 See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
 488
 489 How it is used:
 490
 491         Used when writing out debugging information about vfield and
 492         vbase decls.
 493
 494 What things can this be used on:
 495
 496         FIELD_DECLs that are virtual function pointers
 497         FIELD_DECLs
 498
 499
 500 @item DECL_REFERENCE_SLOT
 501 Used to hold the initialize for the reference.
 502
 503 What things can this be used on:
 504
 505         PARM_DECLs and VAR_DECLs that have a reference type
 506
 507
 508 @item DECL_VINDEX
 509 Used for FUNCTION_DECLs in two different ways.  Before the structure
 510 containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
 511 FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
 512 FUNCTION_DECL will replace as a virtual function.  When the class is
 513 laid out, this pointer is changed to an INTEGER_CST node which is
 514 suitable to find an index into the virtual function table.  See
 515 get_vtable_entry as to how one can find the right index into the virtual
 516 function table.  The first index 0, of a virtual function table it not
 517 used in the normal way, so the first real index is 1.
 518
 519 DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
 520 overridden FUNCTION_DECLs.  add_virtual_function has code to deal with
 521 this when it uses the variable base_fndecl_list, but it would seem that
 522 somehow, it is possible for the TREE_LIST to pursist until method_call,
 523 and it should not.
 524
 525
 526 What things can this be used on:
 527
 528         FUNCTION_DECLs
 529
 530
 531 @item DECL_SOURCE_FILE
 532 Identifies what source file a particular declaration was found in.
 533
 534 Has values of:
 535
 536         "<built-in>" on TYPE_DECLs to mean the typedef is built in
 537
 538
 539 @item DECL_SOURCE_LINE
 540 Identifies what source line number in the source file the declaration
 541 was found at.
 542
 543 Has values of:
 544
 545 @display
 546 0 for an undefined label
 547
 548 0 for TYPE_DECLs that are internally generated
 549
 550 0 for FUNCTION_DECLs for functions generated by the compiler
 551         (not yet, but should be)
 552
 553 0 for ``magic'' arguments to functions, that the user has no
 554         control over
 555 @end display
 556
 557
 558 @item TREE_USED
 559
 560 Has values of:
 561
 562         0 for unused labels
 563
 564
 565 @item TREE_ADDRESSABLE
 566 A flag that is set for any type that has a constructor.
 567
 568
 569 @item TREE_COMPLEXITY
 570 They seem a kludge way to track recursion, poping, and pushing.  They only
 571 appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
 572 proper fixing, and removal.
 573
 574
 575 @item TREE_HAS_CONSTRUCTOR
 576 A flag to indicate when a CALL_EXPR represents a call to a constructor.
 577 If set, we know that the type of the object, is the complete type of the
 578 object, and that the value returned is nonnull.  When used in this
 579 fashion, it is an optimization.  Can also be used on SAVE_EXPRs to
 580 indicate when they are of fixed type and nonnull.  Can also be used on
 581 INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor.
 582
 583
 584 @item TREE_PRIVATE
 585 Set for FIELD_DECLs by finish_struct.  But not uniformly set.
 586
 587 The following routines do something with PRIVATE access:
 588 build_method_call, alter_access, finish_struct_methods,
 589 finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
 590 CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
 591 GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
 592
 593
 594 @item TREE_PROTECTED
 595 The following routines do something with PROTECTED access:
 596 build_method_call, alter_access, finish_struct, convert_to_aggr,
 597 CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
 598 compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
 599 dbxout_type_method_1
 600
 601
 602 @item TYPE_BINFO
 603 Used to get the binfo for the type.
 604
 605 Has values of:
 606
 607         TREE_VECs that are binfos
 608
 609 What things can this be used on:
 610
 611         RECORD_TYPEs
 612
 613
 614 @item TYPE_BINFO_BASETYPES
 615 See also BINFO_BASETYPES.
 616
 617 @item TYPE_BINFO_VIRTUALS
 618 A unique list of functions for the virtual function table.  See also
 619 BINFO_VIRTUALS.
 620
 621 What things can this be used on:
 622
 623         RECORD_TYPEs
 624
 625
 626 @item TYPE_BINFO_VTABLE
 627 Points to the virtual function table associated with the given type.
 628 See also BINFO_VTABLE.
 629
 630 What things can this be used on:
 631
 632         RECORD_TYPEs
 633
 634 Has values of:
 635
 636         VAR_DECLs that are virtual function tables
 637
 638
 639 @item TYPE_NAME
 640 Names the type.
 641
 642 Has values of:
 643
 644 @display
 645 0 for things that don't have names.
 646 should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
 647         ENUM_TYPEs.
 648 TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
 649         shouldn't be.
 650 TYPE_DECL for typedefs, unsure why.
 651 @end display
 652
 653 What things can one use this on:
 654
 655 @display
 656 TYPE_DECLs
 657 RECORD_TYPEs
 658 UNION_TYPEs
 659 ENUM_TYPEs
 660 @end display
 661
 662 History:
 663
 664         It currently points to the TYPE_DECL for RECORD_TYPEs,
 665         UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
 666
 667
 668 @item TYPE_METHODS
 669 Synonym for @code{CLASSTYPE_METHOD_VEC}.  Chained together with
 670 @code{TREE_CHAIN}.  @file{dbxout.c} uses this to get at the methods of a
 671 class.
 672
 673
 674 @item TYPE_DECL
 675 Used to represent typedefs, and used to represent bindings layers.
 676
 677 Components:
 678
 679         DECL_NAME is the name of the typedef.  For example, foo would
 680         be found in the DECL_NAME slot when @code{typedef int foo;} is
 681         seen.
 682
 683         DECL_SOURCE_LINE identifies what source line number in the
 684         source file the declaration was found at.  A value of 0
 685         indicates that this TYPE_DECL is just an internal binding layer
 686         marker, and does not correspond to a user supplied typedef.
 687
 688         DECL_SOURCE_FILE
 689
 690 @item TYPE_FIELDS
 691 A linked list (via @code{TREE_CHAIN}) of member types of a class.  The
 692 list can contain @code{TYPE_DECL}s, but there can also be other things
 693 in the list apparently.  See also @code{CLASSTYPE_TAGS}.
 694
 695
 696 @item TYPE_VIRTUAL_P
 697 A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
 698 a virtual function table or a pointer to one.  When used on a
 699 @code{FUNCTION_DECL}, indicates that it is a virtual function.  When
 700 used on an @code{IDENTIFIER_NODE}, indicates that a function with this
 701 same name exists and has been declared virtual.
 702
 703 When used on types, it indicates that the type has virtual functions, or
 704 is derived from one that does.
 705
 706 Not sure if the above about virtual function tables is still true.  See
 707 also info on @code{DECL_VIRTUAL_P}.
 708
 709 What things can this be used on:
 710
 711         FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
 712
 713
 714 @item VF_BASETYPE_VALUE
 715 Get the associated type from the binfo that caused the given vfield to
 716 exist.  This is the least derived class (the most parent class) that
 717 needed a virtual function table.  It is probably the case that all uses
 718 of this field are misguided, but they need to be examined on a
 719 case-by-case basis.  See history for more information on why the
 720 previous statement was made.
 721
 722 Set at @code{finish_base_struct} time.
 723
 724 What things can this be used on:
 725
 726         TREE_LISTs that are vfields
 727
 728 History:
 729
 730         This field was used to determine if a virtual function table's
 731         slot should be filled in with a certain virtual function, by
 732         checking to see if the type returned by VF_BASETYPE_VALUE was a
 733         parent of the context in which the old virtual function existed.
 734         This incorrectly assumes that a given type _could_ not appear as
 735         a parent twice in a given inheritance lattice.  For single
 736         inheritance, this would in fact work, because a type could not
 737         possibly appear more than once in an inheritance lattice, but
 738         with multiple inheritance, a type can appear more than once.
 739
 740
 741 @item VF_BINFO_VALUE
 742 Identifies the binfo that caused this vfield to exist.  If this vfield
 743 is from the first direct base class that has a virtual function table,
 744 then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
 745 direct base where the vfield came from.  Can use @code{TREE_VIA_VIRTUAL}
 746 on result to find out if it is a virtual base class.  Related to the
 747 binfo found by
 748
 749 @example
 750 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 751 @end example
 752
 753 @noindent
 754 where @samp{t} is the type that has the given vfield.
 755
 756 @example
 757 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 758 @end example
 759
 760 @noindent
 761 will return the binfo for the given vfield.
 762
 763 May or may not be set at @code{modify_vtable_entries} time.  Set at
 764 @code{finish_base_struct} time.
 765
 766 What things can this be used on:
 767
 768         TREE_LISTs that are vfields
 769
 770
 771 @item VF_DERIVED_VALUE
 772 Identifies the type of the most derived class of the vfield, excluding
 773 the class this vfield is for.
 774
 775 Set at @code{finish_base_struct} time.
 776
 777 What things can this be used on:
 778
 779         TREE_LISTs that are vfields
 780
 781
 782 @item VF_NORMAL_VALUE
 783 Identifies the type of the most derived class of the vfield, including
 784 the class this vfield is for.
 785
 786 Set at @code{finish_base_struct} time.
 787
 788 What things can this be used on:
 789
 790         TREE_LISTs that are vfields
 791
 792
 793 @item WRITABLE_VTABLES
 794 This is a option that can be defined when building the compiler, that
 795 will cause the compiler to output vtables into the data segment so that
 796 the vtables maybe written.  This is undefined by default, because
 797 normally the vtables should be unwritable.  People that implement object
 798 I/O facilities may, or people that want to change the dynamic type of
 799 objects may want to have the vtables writable.  Another way of achieving
 800 this would be to make a copy of the vtable into writable memory, but the
 801 drawback there is that that method only changes the type for one object.
 802
 803 @end table
 804
 805 @node Typical Behavior, Coding Conventions, Macros, Top
 806 @section Typical Behavior
 807
 808 @cindex parse errors
 809
 810 Whenever seemingly normal code fails with errors like
 811 @code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
 812 returning a NULL_TREE for whatever reason.
 813
 814 @node Coding Conventions, Templates, Typical Behavior, Top
 815 @section Coding Conventions
 816
 817 It should never be that case that trees are modified in-place by the
 818 back-end, @emph{unless} it is guaranteed that the semantics are the same
 819 no matter how shared the tree structure is.  @file{fold-const.c} still
 820 has some cases where this is not true, but rms hypothesizes that this
 821 will never be a problem.
 822
 823 @node Templates, Access Control, Coding Conventions, Top
 824 @section Templates
 825
 826 A template is represented by a @code{TEMPLATE_DECL}.  The specific
 827 fields used are:
 828
 829 @table @code
 830 @item DECL_TEMPLATE_RESULT
 831 The generic decl on which instantiations are based.  This looks just
 832 like any other decl.
 833
 834 @item DECL_TEMPLATE_PARMS
 835 The parameters to this template.
 836 @end table
 837
 838 The generic decl is parsed as much like any other decl as possible,
 839 given the parameterization.  The template decl is not built up until the
 840 generic decl has been completed.  For template classes, a template decl
 841 is generated for each member function and static data member, as well.
 842
 843 Template members of template classes are represented by a TEMPLATE_DECL
 844 for the class' parameters around another TEMPLATE_DECL for the member's
 845 parameters.
 846
 847 All declarations that are instantiations or specializations of templates
 848 refer to their template and parameters through DECL_TEMPLATE_INFO.
 849
 850 How should I handle parsing member functions with the proper param
 851 decls?  Set them up again or try to use the same ones?  Currently we do
 852 the former.  We can probably do this without any extra machinery in
 853 store_pending_inline, by deducing the parameters from the decl in
 854 do_pending_inlines.  PRE_PARSED_TEMPLATE_DECL?
 855
 856 If a base is a parm, we can't check anything about it.  If a base is not
 857 a parm, we need to check it for name binding.  Do finish_base_struct if
 858 no bases are parameterized (only if none, including indirect, are
 859 parms).  Nah, don't bother trying to do any of this until instantiation
 860 -- we only need to do name binding in advance.
 861
 862 Always set up method vec and fields, inc. synthesized methods.  Really?
 863 We can't know the types of the copy folks, or whether we need a
 864 destructor, or can have a default ctor, until we know our bases and
 865 fields.  Otherwise, we can assume and fix ourselves later.  Hopefully.
 866
 867 @node Access Control, Error Reporting, Templates, Top
 868 @section Access Control
 869 The function compute_access returns one of three values:
 870
 871 @table @code
 872 @item access_public
 873 means that the field can be accessed by the current lexical scope.
 874
 875 @item access_protected
 876 means that the field cannot be accessed by the current lexical scope
 877 because it is protected.
 878
 879 @item access_private
 880 means that the field cannot be accessed by the current lexical scope
 881 because it is private.
 882 @end table
 883
 884 DECL_ACCESS is used for access declarations; alter_access creates a list
 885 of types and accesses for a given decl.
 886
 887 Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
 888 codes of compute_access and were used as a cache for compute_access.
 889 Now they are not used at all.
 890
 891 TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
 892 granted by the containing class.  BEWARE: TREE_PUBLIC means something
 893 completely unrelated to access control!
 894
 895 @node Error Reporting, Parser, Access Control, Top
 896 @section Error Reporting
 897
 898 The C++ front-end uses a call-back mechanism to allow functions to print
 899 out reasonable strings for types and functions without putting extra
 900 logic in the functions where errors are found.  The interface is through
 901 the @code{cp_error} function (or @code{cp_warning}, etc.).  The
 902 syntax is exactly like that of @code{error}, except that a few more
 903 conversions are supported:
 904
 905 @itemize @bullet
 906 @item
 907 %C indicates a value of `enum tree_code'.
 908 @item
 909 %D indicates a *_DECL node.
 910 @item
 911 %E indicates a *_EXPR node.
 912 @item
 913 %L indicates a value of `enum languages'.
 914 @item
 915 %P indicates the name of a parameter (i.e. "this", "1", "2", ...)
 916 @item
 917 %T indicates a *_TYPE node.
 918 @item
 919 %O indicates the name of an operator (MODIFY_EXPR -> "operator =").
 920
 921 @end itemize
 922
 923 There is some overlap between these; for instance, any of the node
 924 options can be used for printing an identifier (though only @code{%D}
 925 tries to decipher function names).
 926
 927 For a more verbose message (@code{class foo} as opposed to just @code{foo},
 928 including the return type for functions), use @code{%#c}.
 929 To have the line number on the error message indicate the line of the
 930 DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
 931 use @code{%+D}, or it will default to the first.
 932
 933 @node Parser, Exception Handling, Error Reporting, Top
 934 @section Parser
 935
 936 Some comments on the parser:
 937
 938 The @code{after_type_declarator} / @code{notype_declarator} hack is
 939 necessary in order to allow redeclarations of @code{TYPENAME}s, for
 940 instance
 941
 942 @example
 943 typedef int foo;
 944 class A @{
 945   char *foo;
 946 @};
 947 @end example
 948
 949 In the above, the first @code{foo} is parsed as a @code{notype_declarator},
 950 and the second as a @code{after_type_declarator}.
 951
 952 Ambiguities:
 953
 954 There are currently four reduce/reduce ambiguities in the parser.  They are:
 955
 956 1) Between @code{template_parm} and
 957 @code{named_class_head_sans_basetype}, for the tokens @code{aggr
 958 identifier}.  This situation occurs in code looking like
 959
 960 @example
 961 template <class T> class A @{ @};
 962 @end example
 963
 964 It is ambiguous whether @code{class T} should be parsed as the
 965 declaration of a template type parameter named @code{T} or an unnamed
 966 constant parameter of type @code{class T}.  Section 14.6, paragraph 3 of
 967 the January '94 working paper states that the first interpretation is
 968 the correct one.  This ambiguity results in two reduce/reduce conflicts.
 969
 970 2) Between @code{primary} and @code{type_id} for code like @samp{int()}
 971 in places where both can be accepted, such as the argument to
 972 @code{sizeof}.  Section 8.1 of the pre-San Diego working paper specifies
 973 that these ambiguous constructs will be interpreted as @code{typename}s.
 974 This ambiguity results in six reduce/reduce conflicts between
 975 @samp{absdcl} and @samp{functional_cast}.
 976
 977 3) Between @code{functional_cast} and
 978 @code{complex_direct_notype_declarator}, for various token strings.
 979 This situation occurs in code looking like
 980
 981 @example
 982 int (*a);
 983 @end example
 984
 985 This code is ambiguous; it could be a declaration of the variable
 986 @samp{a} as a pointer to @samp{int}, or it could be a functional cast of
 987 @samp{*a} to @samp{int}.  Section 6.8 specifies that the former
 988 interpretation is correct.  This ambiguity results in 7 reduce/reduce
 989 conflicts.  Another aspect of this ambiguity is code like 'int (x[2]);',
 990 which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
 991 between @samp{direct_notype_declarator} and
 992 @samp{primary}/@samp{overqualified_id}.  Finally, there are 4 r/r
 993 conflicts between @samp{expr_or_declarator} and @samp{primary} over code
 994 like 'int (a);', which could probably be resolved but would also
 995 probably be more trouble than it's worth.  In all, this situation
 996 accounts for 17 conflicts.  Ack!
 997
 998 The second case above is responsible for the failure to parse 'LinppFile
 999 ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
1000 Math.h++) as an object declaration, and must be fixed so that it does
1001 not resolve until later.
1002
1003 4) Indirectly between @code{after_type_declarator} and @code{parm}, for
1004 type names.  This occurs in (as one example) code like
1005
1006 @example
1007 typedef int foo, bar;
1008 class A @{
1009   foo (bar);
1010 @};
1011 @end example
1012
1013 What is @code{bar} inside the class definition?  We currently interpret
1014 it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
1015 @code{after_type_declarator}.  I believe that xlC is correct, in light
1016 of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
1017 could possibly be a type name is taken as the @i{decl-specifier-seq} of
1018 a @i{declaration}."  However, it seems clear that this rule must be
1019 violated in the case of constructors.  This ambiguity accounts for 8
1020 conflicts.
1021
1022 Unlike the others, this ambiguity is not recognized by the Working Paper.
1023
1024 @node  Exception Handling, Free Store, Parser, Top
1025 @section Exception Handling
1026
1027 Note, exception handling in g++ is still under development.
1028
1029 This section describes the mapping of C++ exceptions in the C++
1030 front-end, into the back-end exception handling framework.
1031
1032 The basic mechanism of exception handling in the back-end is
1033 unwind-protect a la elisp.  This is a general, robust, and language
1034 independent representation for exceptions.
1035
1036 The C++ front-end exceptions are mapping into the unwind-protect
1037 semantics by the C++ front-end.  The mapping is describe below.
1038
1039 When -frtti is used, rtti is used to do exception object type checking,
1040 when it isn't used, the encoded name for the type of the object being
1041 thrown is used instead.  All code that originates exceptions, even code
1042 that throws exceptions as a side effect, like dynamic casting, and all
1043 code that catches exceptions must be compiled with either -frtti, or
1044 -fno-rtti.  It is not possible to mix rtti base exception handling
1045 objects with code that doesn't use rtti.  The exceptions to this, are
1046 code that doesn't catch or throw exceptions, catch (...), and code that
1047 just rethrows an exception.
1048
1049 Currently we use the normal mangling used in building functions names
1050 (int's are "i", const char * is PCc) to build the non-rtti base type
1051 descriptors for exception handling.  These descriptors are just plain
1052 NULL terminated strings, and internally they are passed around as char
1053 *.
1054
1055 In C++, all cleanups should be protected by exception regions.  The
1056 region starts just after the reason why the cleanup is created has
1057 ended.  For example, with an automatic variable, that has a constructor,
1058 it would be right after the constructor is run.  The region ends just
1059 before the finalization is expanded.  Since the backend may expand the
1060 cleanup multiple times along different paths, once for normal end of the
1061 region, once for non-local gotos, once for returns, etc, the backend
1062 must take special care to protect the finalization expansion, if the
1063 expansion is for any other reason than normal region end, and it is
1064 `inline' (it is inside the exception region).  The backend can either
1065 choose to move them out of line, or it can created an exception region
1066 over the finalization to protect it, and in the handler associated with
1067 it, it would not run the finalization as it otherwise would have, but
1068 rather just rethrow to the outer handler, careful to skip the normal
1069 handler for the original region.
1070
1071 In Ada, they will use the more runtime intensive approach of having
1072 fewer regions, but at the cost of additional work at run time, to keep a
1073 list of things that need cleanups.  When a variable has finished
1074 construction, they add the cleanup to the list, when the come to the end
1075 of the lifetime of the variable, the run the list down.  If the take a
1076 hit before the section finishes normally, they examine the list for
1077 actions to perform.  I hope they add this logic into the back-end, as it
1078 would be nice to get that alternative approach in C++.
1079
1080 On an rs6000, xlC stores exception objects on that stack, under the try
1081 block.  When is unwinds down into a handler, the frame pointer is
1082 adjusted back to the normal value for the frame in which the handler
1083 resides, and the stack pointer is left unchanged from the time at which
1084 the object was thrown.  This is so that there is always someplace for
1085 the exception object, and nothing can overwrite it, once we start
1086 throwing.  The only bad part, is that the stack remains large.
1087
1088 The below points out some things that work in g++'s exception handling.
1089
1090 All completely constructed temps and local variables are cleaned up in
1091 all unwinded scopes.  Completely constructed parts of partially
1092 constructed objects are cleaned up.  This includes partially built
1093 arrays.  Exception specifications are now handled.  Thrown objects are
1094 now cleaned up all the time.  We can now tell if we have an active
1095 exception being thrown or not (__eh_type != 0).  We use this to call
1096 terminate if someone does a throw; without there being an active
1097 exception object.  uncaught_exception () works.  Exception handling
1098 should work right if you optimize.  Exception handling should work with
1099 -fpic or -fPIC.
1100
1101 The below points out some flaws in g++'s exception handling, as it now
1102 stands.
1103
1104 Only exact type matching or reference matching of throw types works when
1105 -fno-rtti is used.  Only works on a SPARC (like Suns) (both -mflat and
1106 -mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000,
1107 PowerPC, Alpha, mips, VAX, m68k and z8k machines.  SPARC v9 may not
1108 work.  HPPA is mostly done, but throwing between a shared library and
1109 user code doesn't yet work.  Some targets have support for data-driven
1110 unwinding.  Partial support is in for all other machines, but a stack
1111 unwinder called __unwind_function has to be written, and added to
1112 libgcc2 for them.  The new EH code doesn't rely upon the
1113 __unwind_function for C++ code, instead it creates per function
1114 unwinders right inside the function, unfortunately, on many platforms
1115 the definition of RETURN_ADDR_RTX in the tm.h file for the machine port
1116 is wrong.  See below for details on __unwind_function.  RTL_EXPRs for EH
1117 cond variables for && and || exprs should probably be wrapped in
1118 UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
1119
1120 We only do pointer conversions on exception matching a la 15.3 p2 case
1121 3: `A handler with type T, const T, T&, or const T& is a match for a
1122 throw-expression with an object of type E if [3]T is a pointer type and
1123 E is a pointer type that can be converted to T by a standard pointer
1124 conversion (_conv.ptr_) not involving conversions to pointers to private
1125 or protected base classes.' when -frtti is given.
1126
1127 We don't call delete on new expressions that die because the ctor threw
1128 an exception.  See except/18 for a test case.
1129
1130 15.2 para 13: The exception being handled should be rethrown if control
1131 reaches the end of a handler of the function-try-block of a constructor
1132 or destructor, right now, it is not.
1133
1134 15.2 para 12: If a return statement appears in a handler of
1135 function-try-block of a constructor, the program is ill-formed, but this
1136 isn't diagnosed.
1137
1138 15.2 para 11: If the handlers of a function-try-block contain a jump
1139 into the body of a constructor or destructor, the program is ill-formed,
1140 but this isn't diagnosed.
1141
1142 15.2 para 9: Check that the fully constructed base classes and members
1143 of an object are destroyed before entering the handler of a
1144 function-try-block of a constructor or destructor for that object.
1145
1146 build_exception_variant should sort the incoming list, so that it
1147 implements set compares, not exact list equality.  Type smashing should
1148 smash exception specifications using set union.
1149
1150 Thrown objects are usually allocated on the heap, in the usual way.  If
1151 one runs out of heap space, throwing an object will probably never work.
1152 This could be relaxed some by passing an __in_chrg parameter to track
1153 who has control over the exception object.  Thrown objects are not
1154 allocated on the heap when they are pointer to object types.  We should
1155 extend it so that all small (<4*sizeof(void*)) objects are stored
1156 directly, instead of allocated on the heap.
1157
1158 When the backend returns a value, it can create new exception regions
1159 that need protecting.  The new region should rethrow the object in
1160 context of the last associated cleanup that ran to completion.
1161
1162 The structure of the code that is generated for C++ exception handling
1163 code is shown below:
1164
1165 @example
1166 Ln:                                     throw value;
1167         copy value onto heap
1168         jump throw (Ln, id, address of copy of value on heap)
1169
1170                                         try @{
1171 +Lstart:        the start of the main EH region
1172 |...                                            ...
1173 +Lend:          the end of the main EH region
1174                                         @} catch (T o) @{
1175                                                 ...1
1176                                         @}
1177 Lresume:
1178         nop     used to make sure there is something before
1179                 the next region ends, if there is one
1180 ...                                     ...
1181
1182         jump Ldone
1183 [
1184 Lmainhandler:    handler for the region Lstart-Lend
1185         cleanup
1186 ] zero or more, depending upon automatic vars with dtors
1187 +Lpartial:
1188 |        jump Lover
1189 +Lhere:
1190         rethrow (Lhere, same id, same obj);
1191 Lterm:          handler for the region Lpartial-Lhere
1192         call terminate
1193 Lover:
1194 [
1195  [
1196         call throw_type_match
1197         if (eq) @{
1198  ] these lines disappear when there is no catch condition
1199 +Lsregion2:
1200 |       ...1
1201 |       jump Lresume
1202 |Lhandler:      handler for the region Lsregion2-Leregion2
1203 |       rethrow (Lresume, same id, same obj);
1204 +Leregion2
1205         @}
1206 ] there are zero or more of these sections, depending upon how many
1207   catch clauses there are
1208 ----------------------------- expand_end_all_catch --------------------------
1209                 here we have fallen off the end of all catch
1210                 clauses, so we rethrow to outer
1211         rethrow (Lresume, same id, same obj);
1212 ----------------------------- expand_end_all_catch --------------------------
1213 [
1214 L1:     maybe throw routine
1215 ] depending upon if we have expanded it or not
1216 Ldone:
1217         ret
1218
1219 start_all_catch emits labels: Lresume,
1220
1221 @end example
1222
1223 The __unwind_function takes a pointer to the throw handler, and is
1224 expected to pop the stack frame that was built to call it, as well as
1225 the frame underneath and then jump to the throw handler.  It must
1226 restore all registers to their proper values as well as all other
1227 machine state as determined by the context in which we are unwinding
1228 into.  The way I normally start is to compile:
1229
1230         void *g;
1231         foo(void* a) @{ g = a; @}
1232
1233 with -S, and change the thing that alters the PC (return, or ret
1234 usually) to not alter the PC, making sure to leave all other semantics
1235 (like adjusting the stack pointer, or frame pointers) in.  After that,
1236 replicate the prologue once more at the end, again, changing the PC
1237 altering instructions, and finally, at the very end, jump to `g'.
1238
1239 It takes about a week to write this routine, if someone wants to
1240 volunteer to write this routine for any architecture, exception support
1241 for that architecture will be added to g++.  Please send in those code
1242 donations.  One other thing that needs to be done, is to double check
1243 that __builtin_return_address (0) works.
1244
1245 @subsection Specific Targets
1246
1247 For the alpha, the __unwind_function will be something resembling:
1248
1249 @example
1250 void
1251 __unwind_function(void *ptr)
1252 @{
1253   /* First frame */
1254   asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */
1255   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1256
1257   /* Second frame */
1258   asm ("ldq $15, 8($30)"); /* fp */
1259   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1260
1261   /* Return */
1262   asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */
1263 @}
1264 @end example
1265
1266 @noindent
1267 However, there are a few problems preventing it from working.  First of
1268 all, the gcc-internal function @code{__builtin_return_address} needs to
1269 work given an argument of 0 for the alpha.  As it stands as of August
1270 30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c}
1271 will definitely not work on the alpha.  Instead, we need to define
1272 the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe),
1273 @code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new
1274 definition for @code{RETURN_ADDR_RTX}.
1275
1276 In addition (and more importantly), we need a way to reliably find the
1277 frame pointer on the alpha.  The use of the value 8 above to restore the
1278 frame pointer (register 15) is incorrect.  On many systems, the frame
1279 pointer is consistently offset to a specific point on the stack.  On the
1280 alpha, however, the frame pointer is pushed last.  First the return
1281 address is stored, then any other registers are saved (e.g., @code{s0}),
1282 and finally the frame pointer is put in place.  So @code{fp} could have
1283 an offset of 8, but if the calling function saved any registers at all,
1284 they add to the offset.
1285
1286 The only places the frame size is noted are with the @samp{.frame}
1287 directive, for use by the debugger and the OSF exception handling model
1288 (useless to us), and in the initial computation of the new value for
1289 @code{sp}, the stack pointer.  For example, the function may start with:
1290
1291 @example
1292 lda $30,-32($30)
1293 .frame $15,32,$26,0
1294 @end example
1295
1296 @noindent
1297 The 32 above is exactly the value we need.  With this, we can be sure
1298 that the frame pointer is stored 8 bytes less---in this case, at 24(sp)).
1299 The drawback is that there is no way that I (Brendan) have found to let
1300 us discover the size of a previous frame @emph{inside} the definition
1301 of @code{__unwind_function}.
1302
1303 So to accomplish exception handling support on the alpha, we need two
1304 things: first, a way to figure out where the frame pointer was stored,
1305 and second, a functional @code{__builtin_return_address} implementation
1306 for except.c to be able to use it.
1307
1308 Or just support DWARF 2 unwind info.
1309
1310 @subsection New Backend Exception Support
1311
1312 This subsection discusses various aspects of the design of the
1313 data-driven model being implemented for the exception handling backend.
1314
1315 The goal is to generate enough data during the compilation of user code,
1316 such that we can dynamically unwind through functions at run time with a
1317 single routine (@code{__throw}) that lives in libgcc.a, built by the
1318 compiler, and dispatch into associated exception handlers.
1319
1320 This information is generated by the DWARF 2 debugging backend, and
1321 includes all of the information __throw needs to unwind an arbitrary
1322 frame.  It specifies where all of the saved registers and the return
1323 address can be found at any point in the function.
1324
1325 Major disadvantages when enabling exceptions are:
1326
1327 @itemize @bullet
1328 @item
1329 Code that uses caller saved registers, can't, when flow can be
1330 transferred into that code from an exception handler.  In high performance
1331 code this should not usually be true, so the effects should be minimal.
1332
1333 @end itemize
1334
1335 @subsection Backend Exception Support
1336
1337 The backend must be extended to fully support exceptions.  Right now
1338 there are a few hooks into the alpha exception handling backend that
1339 resides in the C++ frontend from that backend that allows exception
1340 handling to work in g++.  An exception region is a segment of generated
1341 code that has a handler associated with it.  The exception regions are
1342 denoted in the generated code as address ranges denoted by a starting PC
1343 value and an ending PC value of the region.  Some of the limitations
1344 with this scheme are:
1345
1346 @itemize @bullet
1347 @item
1348 The backend replicates insns for such things as loop unrolling and
1349 function inlining.  Right now, there are no hooks into the frontend's
1350 exception handling backend to handle the replication of insns.  When
1351 replication happens, a new exception region descriptor needs to be
1352 generated for the new region.
1353
1354 @item
1355 The backend expects to be able to rearrange code, for things like jump
1356 optimization.  Any rearranging of the code needs have exception region
1357 descriptors updated appropriately.
1358
1359 @item
1360 The backend can eliminate dead code.  Any associated exception region
1361 descriptor that refers to fully contained code that has been eliminated
1362 should also be removed, although not doing this is harmless in terms of
1363 semantics.
1364
1365 @end itemize
1366
1367 The above is not meant to be exhaustive, but does include all things I
1368 have thought of so far.  I am sure other limitations exist.
1369
1370 Below are some notes on the migration of the exception handling code
1371 backend from the C++ frontend to the backend.
1372
1373 NOTEs are to be used to denote the start of an exception region, and the
1374 end of the region.  I presume that the interface used to generate these
1375 notes in the backend would be two functions, start_exception_region and
1376 end_exception_region (or something like that).  The frontends are
1377 required to call them in pairs.  When marking the end of a region, an
1378 argument can be passed to indicate the handler for the marked region.
1379 This can be passed in many ways, currently a tree is used.  Another
1380 possibility would be insns for the handler, or a label that denotes a
1381 handler.  I have a feeling insns might be the best way to pass it.
1382 Semantics are, if an exception is thrown inside the region, control is
1383 transferred unconditionally to the handler.  If control passes through
1384 the handler, then the backend is to rethrow the exception, in the
1385 context of the end of the original region.  The handler is protected by
1386 the conventional mechanisms; it is the frontend's responsibility to
1387 protect the handler, if special semantics are required.
1388
1389 This is a very low level view, and it would be nice is the backend
1390 supported a somewhat higher level view in addition to this view.  This
1391 higher level could include source line number, name of the source file,
1392 name of the language that threw the exception and possibly the name of
1393 the exception.  Kenner may want to rope you into doing more than just
1394 the basics required by C++.  You will have to resolve this.  He may want
1395 you to do support for non-local gotos, first scan for exception handler,
1396 if none is found, allow the debugger to be entered, without any cleanups
1397 being done.  To do this, the backend would have to know the difference
1398 between a cleanup-rethrower, and a real handler, if would also have to
1399 have a way to know if a handler `matches' a thrown exception, and this
1400 is frontend specific.
1401
1402 The stack unwinder is one of the hardest parts to do.  It is highly
1403 machine dependent.  The form that kenner seems to like was a couple of
1404 macros, that would do the machine dependent grunt work.  One preexisting
1405 function that might be of some use is __builtin_return_address ().  One
1406 macro he seemed to want was __builtin_return_address, and the other
1407 would do the hard work of fixing up the registers, adjusting the stack
1408 pointer, frame pointer, arg pointer and so on.
1409
1410
1411 @node Free Store, Mangling, Exception Handling, Top
1412 @section Free Store
1413
1414 @code{operator new []} adds a magic cookie to the beginning of arrays
1415 for which the number of elements will be needed by @code{operator delete
1416 []}.  These are arrays of objects with destructors and arrays of objects
1417 that define @code{operator delete []} with the optional size_t argument.
1418 This cookie can be examined from a program as follows:
1419
1420 @example
1421 typedef unsigned long size_t;
1422 extern "C" int printf (const char *, ...);
1423
1424 size_t nelts (void *p)
1425 @{
1426   struct cookie @{
1427     size_t nelts __attribute__ ((aligned (sizeof (double))));
1428   @};
1429
1430   cookie *cp = (cookie *)p;
1431   --cp;
1432
1433   return cp->nelts;
1434 @}
1435
1436 struct A @{
1437   ~A() @{ @}
1438 @};
1439
1440 main()
1441 @{
1442   A *ap = new A[3];
1443   printf ("%ld\n", nelts (ap));
1444 @}
1445 @end example
1446
1447 @section Linkage
1448 The linkage code in g++ is horribly twisted in order to meet two design goals:
1449
1450 1) Avoid unnecessary emission of inlines and vtables.
1451
1452 2) Support pedantic assemblers like the one in AIX.
1453
1454 To meet the first goal, we defer emission of inlines and vtables until
1455 the end of the translation unit, where we can decide whether or not they
1456 are needed, and how to emit them if they are.
1457
1458 @node Mangling, Concept Index, Free Store, Top
1459 @section Function name mangling for C++ and Java
1460
1461 Both C++ and Jave provide overloaded function and methods,
1462 which are methods with the same types but different parameter lists.
1463 Selecting the correct version is done at compile time.
1464 Though the overloaded functions have the same name in the source code,
1465 they need to be translated into different assembler-level names,
1466 since typical assemblers and linkers cannot handle overloading.
1467 This process of encoding the parameter types with the method name
1468 into a unique name is called @dfn{name mangling}.  The inverse
1469 process is called @dfn{demangling}.
1470
1471 It is convenient that C++ and Java use compatible mangling schemes,
1472 since the makes life easier for tools such as gdb, and it eases
1473 integration between C++ and Java.
1474
1475 Note there is also a standard "Jave Native Interface" (JNI) which
1476 implements a different calling convention, and uses a different
1477 mangling scheme.  The JNI is a rather abstract ABI so Java can call methods
1478 written in C or C++;
1479 we are concerned here about a lower-level interface primarily
1480 intended for methods written in Java, but that can also be used for C++
1481 (and less easily C).
1482
1483 Note that on systems that follow BSD tradition, a C identifier @code{var}
1484 would get "mangled" into the assembler name @samp{_var}.  On such
1485 systems, all other mangled names are also prefixed by a @samp{_}
1486 which is not shown in the following examples.
1487
1488 @subsection Method name mangling
1489
1490 C++ mangles a method by emitting the function name, followed by @code{__},
1491 followed by encodings of any method qualifiers (such as @code{const}),
1492 followed by the mangling of the method's class,
1493 followed by the mangling of the parameters, in order.
1494
1495 For example @code{Foo::bar(int, long) const} is mangled
1496 as @samp{bar__C3Fooil}.
1497
1498 For a constructor, the method name is left out.
1499 That is @code{Foo::Foo(int, long) const}  is mangled
1500 as @samp{__C3Fooil}.
1501
1502 GNU Java does the same.
1503
1504 @subsection Primitive types
1505
1506 The C++ types @code{int}, @code{long}, @code{short}, @code{char},
1507 and @code{long long} are mangled as @samp{i}, @samp{l},
1508 @samp{s}, @samp{c}, and @samp{x}, respectively.
1509 The corresponding unsigned types have @samp{U} prefixed
1510 to the mangling.  The type @code{signed char} is mangled @samp{Sc}.
1511
1512 The C++ and Java floating-point types @code{float} and @code{double}
1513 are mangled as @samp{f} and @samp{d} respectively.
1514
1515 The C++ @code{bool} type and the Java @code{boolean} type are
1516 mangled as @samp{b}.
1517
1518 The C++ @code{wchar_t} and the Java @code{char} types are
1519 mangled as @samp{w}.
1520
1521 The Java integral types @code{byte}, @code{short}, @code{int}
1522 and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i},
1523 and @samp{x}, respectively.
1524
1525 C++ code that has included @code{javatypes.h} will mangle
1526 the typedefs  @code{jbyte}, @code{jshort}, @code{jint}
1527 and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i},
1528 and @samp{x}.  (This has not been implemented yet.)
1529
1530 @subsection Mangling of simple names
1531
1532 A simple class, package, template, or namespace name is
1533 encoded as the number of characters in the name, followed by
1534 the actual characters.  Thus the class @code{Foo}
1535 is encoded as @samp{3Foo}.
1536
1537 If any of the characters in the name are not alphanumeric
1538 (i.e not one of the standard ASCII letters, digits, or '_'),
1539 or the initial character is a digit, then the name is
1540 mangled as a sequence of encoded Unicode letters.
1541 A Unicode encoding starts with a @samp{U} to indicate
1542 that Unicode escapes are used, followed by the number of
1543 bytes used by the Unicode encoding, followed by the bytes
1544 representing the encoding.  ASSCI letters and
1545 non-initial digits are encoded without change.  However, all
1546 other characters (including underscore and initial digits) are
1547 translated into a sequence starting with an underscore,
1548 followed by the big-endian 4-hex-digit lower-case encoding of the character.
1549
1550 If a method name contains Unicode-escaped characters, the
1551 entire mangled method name is followed by a @samp{U}.
1552
1553 For example, the method @code{X\u0319::M\u002B(int)} is encoded as
1554 @samp{M_002b__U6X_0319iU}.
1555
1556
1557 @subsection Pointer and reference types
1558
1559 A C++ pointer type is mangled as @samp{P} followed by the
1560 mangling of the type pointed to.
1561
1562 A C++ reference type as mangled as @samp{R} followed by the
1563 mangling of the type referenced.
1564
1565 A Java object reference type is equivalent
1566 to a C++ pointer parameter, so we mangle such an parameter type
1567 as @samp{P} followed by the mangling of the class name.
1568
1569 @subsection Squangled type compression
1570
1571 Squangling (enabled with the @samp{-fsquangle} option), utilizes the
1572 @samp{B} code to indicate reuse of a previously seen type within an
1573 indentifier. Types are recognized in a left to right manner and given
1574 increasing values, which are appended to the code in the standard
1575 manner. Ie, multiple digit numbers are delimited by @samp{_}
1576 characters. A type is considered to be any non primitive type,
1577 regardless of whether its a parameter, template parameter, or entire
1578 template. Certain codes are considered modifiers of a type, and are not
1579 included as part of the type. These are the @samp{C}, @samp{V},
1580 @samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting
1581 constant, volatile, pointer, array, reference, unsigned, and restrict.
1582 These codes may precede a @samp{B} type in order to make the required
1583 modifications to the type.
1584
1585 For example:
1586 @example
1587 template <class T> class class1 @{ @};
1588
1589 template <class T> class class2 @{ @};
1590
1591 class class3 @{ @};
1592
1593 int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @}
1594
1595     B0 -> class2<class1<class3>
1596     B1 -> class1<class3>
1597     B2 -> class3
1598 @end example
1599 Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}.
1600 The int parameter is a basic type, and does not receive a B encoding...
1601
1602 @subsection Qualified names
1603
1604 Both C++ and Java allow a class to be lexically nested inside another
1605 class.  C++ also supports namespaces.
1606 Java also supports packages.
1607
1608 These are all mangled the same way:  First the letter @samp{Q}
1609 indicates that we are emitting a qualified name.
1610 That is followed by the number of parts in the qualified name.
1611 If that number is 9 or less, it is emitted with no delimiters.
1612 Otherwise, an underscore is written before and after the count.
1613 Then follows each part of the qualified name, as described above.
1614
1615 For example @code{Foo::\u0319::Bar} is encoded as
1616 @samp{Q33FooU5_03193Bar}.
1617
1618 Squangling utilizes the the letter @samp{K} to indicate a
1619 remembered portion of a qualified name. As qualified names are processed
1620 for an identifier, the names are numbered and remembered in a
1621 manner similar to the @samp{B} type compression code.
1622 Names are recognized left to right, and given increasing values, which are
1623 appended to the code in the standard manner. ie, multiple digit numbers
1624 are delimited by @samp{_} characters.
1625
1626 For example
1627 @example
1628 class Andrew
1629 @{
1630   class WasHere
1631   @{
1632       class AndHereToo
1633       @{
1634       @};
1635   @};
1636 @};
1637
1638 f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @}
1639
1640    K0 ->  Andrew
1641    K1 ->  Andrew::WasHere
1642    K2 ->  Andrew::WasHere::AndHereToo
1643 @end example
1644 Function @samp{f()} would be mangled as :
1645 @samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo}
1646
1647 There are some occasions when either a @samp{B} or @samp{K} code could
1648 be chosen, preference is always given to the @samp{B} code. Ie, the example
1649 in the section on @samp{B} mangling could have used a @samp{K} code
1650 instead of @samp{B2}.
1651
1652 @subsection Templates
1653
1654 A class template instantiation is encoded as the letter @samp{t},
1655 followed by the encoding of the template name, followed
1656 the number of template parameters, followed by encoding of the template
1657 parameters.  If a template parameter is a type, it is written
1658 as a @samp{Z} followed by the encoding of the type.
1659
1660 A function template specialization (either an instantiation or an
1661 explicit specialization) is encoded by an @samp{H} followed by the
1662 encoding of the template parameters, as described above, followed by an
1663 @samp{_}, the encoding of the argument types to the template function
1664 (not the specialization), another @samp{_}, and the return type.  (Like
1665 the argument types, the return type is the return type of the function
1666 template, not the specialization.)  Template parameters in the argument
1667 and return types are encoded by an @samp{X} for type parameters, or a
1668 @samp{Y} for constant parameters, an index indicating their position
1669 in the template parameter list declaration, and their template depth.
1670
1671 @subsection Arrays
1672
1673 C++ array types are mangled by emitting @samp{A}, followed by
1674 the length of the array, followed by an @samp{_}, followed by
1675 the mangling of the element type.  Of course, normally
1676 array parameter types decay into a pointer types, so you
1677 don't see this.
1678
1679 Java arrays are objects.  A Java type @code{T[]} is mangled
1680 as if it were the C++ type @code{JArray<T>}.
1681 For example @code{java.lang.String[]} is encoded as
1682 @samp{Pt6JArray1ZPQ34java4lang6String}.
1683
1684 @subsection Static fields
1685
1686 Both C++ and Java classes can have static fields.
1687 These are allocated statically, and are shared among all instances.
1688
1689 The mangling starts with a prefix (@samp{_} in most systems), which is
1690 followed by the mangling
1691 of the class name, followed by the "joiner" and finally the field name.
1692 The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special
1693 separator character.  For historical reasons (and idiosyncracies
1694 of assembler syntax) it can @samp{$} or @samp{.} (or even
1695 @samp{_} on a few systems).  If the joiner is @samp{_} then the prefix
1696 is @samp{__static_} instead of just @samp{_}.
1697
1698 For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax)
1699 would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var}
1700 (or rarely @samp{__static_Q23Foo3Bar_var}).
1701
1702 If the name of a static variable needs Unicode escapes,
1703 the Unicode indicator @samp{U} comes before the "joiner".
1704 This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}.
1705
1706 @subsection Table of demangling code characters
1707
1708 The following special characters are used in mangling:
1709
1710 @table @samp
1711 @item A
1712 Indicates a C++ array type.
1713
1714 @item b
1715 Encodes the C++ @code{bool} type,
1716 and the Java @code{boolean} type.
1717
1718 @item B
1719 Used for squangling. Similar in concept to the 'T' non-squangled code.
1720
1721 @item c
1722 Encodes the C++ @code{char} type, and the Java @code{byte} type.
1723
1724 @item C
1725 A modifier to indicate a @code{const} type.
1726 Also used to indicate a @code{const} member function
1727 (in which cases it precedes the encoding of the method's class).
1728
1729 @item d
1730 Encodes the C++ and Java @code{double} types.
1731
1732 @item e
1733 Indicates extra unknown arguments @code{...}.
1734
1735 @item E
1736 Indicates the opening parenthesis of an expression.
1737
1738 @item f
1739 Encodes the C++ and Java @code{float} types.
1740
1741 @item F
1742 Used to indicate a function type.
1743
1744 @item H
1745 Used to indicate a template function.
1746
1747 @item i
1748 Encodes the C++ and Java @code{int} types.
1749
1750 @item I
1751 Encodes typedef names of the form @code{int@var{n}_t}, where @var{n} is a
1752 positive decimal number.  The @samp{I} is followed by either two
1753 hexidecimal digits, which encode the value of @var{n}, or by an
1754 arbitrary number of hexidecimal digits between underscores.  For
1755 example, @samp{I40} encodes the type @code{int64_t}, and @samp{I_200_}
1756 encodes the type @code{int512_t}.
1757
1758 @item J
1759 Indicates a complex type.
1760
1761 @item K
1762 Used by squangling to compress qualified names.
1763
1764 @item l
1765 Encodes the C++ @code{long} type.
1766
1767 @item n
1768 Immediate repeated type. Followed by the repeat count.
1769
1770 @item N
1771 Repeated type. Followed by the repeat count of the repeated type,
1772 followed by the type index of the repeated type. Due to a bug in
1773 g++ 2.7.2, this is only generated if index is 0. Superceded by
1774 @samp{n} when squangling.
1775
1776 @item P
1777 Indicates a pointer type.  Followed by the type pointed to.
1778
1779 @item Q
1780 Used to mangle qualified names, which arise from nested classes.
1781 Also used for namespaces.
1782 In Java used to mangle package-qualified names, and inner classes.
1783
1784 @item r
1785 Encodes the GNU C++ @code{long double} type.
1786
1787 @item R
1788 Indicates a reference type.  Followed by the referenced type.
1789
1790 @item s
1791 Encodes the C++ and java @code{short} types.
1792
1793 @item S
1794 A modifier that indicates that the following integer type is signed.
1795 Only used with @code{char}.
1796
1797 Also used as a modifier to indicate a static member function.
1798
1799 @item t
1800 Indicates a template instantiation.
1801
1802 @item T
1803 A back reference to a previously seen type.
1804
1805 @item U
1806 A modifier that indicates that the following integer type is unsigned.
1807 Also used to indicate that the following class or namespace name
1808 is encoded using Unicode-mangling.
1809
1810 @item u
1811 The @code{restrict} type qualifier.
1812
1813 @item v
1814 Encodes the C++ and Java @code{void} types.
1815
1816 @item V
1817 A modifier for a @code{volatile} type or method.
1818
1819 @item w
1820 Encodes the C++ @code{wchar_t} type, and the Java @code{char} types.
1821
1822 @item W
1823 Indicates the closing parenthesis of an expression.
1824
1825 @item x
1826 Encodes the GNU C++ @code{long long} type, and the Java @code{long} type.
1827
1828 @item X
1829 Encodes a template type parameter, when part of a function type.
1830
1831 @item Y
1832 Encodes a template constant parameter, when part of a function type.
1833
1834 @item Z
1835 Used for template type parameters.
1836
1837 @end table
1838
1839 The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p}
1840 also seem to be used for obscure purposes ...
1841
1842 @node Concept Index,  , Mangling, Top
1843
1844 @section Concept Index
1845
1846 @printindex cp
1847
1848 @bye