gcc/cp/gxxint.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename g++int.info
   4 @settitle G++ internals
   5 @setchapternewpage odd
   6 @c %**end of header
   7
   8 @node Top, Limitations of g++, (dir), (dir)
   9 @chapter Internal Architecture of the Compiler
  10
  11 This is meant to describe the C++ front-end for gcc in detail.
  12 Questions and comments to Jason Merrill @email{jason@@redhat.com} and
  13 Mark Mitchell @email{mark@codesourcery.com}.
  14
  15 @menu
  16 * Limitations of g++::
  17 * Routines::
  18 * Implementation Specifics::
  19 * Glossary::
  20 * Macros::
  21 * Typical Behavior::
  22 * Coding Conventions::
  23 * Templates::
  24 * Access Control::
  25 * Error Reporting::
  26 * Parser::
  27 * Exception Handling::
  28 * Free Store::
  29 * Mangling::  Function name mangling for C++ and Java
  30 * Concept Index::
  31 @end menu
  32
  33 @node Limitations of g++, Routines, Top, Top
  34 @section Limitations of g++
  35
  36 @itemize @bullet
  37 @item
  38 Limitations on input source code: 240 nesting levels with the parser
  39 stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
  40 16.4k swap space per nesting level.  The parser needs about 2.09 *
  41 number of nesting levels worth of stackspace.
  42
  43 @cindex pushdecl_class_level
  44 @item
  45 I suspect there are other uses of pushdecl_class_level that do not call
  46 set_identifier_type_value in tandem with the call to
  47 pushdecl_class_level.  It would seem to be an omission.
  48
  49 @end itemize
  50
  51 @node Routines, Implementation Specifics, Limitations of g++, Top
  52 @section Routines
  53
  54 This section describes some of the routines used in the C++ front-end.
  55
  56 @code{build_vtable} and @code{prepare_fresh_vtable} is used only within
  57 the @file{cp-class.c} file, and only in @code{finish_struct} and
  58 @code{modify_vtable_entries}.
  59
  60 @code{build_vtable}, @code{prepare_fresh_vtable}, and
  61 @code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
  62
  63 @code{finish_struct} can steal the virtual function table from parents,
  64 this prohibits related_vslot from working.  When finish_struct steals,
  65 we know that
  66
  67 @example
  68 get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
  69 @end example
  70
  71 @noindent
  72 will get the related binfo.
  73
  74 @code{layout_basetypes} does something with the VIRTUALS.
  75
  76 Supposedly (according to Tiemann) most of the breadth first searching
  77 done, like in @code{get_base_distance} and in @code{get_binfo} was not
  78 because of any design decision.  I have since found out the at least one
  79 part of the compiler needs the notion of depth first binfo searching, I
  80 am going to try and convert the whole thing, it should just work.  The
  81 term left-most refers to the depth first left-most node.  It uses
  82 @code{MAIN_VARIANT == type} as the condition to get left-most, because
  83 the things that have @code{BINFO_OFFSET}s of zero are shared and will
  84 have themselves as their own @code{MAIN_VARIANT}s.  The non-shared right
  85 ones, are copies of the left-most one, hence if it is its own
  86 @code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
  87 a non-left-most one.
  88
  89 @code{get_base_distance}'s path and distance matters in its use in:
  90
  91 @itemize @bullet
  92 @item
  93 @code{prepare_fresh_vtable} (the code is probably wrong)
  94 @item
  95 @code{init_vfields} Depends upon distance probably in a safe way,
  96 build_offset_ref might use partial paths to do further lookups,
  97 hack_identifier is probably not properly checking access.
  98
  99 @item
 100 @code{get_first_matching_virtual} probably should check for
 101 @code{get_base_distance} returning -2.
 102
 103 @item
 104 @code{resolve_offset_ref} should be called in a more deterministic
 105 manner.  Right now, it is called in some random contexts, like for
 106 arguments at @code{build_method_call} time, @code{default_conversion}
 107 time, @code{convert_arguments} time, @code{build_unary_op} time,
 108 @code{build_c_cast} time, @code{build_modify_expr} time,
 109 @code{convert_for_assignment} time, and
 110 @code{convert_for_initialization} time.
 111
 112 But, there are still more contexts it needs to be called in, one was the
 113 ever simple:
 114
 115 @example
 116 if (obj.*pmi != 7)
 117    @dots{}
 118 @end example
 119
 120 Seems that the problems were due to the fact that @code{TREE_TYPE} of
 121 the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
 122 of the referent (like @code{INTEGER_TYPE}).  This problem was fixed by
 123 changing @code{default_conversion} to check @code{TREE_CODE (x)},
 124 instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
 125 was @code{OFFSET_TYPE}.
 126
 127 @end itemize
 128
 129 @node Implementation Specifics, Glossary, Routines, Top
 130 @section Implementation Specifics
 131
 132 @itemize @bullet
 133 @item Explicit Initialization
 134
 135 The global list @code{current_member_init_list} contains the list of
 136 mem-initializers specified in a constructor declaration.  For example:
 137
 138 @example
 139 foo::foo() : a(1), b(2) @{@}
 140 @end example
 141
 142 @noindent
 143 will initialize @samp{a} with 1 and @samp{b} with 2.
 144 @code{expand_member_init} places each initialization (a with 1) on the
 145 global list.  Then, when the fndecl is being processed,
 146 @code{emit_base_init} runs down the list, initializing them.  It used to
 147 be the case that g++ first ran down @code{current_member_init_list},
 148 then ran down the list of members initializing the ones that weren't
 149 explicitly initialized.  Things were rewritten to perform the
 150 initializations in order of declaration in the class.  So, for the above
 151 example, @samp{a} and @samp{b} will be initialized in the order that
 152 they were declared:
 153
 154 @example
 155 class foo @{ public: int b; int a; foo (); @};
 156 @end example
 157
 158 @noindent
 159 Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
 160 initialized with 1, regardless of how they're listed in the mem-initializer.
 161
 162 @item The Explicit Keyword
 163
 164 The use of @code{explicit} on a constructor is used by @code{grokdeclarator}
 165 to set the field @code{DECL_NONCONVERTING_P}.  That value is used by
 166 @code{build_method_call} and @code{build_user_type_conversion_1} to decide
 167 if a particular constructor should be used as a candidate for conversions.
 168
 169 @end itemize
 170
 171 @node Glossary, Macros, Implementation Specifics, Top
 172 @section Glossary
 173
 174 @table @r
 175 @item binfo
 176 The main data structure in the compiler used to represent the
 177 inheritance relationships between classes.  The data in the binfo can be
 178 accessed by the BINFO_ accessor macros.
 179
 180 @item vtable
 181 @itemx virtual function table
 182
 183 The virtual function table holds information used in virtual function
 184 dispatching.  In the compiler, they are usually referred to as vtables,
 185 or vtbls.  The first index is not used in the normal way, I believe it
 186 is probably used for the virtual destructor.
 187
 188 @item vfield
 189
 190 vfields can be thought of as the base information needed to build
 191 vtables.  For every vtable that exists for a class, there is a vfield.
 192 See also vtable and virtual function table pointer.  When a type is used
 193 as a base class to another type, the virtual function table for the
 194 derived class can be based upon the vtable for the base class, just
 195 extended to include the additional virtual methods declared in the
 196 derived class.  The virtual function table from a virtual base class is
 197 never reused in a derived class.  @code{is_normal} depends upon this.
 198
 199 @item virtual function table pointer
 200
 201 These are @code{FIELD_DECL}s that are pointer types that point to
 202 vtables.  See also vtable and vfield.
 203 @end table
 204
 205 @node Macros, Typical Behavior, Glossary, Top
 206 @section Macros
 207
 208 This section describes some of the macros used on trees.  The list
 209 should be alphabetical.  Eventually all macros should be documented
 210 here.
 211
 212 @table @code
 213 @item BINFO_BASETYPES
 214 A vector of additional binfos for the types inherited by this basetype.
 215 The binfos are fully unshared (except for virtual bases, in which
 216 case the binfo structure is shared).
 217
 218    If this basetype describes type D as inherited in C,
 219    and if the basetypes of D are E anf F,
 220    then this vector contains binfos for inheritance of E and F by C.
 221
 222 Has values of:
 223
 224         TREE_VECs
 225
 226
 227 @item BINFO_INHERITANCE_CHAIN
 228 Temporarily used to represent specific inheritances.  It usually points
 229 to the binfo associated with the lesser derived type, but it can be
 230 reversed by reverse_path.  For example:
 231
 232 @example
 233         Z ZbY   least derived
 234         |
 235         Y YbX
 236         |
 237         X Xb    most derived
 238
 239 TYPE_BINFO (X) == Xb
 240 BINFO_INHERITANCE_CHAIN (Xb) == YbX
 241 BINFO_INHERITANCE_CHAIN (Yb) == ZbY
 242 BINFO_INHERITANCE_CHAIN (Zb) == 0
 243 @end example
 244
 245 Not sure is the above is really true, get_base_distance has is point
 246 towards the most derived type, opposite from above.
 247
 248 Set by build_vbase_path, recursive_bounded_basetype_p,
 249 get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
 250
 251 What things can this be used on:
 252
 253         TREE_VECs that are binfos
 254
 255
 256 @item BINFO_OFFSET
 257 The offset where this basetype appears in its containing type.
 258 BINFO_OFFSET slot holds the offset (in bytes) from the base of the
 259 complete object to the base of the part of the object that is allocated
 260 on behalf of this `type'.  This is always 0 except when there is
 261 multiple inheritance.
 262
 263 Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
 264
 265
 266 @item BINFO_VIRTUALS
 267 A unique list of functions for the virtual function table.  See also
 268 TYPE_BINFO_VIRTUALS.
 269
 270 What things can this be used on:
 271
 272         TREE_VECs that are binfos
 273
 274
 275 @item BINFO_VTABLE
 276 Used to find the VAR_DECL that is the virtual function table associated
 277 with this binfo.  See also TYPE_BINFO_VTABLE.  To get the virtual
 278 function table pointer, see CLASSTYPE_VFIELD.
 279
 280 What things can this be used on:
 281
 282         TREE_VECs that are binfos
 283
 284 Has values of:
 285
 286         VAR_DECLs that are virtual function tables
 287
 288
 289 @item BLOCK_SUPERCONTEXT
 290 In the outermost scope of each function, it points to the FUNCTION_DECL
 291 node.  It aids in better DWARF support of inline functions.
 292
 293
 294 @item CLASSTYPE_TAGS
 295 CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
 296 class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
 297 these and calls pushtag on them.)
 298
 299 finish_struct scans these to produce TYPE_DECLs to add to the
 300 TYPE_FIELDS of the type.
 301
 302 It is expected that name found in the TREE_PURPOSE slot is unique,
 303 resolve_scope_to_name is one such place that depends upon this
 304 uniqueness.
 305
 306
 307 @item CLASSTYPE_METHOD_VEC
 308 The following is true after finish_struct has been called (on the
 309 class?) but not before.  Before finish_struct is called, things are
 310 different to some extent.  Contains a TREE_VEC of methods of the class.
 311 The TREE_VEC_LENGTH is the number of differently named methods plus one
 312 for the 0th entry.  The 0th entry is always allocated, and reserved for
 313 ctors and dtors.  If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
 314 Each entry of the TREE_VEC is a FUNCTION_DECL.  For each FUNCTION_DECL,
 315 there is a DECL_CHAIN slot.  If the FUNCTION_DECL is the last one with a
 316 given name, the DECL_CHAIN slot is NULL_TREE.  Otherwise it is the next
 317 method that has the same name (but a different signature).  It would
 318 seem that it is not true that because the DECL_CHAIN slot is used in
 319 this way, we cannot call pushdecl to put the method in the global scope
 320 (cause that would overwrite the TREE_CHAIN slot), because they use
 321 different _CHAINs.  finish_struct_methods setups up one version of the
 322 TREE_CHAIN slots on the FUNCTION_DECLs.
 323
 324 friends are kept in TREE_LISTs, so that there's no need to use their
 325 TREE_CHAIN slot for anything.
 326
 327 Has values of:
 328
 329         TREE_VECs
 330
 331
 332 @item CLASSTYPE_VFIELD
 333 Seems to be in the process of being renamed TYPE_VFIELD.  Use on types
 334 to get the main virtual function table pointer.  To get the virtual
 335 function table use BINFO_VTABLE (TYPE_BINFO ()).
 336
 337 Has values of:
 338
 339         FIELD_DECLs that are virtual function table pointers
 340
 341 What things can this be used on:
 342
 343         RECORD_TYPEs
 344
 345
 346 @item DECL_CLASS_CONTEXT
 347 Identifies the context that the _DECL was found in.  For virtual function
 348 tables, it points to the type associated with the virtual function
 349 table.  See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
 350
 351 The difference between this and DECL_CONTEXT, is that for virtuals
 352 functions like:
 353
 354 @example
 355 struct A
 356 @{
 357   virtual int f ();
 358 @};
 359
 360 struct B : A
 361 @{
 362   int f ();
 363 @};
 364
 365 DECL_CONTEXT (A::f) == A
 366 DECL_CLASS_CONTEXT (A::f) == A
 367
 368 DECL_CONTEXT (B::f) == A
 369 DECL_CLASS_CONTEXT (B::f) == B
 370 @end example
 371
 372 Has values of:
 373
 374         RECORD_TYPEs, or UNION_TYPEs
 375
 376 What things can this be used on:
 377
 378         TYPE_DECLs, _DECLs
 379
 380
 381 @item DECL_CONTEXT
 382 Identifies the context that the _DECL was found in.  Can be used on
 383 virtual function tables to find the type associated with the virtual
 384 function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
 385 better access method.  Internally the same as DECL_FIELD_CONTEXT, so
 386 don't us both.  See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
 387 DECL_CLASS_CONTEXT.
 388
 389 Has values of:
 390
 391         RECORD_TYPEs
 392
 393
 394 What things can this be used on:
 395
 396 @display
 397 VAR_DECLs that are virtual function tables
 398 _DECLs
 399 @end display
 400
 401
 402 @item DECL_FIELD_CONTEXT
 403 Identifies the context that the FIELD_DECL was found in.  Internally the
 404 same as DECL_CONTEXT, so don't us both.  See also DECL_CONTEXT,
 405 DECL_FCONTEXT and DECL_CLASS_CONTEXT.
 406
 407 Has values of:
 408
 409         RECORD_TYPEs
 410
 411 What things can this be used on:
 412
 413 @display
 414 FIELD_DECLs that are virtual function pointers
 415 FIELD_DECLs
 416 @end display
 417
 418
 419 @item DECL_NAME
 420
 421 Has values of:
 422
 423 @display
 424 0 for things that don't have names
 425 IDENTIFIER_NODEs for TYPE_DECLs
 426 @end display
 427
 428 @item DECL_IGNORED_P
 429 A bit that can be set to inform the debug information output routines in
 430 the back-end that a certain _DECL node should be totally ignored.
 431
 432 Used in cases where it is known that the debugging information will be
 433 output in another file, or where a sub-type is known not to be needed
 434 because the enclosing type is not needed.
 435
 436 A compiler constructed virtual destructor in derived classes that do not
 437 define an explicit destructor that was defined explicit in a base class
 438 has this bit set as well.  Also used on __FUNCTION__ and
 439 __PRETTY_FUNCTION__ to mark they are ``compiler generated.''  c-decl and
 440 c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
 441 and ``user-invisible variable.''
 442
 443 Functions built by the C++ front-end such as default destructors,
 444 virtual destructors and default constructors want to be marked that
 445 they are compiler generated, but unsure why.
 446
 447 Currently, it is used in an absolute way in the C++ front-end, as an
 448 optimization, to tell the debug information output routines to not
 449 generate debugging information that will be output by another separately
 450 compiled file.
 451
 452
 453 @item DECL_VIRTUAL_P
 454 A flag used on FIELD_DECLs and VAR_DECLs.  (Documentation in tree.h is
 455 wrong.)  Used in VAR_DECLs to indicate that the variable is a vtable.
 456 It is also used in FIELD_DECLs for vtable pointers.
 457
 458 What things can this be used on:
 459
 460         FIELD_DECLs and VAR_DECLs
 461
 462
 463 @item DECL_VPARENT
 464 Used to point to the parent type of the vtable if there is one, else it
 465 is just the type associated with the vtable.  Because of the sharing of
 466 virtual function tables that goes on, this slot is not very useful, and
 467 is in fact, not used in the compiler at all.  It can be removed.
 468
 469 What things can this be used on:
 470
 471         VAR_DECLs that are virtual function tables
 472
 473 Has values of:
 474
 475         RECORD_TYPEs maybe UNION_TYPEs
 476
 477
 478 @item DECL_FCONTEXT
 479 Used to find the first baseclass in which this FIELD_DECL is defined.
 480 See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
 481
 482 How it is used:
 483
 484         Used when writing out debugging information about vfield and
 485         vbase decls.
 486
 487 What things can this be used on:
 488
 489         FIELD_DECLs that are virtual function pointers
 490         FIELD_DECLs
 491
 492
 493 @item DECL_REFERENCE_SLOT
 494 Used to hold the initialize for the reference.
 495
 496 What things can this be used on:
 497
 498         PARM_DECLs and VAR_DECLs that have a reference type
 499
 500
 501 @item DECL_VINDEX
 502 Used for FUNCTION_DECLs in two different ways.  Before the structure
 503 containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
 504 FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
 505 FUNCTION_DECL will replace as a virtual function.  When the class is
 506 laid out, this pointer is changed to an INTEGER_CST node which is
 507 suitable to find an index into the virtual function table.  See
 508 get_vtable_entry as to how one can find the right index into the virtual
 509 function table.  The first index 0, of a virtual function table it not
 510 used in the normal way, so the first real index is 1.
 511
 512 DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
 513 overridden FUNCTION_DECLs.  add_virtual_function has code to deal with
 514 this when it uses the variable base_fndecl_list, but it would seem that
 515 somehow, it is possible for the TREE_LIST to pursist until method_call,
 516 and it should not.
 517
 518
 519 What things can this be used on:
 520
 521         FUNCTION_DECLs
 522
 523
 524 @item DECL_SOURCE_FILE
 525 Identifies what source file a particular declaration was found in.
 526
 527 Has values of:
 528
 529         "<built-in>" on TYPE_DECLs to mean the typedef is built in
 530
 531
 532 @item DECL_SOURCE_LINE
 533 Identifies what source line number in the source file the declaration
 534 was found at.
 535
 536 Has values of:
 537
 538 @display
 539 0 for an undefined label
 540
 541 0 for TYPE_DECLs that are internally generated
 542
 543 0 for FUNCTION_DECLs for functions generated by the compiler
 544         (not yet, but should be)
 545
 546 0 for ``magic'' arguments to functions, that the user has no
 547         control over
 548 @end display
 549
 550
 551 @item TREE_USED
 552
 553 Has values of:
 554
 555         0 for unused labels
 556
 557
 558 @item TREE_ADDRESSABLE
 559 A flag that is set for any type that has a constructor.
 560
 561
 562 @item TREE_COMPLEXITY
 563 They seem a kludge way to track recursion, poping, and pushing.  They only
 564 appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
 565 proper fixing, and removal.
 566
 567
 568 @item TREE_HAS_CONSTRUCTOR
 569 A flag to indicate when a CALL_EXPR represents a call to a constructor.
 570 If set, we know that the type of the object, is the complete type of the
 571 object, and that the value returned is nonnull.  When used in this
 572 fashion, it is an optimization.  Can also be used on SAVE_EXPRs to
 573 indicate when they are of fixed type and nonnull.  Can also be used on
 574 INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor.
 575
 576
 577 @item TREE_PRIVATE
 578 Set for FIELD_DECLs by finish_struct.  But not uniformly set.
 579
 580 The following routines do something with PRIVATE access:
 581 build_method_call, alter_access, finish_struct_methods,
 582 finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
 583 CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
 584 GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
 585
 586
 587 @item TREE_PROTECTED
 588 The following routines do something with PROTECTED access:
 589 build_method_call, alter_access, finish_struct, convert_to_aggr,
 590 CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
 591 compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
 592 dbxout_type_method_1
 593
 594
 595 @item TYPE_BINFO
 596 Used to get the binfo for the type.
 597
 598 Has values of:
 599
 600         TREE_VECs that are binfos
 601
 602 What things can this be used on:
 603
 604         RECORD_TYPEs
 605
 606
 607 @item TYPE_BINFO_BASETYPES
 608 See also BINFO_BASETYPES.
 609
 610 @item TYPE_BINFO_VIRTUALS
 611 A unique list of functions for the virtual function table.  See also
 612 BINFO_VIRTUALS.
 613
 614 What things can this be used on:
 615
 616         RECORD_TYPEs
 617
 618
 619 @item TYPE_BINFO_VTABLE
 620 Points to the virtual function table associated with the given type.
 621 See also BINFO_VTABLE.
 622
 623 What things can this be used on:
 624
 625         RECORD_TYPEs
 626
 627 Has values of:
 628
 629         VAR_DECLs that are virtual function tables
 630
 631
 632 @item TYPE_NAME
 633 Names the type.
 634
 635 Has values of:
 636
 637 @display
 638 0 for things that don't have names.
 639 should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
 640         ENUM_TYPEs.
 641 TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
 642         shouldn't be.
 643 TYPE_DECL for typedefs, unsure why.
 644 @end display
 645
 646 What things can one use this on:
 647
 648 @display
 649 TYPE_DECLs
 650 RECORD_TYPEs
 651 UNION_TYPEs
 652 ENUM_TYPEs
 653 @end display
 654
 655 History:
 656
 657         It currently points to the TYPE_DECL for RECORD_TYPEs,
 658         UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
 659
 660
 661 @item TYPE_METHODS
 662 Synonym for @code{CLASSTYPE_METHOD_VEC}.  Chained together with
 663 @code{TREE_CHAIN}.  @file{dbxout.c} uses this to get at the methods of a
 664 class.
 665
 666
 667 @item TYPE_DECL
 668 Used to represent typedefs, and used to represent bindings layers.
 669
 670 Components:
 671
 672         DECL_NAME is the name of the typedef.  For example, foo would
 673         be found in the DECL_NAME slot when @code{typedef int foo;} is
 674         seen.
 675
 676         DECL_SOURCE_LINE identifies what source line number in the
 677         source file the declaration was found at.  A value of 0
 678         indicates that this TYPE_DECL is just an internal binding layer
 679         marker, and does not correspond to a user supplied typedef.
 680
 681         DECL_SOURCE_FILE
 682
 683 @item TYPE_FIELDS
 684 A linked list (via @code{TREE_CHAIN}) of member types of a class.  The
 685 list can contain @code{TYPE_DECL}s, but there can also be other things
 686 in the list apparently.  See also @code{CLASSTYPE_TAGS}.
 687
 688
 689 @item TYPE_VIRTUAL_P
 690 A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
 691 a virtual function table or a pointer to one.  When used on a
 692 @code{FUNCTION_DECL}, indicates that it is a virtual function.  When
 693 used on an @code{IDENTIFIER_NODE}, indicates that a function with this
 694 same name exists and has been declared virtual.
 695
 696 When used on types, it indicates that the type has virtual functions, or
 697 is derived from one that does.
 698
 699 Not sure if the above about virtual function tables is still true.  See
 700 also info on @code{DECL_VIRTUAL_P}.
 701
 702 What things can this be used on:
 703
 704         FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
 705
 706
 707 @item VF_BASETYPE_VALUE
 708 Get the associated type from the binfo that caused the given vfield to
 709 exist.  This is the least derived class (the most parent class) that
 710 needed a virtual function table.  It is probably the case that all uses
 711 of this field are misguided, but they need to be examined on a
 712 case-by-case basis.  See history for more information on why the
 713 previous statement was made.
 714
 715 Set at @code{finish_base_struct} time.
 716
 717 What things can this be used on:
 718
 719         TREE_LISTs that are vfields
 720
 721 History:
 722
 723         This field was used to determine if a virtual function table's
 724         slot should be filled in with a certain virtual function, by
 725         checking to see if the type returned by VF_BASETYPE_VALUE was a
 726         parent of the context in which the old virtual function existed.
 727         This incorrectly assumes that a given type _could_ not appear as
 728         a parent twice in a given inheritance lattice.  For single
 729         inheritance, this would in fact work, because a type could not
 730         possibly appear more than once in an inheritance lattice, but
 731         with multiple inheritance, a type can appear more than once.
 732
 733
 734 @item VF_BINFO_VALUE
 735 Identifies the binfo that caused this vfield to exist.  If this vfield
 736 is from the first direct base class that has a virtual function table,
 737 then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
 738 direct base where the vfield came from.  Can use @code{TREE_VIA_VIRTUAL}
 739 on result to find out if it is a virtual base class.  Related to the
 740 binfo found by
 741
 742 @example
 743 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 744 @end example
 745
 746 @noindent
 747 where @samp{t} is the type that has the given vfield.
 748
 749 @example
 750 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 751 @end example
 752
 753 @noindent
 754 will return the binfo for the given vfield.
 755
 756 May or may not be set at @code{modify_vtable_entries} time.  Set at
 757 @code{finish_base_struct} time.
 758
 759 What things can this be used on:
 760
 761         TREE_LISTs that are vfields
 762
 763
 764 @item VF_DERIVED_VALUE
 765 Identifies the type of the most derived class of the vfield, excluding
 766 the class this vfield is for.
 767
 768 Set at @code{finish_base_struct} time.
 769
 770 What things can this be used on:
 771
 772         TREE_LISTs that are vfields
 773
 774
 775 @item VF_NORMAL_VALUE
 776 Identifies the type of the most derived class of the vfield, including
 777 the class this vfield is for.
 778
 779 Set at @code{finish_base_struct} time.
 780
 781 What things can this be used on:
 782
 783         TREE_LISTs that are vfields
 784
 785
 786 @item WRITABLE_VTABLES
 787 This is a option that can be defined when building the compiler, that
 788 will cause the compiler to output vtables into the data segment so that
 789 the vtables maybe written.  This is undefined by default, because
 790 normally the vtables should be unwritable.  People that implement object
 791 I/O facilities may, or people that want to change the dynamic type of
 792 objects may want to have the vtables writable.  Another way of achieving
 793 this would be to make a copy of the vtable into writable memory, but the
 794 drawback there is that that method only changes the type for one object.
 795
 796 @end table
 797
 798 @node Typical Behavior, Coding Conventions, Macros, Top
 799 @section Typical Behavior
 800
 801 @cindex parse errors
 802
 803 Whenever seemingly normal code fails with errors like
 804 @code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
 805 returning a NULL_TREE for whatever reason.
 806
 807 @node Coding Conventions, Templates, Typical Behavior, Top
 808 @section Coding Conventions
 809
 810 It should never be that case that trees are modified in-place by the
 811 back-end, @emph{unless} it is guaranteed that the semantics are the same
 812 no matter how shared the tree structure is.  @file{fold-const.c} still
 813 has some cases where this is not true, but rms hypothesizes that this
 814 will never be a problem.
 815
 816 @node Templates, Access Control, Coding Conventions, Top
 817 @section Templates
 818
 819 A template is represented by a @code{TEMPLATE_DECL}.  The specific
 820 fields used are:
 821
 822 @table @code
 823 @item DECL_TEMPLATE_RESULT
 824 The generic decl on which instantiations are based.  This looks just
 825 like any other decl.
 826
 827 @item DECL_TEMPLATE_PARMS
 828 The parameters to this template.
 829 @end table
 830
 831 The generic decl is parsed as much like any other decl as possible,
 832 given the parameterization.  The template decl is not built up until the
 833 generic decl has been completed.  For template classes, a template decl
 834 is generated for each member function and static data member, as well.
 835
 836 Template members of template classes are represented by a TEMPLATE_DECL
 837 for the class' parameters around another TEMPLATE_DECL for the member's
 838 parameters.
 839
 840 All declarations that are instantiations or specializations of templates
 841 refer to their template and parameters through DECL_TEMPLATE_INFO.
 842
 843 How should I handle parsing member functions with the proper param
 844 decls?  Set them up again or try to use the same ones?  Currently we do
 845 the former.  We can probably do this without any extra machinery in
 846 store_pending_inline, by deducing the parameters from the decl in
 847 do_pending_inlines.  PRE_PARSED_TEMPLATE_DECL?
 848
 849 If a base is a parm, we can't check anything about it.  If a base is not
 850 a parm, we need to check it for name binding.  Do finish_base_struct if
 851 no bases are parameterized (only if none, including indirect, are
 852 parms).  Nah, don't bother trying to do any of this until instantiation
 853 -- we only need to do name binding in advance.
 854
 855 Always set up method vec and fields, inc. synthesized methods.  Really?
 856 We can't know the types of the copy folks, or whether we need a
 857 destructor, or can have a default ctor, until we know our bases and
 858 fields.  Otherwise, we can assume and fix ourselves later.  Hopefully.
 859
 860 @node Access Control, Error Reporting, Templates, Top
 861 @section Access Control
 862 The function compute_access returns one of three values:
 863
 864 @table @code
 865 @item access_public
 866 means that the field can be accessed by the current lexical scope.
 867
 868 @item access_protected
 869 means that the field cannot be accessed by the current lexical scope
 870 because it is protected.
 871
 872 @item access_private
 873 means that the field cannot be accessed by the current lexical scope
 874 because it is private.
 875 @end table
 876
 877 DECL_ACCESS is used for access declarations; alter_access creates a list
 878 of types and accesses for a given decl.
 879
 880 Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
 881 codes of compute_access and were used as a cache for compute_access.
 882 Now they are not used at all.
 883
 884 TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
 885 granted by the containing class.  BEWARE: TREE_PUBLIC means something
 886 completely unrelated to access control!
 887
 888 @node Error Reporting, Parser, Access Control, Top
 889 @section Error Reporting
 890
 891 The C++ front-end uses a call-back mechanism to allow functions to print
 892 out reasonable strings for types and functions without putting extra
 893 logic in the functions where errors are found.  The interface is through
 894 the @code{cp_error} function (or @code{cp_warning}, etc.).  The
 895 syntax is exactly like that of @code{error}, except that a few more
 896 conversions are supported:
 897
 898 @itemize @bullet
 899 @item
 900 %C indicates a value of `enum tree_code'.
 901 @item
 902 %D indicates a *_DECL node.
 903 @item
 904 %E indicates a *_EXPR node.
 905 @item
 906 %L indicates a value of `enum languages'.
 907 @item
 908 %P indicates the name of a parameter (i.e. "this", "1", "2", ...)
 909 @item
 910 %T indicates a *_TYPE node.
 911 @item
 912 %O indicates the name of an operator (MODIFY_EXPR -> "operator =").
 913
 914 @end itemize
 915
 916 There is some overlap between these; for instance, any of the node
 917 options can be used for printing an identifier (though only @code{%D}
 918 tries to decipher function names).
 919
 920 For a more verbose message (@code{class foo} as opposed to just @code{foo},
 921 including the return type for functions), use @code{%#c}.
 922 To have the line number on the error message indicate the line of the
 923 DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
 924 use @code{%+D}, or it will default to the first.
 925
 926 @node Parser, Exception Handling, Error Reporting, Top
 927 @section Parser
 928
 929 Some comments on the parser:
 930
 931 The @code{after_type_declarator} / @code{notype_declarator} hack is
 932 necessary in order to allow redeclarations of @code{TYPENAME}s, for
 933 instance
 934
 935 @example
 936 typedef int foo;
 937 class A @{
 938   char *foo;
 939 @};
 940 @end example
 941
 942 In the above, the first @code{foo} is parsed as a @code{notype_declarator},
 943 and the second as a @code{after_type_declarator}.
 944
 945 Ambiguities:
 946
 947 There are currently four reduce/reduce ambiguities in the parser.  They are:
 948
 949 1) Between @code{template_parm} and
 950 @code{named_class_head_sans_basetype}, for the tokens @code{aggr
 951 identifier}.  This situation occurs in code looking like
 952
 953 @example
 954 template <class T> class A @{ @};
 955 @end example
 956
 957 It is ambiguous whether @code{class T} should be parsed as the
 958 declaration of a template type parameter named @code{T} or an unnamed
 959 constant parameter of type @code{class T}.  Section 14.6, paragraph 3 of
 960 the January '94 working paper states that the first interpretation is
 961 the correct one.  This ambiguity results in two reduce/reduce conflicts.
 962
 963 2) Between @code{primary} and @code{type_id} for code like @samp{int()}
 964 in places where both can be accepted, such as the argument to
 965 @code{sizeof}.  Section 8.1 of the pre-San Diego working paper specifies
 966 that these ambiguous constructs will be interpreted as @code{typename}s.
 967 This ambiguity results in six reduce/reduce conflicts between
 968 @samp{absdcl} and @samp{functional_cast}.
 969
 970 3) Between @code{functional_cast} and
 971 @code{complex_direct_notype_declarator}, for various token strings.
 972 This situation occurs in code looking like
 973
 974 @example
 975 int (*a);
 976 @end example
 977
 978 This code is ambiguous; it could be a declaration of the variable
 979 @samp{a} as a pointer to @samp{int}, or it could be a functional cast of
 980 @samp{*a} to @samp{int}.  Section 6.8 specifies that the former
 981 interpretation is correct.  This ambiguity results in 7 reduce/reduce
 982 conflicts.  Another aspect of this ambiguity is code like 'int (x[2]);',
 983 which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
 984 between @samp{direct_notype_declarator} and
 985 @samp{primary}/@samp{overqualified_id}.  Finally, there are 4 r/r
 986 conflicts between @samp{expr_or_declarator} and @samp{primary} over code
 987 like 'int (a);', which could probably be resolved but would also
 988 probably be more trouble than it's worth.  In all, this situation
 989 accounts for 17 conflicts.  Ack!
 990
 991 The second case above is responsible for the failure to parse 'LinppFile
 992 ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
 993 Math.h++) as an object declaration, and must be fixed so that it does
 994 not resolve until later.
 995
 996 4) Indirectly between @code{after_type_declarator} and @code{parm}, for
 997 type names.  This occurs in (as one example) code like
 998
 999 @example
1000 typedef int foo, bar;
1001 class A @{
1002   foo (bar);
1003 @};
1004 @end example
1005
1006 What is @code{bar} inside the class definition?  We currently interpret
1007 it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
1008 @code{after_type_declarator}.  I believe that xlC is correct, in light
1009 of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
1010 could possibly be a type name is taken as the @i{decl-specifier-seq} of
1011 a @i{declaration}."  However, it seems clear that this rule must be
1012 violated in the case of constructors.  This ambiguity accounts for 8
1013 conflicts.
1014
1015 Unlike the others, this ambiguity is not recognized by the Working Paper.
1016
1017 @node  Exception Handling, Free Store, Parser, Top
1018 @section Exception Handling
1019
1020 Note, exception handling in g++ is still under development.
1021
1022 This section describes the mapping of C++ exceptions in the C++
1023 front-end, into the back-end exception handling framework.
1024
1025 The basic mechanism of exception handling in the back-end is
1026 unwind-protect a la elisp.  This is a general, robust, and language
1027 independent representation for exceptions.
1028
1029 The C++ front-end exceptions are mapping into the unwind-protect
1030 semantics by the C++ front-end.  The mapping is describe below.
1031
1032 When -frtti is used, rtti is used to do exception object type checking,
1033 when it isn't used, the encoded name for the type of the object being
1034 thrown is used instead.  All code that originates exceptions, even code
1035 that throws exceptions as a side effect, like dynamic casting, and all
1036 code that catches exceptions must be compiled with either -frtti, or
1037 -fno-rtti.  It is not possible to mix rtti base exception handling
1038 objects with code that doesn't use rtti.  The exceptions to this, are
1039 code that doesn't catch or throw exceptions, catch (...), and code that
1040 just rethrows an exception.
1041
1042 Currently we use the normal mangling used in building functions names
1043 (int's are "i", const char * is PCc) to build the non-rtti base type
1044 descriptors for exception handling.  These descriptors are just plain
1045 NULL terminated strings, and internally they are passed around as char
1046 *.
1047
1048 In C++, all cleanups should be protected by exception regions.  The
1049 region starts just after the reason why the cleanup is created has
1050 ended.  For example, with an automatic variable, that has a constructor,
1051 it would be right after the constructor is run.  The region ends just
1052 before the finalization is expanded.  Since the backend may expand the
1053 cleanup multiple times along different paths, once for normal end of the
1054 region, once for non-local gotos, once for returns, etc, the backend
1055 must take special care to protect the finalization expansion, if the
1056 expansion is for any other reason than normal region end, and it is
1057 `inline' (it is inside the exception region).  The backend can either
1058 choose to move them out of line, or it can created an exception region
1059 over the finalization to protect it, and in the handler associated with
1060 it, it would not run the finalization as it otherwise would have, but
1061 rather just rethrow to the outer handler, careful to skip the normal
1062 handler for the original region.
1063
1064 In Ada, they will use the more runtime intensive approach of having
1065 fewer regions, but at the cost of additional work at run time, to keep a
1066 list of things that need cleanups.  When a variable has finished
1067 construction, they add the cleanup to the list, when the come to the end
1068 of the lifetime of the variable, the run the list down.  If the take a
1069 hit before the section finishes normally, they examine the list for
1070 actions to perform.  I hope they add this logic into the back-end, as it
1071 would be nice to get that alternative approach in C++.
1072
1073 On an rs6000, xlC stores exception objects on that stack, under the try
1074 block.  When is unwinds down into a handler, the frame pointer is
1075 adjusted back to the normal value for the frame in which the handler
1076 resides, and the stack pointer is left unchanged from the time at which
1077 the object was thrown.  This is so that there is always someplace for
1078 the exception object, and nothing can overwrite it, once we start
1079 throwing.  The only bad part, is that the stack remains large.
1080
1081 The below points out some things that work in g++'s exception handling.
1082
1083 All completely constructed temps and local variables are cleaned up in
1084 all unwinded scopes.  Completely constructed parts of partially
1085 constructed objects are cleaned up.  This includes partially built
1086 arrays.  Exception specifications are now handled.  Thrown objects are
1087 now cleaned up all the time.  We can now tell if we have an active
1088 exception being thrown or not (__eh_type != 0).  We use this to call
1089 terminate if someone does a throw; without there being an active
1090 exception object.  uncaught_exception () works.  Exception handling
1091 should work right if you optimize.  Exception handling should work with
1092 -fpic or -fPIC.
1093
1094 The below points out some flaws in g++'s exception handling, as it now
1095 stands.
1096
1097 Only exact type matching or reference matching of throw types works when
1098 -fno-rtti is used.  Only works on a SPARC (like Suns) (both -mflat and
1099 -mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000,
1100 PowerPC, Alpha, mips, VAX, m68k and z8k machines.  SPARC v9 may not
1101 work.  HPPA is mostly done, but throwing between a shared library and
1102 user code doesn't yet work.  Some targets have support for data-driven
1103 unwinding.  Partial support is in for all other machines, but a stack
1104 unwinder called __unwind_function has to be written, and added to
1105 libgcc2 for them.  The new EH code doesn't rely upon the
1106 __unwind_function for C++ code, instead it creates per function
1107 unwinders right inside the function, unfortunately, on many platforms
1108 the definition of RETURN_ADDR_RTX in the tm.h file for the machine port
1109 is wrong.  See below for details on __unwind_function.  RTL_EXPRs for EH
1110 cond variables for && and || exprs should probably be wrapped in
1111 UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
1112
1113 We only do pointer conversions on exception matching a la 15.3 p2 case
1114 3: `A handler with type T, const T, T&, or const T& is a match for a
1115 throw-expression with an object of type E if [3]T is a pointer type and
1116 E is a pointer type that can be converted to T by a standard pointer
1117 conversion (_conv.ptr_) not involving conversions to pointers to private
1118 or protected base classes.' when -frtti is given.
1119
1120 We don't call delete on new expressions that die because the ctor threw
1121 an exception.  See except/18 for a test case.
1122
1123 15.2 para 13: The exception being handled should be rethrown if control
1124 reaches the end of a handler of the function-try-block of a constructor
1125 or destructor, right now, it is not.
1126
1127 15.2 para 12: If a return statement appears in a handler of
1128 function-try-block of a constructor, the program is ill-formed, but this
1129 isn't diagnosed.
1130
1131 15.2 para 11: If the handlers of a function-try-block contain a jump
1132 into the body of a constructor or destructor, the program is ill-formed,
1133 but this isn't diagnosed.
1134
1135 15.2 para 9: Check that the fully constructed base classes and members
1136 of an object are destroyed before entering the handler of a
1137 function-try-block of a constructor or destructor for that object.
1138
1139 build_exception_variant should sort the incoming list, so that it
1140 implements set compares, not exact list equality.  Type smashing should
1141 smash exception specifications using set union.
1142
1143 Thrown objects are usually allocated on the heap, in the usual way.  If
1144 one runs out of heap space, throwing an object will probably never work.
1145 This could be relaxed some by passing an __in_chrg parameter to track
1146 who has control over the exception object.  Thrown objects are not
1147 allocated on the heap when they are pointer to object types.  We should
1148 extend it so that all small (<4*sizeof(void*)) objects are stored
1149 directly, instead of allocated on the heap.
1150
1151 When the backend returns a value, it can create new exception regions
1152 that need protecting.  The new region should rethrow the object in
1153 context of the last associated cleanup that ran to completion.
1154
1155 The structure of the code that is generated for C++ exception handling
1156 code is shown below:
1157
1158 @example
1159 Ln:                                     throw value;
1160         copy value onto heap
1161         jump throw (Ln, id, address of copy of value on heap)
1162
1163                                         try @{
1164 +Lstart:        the start of the main EH region
1165 |...                                            ...
1166 +Lend:          the end of the main EH region
1167                                         @} catch (T o) @{
1168                                                 ...1
1169                                         @}
1170 Lresume:
1171         nop     used to make sure there is something before
1172                 the next region ends, if there is one
1173 ...                                     ...
1174
1175         jump Ldone
1176 [
1177 Lmainhandler:    handler for the region Lstart-Lend
1178         cleanup
1179 ] zero or more, depending upon automatic vars with dtors
1180 +Lpartial:
1181 |        jump Lover
1182 +Lhere:
1183         rethrow (Lhere, same id, same obj);
1184 Lterm:          handler for the region Lpartial-Lhere
1185         call terminate
1186 Lover:
1187 [
1188  [
1189         call throw_type_match
1190         if (eq) @{
1191  ] these lines disappear when there is no catch condition
1192 +Lsregion2:
1193 |       ...1
1194 |       jump Lresume
1195 |Lhandler:      handler for the region Lsregion2-Leregion2
1196 |       rethrow (Lresume, same id, same obj);
1197 +Leregion2
1198         @}
1199 ] there are zero or more of these sections, depending upon how many
1200   catch clauses there are
1201 ----------------------------- expand_end_all_catch --------------------------
1202                 here we have fallen off the end of all catch
1203                 clauses, so we rethrow to outer
1204         rethrow (Lresume, same id, same obj);
1205 ----------------------------- expand_end_all_catch --------------------------
1206 [
1207 L1:     maybe throw routine
1208 ] depending upon if we have expanded it or not
1209 Ldone:
1210         ret
1211
1212 start_all_catch emits labels: Lresume,
1213
1214 @end example
1215
1216 The __unwind_function takes a pointer to the throw handler, and is
1217 expected to pop the stack frame that was built to call it, as well as
1218 the frame underneath and then jump to the throw handler.  It must
1219 restore all registers to their proper values as well as all other
1220 machine state as determined by the context in which we are unwinding
1221 into.  The way I normally start is to compile:
1222
1223         void *g;
1224         foo(void* a) @{ g = a; @}
1225
1226 with -S, and change the thing that alters the PC (return, or ret
1227 usually) to not alter the PC, making sure to leave all other semantics
1228 (like adjusting the stack pointer, or frame pointers) in.  After that,
1229 replicate the prologue once more at the end, again, changing the PC
1230 altering instructions, and finally, at the very end, jump to `g'.
1231
1232 It takes about a week to write this routine, if someone wants to
1233 volunteer to write this routine for any architecture, exception support
1234 for that architecture will be added to g++.  Please send in those code
1235 donations.  One other thing that needs to be done, is to double check
1236 that __builtin_return_address (0) works.
1237
1238 @subsection Specific Targets
1239
1240 For the alpha, the __unwind_function will be something resembling:
1241
1242 @example
1243 void
1244 __unwind_function(void *ptr)
1245 @{
1246   /* First frame */
1247   asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */
1248   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1249
1250   /* Second frame */
1251   asm ("ldq $15, 8($30)"); /* fp */
1252   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1253
1254   /* Return */
1255   asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */
1256 @}
1257 @end example
1258
1259 @noindent
1260 However, there are a few problems preventing it from working.  First of
1261 all, the gcc-internal function @code{__builtin_return_address} needs to
1262 work given an argument of 0 for the alpha.  As it stands as of August
1263 30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c}
1264 will definitely not work on the alpha.  Instead, we need to define
1265 the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe),
1266 @code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new
1267 definition for @code{RETURN_ADDR_RTX}.
1268
1269 In addition (and more importantly), we need a way to reliably find the
1270 frame pointer on the alpha.  The use of the value 8 above to restore the
1271 frame pointer (register 15) is incorrect.  On many systems, the frame
1272 pointer is consistently offset to a specific point on the stack.  On the
1273 alpha, however, the frame pointer is pushed last.  First the return
1274 address is stored, then any other registers are saved (e.g., @code{s0}),
1275 and finally the frame pointer is put in place.  So @code{fp} could have
1276 an offset of 8, but if the calling function saved any registers at all,
1277 they add to the offset.
1278
1279 The only places the frame size is noted are with the @samp{.frame}
1280 directive, for use by the debugger and the OSF exception handling model
1281 (useless to us), and in the initial computation of the new value for
1282 @code{sp}, the stack pointer.  For example, the function may start with:
1283
1284 @example
1285 lda $30,-32($30)
1286 .frame $15,32,$26,0
1287 @end example
1288
1289 @noindent
1290 The 32 above is exactly the value we need.  With this, we can be sure
1291 that the frame pointer is stored 8 bytes less---in this case, at 24(sp)).
1292 The drawback is that there is no way that I (Brendan) have found to let
1293 us discover the size of a previous frame @emph{inside} the definition
1294 of @code{__unwind_function}.
1295
1296 So to accomplish exception handling support on the alpha, we need two
1297 things: first, a way to figure out where the frame pointer was stored,
1298 and second, a functional @code{__builtin_return_address} implementation
1299 for except.c to be able to use it.
1300
1301 Or just support DWARF 2 unwind info.
1302
1303 @subsection New Backend Exception Support
1304
1305 This subsection discusses various aspects of the design of the
1306 data-driven model being implemented for the exception handling backend.
1307
1308 The goal is to generate enough data during the compilation of user code,
1309 such that we can dynamically unwind through functions at run time with a
1310 single routine (@code{__throw}) that lives in libgcc.a, built by the
1311 compiler, and dispatch into associated exception handlers.
1312
1313 This information is generated by the DWARF 2 debugging backend, and
1314 includes all of the information __throw needs to unwind an arbitrary
1315 frame.  It specifies where all of the saved registers and the return
1316 address can be found at any point in the function.
1317
1318 Major disadvantages when enabling exceptions are:
1319
1320 @itemize @bullet
1321 @item
1322 Code that uses caller saved registers, can't, when flow can be
1323 transferred into that code from an exception handler.  In high performance
1324 code this should not usually be true, so the effects should be minimal.
1325
1326 @end itemize
1327
1328 @subsection Backend Exception Support
1329
1330 The backend must be extended to fully support exceptions.  Right now
1331 there are a few hooks into the alpha exception handling backend that
1332 resides in the C++ frontend from that backend that allows exception
1333 handling to work in g++.  An exception region is a segment of generated
1334 code that has a handler associated with it.  The exception regions are
1335 denoted in the generated code as address ranges denoted by a starting PC
1336 value and an ending PC value of the region.  Some of the limitations
1337 with this scheme are:
1338
1339 @itemize @bullet
1340 @item
1341 The backend replicates insns for such things as loop unrolling and
1342 function inlining.  Right now, there are no hooks into the frontend's
1343 exception handling backend to handle the replication of insns.  When
1344 replication happens, a new exception region descriptor needs to be
1345 generated for the new region.
1346
1347 @item
1348 The backend expects to be able to rearrange code, for things like jump
1349 optimization.  Any rearranging of the code needs have exception region
1350 descriptors updated appropriately.
1351
1352 @item
1353 The backend can eliminate dead code.  Any associated exception region
1354 descriptor that refers to fully contained code that has been eliminated
1355 should also be removed, although not doing this is harmless in terms of
1356 semantics.
1357
1358 @end itemize
1359
1360 The above is not meant to be exhaustive, but does include all things I
1361 have thought of so far.  I am sure other limitations exist.
1362
1363 Below are some notes on the migration of the exception handling code
1364 backend from the C++ frontend to the backend.
1365
1366 NOTEs are to be used to denote the start of an exception region, and the
1367 end of the region.  I presume that the interface used to generate these
1368 notes in the backend would be two functions, start_exception_region and
1369 end_exception_region (or something like that).  The frontends are
1370 required to call them in pairs.  When marking the end of a region, an
1371 argument can be passed to indicate the handler for the marked region.
1372 This can be passed in many ways, currently a tree is used.  Another
1373 possibility would be insns for the handler, or a label that denotes a
1374 handler.  I have a feeling insns might be the best way to pass it.
1375 Semantics are, if an exception is thrown inside the region, control is
1376 transferred unconditionally to the handler.  If control passes through
1377 the handler, then the backend is to rethrow the exception, in the
1378 context of the end of the original region.  The handler is protected by
1379 the conventional mechanisms; it is the frontend's responsibility to
1380 protect the handler, if special semantics are required.
1381
1382 This is a very low level view, and it would be nice is the backend
1383 supported a somewhat higher level view in addition to this view.  This
1384 higher level could include source line number, name of the source file,
1385 name of the language that threw the exception and possibly the name of
1386 the exception.  Kenner may want to rope you into doing more than just
1387 the basics required by C++.  You will have to resolve this.  He may want
1388 you to do support for non-local gotos, first scan for exception handler,
1389 if none is found, allow the debugger to be entered, without any cleanups
1390 being done.  To do this, the backend would have to know the difference
1391 between a cleanup-rethrower, and a real handler, if would also have to
1392 have a way to know if a handler `matches' a thrown exception, and this
1393 is frontend specific.
1394
1395 The stack unwinder is one of the hardest parts to do.  It is highly
1396 machine dependent.  The form that kenner seems to like was a couple of
1397 macros, that would do the machine dependent grunt work.  One preexisting
1398 function that might be of some use is __builtin_return_address ().  One
1399 macro he seemed to want was __builtin_return_address, and the other
1400 would do the hard work of fixing up the registers, adjusting the stack
1401 pointer, frame pointer, arg pointer and so on.
1402
1403
1404 @node Free Store, Mangling, Exception Handling, Top
1405 @section Free Store
1406
1407 @code{operator new []} adds a magic cookie to the beginning of arrays
1408 for which the number of elements will be needed by @code{operator delete
1409 []}.  These are arrays of objects with destructors and arrays of objects
1410 that define @code{operator delete []} with the optional size_t argument.
1411 This cookie can be examined from a program as follows:
1412
1413 @example
1414 typedef unsigned long size_t;
1415 extern "C" int printf (const char *, ...);
1416
1417 size_t nelts (void *p)
1418 @{
1419   struct cookie @{
1420     size_t nelts __attribute__ ((aligned (sizeof (double))));
1421   @};
1422
1423   cookie *cp = (cookie *)p;
1424   --cp;
1425
1426   return cp->nelts;
1427 @}
1428
1429 struct A @{
1430   ~A() @{ @}
1431 @};
1432
1433 main()
1434 @{
1435   A *ap = new A[3];
1436   printf ("%ld\n", nelts (ap));
1437 @}
1438 @end example
1439
1440 @section Linkage
1441 The linkage code in g++ is horribly twisted in order to meet two design goals:
1442
1443 1) Avoid unnecessary emission of inlines and vtables.
1444
1445 2) Support pedantic assemblers like the one in AIX.
1446
1447 To meet the first goal, we defer emission of inlines and vtables until
1448 the end of the translation unit, where we can decide whether or not they
1449 are needed, and how to emit them if they are.
1450
1451 @node Mangling, Concept Index, Free Store, Top
1452 @section Function name mangling for C++ and Java
1453
1454 Both C++ and Java provide overloaded functions and methods,
1455 which are methods with the same types but different parameter lists.
1456 Selecting the correct version is done at compile time.
1457 Though the overloaded functions have the same name in the source code,
1458 they need to be translated into different assembler-level names,
1459 since typical assemblers and linkers cannot handle overloading.
1460 This process of encoding the parameter types with the method name
1461 into a unique name is called @dfn{name mangling}.  The inverse
1462 process is called @dfn{demangling}.
1463
1464 It is convenient that C++ and Java use compatible mangling schemes,
1465 since the makes life easier for tools such as gdb, and it eases
1466 integration between C++ and Java.
1467
1468 Note there is also a standard "Jave Native Interface" (JNI) which
1469 implements a different calling convention, and uses a different
1470 mangling scheme.  The JNI is a rather abstract ABI so Java can call methods
1471 written in C or C++;
1472 we are concerned here about a lower-level interface primarily
1473 intended for methods written in Java, but that can also be used for C++
1474 (and less easily C).
1475
1476 Note that on systems that follow BSD tradition, a C identifier @code{var}
1477 would get "mangled" into the assembler name @samp{_var}.  On such
1478 systems, all other mangled names are also prefixed by a @samp{_}
1479 which is not shown in the following examples.
1480
1481 @subsection Method name mangling
1482
1483 C++ mangles a method by emitting the function name, followed by @code{__},
1484 followed by encodings of any method qualifiers (such as @code{const}),
1485 followed by the mangling of the method's class,
1486 followed by the mangling of the parameters, in order.
1487
1488 For example @code{Foo::bar(int, long) const} is mangled
1489 as @samp{bar__C3Fooil}.
1490
1491 For a constructor, the method name is left out.
1492 That is @code{Foo::Foo(int, long) const}  is mangled
1493 as @samp{__C3Fooil}.
1494
1495 GNU Java does the same.
1496
1497 @subsection Primitive types
1498
1499 The C++ types @code{int}, @code{long}, @code{short}, @code{char},
1500 and @code{long long} are mangled as @samp{i}, @samp{l},
1501 @samp{s}, @samp{c}, and @samp{x}, respectively.
1502 The corresponding unsigned types have @samp{U} prefixed
1503 to the mangling.  The type @code{signed char} is mangled @samp{Sc}.
1504
1505 The C++ and Java floating-point types @code{float} and @code{double}
1506 are mangled as @samp{f} and @samp{d} respectively.
1507
1508 The C++ @code{bool} type and the Java @code{boolean} type are
1509 mangled as @samp{b}.
1510
1511 The C++ @code{wchar_t} and the Java @code{char} types are
1512 mangled as @samp{w}.
1513
1514 The Java integral types @code{byte}, @code{short}, @code{int}
1515 and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i},
1516 and @samp{x}, respectively.
1517
1518 C++ code that has included @code{javatypes.h} will mangle
1519 the typedefs  @code{jbyte}, @code{jshort}, @code{jint}
1520 and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i},
1521 and @samp{x}.  (This has not been implemented yet.)
1522
1523 @subsection Mangling of simple names
1524
1525 A simple class, package, template, or namespace name is
1526 encoded as the number of characters in the name, followed by
1527 the actual characters.  Thus the class @code{Foo}
1528 is encoded as @samp{3Foo}.
1529
1530 If any of the characters in the name are not alphanumeric
1531 (i.e not one of the standard ASCII letters, digits, or '_'),
1532 or the initial character is a digit, then the name is
1533 mangled as a sequence of encoded Unicode letters.
1534 A Unicode encoding starts with a @samp{U} to indicate
1535 that Unicode escapes are used, followed by the number of
1536 bytes used by the Unicode encoding, followed by the bytes
1537 representing the encoding.  ASSCI letters and
1538 non-initial digits are encoded without change.  However, all
1539 other characters (including underscore and initial digits) are
1540 translated into a sequence starting with an underscore,
1541 followed by the big-endian 4-hex-digit lower-case encoding of the character.
1542
1543 If a method name contains Unicode-escaped characters, the
1544 entire mangled method name is followed by a @samp{U}.
1545
1546 For example, the method @code{X\u0319::M\u002B(int)} is encoded as
1547 @samp{M_002b__U6X_0319iU}.
1548
1549
1550 @subsection Pointer and reference types
1551
1552 A C++ pointer type is mangled as @samp{P} followed by the
1553 mangling of the type pointed to.
1554
1555 A C++ reference type as mangled as @samp{R} followed by the
1556 mangling of the type referenced.
1557
1558 A Java object reference type is equivalent
1559 to a C++ pointer parameter, so we mangle such an parameter type
1560 as @samp{P} followed by the mangling of the class name.
1561
1562 @subsection Squangled type compression
1563
1564 Squangling (enabled with the @samp{-fsquangle} option), utilizes the
1565 @samp{B} code to indicate reuse of a previously seen type within an
1566 indentifier. Types are recognized in a left to right manner and given
1567 increasing values, which are appended to the code in the standard
1568 manner. Ie, multiple digit numbers are delimited by @samp{_}
1569 characters. A type is considered to be any non primitive type,
1570 regardless of whether its a parameter, template parameter, or entire
1571 template. Certain codes are considered modifiers of a type, and are not
1572 included as part of the type. These are the @samp{C}, @samp{V},
1573 @samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting
1574 constant, volatile, pointer, array, reference, unsigned, and restrict.
1575 These codes may precede a @samp{B} type in order to make the required
1576 modifications to the type.
1577
1578 For example:
1579 @example
1580 template <class T> class class1 @{ @};
1581
1582 template <class T> class class2 @{ @};
1583
1584 class class3 @{ @};
1585
1586 int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @}
1587
1588     B0 -> class2<class1<class3>
1589     B1 -> class1<class3>
1590     B2 -> class3
1591 @end example
1592 Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}.
1593 The int parameter is a basic type, and does not receive a B encoding...
1594
1595 @subsection Qualified names
1596
1597 Both C++ and Java allow a class to be lexically nested inside another
1598 class.  C++ also supports namespaces.
1599 Java also supports packages.
1600
1601 These are all mangled the same way:  First the letter @samp{Q}
1602 indicates that we are emitting a qualified name.
1603 That is followed by the number of parts in the qualified name.
1604 If that number is 9 or less, it is emitted with no delimiters.
1605 Otherwise, an underscore is written before and after the count.
1606 Then follows each part of the qualified name, as described above.
1607
1608 For example @code{Foo::\u0319::Bar} is encoded as
1609 @samp{Q33FooU5_03193Bar}.
1610
1611 Squangling utilizes the the letter @samp{K} to indicate a
1612 remembered portion of a qualified name. As qualified names are processed
1613 for an identifier, the names are numbered and remembered in a
1614 manner similar to the @samp{B} type compression code.
1615 Names are recognized left to right, and given increasing values, which are
1616 appended to the code in the standard manner. ie, multiple digit numbers
1617 are delimited by @samp{_} characters.
1618
1619 For example
1620 @example
1621 class Andrew
1622 @{
1623   class WasHere
1624   @{
1625       class AndHereToo
1626       @{
1627       @};
1628   @};
1629 @};
1630
1631 f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @}
1632
1633    K0 ->  Andrew
1634    K1 ->  Andrew::WasHere
1635    K2 ->  Andrew::WasHere::AndHereToo
1636 @end example
1637 Function @samp{f()} would be mangled as :
1638 @samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo}
1639
1640 There are some occasions when either a @samp{B} or @samp{K} code could
1641 be chosen, preference is always given to the @samp{B} code. Ie, the example
1642 in the section on @samp{B} mangling could have used a @samp{K} code
1643 instead of @samp{B2}.
1644
1645 @subsection Templates
1646
1647 A class template instantiation is encoded as the letter @samp{t},
1648 followed by the encoding of the template name, followed
1649 the number of template parameters, followed by encoding of the template
1650 parameters.  If a template parameter is a type, it is written
1651 as a @samp{Z} followed by the encoding of the type.  If it is a
1652 template, it is encoded as @samp{z} followed by the parameter
1653 of the template template parameter and the template name.
1654
1655 A function template specialization (either an instantiation or an
1656 explicit specialization) is encoded by an @samp{H} followed by the
1657 encoding of the template parameters, as described above, followed by an
1658 @samp{_}, the encoding of the argument types to the template function
1659 (not the specialization), another @samp{_}, and the return type.  (Like
1660 the argument types, the return type is the return type of the function
1661 template, not the specialization.)  Template parameters in the argument
1662 and return types are encoded by an @samp{X} for type parameters,
1663 @samp{zX} for template parameters,
1664 or a @samp{Y} for constant parameters, an index indicating their position
1665 in the template parameter list declaration, and their template depth.
1666
1667 @subsection Arrays
1668
1669 C++ array types are mangled by emitting @samp{A}, followed by
1670 the length of the array, followed by an @samp{_}, followed by
1671 the mangling of the element type.  Of course, normally
1672 array parameter types decay into a pointer types, so you
1673 don't see this.
1674
1675 Java arrays are objects.  A Java type @code{T[]} is mangled
1676 as if it were the C++ type @code{JArray<T>}.
1677 For example @code{java.lang.String[]} is encoded as
1678 @samp{Pt6JArray1ZPQ34java4lang6String}.
1679
1680 @subsection Static fields
1681
1682 Both C++ and Java classes can have static fields.
1683 These are allocated statically, and are shared among all instances.
1684
1685 The mangling starts with a prefix (@samp{_} in most systems), which is
1686 followed by the mangling
1687 of the class name, followed by the "joiner" and finally the field name.
1688 The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special
1689 separator character.  For historical reasons (and idiosyncracies
1690 of assembler syntax) it can @samp{$} or @samp{.} (or even
1691 @samp{_} on a few systems).  If the joiner is @samp{_} then the prefix
1692 is @samp{__static_} instead of just @samp{_}.
1693
1694 For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax)
1695 would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var}
1696 (or rarely @samp{__static_Q23Foo3Bar_var}).
1697
1698 If the name of a static variable needs Unicode escapes,
1699 the Unicode indicator @samp{U} comes before the "joiner".
1700 This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}.
1701
1702 @subsection Table of demangling code characters
1703
1704 The following special characters are used in mangling:
1705
1706 @table @samp
1707 @item A
1708 Indicates a C++ array type.
1709
1710 @item b
1711 Encodes the C++ @code{bool} type,
1712 and the Java @code{boolean} type.
1713
1714 @item B
1715 Used for squangling. Similar in concept to the 'T' non-squangled code.
1716
1717 @item c
1718 Encodes the C++ @code{char} type, and the Java @code{byte} type.
1719
1720 @item C
1721 A modifier to indicate a @code{const} type.
1722 Also used to indicate a @code{const} member function
1723 (in which cases it precedes the encoding of the method's class).
1724
1725 @item d
1726 Encodes the C++ and Java @code{double} types.
1727
1728 @item e
1729 Indicates extra unknown arguments @code{...}.
1730
1731 @item E
1732 Indicates the opening parenthesis of an expression.
1733
1734 @item f
1735 Encodes the C++ and Java @code{float} types.
1736
1737 @item F
1738 Used to indicate a function type.
1739
1740 @item H
1741 Used to indicate a template function.
1742
1743 @item i
1744 Encodes the C++ and Java @code{int} types.
1745
1746 @item I
1747 Encodes typedef names of the form @code{int@var{n}_t}, where @var{n} is a
1748 positive decimal number.  The @samp{I} is followed by either two
1749 hexidecimal digits, which encode the value of @var{n}, or by an
1750 arbitrary number of hexidecimal digits between underscores.  For
1751 example, @samp{I40} encodes the type @code{int64_t}, and @samp{I_200_}
1752 encodes the type @code{int512_t}.
1753
1754 @item J
1755 Indicates a complex type.
1756
1757 @item K
1758 Used by squangling to compress qualified names.
1759
1760 @item l
1761 Encodes the C++ @code{long} type.
1762
1763 @item n
1764 Immediate repeated type. Followed by the repeat count.
1765
1766 @item N
1767 Repeated type. Followed by the repeat count of the repeated type,
1768 followed by the type index of the repeated type. Due to a bug in
1769 g++ 2.7.2, this is only generated if index is 0. Superceded by
1770 @samp{n} when squangling.
1771
1772 @item P
1773 Indicates a pointer type.  Followed by the type pointed to.
1774
1775 @item Q
1776 Used to mangle qualified names, which arise from nested classes.
1777 Also used for namespaces.
1778 In Java used to mangle package-qualified names, and inner classes.
1779
1780 @item r
1781 Encodes the GNU C++ @code{long double} type.
1782
1783 @item R
1784 Indicates a reference type.  Followed by the referenced type.
1785
1786 @item s
1787 Encodes the C++ and java @code{short} types.
1788
1789 @item S
1790 A modifier that indicates that the following integer type is signed.
1791 Only used with @code{char}.
1792
1793 Also used as a modifier to indicate a static member function.
1794
1795 @item t
1796 Indicates a template instantiation.
1797
1798 @item T
1799 A back reference to a previously seen type.
1800
1801 @item U
1802 A modifier that indicates that the following integer type is unsigned.
1803 Also used to indicate that the following class or namespace name
1804 is encoded using Unicode-mangling.
1805
1806 @item u
1807 The @code{restrict} type qualifier.
1808
1809 @item v
1810 Encodes the C++ and Java @code{void} types.
1811
1812 @item V
1813 A modifier for a @code{volatile} type or method.
1814
1815 @item w
1816 Encodes the C++ @code{wchar_t} type, and the Java @code{char} types.
1817
1818 @item W
1819 Indicates the closing parenthesis of an expression.
1820
1821 @item x
1822 Encodes the GNU C++ @code{long long} type, and the Java @code{long} type.
1823
1824 @item X
1825 Encodes a template type parameter, when part of a function type.
1826
1827 @item Y
1828 Encodes a template constant parameter, when part of a function type.
1829
1830 @item z
1831 Used for template template parameters.
1832
1833 @item Z
1834 Used for template type parameters.
1835
1836 @end table
1837
1838 The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p}
1839 also seem to be used for obscure purposes ...
1840
1841 @node Concept Index,  , Mangling, Top
1842
1843 @section Concept Index
1844
1845 @printindex cp
1846
1847 @bye