1 Notes on the GNU Implementation of DWARF Debugging Information
2 --------------------------------------------------------------
3 Last Updated: Sun Jul 17 08:17:42 PDT 1994 by rfg@segfault.us.com
4 ------------------------------------------------------------
6 This file describes special and unique aspects of the GNU implementation
7 of the DWARF debugging information language, as provided in the GNU version
10 For general information about the DWARF debugging information language,
11 you should obtain the DWARF version 1 specification document (and perhaps
12 also the DWARF version 2 draft specification document) developed by the
13 UNIX International Programming Languages Special Interest Group. A copy
14 of the DWARF version 1 specification (in PostScript form) may be
15 obtained either from me <rfg@netcom.com> or from the main Data General
16 FTP server. (See below.) The file you are looking at now only describes
17 known deviations from the DWARF version 1 specification, together with
18 those things which are allowed by the DWARF version 1 specification but
19 which are known to cause interoperability problems (e.g. with SVR4 SDB).
21 To obtain a copy of the DWARF Version 1 and/or DWARF Version 2 specification
22 from Data General's FTP server, use the following procedure:
24 ---------------------------------------------------------------------------
25 ftp to machine: "dg-rtp.dg.com" (128.222.1.2).
29 get any of the following file you are interested in:
34 ---------------------------------------------------------------------------
36 The generation of DWARF debugging information by the GNU version 2.x C
37 compiler has now been tested rather extensively for m88k, i386, i860, and
38 Sparc targets. The DWARF output of the GNU C compiler appears to inter-
39 operate well with the standard SVR4 SDB debugger on these kinds of target
40 systems (but of course, there are no guarantees).
42 DWARF generation for the GNU g++ compiler is still not operable. This is
43 due primarily to the many remaining cases where the g++ front end does not
44 conform to the conventions used in the GNU C front end for representing
45 various kinds of declarations in the TREE data structure. It is not clear
46 at this time how these problems will be addressed.
48 Future plans for the dwarfout.c module of the GNU compiler(s) includes the
49 addition of full support for GNU FORTRAN. (This should, in theory, be a
50 lot simpler to add than adding support for g++... but we'll see.)
52 Many features of the DWARF version 2 specification have been adapted to
53 (and used in) the GNU implementation of DWARF (version 1). In most of
54 these cases, a DWARF version 2 approach is used in place of (or in addition
55 to) DWARF version 1 stuff simply because it is apparent that DWARF version
56 1 is not sufficiently expressive to provide the kinds of information which
57 may be necessary to support really robust debugging. In all of these cases
58 however, the use of DWARF version 2 features should not interfere in any
59 way with the interoperability (of GNU compilers) with generally available
60 "classic" (pre version 1) DWARF consumer tools (e.g. SVR4 SDB).
62 The DWARF generation enhancement for the GNU compiler(s) was initially
63 donated to the Free Software Foundation by Network Computing Devices.
64 (Thanks NCD!) Additional development and maintenance of dwarfout.c has
65 been largely supported (i.e. funded) by Intel Corporation. (Thanks Intel!)
67 If you have questions or comments about the DWARF generation feature, please
68 send mail to me <rfg@netcom.com>. I will be happy to investigate any bugs
69 reported and I may even provide fixes (but of course, I can make no promises).
71 The DWARF debugging information produced by GCC may deviate in a few minor
72 (but perhaps significant) respects from the DWARF debugging information
73 currently produced by other C compilers. A serious attempt has been made
74 however to conform to the published specifications, to existing practice,
75 and to generally accepted norms in the GNU implementation of DWARF.
77 ** IMPORTANT NOTE ** ** IMPORTANT NOTE ** ** IMPORTANT NOTE **
79 Under normal circumstances, the DWARF information generated by the GNU
80 compilers (in an assembly language file) is essentially impossible for
81 a human being to read. This fact can make it very difficult to debug
82 certain DWARF-related problems. In order to overcome this difficulty,
83 a feature has been added to dwarfout.c (enabled by the -fverbose-asm
84 option) which causes additional comments to be placed into the assembly
85 language output file, out to the right-hand side of most bits of DWARF
86 material. The comments indicate (far more clearly that the obscure
87 DWARF hex codes do) what is actually being encoded in DWARF. Thus, the
88 -fverbose-asm option can be highly useful for those who must study the
89 DWARF output from the GNU compilers in detail.
93 (Footnote: Within this file, the term `Debugging Information Entry' will
94 be abbreviated as `DIE'.)
97 Release Notes (aka known bugs)
98 -------------------------------
100 In one very obscure case involving dynamically sized arrays, the DWARF
101 "location information" for such an array may make it appear that the
102 array has been totally optimized out of existence, when in fact it
103 *must* actually exist. (This only happens when you are using *both* -g
104 *and* -O.) This is due to aggressive dead store elimination in the
105 compiler, and to the fact that the DECL_RTL expressions associated with
106 variables are not always updated to correctly reflect the effects of
107 GCC's aggressive dead store elimination.
109 -------------------------------
111 When attempting to set a breakpoint at the "start" of a function compiled
112 with -g1, the debugger currently has no way of knowing exactly where the
113 end of the prologue code for the function is. Thus, for most targets,
114 all the debugger can do is to set the breakpoint at the AT_low_pc address
115 for the function. But if you stop there and then try to look at one or
116 more of the formal parameter values, they may not have been "homed" yet,
117 so you may get inaccurate answers (or perhaps even addressing errors).
119 Some people may consider this simply a non-feature, but I consider it a
120 bug, and I hope to provide some GNU-specific attributes (on function
121 DIEs) which will specify the address of the end of the prologue and the
122 address of the beginning of the epilogue in a future release.
124 -------------------------------
126 It is believed at this time that old bugs relating to the AT_bit_offset
127 values for bit-fields have been fixed.
129 There may still be some very obscure bugs relating to the DWARF description
130 of type `long long' bit-fields for target machines (e.g. 80x86 machines)
131 where the alignment of type `long long' data objects is different from
132 (and less than) the size of a type `long long' data object.
134 Please report any problems with the DWARF description of bit-fields as you
135 would any other GCC bug. (Procedures for bug reporting are given in the
136 GNU C compiler manual.)
138 --------------------------------
140 At this time, GCC does not know how to handle the GNU C "nested functions"
141 extension. (See the GCC manual for more info on this extension to ANSI C.)
143 --------------------------------
145 The GNU compilers now represent inline functions (and inlined instances
146 thereof) in exactly the manner described by the current DWARF version 2
147 (draft) specification. The version 1 specification for handling inline
148 functions (and inlined instances) was known to be brain-damaged (by the
149 PLSIG) when the version 1 spec was finalized, but it was simply too late
150 in the cycle to get it removed before the version 1 spec was formally
151 released to the public (by UI).
153 --------------------------------
155 At this time, GCC does not generate the kind of really precise information
156 about the exact declared types of entities with signed integral types which
157 is required by the current DWARF draft specification.
159 Specifically, the current DWARF draft specification seems to require that
160 the type of an non-unsigned integral bit-field member of a struct or union
161 type be represented as either a "signed" type or as a "plain" type,
162 depending upon the exact set of keywords that were used in the
163 type specification for the given bit-field member. It was felt (by the
164 UI/PLSIG) that this distinction between "plain" and "signed" integral types
165 could have some significance (in the case of bit-fields) because ANSI C
166 does not constrain the signedness of a plain bit-field, whereas it does
167 constrain the signedness of an explicitly "signed" bit-field. For this
168 reason, the current DWARF specification calls for compilers to produce
169 type information (for *all* integral typed entities... not just bit-fields)
170 which explicitly indicates the signedness of the relevant type to be
171 "signed" or "plain" or "unsigned".
173 Unfortunately, the GNU DWARF implementation is currently incapable of making
176 --------------------------------
179 Known Interoperability Problems
180 -------------------------------
182 Although the GNU implementation of DWARF conforms (for the most part) with
183 the current UI/PLSIG DWARF version 1 specification (with many compatible
184 version 2 features added in as "vendor specific extensions" just for good
185 measure) there are a few known cases where GCC's DWARF output can cause
186 some confusion for "classic" (pre version 1) DWARF consumers such as the
187 System V Release 4 SDB debugger. These cases are described in this section.
189 --------------------------------
191 The DWARF version 1 specification includes the fundamental type codes
192 FT_ext_prec_float, FT_complex, FT_dbl_prec_complex, and FT_ext_prec_complex.
193 Since GNU C is only a C compiler (and since C doesn't provide any "complex"
194 data types) the only one of these fundamental type codes which GCC ever
195 generates is FT_ext_prec_float. This fundamental type code is generated
196 by GCC for the `long double' data type. Unfortunately, due to an apparent
197 bug in the SVR4 SDB debugger, SDB can become very confused wherever any
198 attempt is made to print a variable, parameter, or field whose type was
199 given in terms of FT_ext_prec_float.
201 (Actually, SVR4 SDB fails to understand *any* of the four fundamental type
202 codes mentioned here. This will fact will cause additional problems when
203 there is a GNU FORTRAN front-end.)
205 --------------------------------
207 In general, it appears that SVR4 SDB is not able to effectively ignore
208 fundamental type codes in the "implementation defined" range. This can
209 cause problems when a program being debugged uses the `long long' data
210 type (or the signed or unsigned varieties thereof) because these types
211 are not defined by ANSI C, and thus, GCC must use its own private fundamental
212 type codes (from the implementation-defined range) to represent these types.
214 --------------------------------
217 General GNU DWARF extensions
218 ----------------------------
220 In the current DWARF version 1 specification, no mechanism is specified by
221 which accurate information about executable code from include files can be
222 properly (and fully) described. (The DWARF version 2 specification *does*
223 specify such a mechanism, but it is about 10 times more complicated than
224 it needs to be so I'm not terribly anxious to try to implement it right
227 In the GNU implementation of DWARF version 1, a fully downward-compatible
228 extension has been implemented which permits the GNU compilers to specify
229 which executable lines come from which files. This extension places
230 additional information (about source file names) in GNU-specific sections
231 (which should be totally ignored by all non-GNU DWARF consumers) so that
232 this extended information can be provided (to GNU DWARF consumers) in a way
233 which is totally transparent (and invisible) to non-GNU DWARF consumers
234 (e.g. the SVR4 SDB debugger). The additional information is placed *only*
235 in specialized GNU-specific sections, where it should never even be seen
236 by non-GNU DWARF consumers.
238 To understand this GNU DWARF extension, imagine that the sequence of entries
239 in the .lines section is broken up into several subsections. Each contiguous
240 sequence of .line entries which relates to a sequence of lines (or statements)
241 from one particular file (either a `base' file or an `include' file) could
242 be called a `line entries chunk' (LEC).
244 For each LEC there is one entry in the .debug_srcinfo section.
246 Each normal entry in the .debug_srcinfo section consists of two 4-byte
247 words of data as follows:
249 (1) The starting address (relative to the entire .line section)
250 of the first .line entry in the relevant LEC.
252 (2) The starting address (relative to the entire .debug_sfnames
253 section) of a NUL terminated string representing the
254 relevant filename. (This filename name be either a
255 relative or an absolute filename, depending upon how the
256 given source file was located during compilation.)
258 Obviously, each .debug_srcinfo entry allows you to find the relevant filename,
259 and it also points you to the first .line entry that was generated as a result
260 of having compiled a given source line from the given source file.
262 Each subsequent .line entry should also be assumed to have been produced
263 as a result of compiling yet more lines from the same file. The end of
264 any given LEC is easily found by looking at the first 4-byte pointer in
265 the *next* .debug_srcinfo entry. That next .debug_srcinfo entry points
266 to a new and different LEC, so the preceding LEC (implicitly) must have
267 ended with the last .line section entry which occurs at the 2 1/2 words
268 just before the address given in the first pointer of the new .debug_srcinfo
271 The following picture may help to clarify this feature. Let's assume that
272 `LE' stands for `.line entry'. Also, assume that `* 'stands for a pointer.
275 .line section .debug_srcinfo section .debug_sfnames section
276 ----------------------------------------------------------------
278 LE <---------------------- *
279 LE * -----------------> "foobar.c" <---
282 LE <---------------------- * |
283 LE * -----------------> "foobar.h" <| |
286 LE <---------------------- * | |
287 LE * -----------------> "inner.h" | |
289 LE <---------------------- * | |
290 LE * ------------------------------- |
295 LE <---------------------- * |
296 LE * -----------------------------------
301 In effect, each entry in the .debug_srcinfo section points to *both* a
302 filename (in the .debug_sfnames section) and to the start of a block of
303 consecutive LEs (in the .line section).
305 Note that just like in the .line section, there are specialized first and
306 last entries in the .debug_srcinfo section for each object file. These
307 special first and last entries for the .debug_srcinfo section are very
308 different from the normal .debug_srcinfo section entries. They provide
309 additional information which may be helpful to a debugger when it is
310 interpreting the data in the .debug_srcinfo, .debug_sfnames, and .line
313 The first entry in the .debug_srcinfo section for each compilation unit
314 consists of five 4-byte words of data. The contents of these five words
315 should be interpreted (by debuggers) as follows:
317 (1) The starting address (relative to the entire .line section)
318 of the .line section for this compilation unit.
320 (2) The starting address (relative to the entire .debug_sfnames
321 section) of the .debug_sfnames section for this compilation
324 (3) The starting address (in the execution virtual address space)
325 of the .text section for this compilation unit.
327 (4) The ending address plus one (in the execution virtual address
328 space) of the .text section for this compilation unit.
330 (5) The date/time (in seconds since midnight 1/1/70) at which the
331 compilation of this compilation unit occurred. This value
332 should be interpreted as an unsigned quantity because gcc
333 might be configured to generate a default value of 0xffffffff
334 in this field (in cases where it is desired to have object
335 files created at different times from identical source files
336 be byte-for-byte identical). By default, these timestamps
337 are *not* generated by dwarfout.c (so that object files
338 compiled at different times will be byte-for-byte identical).
339 If you wish to enable this "timestamp" feature however, you
340 can simply place a #define for the symbol `DWARF_TIMESTAMPS'
341 in your target configuration file and then rebuild the GNU
344 Note that the first string placed into the .debug_sfnames section for each
345 compilation unit is the name of the directory in which compilation occurred.
346 This string ends with a `/' (to help indicate that it is the pathname of a
347 directory). Thus, the second word of each specialized initial .debug_srcinfo
348 entry for each compilation unit may be used as a pointer to the (string)
349 name of the compilation directory, and that string may in turn be used to
350 "absolutize" any relative pathnames which may appear later on in the
351 .debug_sfnames section entries for the same compilation unit.
353 The fifth and last word of each specialized starting entry for a compilation
354 unit in the .debug_srcinfo section may (depending upon your configuration)
355 indicate the date/time of compilation, and this may be used (by a debugger)
356 to determine if any of the source files which contributed code to this
357 compilation unit are newer than the object code for the compilation unit
358 itself. If so, the debugger may wish to print an "out-of-date" warning
359 about the compilation unit.
361 The .debug_srcinfo section associated with each compilation will also have
362 a specialized terminating entry. This terminating .debug_srcinfo section
363 entry will consist of the following two 4-byte words of data:
365 (1) The offset, measured from the start of the .line section to
366 the beginning of the terminating entry for the .line section.
368 (2) A word containing the value 0xffffffff.
370 --------------------------------
372 In the current DWARF version 1 specification, no mechanism is specified by
373 which information about macro definitions and un-definitions may be provided
374 to the DWARF consumer.
376 The DWARF version 2 (draft) specification does specify such a mechanism.
377 That specification was based on the GNU ("vendor specific extension")
378 which provided some support for macro definitions and un-definitions,
379 but the "official" DWARF version 2 (draft) specification mechanism for
380 handling macros and the GNU implementation have diverged somewhat. I
381 plan to update the GNU implementation to conform to the "official"
382 DWARF version 2 (draft) specification as soon as I get time to do that.
384 Note that in the GNU implementation, additional information about macro
385 definitions and un-definitions is *only* provided when the -g3 level of
386 debug-info production is selected. (The default level is -g2 and the
387 plain old -g option is considered to be identical to -g2.)
389 GCC records information about macro definitions and undefinitions primarily
390 in a section called the .debug_macinfo section. Normal entries in the
391 .debug_macinfo section consist of the following three parts:
393 (1) A special "type" byte.
395 (2) A 3-byte line-number/filename-offset field.
397 (3) A NUL terminated string.
399 The interpretation of the second and third parts is dependent upon the
400 value of the leading (type) byte.
402 The type byte may have one of four values depending upon the type of the
403 .debug_macinfo entry which follows. The 1-byte MACINFO type codes presently
404 used, and their meanings are as follows:
406 MACINFO_start A base file or an include file starts here.
407 MACINFO_resume The current base or include file ends here.
408 MACINFO_define A #define directive occurs here.
409 MACINFO_undef A #undef directive occur here.
411 (Note that the MACINFO_... codes mentioned here are simply symbolic names
412 for constants which are defined in the GNU dwarf.h file.)
414 For MACINFO_define and MACINFO_undef entries, the second (3-byte) field
415 contains the number of the source line (relative to the start of the current
416 base source file or the current include files) when the #define or #undef
417 directive appears. For a MACINFO_define entry, the following string field
418 contains the name of the macro which is defined, followed by its definition.
419 Note that the definition is always separated from the name of the macro
420 by at least one whitespace character. For a MACINFO_undef entry, the
421 string which follows the 3-byte line number field contains just the name
422 of the macro which is being undef'ed.
424 For a MACINFO_start entry, the 3-byte field following the type byte contains
425 the offset, relative to the start of the .debug_sfnames section for the
426 current compilation unit, of a string which names the new source file which
427 is beginning its inclusion at this point. Following that 3-byte field,
428 each MACINFO_start entry always contains a zero length NUL terminated
431 For a MACINFO_resume entry, the 3-byte field following the type byte contains
432 the line number WITHIN THE INCLUDING FILE at which the inclusion of the
433 current file (whose inclusion ends here) was initiated. Following that
434 3-byte field, each MACINFO_resume entry always contains a zero length NUL
437 Each set of .debug_macinfo entries for each compilation unit is terminated
438 by a special .debug_macinfo entry consisting of a 4-byte zero value followed
439 by a single NUL byte.
441 --------------------------------
443 In the current DWARF draft specification, no provision is made for providing
444 a separate level of (limited) debugging information necessary to support
445 tracebacks (only) through fully-debugged code (e.g. code in system libraries).
447 A proposal to define such a level was submitted (by me) to the UI/PLSIG.
448 This proposal was rejected by the UI/PLSIG for inclusion into the DWARF
449 version 1 specification for two reasons. First, it was felt (by the PLSIG)
450 that the issues involved in supporting a "traceback only" subset of DWARF
451 were not well understood. Second, and perhaps more importantly, the PLSIG
452 is already having enough trouble agreeing on what it means to be "conforming"
453 to the DWARF specification, and it was felt that trying to specify multiple
454 different *levels* of conformance would only complicate our discussions of
455 this already divisive issue. Nonetheless, the GNU implementation of DWARF
456 provides an abbreviated "traceback only" level of debug-info production for
457 use with fully-debugged "system library" code. This level should only be
458 used for fully debugged system library code, and even then, it should only
459 be used where there is a very strong need to conserve disk space. This
460 abbreviated level of debug-info production can be used by specifying the
461 -g1 option on the compilation command line.
463 --------------------------------
465 As mentioned above, the GNU implementation of DWARF currently uses the DWARF
466 version 2 (draft) approach for inline functions (and inlined instances
467 thereof). This is used in preference to the version 1 approach because
468 (quite simply) the version 1 approach is highly brain-damaged and probably
471 --------------------------------
474 GNU DWARF Representation of GNU C Extensions to ANSI C
475 ------------------------------------------------------
477 The file dwarfout.c has been designed and implemented so as to provide
478 some reasonable DWARF representation for each and every declarative
479 construct which is accepted by the GNU C compiler. Since the GNU C
480 compiler accepts a superset of ANSI C, this means that there are some
481 cases in which the DWARF information produced by GCC must take some
482 liberties in improvising DWARF representations for declarations which
483 are only valid in (extended) GNU C.
485 In particular, GNU C provides at least three significant extensions to
486 ANSI C when it comes to declarations. These are (1) inline functions,
487 and (2) dynamic arrays, and (3) incomplete enum types. (See the GCC
488 manual for more information on these GNU extensions to ANSI C.) When
489 used, these GNU C extensions are represented (in the generated DWARF
490 output of GCC) in the most natural and intuitively obvious ways.
492 In the case of inline functions, the DWARF representation is exactly as
493 called for in the DWARF version 2 (draft) specification for an identical
494 function written in C++; i.e. we "reuse" the representation of inline
495 functions which has been defined for C++ to support this GNU C extension.
497 In the case of dynamic arrays, we use the most obvious representational
498 mechanism available; i.e. an array type in which the upper bound of
499 some dimension (usually the first and only dimension) is a variable
500 rather than a constant. (See the DWARF version 1 specification for more
503 In the case of incomplete enum types, such types are represented simply
504 as TAG_enumeration_type DIEs which DO NOT contain either AT_byte_size
505 attributes or AT_element_list attributes.
507 --------------------------------
513 The codes, formats, and other paraphernalia necessary to provide proper
514 support for symbolic debugging for the C++ language are still being worked
515 on by the UI/PLSIG. The vast majority of the additions to DWARF which will
516 be needed to completely support C++ have already been hashed out and agreed
517 upon, but a few small issues (e.g. anonymous unions, access declarations)
518 are still being discussed. Also, we in the PLSIG are still discussing
519 whether or not we need to do anything special for C++ templates. (At this
520 time it is not yet clear whether we even need to do anything special for
523 Unfortunately, as mentioned above, there are quite a few problems in the
524 g++ front end itself, and these are currently responsible for severely
525 restricting the progress which can be made on adding DWARF support
526 specifically for the g++ front-end. Furthermore, Richard Stallman has
527 expressed the view that C++ friendships might not be important enough to
528 describe (in DWARF). This view directly conflicts with both the DWARF
529 version 1 and version 2 (draft) specifications, so until this small
530 misunderstanding is cleared up, DWARF support for g++ is unlikely.
532 With regard to FORTRAN, the UI/PLSIG has defined what is believed to be a
533 complete and sufficient set of codes and rules for adequately representing
534 all of FORTRAN 77, and most of Fortran 90 in DWARF. While some support for
535 this has been implemented in dwarfout.c, further implementation and testing
536 will have to await the arrival of the GNU Fortran front-end (which is
537 currently in early alpha test as of this writing).
539 GNU DWARF support for other languages (i.e. Pascal and Modula) is a moot
540 issue until there are GNU front-ends for these other languages.
542 GNU DWARF support for DWARF version 2 will probably not be attempted until
543 such time as the version 2 specification is finalized. (More work needs
544 to be done on the version 2 specification to make the new "abbreviations"
545 feature of version 2 more easily implementable. Until then, it will be
546 a royal pain the ass to implement version 2 "abbreviations".) For the
547 time being, version 2 features will be added (in a version 1 compatible
548 manner) when and where these features seem necessary or extremely desirable.
550 As currently defined, DWARF only describes a (binary) language which can
551 be used to communicate symbolic debugging information from a compiler
552 through an assembler and a linker, to a debugger. There is no clear
553 specification of what processing should be (or must be) done by the
554 assembler and/or the linker. Fortunately, the role of the assembler
555 is easily inferred (by anyone knowledgeable about assemblers) just by
556 looking at examples of assembly-level DWARF code. Sadly though, the
557 allowable (or required) processing steps performed by a linker are
558 harder to infer and (perhaps) even harder to agree upon. There are
559 several forms of very useful `post-processing' steps which intelligent
560 linkers *could* (in theory) perform on object files containing DWARF,
561 but any and all such link-time transformations are currently both disallowed
564 In particular, possible link-time transformations of DWARF code which could
565 provide significant benefits include (but are not limited to):
567 Commonization of duplicate DIEs obtained from multiple input
570 Cross-compilation type checking based upon DWARF type information
571 for objects and functions.
573 Other possible `compacting' transformations designed to save disk
574 space and to reduce linker & debugger I/O activity.