coregrind/m_gdbserver/README_DEVELOPERS

   1 This file contains various notes/ideas/history/... related
   2 to gdbserver in valgrind.
   3
   4 How to use Valgrind gdbserver ?
   5 -------------------------------
   6 This is described in the Valgrind user manual.
   7 Before reading the below, you better read the user manual first.
   8
   9 What is gdbserver ?
  10 -------------------
  11 gdb debugger typically is used to debug a process running
  12 on the same machine : gdb uses system calls (such as ptrace)
  13 to fetch data from the process being debugged
  14 or to change data in the process
  15 or interrupt the process
  16 or ...
  17
  18 gdb can also debug processes running in a different computer
  19 (e.g. it can debug a process running on a small real time
  20 board).
  21
  22 gdb does this by sending some commands (e.g. using tcp/ip) to a piece
  23 of code running on the remote computer. This piece of code (called a
  24 gdb stub in small boards, or gdbserver when the remote computer runs
  25 an OS such as GNU/linux) will provide a set of commands allowing gdb
  26 to remotely debug the process.  Examples of commands are: "get the
  27 registers", "get the list of running threads", "read xxx bytes at
  28 address yyyyyyyy", etc.  The definition of all these commands and the
  29 associated replies is the gdb remote serial protocol, which is
  30 documented in Appendix D of gdb user manual.
  31
  32 The standard gdb distribution has a standalone gdbserver (a small
  33 executable) which implements this protocol and the needed system calls
  34 to allow gdb to remotely debug process running on a linux or MacOS or
  35 Solaris...
  36
  37 Activation of gdbserver code inside valgrind
  38 --------------------------------------------
  39 The gdbserver code (from gdb 6.6, GPL2+) has been modified so as to
  40 link it with valgrind and allow the valgrind guest process to be
  41 debugged by a gdb speaking to this gdbserver embedded in valgrind.
  42 The ptrace system calls inside gdbserver have been replaced by reading
  43 the state of the guest.
  44
  45 The gdbserver functionality is activated with valgrind command line
  46 options. If gdbserver is not enabled, then the impact on valgrind
  47 runtime is minimal: basically it just checks at startup the command
  48 line option to see that there is nothing to do for what concerns gdb
  49 server: there is a "if gdbserver is active" check in the translate
  50 function of translate.c and an "if" in the valgrind scheduler.
  51 If the valgrind gdbserver is activated (--vgdb=yes), the impact
  52 is minimal (from time to time, the valgrind scheduler checks a counter
  53 in memory). Option --vgdb-poll=yyyyy controls how often the scheduler
  54 will do a (somewhat) more heavy check to see if gdbserver needs to
  55 stop execution of the guest to allow debugging.
  56 If valgrind gdbserver is activated with --vgdb=full, then
  57 each instruction is instrumented with an additional call to a dirty
  58 helper.
  59
  60 How does gdbserver code interacts with valgrind ?
  61 -------------------------------------------------
  62 When an error is reported, the gdbserver code is called.  It reads
  63 commands from gdb using read system call on a FIFO (e.g. a command
  64 such as "get the registers").  It executes the command (e.g. fetches
  65 the registers from the guest state) and writes the reply (e.g. a
  66 packet containing the register data).  When gdb instructs gdbserver to
  67 "continue", the control is returned to valgrind, which then continues
  68 to execute guest code.  The FIFOs used to communication between
  69 valgrind and gdb are created at startup if gdbserver is activated
  70 according to the --vgdb=no/yes/full command line option.
  71
  72 How are signals "handled" ?
  73 ---------------------------
  74 When a signal is to be given to the guest, valgrind core first calls
  75 gdbserver (if a gdb is currently connected to valgrind, otherwise the
  76 signal is delivered immediately). If gdb instructs to give the signal
  77 to the process, the signal is delivered to the guest.  Otherwise, the
  78 signal is ignored (not given to the guest). The user can
  79 with gdb further decide to pass (or not pass) the signal.
  80 Note that some (fatal) signals cannot be ignored.
  81
  82 How are "break/step/stepi/next/..." implemented ?
  83 -------------------------------------------------
  84 When a break is put by gdb on an instruction, a command is sent to the
  85 gdbserver in valgrind. This causes the basic block of this instruction
  86 to be discarded and then re-instrumented so as to insert calls to a
  87 dirty helper which calls the gdb server code.  When a block is
  88 instrumented for gdbserver, all the "jump targets" of this block are
  89 invalidated, so as to allow step/stepi/next to properly work: these
  90 blocks will themselves automatically be re-instrumented for gdbserver
  91 if they are jumped to.
  92 The valgrind gdbserver remembers which blocks have been instrumented
  93 due to this "lazy 'jump targets' debugging instrumentation" so as to
  94 discard these "debugging translation" when gdb instructs to continue
  95 the execution normally.
  96 The blocks in which an explicit break has been put by the user
  97 are kept instrumented for gdbserver.
  98 (but note that by default, gdb removes all breaks when the
  99 process is stopped, and re-inserts all breaks when the process
 100 is continued). This behaviour can be changed using the gdb
 101 command 'set breakpoint always-inserted'.
 102
 103 How are watchpoints implemented ?
 104 ---------------------------------
 105 Watchpoints implies support from the tool to detect that
 106 a location is read and/or written. Currently, only memcheck
 107 supports this : when a watchpoint is placed, memcheck changes
 108 the addressability bits of the watched memory zone to be unacessible.
 109 Before an access, memcheck then detects an error, but sees this error
 110 is due to a watchpoint and gives the control back to gdb.
 111 Stopping on the exact instruction for a write watchpoint implies
 112 to use --vgdb=full. This is because the error is detected by memcheck
 113 before modifying the value. gdb checks that the value has not changed
 114 and so "does not believe" the information that the write watchpoint
 115 was triggered, and continues the execution. At the next watchpoint
 116 occurrence, gdb sees the value has changed. But the watchpoints are all
 117 reported "off by one". To avoid this, Valgrind gdbserver must
 118 terminate the current instruction before reporting the write watchpoint.
 119 Terminating precisely the current instruction implies to have
 120 instrumented all the instructions of the block for gdbserver even
 121 if there is no break in this block. This is ensured by --vgdb=full.
 122 See m_gdbserver.c Bool VG_(is_watched) where watchpoint handling
 123 is implemented.
 124
 125 How is the Valgrind gdbserver receiving commands/packets from gdb ?
 126 -------------------------------------------------------------------
 127 The embedded gdbserver reads gdb commands on a named pipe having
 128 (by default) the name   /tmp/vgdb-pipe-from-vgdb-to-PID-by-USER-on-HOST
 129 where PID, USER, and HOST will be replaced by the actual pid, the user id,
 130 and the host name, respectively.
 131 The embedded gdbserver will reply to gdb commands on a named pipe
 132 /tmp/vgdb-pipe-to-vgdb-from-PID-by-USER-on-HOST
 133
 134 gdb does not speak directly with gdbserver in valgrind: a relay application
 135 called vgdb is needed between gdb and the valgrind-ified process.
 136 gdb writes commands on the stdin of vgdb. vgdb reads these
 137 commands and writes them on FIFO /tmp/vgdb-pipe-from-vgdb-to-PID-by-USER-on-HOST.
 138 vgdb reads replies on FIFO /tmp/vgdb-pipe-to-vgdb-from-PID-by-USER-on-HOST
 139 and writes them on its stdout.
 140
 141 Note: The solution of named pipes was preferred to tcp ip connections as
 142 it allows a discovery of which valgrind-ified processes are ready to accept
 143 command by looking at files starting with the /tmp/vgdb-pipe- prefix
 144 (changeable by a command line option).
 145 Also, the usual unix protections are protecting
 146 the valgrind process against other users sending commands.
 147 The relay process also takes into account the wake up of the valgrind
 148 process in case all threads are blocked in a system call.
 149 The relay process can also be used in a shell to send commands
 150 without a gdb (this allows to have a standard mechanism to control
 151 valgrind tools from the command line, rather than specialized mechanism
 152 e.g. in callgrind).
 153
 154 How is gdbserver activated if all Valgrind threads are blocked in a syscall ?
 155 -----------------------------------------------------------------------------
 156 vgdb relays characters from gdb to valgrind. The scheduler will from
 157 time to time check if gdbserver has to handle incoming characters.
 158 (the check is efficient i.e. most of the time consists in checking
 159 a counter in (shared) memory).
 160
 161 However, it might be that all the threads in the valgrind process are
 162 blocked in a system call. In such a case, no polling will be done by
 163 the valgrind scheduler (as no activity takes place).  By default, vgdb
 164 will check after 100ms if the characters it has written have been read
 165 by valgrind. If not, vgdb will force the invocation of the gdbserver
 166 code inside the valgrind process.
 167
 168 On Linux, this forced invocation is implemented using the ptrace system call:
 169 using ptrace, vgdb will cause the valgrind process to call the
 170 gdbserver code.
 171
 172 This wake up is *not* done using signals as this would imply to
 173 implement a syscall restart logic in valgrind for all system
 174 calls. When using ptrace as above, the linux kernel is responsible to
 175 restart the system call.
 176
 177 This wakeup is also *not* implemented by having a "system thread"
 178 started by valgrind as this would transform all non-threaded programs
 179 in threaded programs when running under valgrind. Also, such a 'system
 180 thread' for gdbserver was tried by Greg Parker in the early MacOS
 181 port, and was unreliable.
 182
 183 So, the ptrace based solution was chosen instead.
 184
 185 There used to be some bugs in the kernel when using ptrace on
 186 a process blocked in a system call : the symptom is that the system
 187 call fails with an unknown errno 512. This typically happens
 188 with a vgdb in 64bits ptrace-ing a 32 bits process.
 189 A bypass for old kernels has been integrated in vgdb.c (sign extend
 190 register rax).
 191
 192 At least on a fedora core 12 (kernel 2.6.32), syscall restart of read
 193 and select are working ok and red-hat 5.3 (an old kernel), everything
 194 works properly.
 195
 196 Need to investigate if darwin can similarly do syscall
 197 restart with ptrace.
 198
 199 The vgdb argument --max-invoke-ms=xxx allows to control the nr of
 200 milli-seconds after which vgdb will force the invocation of gdbserver
 201 code.  If xxx is 0, this disables the forced invocation.
 202 Also, disabling this ptrace mechanism is necessary in case you are
 203 debugging the valgrind code at the same time as debugging the guest
 204 process using gdbserver.
 205
 206 Do not kill -9 vgdb while it has interrupted the valgrind process,
 207 otherwise the valgrind process will very probably stay stopped or die.
 208
 209 On Solaris, this forced invocation is implemented via agent thread.
 210 The process is first stopped (all the threads at once), and special agent
 211 thread is created which will force gbdserver invocation. After its
 212 work is done, the agent thread is destroyed and process resumed.
 213 Agent thread functionality is a Solaris OS feature, used also by debuggers.
 214 Therefore vgdb-invoker-solaris implementation is really small.
 215
 216 Implementation is based on the gdbserver code from gdb 6.6
 217 ----------------------------------------------------------
 218 The gdbserver implementation is derived from the gdbserver included
 219 in the gdb distribution.
 220 The files originating from gdb are : inferiors.c, regcache.[ch],
 221 regdef.h, remote-utils.c, server.[ch], signals.c, target.[ch], utils.c,
 222 version.c.
 223 valgrind-low-* are inspired from gdb files.
 224
 225 This code had to be changed to integrate properly within valgrind
 226 (e.g. no libc usage).  Some of these changes have been ensured by
 227 using the preprocessor to replace calls by valgrind equivalent,
 228 e.g. #define strcmp(...) VG_(strcmp) (...).
 229
 230 Some "control flow" changes are due to the fact that gdbserver inside
 231 valgrind must return the control to valgrind when the 'debugged'
 232 process has to run, while in a classical gdbserver usage, the
 233 gdbserver process waits for a debugged process to stop on a break or
 234 similar.  This has implied to have some variables to remember the
 235 state of gdbserver before returning to valgrind (search for
 236 resume_packet_needed in server.c) and "goto" the place where gdbserver
 237 expects a stopped process to return control to gdbserver.
 238
 239 How does a tool need to be changed to be "debuggable" ?
 240 -------------------------------------------------------
 241 There is no need to modify a tool to have it "debuggable" via
 242 gdbserver : e.g. reports of errors, break etc will work "out of the
 243 box".  If an interactive usage of tool client requests or similar is
 244 desired for a tool, then simple code can be written for that via a
 245 specific client request VG_USERREQ__GDB_MONITOR_COMMAND code. The tool
 246 function "handle_client_request" must then parse the string received
 247 in argument and call the expected valgrind or tool code.  See
 248 e.g. massif ms_handle_client_request as an example.
 249
 250
 251 Automatic regression tests:
 252 ---------------------------
 253 Automatic Valgrind gdbserver tests are in the directory
 254 $(top_srcdir)/gdbserver_tests.
 255 Read $(top_srcdir)/gdbserver_tests/README_DEVELOPERS for more
 256 info about testing.
 257
 258 How to integrate support for a new architecture xxx?
 259 ----------------------------------------------------
 260 Let's imagine a new architecture hal9000 has to be supported.
 261
 262 Mandatory:
 263 The main thing to do is to make a file valgrind-low-hal9000.c.
 264 Start from an existing file (e.g. valgrind-low-x86.c).
 265 The data structures 'struct reg regs'
 266 and 'const char *expedite_regs' are built from files
 267 in the gdb sources, e.g. for an new arch hal9000
 268    cd gdb/regformats
 269    sh ./regdat.sh reg-hal9000.dat hal9000
 270
 271 From the generated file hal9000, you copy/paste in
 272 valgrind-low-hal9000.c the two needed data structures and change their
 273 name to 'regs' and 'expedite_regs'
 274
 275 Then adapt the set of functions needed to initialize the structure
 276 'static struct valgrind_target_ops low_target'.
 277
 278 Optional but heavily recommended:
 279 To have a proper wake up of a Valgrind process with all threads
 280 blocked in a system call, some architecture specific code
 281 has to be done in vgdb-invoker-*.c.
 282 Typically, for a linux system supporting ptrace, you have to modify
 283 vgdb-invoker-ptrace.c.
 284
 285 For Linux based platforms, all the ptrace calls in vgdb-invoker-ptrace.c
 286 should be ok.
 287 The only thing needed is the code needed to "push a dummy call" on the stack,
 288 i.e. assign the relevant registers in the struct user_regs_struct, and push
 289 values on the stack according to the ABI.
 290
 291 For other platforms (i.e. Macos), more work is needed as the ptrace calls
 292 on Macos are either different and/or incomplete (and so, 'Mach' specific
 293 things are needed e.g. to attach to threads etc).
 294 A courageous Mac aficionado is welcome on this aspect.
 295
 296 For Solaris, only architecture specific functionality in vgdb-invoker-solaris.c
 297 needs to be implemented, similar to Linux above.
 298
 299 Optional:
 300 To let gdb see the Valgrind shadow registers, xml description
 301 files have to be provided + valgrind-low-hal9000.c has
 302 to give the top xml file.
 303 Start from the xml files found in the gdb distribution directory
 304 gdb/features. You need to duplicate and modify these files to provide
 305 shadow1 and shadow2 register sets description.
 306
 307 Modify coregrind/Makefile.am:
 308     add valgrind-low-hal9000.c
 309     If you have target xml description, also add them to GDBSERVER_XML_FILES
 310
 311
 312 TODO and/or additional nice things to have
 313 ------------------------------------------
 314 * many options can be changed on-line without problems.
 315   => would be nice to have a v.option command that would evaluate
 316   its arguments like the  startup options of m_main.c and tool clo processing.
 317
 318 * have a memcheck monitor command
 319   show_dangling_pointers [last_n_recently_released_blocks]
 320   showing which of the n last recently released blocks are still
 321   referenced. These references are (potential) dangling pointers.
 322
 323 * some GDBTD in the code
 324
 325 (GDBTD = GDB To Do = something still to look at and/or a question)
 326
 327 * All architectures and platforms are done.
 328   But there are still some "GDBTD" to convert between gdb registers
 329   and VEX registers :
 330   e.g. some registers in x86 or amd64 that I could not
 331   translate to VEX registers. Someone with a good knowledge
 332   of these architectures might complete this
 333   (see the GDBTD in valgrind-low-*.c)
 334
 335 * Currently, at least on recent linux kernel, vgdb can properly wake
 336   up a valgrind process which is blocked in system calls. Maybe we
 337   need to see till which kernel version the ptrace + syscall restart
 338   is broken, and put the default value of --max-invoke-ms to 0 in this
 339   case.
 340
 341 * more client requests can be programmed in various tools.  Currently,
 342   there are only a few standard valgrind or memcheck client requests
 343   implemented.
 344   v.suppression [generate|add|delete] might be an interesting command:
 345      generate would output a suppression, add/delete would add a suppression
 346      in memory for the last (or selected?) error.
 347   v.break on fn calls/entry/exit + commands associated to it
 348     (such as search leaks)?
 349
 350
 351 * currently jump(s) and inferior call(s) are somewhat dangerous
 352   when called from a block not yet instrumented : instead
 353   of continuing till the next Imark, where there will be a
 354   debugger call that can properly jump at an instruction boundary,
 355   the jump/call will quit the "middle" of an instruction.
 356   We could detect if the current block is instrumented by a trick
 357   like this:
 358      /* Each time helperc_CallDebugger is called, we will store
 359         the address from which is it called and the nr of bbs_done
 360         when called. This allows to detect that gdbserver is called
 361         from a block which is instrumented. */
 362      static HWord CallDebugger_addr;
 363      static ULong CallDebugger_bbs_done;
 364
 365      Bool VG_(gdbserver_current_IP_instrumented) (ThreadId tid)
 366      {
 367         if (VG_(get_IP) (tid) != CallDebugger_addr
 368             || CallDebugger_bbs_done != VG_(bbs_done)())
 369            return False;
 370         return True;
 371      }
 372
 373   Alternatively, we ensure we can re-instrument the current
 374   block for gdbserver while executing it.
 375   Something like:
 376   keep current block till the end of the current instruction, then
 377   go back to scheduler.
 378   Unsure if and how this is do-able.
 379
 380
 381 * ensure that all non static symbols of gdbserver files are #define
 382   xxxxx VG_(xxxxx) ???? Is this really needed ? I have tried to put in
 383   a test program variables and functions with the same name as valgrind
 384   stuff, and everything seems to be ok.
 385   I see that all exported symbols in valgrind have a unique prefix
 386   created with VG_ or MC_ or ...
 387   This is not done for the "gdb gdbserver code", where I have kept
 388   the original names. Is this a problem ? I could not create
 389   a "symbol" collision between the user symbol and the valgrind
 390   core gdbserver symbol.
 391
 392 * currently, gdbserver can only stop/continue the whole process. It
 393   might be interesting to have a fine-grained thread control (vCont
 394   packet) maybe for tools such as helgrind, drd.  This would allow the
 395   user to stop/resume specific threads.  Also, maybe this would solve
 396   the following problem: wait for a breakpoint to be encountered,
 397   switch thread, next. This sometimes causes an internal error in gdb,
 398   probably because gdb believes the current thread will be continued ?
 399
 400 * would be nice to have some more tests.
 401
 402 * better valgrind target support in gdb (see comments of Tom Tromey).
 403
 404
 405 -------- description of how gdb invokes a function in the inferior
 406 to call a function in the inferior (below is for x86):
 407 gdb writes ESP and EBP to have some more stack space
 408 push a return address equal to  0x8048390 <_start>
 409 puts a break                at  0x8048390
 410 put address of the function to call (e.g. hello_world in EIP (0x8048444))
 411 continue
 412 break encountered at 0x8048391 (90 after decrement)
 413   => report stop to gdb
 414   => gdb restores esp/ebp/eip to what it was (eg. 0x804848C)
 415   => gdb "s" => causes the EIP to go to the new EIP (i.e. 0x804848C)
 416      gdbserver tells "resuming from 0x804848c"
 417                      "stop pc is 0x8048491" => informed gdb of this
 418