README-DEVELOPERS

   1 # vim: tw=65
   2
   3 General help and instructions on writing code for Rubinius.
   4
   5
   6 0. Further Reading
   7 ==================
   8 At some point, you should read everything in doc/. It is not
   9 necessary to understand or memorise everything but it will
  10 help with the big picture at least!
  11
  12
  13 1. Files and Directories
  14 ========================
  15 Get to know your way around the place!
  16
  17 * .load_order.txt
  18   Explains the dependencies between files so the VM can load them
  19   in the correct order.
  20
  21 * kernel/
  22   The Ruby half of the implementation. The classes, methods etc.
  23   that make up the Ruby language environment are defined here.
  24   Further divided into..
  25
  26 * kernel/platform.conf
  27   kernel/platform/
  28   Platform-dependent code wrappers that can then be used in other
  29   kernel code. platform.conf is an autogenerated file that defines
  30   various platform-dependent constants, offsets etc.
  31
  32 * kernel/bootstrap/
  33   Minimal set of incomplete core classes that is used to load up
  34   the rest of the system. Any code that requires Rubinius' special
  35   abilities needs to be here too.
  36
  37 * kernel/core/
  38   Complete implementation of the core classes. Builds on and/or
  39   overrides bootstrap/. Theoretically this code should be portable
  40   so all Rubinius-dependent stuff such as primitives goes in
  41   bootstrap/ also.
  42
  43 * runtime/
  44   Contains run-time compiled files for Rubinius. You'll use these
  45   files when running shotgun/rubinius
  46
  47 * runtime/stable/*
  48   Known-good versions of the Ruby libraries that are used by the
  49   compiler to make sure you can recompile in case you break one
  50   of the core classes.
  51
  52 * shotgun/
  53   The C parts. This top-level directory contains most of the build
  54   process configuration as well as the very short main.c.
  55
  56 * shotgun/lib/
  57   All of the C code that implements the VM as well as the extremely
  58   bare-bones versions of some Ruby constructs.
  59
  60 * shotgun/external_libs/
  61   Libraries required by Rubinius, bundled for convenience.
  62
  63 * lib/
  64   All Ruby Stdlib libraries that are verified to work as well as
  65   any Rubinius-specific standard libraries. Of special interest
  66   here are three subdirectories:
  67
  68 * lib/bin/
  69   Some utility programs such as lib/bin/compile.rb which is used
  70   to compile files during the build process.
  71
  72 * lib/ext/
  73   C extensions that use Subtend.
  74
  75 * lib/compiler/
  76   This is the compiler (implemented completely in Ruby.)
  77
  78 * stdlib/
  79   This is the Ruby Stdlib, copied straight from the distribution.
  80   These libraries do not yet work on Rubinius (or have not been
  81   tried.) When a library is verified to work, it is copied to
  82   lib/ instead.
  83
  84 * bin/
  85   Various utility programs like bin/mspec and bin/ci.
  86
  87 * spec/ and test/
  88   These contain the behaviour specification and verification files.
  89   See section 3 for information about specs. The test/ directory is
  90   deprecated but some old test code and benchmarks live there.
  91
  92
  93 Notes: Occasionally working with kernel/ you may seem classes that
  94        are not completely defined or looks strange. Remember that
  95        some classes are set up in the VM and we are basically just
  96        reopening those classes.
  97
  98
  99 2. Working with Kernel classes
 100 ==============================
 101
 102 Any time you make a change here -- or anywhere else for that
 103 matter -- make sure you do a full rebuild to pick up the changes,
 104 then run the related specs, and then run bin/ci to make sure
 105 that also the *unrelated* specs still work (minimal-seeming
 106 changes may have broad consequences.)
 107
 108 There are a few special forms that are used in bootstrap/ as well
 109 as core/ such as @ivar_as_index@ (see 2.2) which maps instance
 110 variable names to internal fields. These impose special restrictions
 111 on their usage so it is best to follow the example of existing
 112 code when dealing with these. Broadly speaking, if something looks
 113 "unrubyish", there is probably a good reason for it so make sure
 114 to ask before doing any "cosmetic" changes -- and to run CI after.
 115
 116 If you modify a kernel class, you need to `rake build` after to
 117 have the changes picked up. With some exceptions, you should not
 118 regenerate the stable files. They will in most cases work just fine
 119 even without the newest code. `rake build:stable` is the command
 120 for that.
 121
 122 If you create a new file in one of the kernel subdirectories, it
 123 will be necessary to regenerate the .load_order.txt file in the
 124 equivalent runtime subdirectory in order to get your class loaded
 125 when Rubinius starts up. Use the rake task build:load_order to
 126 regenerate the .load_order.txt files.
 127
 128 Due to the dependencies inherent in writing the Core in Ruby, there
 129 is one idiom used that may confuse on first sight. Many methods are
 130 called #some_method_cv and the _cv stands for 'core version,' not
 131 one of the other things you thought it might be. The idea is that
 132 a simple version of a given method is used until everything is
 133 safely loaded, at which point it is replaced by the real version.
 134 This happens in WhateverClass.after_loaded (and it is NOT automated.)
 135
 136
 137 2.1 Safe Math Compiler Plugin
 138 -----------------------------
 139
 140 Since the core libraries are built of the same blocks as any other
 141 Ruby code and since Ruby is a dynamic language with open classes and
 142 late binding, it is possible to change fundamental classes like
 143 Fixnum in ways that violate the semantics that other classes depend
 144 on. For example, imagine we did the following:
 145
 146     class Fixnum
 147       def +(other)
 148         (self + other) % 5
 149       end
 150     end
 151
 152 While it is certainly possible to redefine fixed point arithmetic plus
 153 to be modulo 5, doing so will certainly cause some class like Array to
 154 be unable to calculate the correct length when it needs to. The dynamic
 155 nature of Ruby is one of its cherished features but it is also truly a
 156 double-edged sword in some respects.
 157
 158 In Stdlib, the 'mathn' library redefines Fixnum#/ in an unsafe and
 159 incompatible manner. The library aliases Fixnum#/ to Fixnum#quo,
 160 which returns a Float by default.
 161
 162 Because of this there is a special compiler plugin that emits a different
 163 method name when it encounters the #/ method. The compiler emits #divide
 164 instead of #/. The numeric classes Fixnum, Bignum, Float, and Numeric all
 165 define this method.
 166
 167 The `-frbx-safe-math` switch is used during the compilation of the Core
 168 libraries to enable the plugin. During regular 'user code' compilation,
 169 the plugin is not enabled. This enables us to support mathn without
 170 breaking the core libraries or forcing inconvenient practices.
 171
 172
 173 2.2 ivar_as_index
 174 -----------------
 175
 176 As described above, you'll see calls to @ivar_as_index@ kernel code.
 177 This maps the class's numbered fields to ivar names, but ONLY for
 178 that file.
 179
 180 You can NOT access those names using the @name syntax outside of that
 181 file. (Doing so will cause maddeningly odd behavior and errors.)
 182
 183 For instance, if you make a subclass of IO, you can NOT access @descriptor
 184 directly in your subclass. You must go through methods to access it only.
 185 Notably, you can NOT just use the @#attr_*@ methods for this. The methods
 186 must be completely written out so that the instance variable label can
 187 be picked up to be translated.
 188
 189
 190 2.3 Kernel- and user-land
 191 -------------------------
 192
 193 Rubinius is in many ways architected like an operating system, so some
 194 OS world terms may be easiest to describe the two modes that Rubinius
 195 operates under:
 196
 197 'Kernel-land' describes how code in kernel/ is executed. Everything else
 198 is 'user-land.'
 199
 200 Kernel-land has a number of restrictions to keep things sane and simple:
 201
 202 * #public, #private, #protected, #module_function require method names
 203   as arguments. The 0-argument version that allows toggling visibility
 204   in a class or module body is not available.
 205
 206 * Restricted use of executable code in class, module and script (file)
 207   bodies. @SOME_CONSTANT = :foo@ is perfectly fine, of course, but for
 208   example different 'memoizations' or other calculation should not be
 209   present. Code inside methods has no restrictions, broadly speaking,
 210   but keep dependency issues in mind for methods that may get called
 211   during the instantiation of the rest of the kernel code.
 212
 213 * @#after_loaded@ hooks can be used to perform more complex/extended
 214   setup or calculations for kernel classes. The @_cv@ methods mentioned
 215   above, for example, are replaced over the simpler bootstrap versions
 216   in the @#after_loaded@ hooks of the respective classes. @#after_loaded@
 217   is not magic, and will not be automatically called. If adding a new
 218   one, have kernel/loader.rb call it (at this point the system is
 219   fully up.)
 220
 221 * Kernel-land code does not use handle defining methods through
 222   @Module#__add_method__@ nor @MetaClass#attach_method@. It adds
 223   and attaches methods directly in the VM. This is necessary for
 224   bootstrapping.
 225
 226 * Any use of string-based eval in the kernel must go through discussion.
 227
 228
 229 3. Specs (Specifications)
 230 =========================
 231
 232 Probably the first or second thing you hear about Rubinius when
 233 speaking to any of the developers is a mention of The Specs. It
 234 is a crucial part of Rubinius.
 235
 236 Rubinius itself is being developed using the Behaviour-Driven
 237 Design approach (a refinement of Test-Driven Design) where each
 238 aspect of the behaviour of the code is first specified using
 239 the spec format and only then implemented to pass those specs.
 240
 241 In addition to this, we have undertaken the ambitious task of
 242 specifying the entirety of the Ruby language as well as its
 243 Core and Stdlib libraries in this format which both allows us
 244 to ensure our implementation is conformant with the Ruby standard
 245 and, more importantly, to actually *define* that standard since
 246 there currently is no formal specification of Ruby.
 247
 248 The de facto standard of BDD is set by "RSpec":http://rspec.info,
 249 the project conceived to implement the then-new way of coding.
 250 Their website is fairly useful as a tutorial as well, although
 251 the spec syntax (particularly as used in Rubinius) is not very
 252 complex at all.
 253
 254 Currently we actually use a compatible but vastly simpler
 255 implementation specifically developed as a part of Rubinius
 256 called MSpec (for mini-RSpec, as it was originally needed
 257 because the code in RSpec was too complex to be run on our
 258 not-yet-complete Ruby implementation.)
 259
 260 Specs live in the spec/ directory. spec/ruby/ specifies our
 261 current target implementation, Ruby 1.8.6-p111 and it is
 262 further split to various subdirectories such as language/
 263 for language-level constructs such as, for example, the
 264 @if@ statement and core/ for Core library code such as
 265 @Array@.
 266
 267 Parallel to this the top-level spec/ directory itself has the
 268 subdirectories for Rubinius-specific specs: additions and/or
 269 deviations from the standard, Rubinius language constructs
 270 etc. For example, the standard @String@ specs live under the
 271 spec/ruby/1.8/core/string/ directory and if Rubinius implements
 272 an additional method @String#to_morse@, the specs for it can
 273 be found in spec/core/string/. Completely new classes such as
 274 @CompiledMethod@ find their specs here as well.
 275
 276 The way to run the specs is contained in two small programs:
 277 bin/mspec and bin/ci. The former is the "full" version that
 278 allows a wider range of options and the latter is a streamlined
 279 way of running Continuous Integration (CI) testing. CI is a
 280 set of "known-good" specs picked out from the entirety of
 281 them (which is what bin/mspec works with) using an automatic
 282 exclusion mechanism. CI is very important for any Rubinius
 283 developer: before each commit, bin/ci should be run and found
 284 to finish without error. It makes it very easy to ensure that
 285 your change did not break other, seemingly unrelated things
 286 because it exercises all areas of specs. A clean bin/ci run
 287 gives confidence that your code is correct.
 288
 289 For a deeper overview, tutorials, help and other information
 290 about Rubinius' specs, start here:
 291
 292 http://rubinius.lighthouseapp.com/projects/5089/specs-overview
 293
 294
 295 4. Libraries and C: Primitives vs. FFI
 296 ======================================
 297
 298 There are two ways to "drop to C" in Rubinius. Firstly, primitives
 299 are special instructions that are specifically defined in the VM.
 300 In general they are operations that are impossible to do in the
 301 Ruby layer such as opening a file. Primitives should be used to
 302 access the functionality of the VM from inside Ruby.
 303
 304 FFI or Foreign Function Interface, on the other hand, is meant as
 305 a generalised method of accessing system libraries. FFI is able to
 306 automatically generate the bridge code needed to call out to some
 307 library and get the result back into Ruby. FFI functions at runtime
 308 as real machine code generation so that it is not necessary to have
 309 anything compiled beforehand. FFI should be used to access the code
 310 outside of Rubinius, whether it is system libraries or some type of
 311 extension code, for example.
 312
 313 There is also a specific Rubinius extension layer called Subtend.
 314 It emulates the extension interface of Ruby to allow old Ruby
 315 extensions to work with Rubinius.
 316
 317
 318 4.1 Primitives
 319 ==============
 320 Using the above rationale, if you need to implement a primitive:
 321
 322 * Give the primitive a sane name
 323 * Implement the primitive in shotgun/lib/primitives.rb using the
 324   name you chose as the method name.
 325 * Enter the primitive name as a symbol at the BOTTOM of the Array
 326   in shotgun/lib/primitive_names.rb.
 327 * `rake build`
 328
 329 This makes your primitive available in the Ruby layer using the
 330 special form @Ruby.primitive :primitive_name@. Primitives have a
 331 few rules and chief among them is that a primitive must be the
 332 first instruction in the method that it appears in. Partially for
 333 this reason all primitives should reside in a wrapper method in
 334 bootstrap/ (the other part is that core/ should be implementation
 335 independent and primitives are not.)
 336
 337 In addition to this, primitives have another property that may
 338 seem unintuitive: anything that appears below the primitive form
 339 in the wrapper method is executed if the primitive FAILS and only
 340 if it fails. There is no exception handling syntax involved. So
 341 this is a typical pattern:
 342
 343     # kernel/bootstrap/whatever.rb
 344     def self.prim_primitive_name()
 345       Ruby.primitive :primitive_name
 346       raise SomeError, "Whatever I was doing just failed."
 347     end
 348
 349     # kernel/core/whatever.rb
 350     def self.primitive_name()
 351       self.prim_primitive_name
 352       ...
 353     end
 354
 355 To have a primitive fail, the primitive body (in primitives.rb)
 356 should return FALSE; this will cause the code following the
 357 Ruby.primitive line to be run. This provides a fallback so that
 358 the operation can be retried in Ruby.
 359
 360 If a primitive cannot be retried in Ruby or if there is some
 361 additional information that needs to be passed along to create
 362 the exception, it may raise an exception using a couple of macros:
 363
 364 * RAISE(exc_class, msg) will raise an exception of type exc_class
 365   and with a message of msg, e.g.
 366
 367     RAISE("ArgumentError", "Invalid argument");
 368
 369 * RAISE_FROM_ERRNO(msg) will raise an Errno exception with the
 370   specified msg.
 371
 372 If you need to change the signature of a primitive, follow this
 373 procedure:
 374   1. change the signature of the kernel method that calls the
 375      VM primitive
 376   2. change any calls to the kernel method in the kernel/**
 377      code to use the new signature, then recompile
 378   3. run rake build:stable
 379   4. change the actual primitive in the VM and recompile again
 380   5. run bin/ci
 381
 382 4.2 FFI
 383 -------
 384
 385 Module#attach_function allows a C function to be called from Ruby
 386 code using FFI.
 387
 388 Module#attach_function takes the C function name, the ruby module
 389 function to bind it to, the C argument types, and the C return type.
 390 For a list of C argument types, see kernel/platform/ffi.rb.
 391
 392 Currently, FFI does not support C functions with more than 6
 393 arguments.
 394
 395 When the C function will be filling in a String, be sure the Ruby
 396 String is large enough. For the C function rbx_Digest_MD5_Finish,
 397 the digest string is allocated with a 16 character length.  The
 398 string is passed to md5_finish which calls rbx_Digest_MD5_Finish
 399 which fills in the string with the digest.
 400
 401   class Digest::MD5
 402     attach_function nil, 'rbx_Digest_MD5_Finish', :md5_finish,
 403                     [:pointer, :string], :void
 404
 405     def finish
 406       digest = ' ' * 16
 407       self.class.md5_finish @context, digest
 408       digest
 409     end
 410   end
 411
 412 For a complete additional example, see digest/md5.rb.
 413
 414
 415 5. Debugging: debugger, GDB, valgrind
 416 =====================================
 417
 418 With Rubinius, there are two distinct things that may need
 419 debugging (sometimes at the same time.) There is the Ruby
 420 code, for which 'debugger' exists. debugger is a full-speed
 421 debugger, which means that there is no extra compilation or
 422 flags to enable it but at the same time, code normally does
 423 not suffer a performance penalty from the infrastructure.
 424 This is achieved using a combination of bytecode substitution
 425 and Rubinius' Channel IO interface. Multithreaded debugging
 426 is supported (credit for the debugger goes to Adam Gardiner.)
 427
 428 On the C side, the trusty workhorse is the Gnu Debugger or
 429 GDB. In addition there is support built in for Valgrind, a
 430 memory checker/lint/debugger/analyzer hybrid.
 431
 432
 433 5.1 debugger
 434 ------------
 435 The nonchalantly named debugger is specifically the debugger
 436 for Ruby code, although it does also allow examining the VM
 437 as it runs. The easiest way to start it is to insert either
 438 a @breakpoint@ or @debugger@ method call anywhere in your
 439 source code. Upon running this method, the debugger starts
 440 up and awaits your command at the instruction where the
 441 @breakpoint@ or @debugger@ method used to be. For a full
 442 explanation of the debugger, refer to [currently the source
 443 but hopefully docs shortly.] You will see this prompt and
 444 there is a trusty command you can try to get started:
 445
 446     rbx:debug> help
 447
 448
 449 5.2 GDB
 450 -------
 451 To really be able to use GDB, make sure that you build Rubinius
 452 with DEV=1 set. This disables optimisations and adds debugging
 453 symbols.
 454
 455 There are two ways to access GDB for Rubinius. You can simply
 456 run shotgun/rubinius with gdb (use the builtin support so you
 457 do not need to worry about linking etc.):
 458
 459 * Run `shotgun/rubinius --gdb`, place a breakpoint (break main,
 460   for example) and then r(un.)
 461 * Alternatively, you can run and then hit ^C to interrupt.
 462
 463 You can also drop into GDB from Ruby code with @Kernel#yield_gdb@
 464 which uses a rather rude but very effective method of stopping
 465 execution to start up GDB. To continue past the @yield_gdb@,
 466 j(ump) to one line after the line that you have stopped on.
 467
 468 Useful gdb commands and functions (remember, using the p(rint)
 469 command in GDB you can access pretty much any C function in
 470 Rubinius):
 471
 472 * rbt
 473   Prints the backtrace of the Ruby side of things. Use this in
 474   conjunction with gdb's own bt which shows the C backtrace.
 475
 476 * p _inspect(OBJECT)
 477   Useful information about a given Ruby object.
 478
 479
 480 5.3 Valgrind
 481 ------------
 482 Valgrind is a program for debugging, profiling and memory-checking
 483 programs. The invocation is just  `shotgun/rubinius --valgrind`.
 484 See http://valgrind.org for usage information.
 485
 486
 487 === END ===