doc/internals-notes/compact-instance

   1 The following outlines an approach by which the core image size of
   2 a particularly memory-hungry application can be reduced by about 100MB,
   3 with a corresponding reduction in runtime object allocation of course.
   4 This represents 10% of the space consumed by all structure-objects
   5 within pseudo-static space of this core image.
   6 The idea is to squash one word out of all structures
   7 by storing their instance-layout in their header words.
   8
   9 There are (at least) two plausible ways to do this as detailed below.
  10
  11 Alternative A [as proposed on sbcl-devel]
  12 =============
  13 If all layouts are aligned on a 256-byte boundary, then the lowest byte
  14 of a descriptor known to be a layout pointer is arbitrary,
  15 in the same way that the low 3 or 4 bits of a descriptor can be recovered
  16 from the object pointed to.
  17
  18 Suppose than an instance's layout is at memory address #x1000CBAD00,
  19 so that its tagged pointer is #x1000CBAD03.
  20
  21 Existing in-memory representation:
  22   word 0 : #x000000NN59 ; header. #x59 is the widetag, NN is the length
  23   word 1 : #x1000CBAD03 ; pointer to LAYOUT
  24   word 2 : first user-specified slot
  25   word 3 : second user-specified slot
  26   ...
  27
  28 Alternative representation A:
  29   word 0 : #x1000CBAD59 ; header
  30   word 1 : first user-specified slot
  31   word 2 : second user-specified slot
  32   ...
  33
  34 Word 0 is a "strange" interior pointer that has the correct widetag
  35 for an INSTANCE, from which can be obtained a LAYOUT by subtracting #x56.
  36 This approach is slightly inefficient in that %INSTANCE-LENGTH can only
  37 be determined by following the LAYOUT and reading a slot from it.
  38 It is therefore inappropriate for the legacy raw-slot implementation,
  39 in which %INSTANCE-LENGTH plays an important role in slot access.
  40
  41 Alternative B [applicable only to 64-bit architecture]
  42 =============
  43 If all layouts are located in low memory (< 4GB), then the upper 4 bytes
  44 of an instance header can convey the tagged pointer, with room to spare.
  45
  46 Suppose now that the layout has tagged pointer #x20CBAD03.
  47 Alternative representation B:
  48   word 0 : #x20CBAD03zzzzNN59 ; header
  49   word 1 : first user-specified slot
  50   word 2 : second user-specified slot
  51   ...
  52
  53 "zzzzNN" indicate the three bytes that may be used for length.
  54 Two bytes are adequate for all practical purposes.
  55
  56 Alternative B requires no funny pointer arithmetic, only a small
  57 change when reading INSTANCE-LAYOUT to look in the high 4 bytes
  58 (half-Lisp-word) when accessing the zeroth word.
  59 And of course, it needs a segregated dynamic heap in low memory.
  60
  61 While it is possible to implement a range of low addresses managed
  62 by the existing generational GC, for various reasons it is attractive to
  63 implement this in a non-moving GC. For one thing, type checks would be
  64 able to reference a LAYOUT as a 32-bit immediate operand to CMP,
  65 provided that the containing function also keeps the layout object
  66 live by a reference from its code header, which is not a problem.
  67
  68 Compatibility
  69 =============
  70 In either approach, there is a quesion of what to report for %INSTANCE-LENGTH.
  71 Existing code assumes that (%INSTANCE-REF 0) is equivalent to %INSTANCE-LAYOUT.
  72 It is possible to preserve that appearance, so that %INSTANCE-REF 1 returns
  73 what is in "word 1" depicted above, and so on, but this consistency actually
  74 imparts an unacceptable degree of inefficiency to the %INSTANCE-REF vop,
  75 as well as making the meaning of %INSTANCE-LENGTH unusual for Lisp.
  76 It is inefficient because the vop for non-constant index would have to test
  77 for index 0 and access it differently. It is unconventional because it
  78 allows access to one more index than implied by the length.
  79 For example, suppose an %INSTANCE-LENGTH is 3 (4 physical words in the
  80 object). This means that you should only be able to supply indices
  81 of 0, 1, or 2; but in fact the last user-visible slot would be index 3
  82 in the above schemes because word 0 is stashed in a "hidden" place.
  83 The three slots of data are indexed in Lisp as 1, 2, 3 instead of 0, 1, 2.
  84 Hence, the non-adherence to a standard meaning of LENGTH.
  85
  86 If the length in the header instead indicated 4, this would be strange
  87 for GC, and would require change there. Or, it could indicate 3, and Lisp
  88 could add 1 to it upon reading the value.  This is all quite messy
  89 and adds to the already-confusing bit of housekeeping necessary
  90 with regard to this LENGTH field.
  91 [Technically, the interleaved-raw-slot backend feature would be happier
  92 if we never had to adhere to the round-to-odd convention either.
  93 See the ample commentary at the 1 line DD-INSTANCE-LENGTH function]
  94
  95 To best straighten out this conundrum, we shall:
  96 (A) deem that %INSTANCE-LAYOUT be the only abstraction for getting
  97     the layout, and
  98 (B) define a new constant SB-VM:INSTANCE-DATA-START which is the
  99     index of the first user-visible data slot.
 100 The latter constant is either 1 or 0 depending on the presence
 101 of the compact-header feature. Iteration over defined slots is performed
 102 by scanning from INSTANCE-DATA-START to %INSTANCE-LENGTH.
 103 And SB-VM:INSTANCE-SLOTS-OFFSET does not change. This is the constant
 104 that when added to an index as specified by Lisp to the INSTANCE-REF
 105 function, gets the proper physical index into the object.
 106 So with or without the feature, INSTANCE-REF 0 gets physical word
 107 index 1 relative to the object base.
 108
 109 Compactifying STANDARD-OBJECT
 110 =============================
 111 The above shrinks every structure by 1 word, which in reality means that
 112 structures which had one word of waste shrink by 2, and structures with
 113 no wasted cells gain one padding word.
 114 But a STANDARD-OBJECT has exactly 4 physical slots:
 115   <header, layout, slot-vector, hash-code>
 116 suggesting that it would acquire one word of slack, and not shrink.
 117
 118 We can do better than that:
 119 1) suppose the clos-hash were stored in the 0th cell of the slot-vector.
 120 There's nothing about the standard instance protocol that precludes this.
 121 And it is a win: the "primitive" CLOS object would be
 122   <compact-header, slot-vector>
 123 and the slot-vector would in turn either gain 2 physical slots,
 124 or gain no physical slots depending on whether it had a padding word.
 125 So once again we have each object netting either -2 cells or -0 cells
 126 of memory, considering the primitive object plus its data vector.
 127 This is exactly the same advantage as for STRUCTURE-OBJECT.
 128 Moreover, we gain a beautiful advantange: atomic update of
 129 the LAYOUT and SLOTS with CMPXCHG16B (for CHANGE-CLASS etc) which
 130 at present can not be done, as they do not satisfy that instruction's
 131 strict memory alignment requirement.
 132
 133 2) Supposing we don't store the clos-hash at all, but use a non-moving GC.
 134 Then the clos-hash of a standard-object is just a mixing of its address.
 135 Each CLOS primitive object will shrink by exactly 2 words.
 136
 137 Unifying PCL instance access
 138 ============================
 139 STANDARD-INSTANCE-ACCESS and FUNCALLABLE-STANDARD-INSTANCE-ACCESS
 140 can be made to emit identical assembly code, which simplifies
 141 or eliminates some tests throughout the logic for PCL in deciding
 142 what metaclass of instance is in hand (funcallable or not)
 143 when getting the slot vector and layout.
 144
 145 1) By placing the layout of a funcallable instance to its header word,
 146 the assembly code for reading a layout of an object that is either a
 147 standard-instance or funcallable-instance would mask off the lowtag
 148 (essentially converting the descriptor to a native pointer),
 149 and then read the layout from the high half of the header word.
 150
 151 2) By placing the slot vector of a funcallable instance so that
 152 it becomes the first slot after the trampoline slot,
 153 access to the slot vector of either kind of instance can
 154 be done with "mov result, [ptr+5]" from the tagged pointer.
 155 This relies on the fact that the difference between the lowtags
 156 is exactly 8 and that the instance-pointer-lowtag is 3.
 157 Therefore, in the case of standard instance, [ptr+5] reads the
 158 physical word which is one word after the header,
 159 and in the case of funcallable-standard-instance,
 160 [ptr+5] reads the physical word which is 2 words after the header,
 161 skipping over the trampoline word.
 162
 163 3) By placing a layout in every header of all 3 subtypes of FUNCTION,
 164 then LAYOUT-OF can consistently access the layout in the same way
 165 for any object that has either FUN-POINTER- or INSTANCE-POINTER-LOWTAG.