bugs/issue-2a9ae9d6db1fda0097a62dd69e52e33b33754e26.yaml

   1 --- !ditz.rubyforge.org,2008-03-06/issue
   2 title: Investigate median calculation that will allow the 0th frame to hold the current median and weight.
   3 desc: |-
   4   Median calculation implementation (in ALE) currently requires minor processing
   5   to calculate weights and values.  Eliminating this requirement would probably
   6   be desirable.
   7 type: :task
   8 component: libale
   9 release:
  10 reporter: David Hilvert <dhilvert@auricle.dyndns.org>
  11 status: :unstarted
  12 disposition:
  13 creation_time: 2009-02-06 21:56:37.111191 Z
  14 references: []
  15
  16 id: 2a9ae9d6db1fda0097a62dd69e52e33b33754e26
  17 log_events:
  18 - - 2009-02-06 21:56:37.743981 Z
  19   - David Hilvert <dhilvert@auricle.dyndns.org>
  20   - created
  21   - ""
  22 - - 2009-02-06 22:21:41.445266 Z
  23   - David Hilvert <dhilvert@auricle.dyndns.org>
  24   - commented
  25   - |-
  26     Values and weights could probably be stored in a tree packed into an array,
  27     with zero weights indicating termination, and with re-balancing occurring as
  28     necessary to maintain the median position at the root.  E.g.,
  29
  30     node value(summation of node weight with node-inferior weights):
  31
  32                                 10(45)
  33                             5(12)    13(4)
  34                         2(3)  0(0) 0(0)  0(0)
  35
  36     Packed as something like:
  37
  38         10(45) 5(12) 13(4) 2(3) 0(0) 0(0) 0(0)
  39
  40     Better, and easier, however, might be to just store a shifted list of values,
  41     with the median value at the head of the list, and with each weight the
  42     summation of the current weight with all following weights.  E.g.,
  43
  44     node value(summation of item weight with all following weights):
  45
  46         10(45) 13(16) 2(12) 5(9)
  47
  48     Note that the weights associated with each value can be fairly easily
  49     calculated from either of the above representations:
  50
  51         2(3)
  52         5(9)
  53         10(29)
  54         13(4)
  55
  56     The most obvious disadvantage of either of the above approaches relative to
  57     current code is increased complexity.
  58 - - 2009-02-06 22:30:39.306116 Z
  59   - David Hilvert <dhilvert@auricle.dyndns.org>
  60   - commented
  61   - |-
  62     More general approaches are possible, perhaps in combination with
  63     earlier-commented approaches.  In particular, an additional plane could be
  64     allocated having indices noting the current shift value (in the case of the
  65     second earlier-commented approach), or, perhaps better, lists could be stored
  66     linearly in memory (perhaps with the sort of shift [rotation] commented
  67     earlier) in a single, very large, plane, with median calculation performed with
  68     each addition, and the result transferred at every step to the result plane.
  69     (Result transfer obviously being simpler in the case that shift [rotation] is
  70     being performed.)
  71 - - 2009-02-06 23:43:57.302348 Z
  72   - David Hilvert <dhilvert@auricle.dyndns.org>
  73   - commented
  74   - |-
  75     As an addendum to the prior comment, note that there is no need to store the
  76     median value twice in either the shifted or unshifted case, as the median item
  77     can always be stored outside of the main part of the list; as a consequence,
  78     there is also no need to perform any sort of transfer from a calculation plane
  79     to a result plane, as the result plane can also be used for calculation, in
  80     conjunction with the additional, larger plane.  The alternatives of shifted and
  81     unshifted would appear roughly so:
  82
  83     Unshifted:
  84
  85         Result plane    Calculation plane
  86         ------------    -----------------
  87         10(45)          2(16) 5(13) 13(4)
  88
  89     Shifted:
  90
  91         Result plane    Calculation plane
  92         ------------    -----------------
  93         10(45)          13(16) 2(12) 5(9)
  94
  95     with the former approach apparently the easier.  Since result values are always
  96     maintained, there is no particular need to store the calculation plane in a
  97     manner allowing for easy extraction of results (merely for easy calculation
  98     overall).  Hence, the following representations, not summing over calculation
  99     plane items, may also be acceptable.
 100
 101     Unshifted:
 102
 103         Result plane    Calculation plane
 104         ------------    -----------------
 105         10(45)          2(3) 5(9) 13(4)
 106
 107     Shifted:
 108
 109         Result plane    Calculation plane
 110         ------------    -----------------
 111         10(45)          13(4) 2(3) 5(9)
 112
 113     This has the nice property of not requiring continual re-calculation of weight
 114     sums, leading to possible imprecision.  Another possibility would be the
 115     following, which would require (as suggested in previous comments on this
 116     issue) a transfer to the result plane following calculation, but which would
 117     always maintain all item weights independently of any sum:
 118
 119     Unshifted:
 120
 121         Result plane    Calculation plane
 122         ------------    -----------------
 123         10(45)          2(3) 5(9) 10(29) 13(4)
 124
 125     Shifted:
 126
 127         Result plane    Calculation plane
 128         ------------    -----------------
 129         10(45)          10(29) 13(4) 2(3) 5(9)
 130
 131     Perhaps the most straightforward, and simplest, approach would be the
 132     following, which combines transfer with summation:
 133
 134         Result plane    Calculation plane
 135         ------------    -----------------
 136         10(45)          2(3) 5(12) 10(41) 13(45*)
 137
 138     A nice side effect of this is that the median can be located in a
 139     straightforward manner, and also that the final weight value can be dropped,
 140     since it is present in the result plane.  Furthermore, if weights are stored
 141     separately from values, then the duplication of the 10 can also be dropped,
 142     like so:
 143
 144         Result plane    Calculation plane
 145         ------------    -----------------
 146         10(45)          2 5 13 (3 12 41)
 147
 148     Note in this case that the weights and values in the calculation plane do not
 149     correspond, but rather the weights and values of the combination of the result
 150     and calculation planes correspond (if not sequentially).
 151
 152     Zero weights can be used to indicate values (if any) having no contribution.
 153     E.g.,
 154
 155         Result plane    Calculation plane
 156         ------------    -----------------
 157         10(45)          0 2 5 13 (0 3 12 41)
 158
 159     (In this case, the first weight -- for the first value '0' -- is zero.  The
 160     value obviously has no impact on the median result.)
 161
 162     Note that the grouping of weights and values above is primarily for clarity.
 163     Interleaving these might be most efficient; e.g.,
 164
 165         Result plane    Calculation plane
 166         ------------    -----------------
 167         10 45           0 0 2 3 5 12 13 41
 168
 169     (But note that weights and values in the calculation plane still do not
 170     correspond, despite the fact that each value is paired with a following
 171     weight.)
 172 - - 2009-02-06 23:55:03.060874 Z
 173   - David Hilvert <dhilvert@auricle.dyndns.org>
 174   - commented
 175   - |-
 176     As a minor addendum, note that 'calculation plane' could be split into several
 177     planes (e.g., to facilitate more dynamic augmentation of calculation space), or
 178     could merely be stored as a single plane.  This point should be fairly obvious,
 179     as should the mappings between single-plane and multi-plane representations.