1 --- !ditz.rubyforge.org,2008-03-06/issue
2 title: Investigate median calculation that will allow the 0th frame to hold the current median and weight.
4 Median calculation implementation (in ALE) currently requires minor processing
5 to calculate weights and values. Eliminating this requirement would probably
10 reporter: David Hilvert <dhilvert@auricle.dyndns.org>
13 creation_time: 2009-02-06 21:56:37.111191 Z
16 id: 2a9ae9d6db1fda0097a62dd69e52e33b33754e26
18 - - 2009-02-06 21:56:37.743981 Z
19 - David Hilvert <dhilvert@auricle.dyndns.org>
22 - - 2009-02-06 22:21:41.445266 Z
23 - David Hilvert <dhilvert@auricle.dyndns.org>
26 Values and weights could probably be stored in a tree packed into an array,
27 with zero weights indicating termination, and with re-balancing occurring as
28 necessary to maintain the median position at the root. E.g.,
30 node value(summation of node weight with node-inferior weights):
36 Packed as something like:
38 10(45) 5(12) 13(4) 2(3) 0(0) 0(0) 0(0)
40 Better, and easier, however, might be to just store a shifted list of values,
41 with the median value at the head of the list, and with each weight the
42 summation of the current weight with all following weights. E.g.,
44 node value(summation of item weight with all following weights):
46 10(45) 13(16) 2(12) 5(9)
48 Note that the weights associated with each value can be fairly easily
49 calculated from either of the above representations:
56 The most obvious disadvantage of either of the above approaches relative to
57 current code is increased complexity.
58 - - 2009-02-06 22:30:39.306116 Z
59 - David Hilvert <dhilvert@auricle.dyndns.org>
62 More general approaches are possible, perhaps in combination with
63 earlier-commented approaches. In particular, an additional plane could be
64 allocated having indices noting the current shift value (in the case of the
65 second earlier-commented approach), or, perhaps better, lists could be stored
66 linearly in memory (perhaps with the sort of shift [rotation] commented
67 earlier) in a single, very large, plane, with median calculation performed with
68 each addition, and the result transferred at every step to the result plane.
69 (Result transfer obviously being simpler in the case that shift [rotation] is
71 - - 2009-02-06 23:43:57.302348 Z
72 - David Hilvert <dhilvert@auricle.dyndns.org>
75 As an addendum to the prior comment, note that there is no need to store the
76 median value twice in either the shifted or unshifted case, as the median item
77 can always be stored outside of the main part of the list; as a consequence,
78 there is also no need to perform any sort of transfer from a calculation plane
79 to a result plane, as the result plane can also be used for calculation, in
80 conjunction with the additional, larger plane. The alternatives of shifted and
81 unshifted would appear roughly so:
85 Result plane Calculation plane
86 ------------ -----------------
87 10(45) 2(16) 5(13) 13(4)
91 Result plane Calculation plane
92 ------------ -----------------
93 10(45) 13(16) 2(12) 5(9)
95 with the former approach apparently the easier. Since result values are always
96 maintained, there is no particular need to store the calculation plane in a
97 manner allowing for easy extraction of results (merely for easy calculation
98 overall). Hence, the following representations, not summing over calculation
99 plane items, may also be acceptable.
103 Result plane Calculation plane
104 ------------ -----------------
105 10(45) 2(3) 5(9) 13(4)
109 Result plane Calculation plane
110 ------------ -----------------
111 10(45) 13(4) 2(3) 5(9)
113 This has the nice property of not requiring continual re-calculation of weight
114 sums, leading to possible imprecision. Another possibility would be the
115 following, which would require (as suggested in previous comments on this
116 issue) a transfer to the result plane following calculation, but which would
117 always maintain all item weights independently of any sum:
121 Result plane Calculation plane
122 ------------ -----------------
123 10(45) 2(3) 5(9) 10(29) 13(4)
127 Result plane Calculation plane
128 ------------ -----------------
129 10(45) 10(29) 13(4) 2(3) 5(9)
131 Perhaps the most straightforward, and simplest, approach would be the
132 following, which combines transfer with summation:
134 Result plane Calculation plane
135 ------------ -----------------
136 10(45) 2(3) 5(12) 10(41) 13(45*)
138 A nice side effect of this is that the median can be located in a
139 straightforward manner, and also that the final weight value can be dropped,
140 since it is present in the result plane. Furthermore, if weights are stored
141 separately from values, then the duplication of the 10 can also be dropped,
144 Result plane Calculation plane
145 ------------ -----------------
146 10(45) 2 5 13 (3 12 41)
148 Note in this case that the weights and values in the calculation plane do not
149 correspond, but rather the weights and values of the combination of the result
150 and calculation planes correspond (if not sequentially).
152 Zero weights can be used to indicate values (if any) having no contribution.
155 Result plane Calculation plane
156 ------------ -----------------
157 10(45) 0 2 5 13 (0 3 12 41)
159 (In this case, the first weight -- for the first value '0' -- is zero. The
160 value obviously has no impact on the median result.)
162 Note that the grouping of weights and values above is primarily for clarity.
163 Interleaving these might be most efficient; e.g.,
165 Result plane Calculation plane
166 ------------ -----------------
167 10 45 0 0 2 3 5 12 13 41
169 (But note that weights and values in the calculation plane still do not
170 correspond, despite the fact that each value is paired with a following
172 - - 2009-02-06 23:55:03.060874 Z
173 - David Hilvert <dhilvert@auricle.dyndns.org>
176 As a minor addendum, note that 'calculation plane' could be split into several
177 planes (e.g., to facilitate more dynamic augmentation of calculation space), or
178 could merely be stored as a single plane. This point should be fairly obvious,
179 as should the mappings between single-plane and multi-plane representations.