2 Time-stamp: <2011-01-24 20:10:44 tony>
3 Creation: <2008-09-08 08:06:30 tony>
8 Author: AJ Rossini <blindglobe@gmail.com>
9 Copyright: (c) 2007-2010, AJ Rossini <blindglobe@gmail.com>. BSD.
10 Purpose: Stuff that needs to be made working sits inside the
13 This file contains the current challenges to solve,
14 including a description of the setup and the work to
15 solve. Solutions welcome.
17 What is this talk of 'release'? Klingons do not make software
18 'releases'. Our software 'escapes', leaving a bloody trail of
19 designers and quality assurance people in its wake.
23 ** (Internal) Package and (External) System Hierarchy
24 *** Singletons (primary building blocks)
26 These are packages as well as
28 | asdf | common system loader |
29 | xarray | common access structure to array-like |
30 | | (matrix, vector) structures. |
31 | cls-config | initialization of Lisp state, variables, etc, |
32 | | localization to the particular lisp. |
33 | lift | unit-testing |
34 | cffi | foriegn function library |
40 *** Dependency structure
42 | lisp-matrix | general purpose matrix package, linking to lapack | |
43 | | for numerics. Depends on: | |
46 | | cl-blapack | cffi |
48 | cls-dataframe | in the same spirit as lisp-matrix, a means to | |
49 | | create tables. Perhaps better called datatables? | |
50 | cls-probability | depends on gsll, cl-variates, cl-? initially, | |
64 Usually, we need to load it before going on.
67 (asdf:oos 'asdf:load-op :cls)
70 - State "DONE" from "CURR" [2010-10-12 Tue 13:48] \\
71 setup is mostly complete
72 - State "CURR" from "TODO" [2010-10-12 Tue 13:47]
73 - State "TODO" from "" [2010-10-12 Tue 13:47]
74 This is an example of a custom setup, not really interesting at
75 this point except to remind Tony how to program.
80 (defun init-CLS (&key (compile 'nil))
81 (let ((packagesToLoad (list ;; core system
82 :lift :lisp-matrix :cls
85 ;; :cl-cairo2-x11 :iterate
88 :cl-pdf :cl-typesetting
90 :asdf-system-connections :xarray
92 :metatilities-base :anaphora :tinaa
93 :cl-ppcre :cl-markdown :docudown
94 ;; version and validate CLOS objects
95 ;; :versioned-objects :validations
98 ;; :cl-glu :cl-glut :cl-glut-examples
103 (mapcar #'(lambda (x)
105 (asdf:oos 'asdf:compile-op x :force T)
106 (asdf:oos 'asdf:load-op x)))
109 (init-CLS)) ;; vs (init-CLS :compile T)
113 | | #<PACKAGE "COMMON-LISP-USER"> |
115 ** TODO [#A] Integrate with quicklist support.
116 - State "TODO" from "" [2010-11-30 Tue 18:00]
118 important to merge with quicklisp system loader support.
119 ** CURR [#A] Testing: unit, regression, examples. [0/3]
120 - State "CURR" from "TODO" [2010-10-12 Tue 13:51]
121 - State "TODO" from "" [2010-10-12 Tue 13:51]
122 Testing consists of unit tests, which internally verify subsets of
123 code, regression tests, and functional tests (in increasing order
125 *** CURR [#B] Unit tests
126 - State "CURR" from "TODO" [2010-11-04 Thu 18:33]
127 - State "CURR" from "TODO" [2010-10-12 Tue 13:48]
128 - State "TODO" from "" [2010-10-12 Tue 13:48]
129 Unit tests have been started using LIFT. Need to consider some of
130 the other systems that provide testing, when people add them to the
131 mix of libraries that we need, along with examples of how to use.
134 (in-package :lisp-stat-unittests)
135 (run-tests :suite 'lisp-stat-ut)
136 ;; => tests = 78, failures = 7, errors = 20
137 (asdf:oos 'asdf:test-op 'cls)
138 ;; which runs (describe (run-tests :suite 'lisp-stat-ut))
140 and check documentation to see if it is useful.
143 (in-package :lisp-stat-unittests)
145 (describe 'lisp-stat-ut)
146 (documentation 'lisp-stat-ut 'type)
148 ;; FIXME: Example: currently not relevant, yet
149 ;; (describe (lift::run-test :test-case 'lisp-stat-unittests::create-proto
150 ;; :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
152 (describe (lift::run-tests :suite 'lisp-stat-ut-dataframe))
153 (lift::run-tests :suite 'lisp-stat-ut-dataframe)
155 (describe (lift::run-test
156 :test-case 'lisp-stat-unittests::create-proto
157 :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
160 *** TODO [#B] Regression Tests
161 - State "TODO" from "" [2010-10-12 Tue 13:54]
163 *** TODO [#B] Functional Tests
164 - State "TODO" from "" [2010-10-12 Tue 13:54]
166 ** CURR [#B] Functional Examples that need to work [1/2]
167 - State "CURR" from "TODO" [2010-11-30 Tue 17:57]
168 - State "TODO" from "" [2010-10-12 Tue 13:55]
170 These examples should be functional forms within CLS, describing
171 working functionality which is needed for work.
173 *** TODO [#B] Scoping with datasets
174 - State "TODO" from "" [2010-11-04 Thu 18:46]
176 The following needs to work, and a related syntax for resampling
177 and similar synthetic data approaches (bootstrapping, imputation)
178 ought to use similar syntax as well.
179 #+srcname: DataSetNameScoping
181 (in-package :ls-user)
183 ;; Syntax examples using lexical scope, closures, and bindings to
184 ;; ensure a clean communication of results
185 ;; This is actually a bit tricky, since we need to clarify whether
186 ;; it is line-at-a-time that we are considering or if there is
187 ;; another mapping strategy. In particular, one could imagine a
188 ;; looping-over-observations function, or a
189 ;; looping-over-independent-observations function which leverages a
190 ;; grouping variable which provides guidance for what is considered
191 ;; independent from the sampling frame being considered. The frame
192 ;; itself (definable via some form of metadata to clarify scope?)
193 ;; could clearly provide a bit of relativity for clarifying what
194 ;; statistical independence means.
196 (with-data dataset ((dsvarname1 [usevarname1])
197 (dsvarname2 [usevarname2]))
200 (looping-over-observations
201 dataset ((dsvarname1 [usevarname1])
202 (dsvarname2 [usevarname2]))
205 (looping-over-independent-observations
206 dataset independence-defining-variable
207 ((dsvarname1 [usevarname1])
208 (dsvarname2 [usevarname2]))
214 *** DONE [#B] Dataframe variable typing
215 - State "DONE" from "CURR" [2010-11-30 Tue 17:56] \\
216 check-type approach works, we would just have to throw a catchable
217 error if we want to use it in a reliable fashion.
218 - State "CURR" from "TODO" [2010-11-30 Tue 17:56]
219 - State "TODO" from "" [2010-11-04 Thu 18:48]
221 Seems to generally work, need to ensure that we use this for
224 #+srcname: DFvarTyping
226 (in-package :ls-user)
227 (defparameter *df-test*
228 (make-instance 'dataframe-array
229 :storage #2A (('a "test0" 0 0d0)
235 :case-labels (list "0" "1" 2 "3" "4")
236 :var-labels (list "symbol" "string" "integer" "double-float")
237 :var-types (list 'symbol 'string 'integer 'double-float)))
239 ;; with SBCL, ints become floats? Need to adjust output
240 ;; representation appropriately..
243 (defun check-var (df colnum)
244 (let ((nobs (xdim (dataset df) 0)))
246 (check-type (xref df i colnum) (elt (var-types df) i)))))
248 (xdim (dataset *df-test*) 1)
249 (xdim (dataset *df-test*) 0)
251 (check-var *df-test* 0)
254 (xref *df-test* 1 1))
256 (check-type (xref *df-test* 1 1)
257 string) ;; => nil, so good.
258 (check-type (xref *df-test* 1 1)
259 vector) ;; => nil, so good.
260 (check-type (xref *df-test* 1 1)
261 real) ;; => simple-error type thrown, so good.
263 ;; How to nest errors within errors?
264 (check-type (check-type (xref *df-test* 1 1) real) ;; => error thrown, so good.
268 (check-type *df-test*
269 dataframe-array) ; nil is good.
271 (integerp (xref *df-test* 1 2))
272 (floatp (xref *df-test* 1 2))
273 (integerp (xref *df-test* 1 3))
274 (type-of (xref *df-test* 1 3))
275 (floatp (xref *df-test* 1 3))
277 (type-of (vector 1 1d0))
283 (xref *df-test* 1 '*)
286 ** CURR [#A] Random Numbers [2/6]
287 - State "CURR" from "TODO" [2010-11-05 Fri 15:41]
288 - State "TODO" from "" [2010-10-14 Thu 00:12]
290 Need to select and choose a probability system (probability
291 functions, random numbers). Goal is to have a general framework
292 for representing probability functions, functionals on
293 probabilities, and reproducible random streams based on such
295 *** CURR [#B] CL-VARIATES system evaluation [2/3]
296 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
297 - State "TODO" from "" [2010-10-12 Tue 14:16]
299 CL-VARIATES is a system developed by Gary W King. It uses streams
300 with seeds, and is hence reproducible. (Random comment: why do CL
301 programmers as a class ignore computational reproducibility?)
302 **** DONE [#B] load and verify
303 - State "DONE" from "CURR" [2010-11-04 Thu 18:59] \\
304 load, init, and verify performance.
305 - State "CURR" from "TODO" [2010-11-04 Thu 18:58]
306 - State "TODO" from "" [2010-11-04 Thu 18:58]
308 <2010-11-30 Tue> : just modified cls.asd to ensure that we load
309 as appropriate the correct random variate package.
311 #+srcname: Loading-CL-VARIATES
313 (in-package :cl-user)
314 (asdf:oos 'asdf:load-op 'cl-variates)
315 (asdf:oos 'asdf:load-op 'cl-variates-test)
319 #+srcname: CL-VARIATES-UNITTESTS
322 (in-package :cl-variates-test)
324 (run-tests :suite 'cl-variates-test)
325 (describe (run-tests :suite 'cl-variates-test))
329 **** DONE [#B] Examples of use
330 - State "DONE" from "CURR" [2010-11-05 Fri 15:39] \\
331 basic example of reproducible draws from the uniform and normal random
333 - State "CURR" from "TODO" [2010-11-05 Fri 15:39]
334 - State "TODO" from "" [2010-11-04 Thu 19:01]
336 #+srcname: CL-VARIATES-REPRO
339 (in-package :cl-variates-user)
341 (defparameter state (make-random-number-generator))
342 (setf (random-seed state) 44)
345 (loop for i from 1 to 10 collect
346 (random-range state 0 10))
347 ;; => (1 5 1 0 7 1 2 2 8 10)
348 (setf (random-seed state) 44)
349 (loop for i from 1 to 10 collect
350 (random-range state 0 10))
351 ;; => (1 5 1 0 7 1 2 2 8 10)
353 (setf (random-seed state) 44)
355 (loop for i from 1 to 10 collect
356 (normal-random state 0 1))
358 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
359 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
360 ;; 0.20750134211656893 -0.14501914108452274)
362 (setf (random-seed state) 44)
363 (loop for i from 1 to 10 collect
364 (normal-random state 0 1))
366 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
367 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
368 ;; 0.20750134211656893 -0.14501914108452274)
372 **** CURR [#B] Full example of general usage
373 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
374 - State "TODO" from "" [2010-11-05 Fri 15:40]
376 What we want to do here is describe the basic available API that
377 is present. So while the previous work describes what the basic
378 reproducibility approach would be in terms of generating lists of
379 reproducible pRNG streams, we need the full range of possible
380 probability laws that are present.
382 One of the good things about cl-variates is that it provides for
383 reproducibility. One of the bad things is that it has a mixed
386 *** TODO [#B] CL-RANDOM system evaluation
387 - State "TODO" from "" [2010-11-05 Fri 15:40]
390 1. no seed setting for random numbers
391 2. contamination of a probability support with optimization and
396 2. nice design for generics.
398 *** TODO [#B] Native CLS (from XLS)
399 - State "TODO" from "" [2010-11-05 Fri 15:40]
401 ** TODO [#B] Numerical Linear Algebra
402 - State "TODO" from "" [2010-10-14 Thu 00:12]
404 *** TODO [#B] LLA evaluation
405 - State "TODO" from "" [2010-10-12 Tue 14:13]
406 ;;; experiments with LLA
407 (in-package :cl-user)
408 (asdf:oos 'asdf:load-op 'lla)
409 (in-package :lla-user)
411 *** CURR [#B] Lisp-Matrix system evaluation
412 - State "CURR" from "TODO" [2010-10-12 Tue 14:13]
413 - State "TODO" from "" [2010-10-12 Tue 14:13]
415 *** TODO [#B] LispLab system evaluation
416 - State "TODO" from "" [2010-10-12 Tue 14:13]
418 ** TODO [#B] Statistical Procedures to implement
419 - State "TODO" from "" [2010-10-14 Thu 00:12]
422 (in-package :cls-user)
427 ;; population design eval and opt
435 number of samples/cost of lab analysis and collection
439 (defun pfim (&key model ( constraints ( summary-function )
441 (list num-subjects num-times list-times))))
445 Each individal has a deisgn psi_i
446 nubmer of samples n_i and sampling times t_{i{1}} t_{i{n_1}}
447 individuals can differ
451 individual-level model
454 (=model y_i (+ (f \theta_i \psi_i) epsilion_i ))
455 (=var \epsilion_i \sigma_between \sigma_within )
457 ;; Information Matrix for pop deisgn
459 (defparameter IM (sum (i 1 N) (MF \psi_i \phi_i)))
462 For nonlinear structureal models, expand around RE=0
464 Cramer-Rao : MF^{-1} is lower bound for estimation variance.
468 - smallest SE, but is a matrix, so
469 - criteria for matrix comparison
470 -- D-opt, (power (determinant MF) (/ 1 P))
473 find design maxing D opt, (power (determinant MF) (/ 1 P))
475 -- contin vars for smapling times within interval or set -- number of groups for cat vars
477 Stat in Med 2009, expansion around post-hoc RE est, not necessarily zero.
479 Example binary covariate C
482 (if (= i reference-class)
487 (=model (log \theta) ( ))
494 PFIM provides for a given design and values of \beta:
496 SE/RSE for \beta of each class of each covar
497 eval influence of design on SE(\beta)
499 inter-occassion variability (IOV)
500 - patients sampled more than once, H occassions
502 - additional vars to estimate
506 ;;; comparison criteria
508 functional of conc/time curve which is used for comparison, i.e.
509 (AUC conc/time-curve)
510 (Cmax conc/time-curve)
511 (Tmax conc/time-curve)
515 (defun conc/time-curve (t)
518 (let ((conc (exp (* t \beta1))))
524 (url-get "www.pfim.biostat.fr")
527 ;;; Thinking of generics...
528 (information-matrix model parameters)
529 (information-matrix variance-matrix)
530 (information-matrix model data)
531 (information-matrix list-of-individual-IMs)
534 (defun IM (loglikelihood parameters times)
535 "Does double work. Sum up the resulting IMs to form a full IM."
536 (let ((IM (make-matrix (length parameters)
538 :initial-value 0.0d0)))
539 (dolist (parameterI parameters)
540 (dolist (parameterJ parameters)
542 (differentiate (differentiate loglikelihood parameterI) parameterJ))))))
544 *** difference between empirical, fisherian, and ...? information.
545 *** Example of Integration with CL-GENOMIC
546 - State "TODO" from "" [2010-10-12 Tue 14:03]
548 CL-GENOMIC is a very interesting data-structure strategy for
549 manipulating sequence data.
553 (in-package :cl-user)
554 (asdf:oos 'asdf:compile-op :ironclad)
555 (asdf:oos 'asdf:load-op :cl-genomic)
557 (in-package :bio-sequence)
558 (make-dna "agccg") ;; fine
559 (make-aa "agccg") ;; fine
560 (make-aa "agc9zz") ;; error expected
563 ** TODO [#B] Documentation and Examples [0/3]
564 - State "TODO" from "" [2010-10-14 Thu 00:12]
566 *** TODO [#B] Docudown
567 - State "TODO" from "" [2010-11-05 Fri 15:34]
570 - State "TODO" from "" [2010-11-05 Fri 15:34]
572 *** TODO [#B] CLPDF, and literate data analysis
573 - State "TODO" from "" [2010-11-05 Fri 15:34]