2 Time-stamp: <2012-07-01 11:14:12 tony>
3 Creation: <2008-09-08 08:06:30 tony>
8 Author: AJ Rossini <blindglobe@gmail.com>
9 Copyright: (c) 2007-2010, AJ Rossini <blindglobe@gmail.com>. BSD.
10 Purpose: Stuff that needs to be made working sits inside the
13 This file contains the current challenges to solve,
14 including a description of the setup and the work to
15 solve. Solutions welcome.
17 What is this talk of 'release'? Klingons do not make software
18 'releases'. Our software 'escapes', leaving a bloody trail of
19 designers and quality assurance people in its wake.
23 ** (Internal) Package and (External) System Hierarchy
24 *** Singletons (primary building blocks)
26 These are packages as well as
28 | asdf | common system loader |
29 | xarray | common access structure to array-like |
30 | | (matrix, vector) structures. |
31 | cls-config | initialization of Lisp state, variables, etc, |
32 | | localization to the particular lisp. |
33 | lift | unit-testing |
34 | cffi | foriegn function library |
40 as of now, all are in QL heirarchy
42 *** Dependency structure
44 | lisp-matrix | general purpose matrix package, linking to lapack | |
45 | | for numerics. Depends on: | |
48 | | cl-blapack | cffi |
50 | cls-dataframe | in the same spirit as lisp-matrix, a means to | |
51 | | create tables. Perhaps better called datatables? | |
52 | cls-probability | depends on gsll, cl-variates, cl-? initially, | |
67 Usually, we need to load it before going on.
74 though sometimes we might want to recompile fully. (Can this be
75 done via QL? Need to check)
77 #+name: compile-it-all
79 (asdf:oos 'asdf:compile-op :cls :force T)
84 - State "DONE" from "CURR" [2010-10-12 Tue 13:48] \\
85 setup is mostly complete
86 - State "CURR" from "TODO" [2010-10-12 Tue 13:47]
87 - State "TODO" from "" [2010-10-12 Tue 13:47]
88 This is an example of a custom setup, not really interesting at
89 this point except to remind Tony how to program.
94 (defun init-CLS (&key (compile 'nil))
95 (let ((packagesToLoad (list ;; core system
96 :lift :lisp-matrix :cls
98 ;; :cl-cairo2-x11 :iterate
101 :cl-pdf :cl-typesetting
103 :asdf-system-connections :xarray
105 :metatilities-base :anaphora :tinaa
106 :cl-ppcre :cl-markdown :docudown
107 ;; version and validate CLOS objects
108 ;; :versioned-objects :validations
111 ;; :cl-glu :cl-glut :cl-glut-examples
116 (mapcar #'(lambda (x)
118 (asdf:oos 'asdf:compile-op x :force T)
119 (asdf:oos 'asdf:load-op x)))
122 (init-CLS)) ;; vs (init-CLS :compile T)
126 | | #<PACKAGE "COMMON-LISP-USER"> |
128 ** TODO [#A] Integrate with quicklist support.
129 - State "TODO" from "" [2010-11-30 Tue 18:00]
131 important to merge with quicklisp system loader support.
132 ** CURR [#A] Testing: unit, regression, examples. [0/3]
133 - State "CURR" from "TODO" [2010-10-12 Tue 13:51]
134 - State "TODO" from "" [2010-10-12 Tue 13:51]
135 Testing consists of unit tests, which internally verify subsets of
136 code, regression tests, and functional tests (in increasing order
138 *** CURR [#B] Unit tests
139 - State "CURR" from "TODO" [2010-11-04 Thu 18:33]
140 - State "CURR" from "TODO" [2010-10-12 Tue 13:48]
141 - State "TODO" from "" [2010-10-12 Tue 13:48]
142 Unit tests have been started using LIFT. Need to consider some of
143 the other systems that provide testing, when people add them to the
144 mix of libraries that we need, along with examples of how to use.
146 #+srcname: ex-cls-unittest
148 (in-package :lisp-stat-unittests)
149 (run-tests :suite 'lisp-stat-ut)
153 : #<Results for LISP-STAT-UT 78 Tests, 7 Failures, 20 Errors>
155 ;; => tests = 78, failures = 7, errors = 20
159 The following needs to be solved in order to have a decent
160 installation qualification (IQ) and performance qualification (PQ)
162 #+srcname: cls-unittest
164 (in-package :lisp-stat-unittests)
165 (asdf:oos 'asdf:test-op 'cls)
166 ;; which runs (describe (run-tests :suite 'lisp-stat-ut))
171 and check documentation to see if it is useful.
174 (in-package :lisp-stat-unittests)
176 (describe 'lisp-stat-ut)
177 (documentation 'lisp-stat-ut 'type)
179 ;; FIXME: Example: currently not relevant, yet
180 ;; (describe (lift::run-test :test-case 'lisp-stat-unittests::create-proto
181 ;; :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
183 (describe (lift::run-tests :suite 'lisp-stat-ut-dataframe))
184 (lift::run-tests :suite 'lisp-stat-ut-dataframe)
186 (describe (lift::run-test
187 :test-case 'lisp-stat-unittests::create-proto
188 :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
191 *** TODO [#B] Regression Tests
192 - State "TODO" from "" [2010-10-12 Tue 13:54]
194 *** TODO [#B] Functional Tests
195 - State "TODO" from "" [2010-10-12 Tue 13:54]
197 ** CURR [#B] Functional Examples that need to work [1/2]
198 - State "CURR" from "TODO" [2010-11-30 Tue 17:57]
199 - State "TODO" from "" [2010-10-12 Tue 13:55]
201 These examples should be functional forms within CLS, describing
202 working functionality which is needed for work.
204 *** TODO [#B] Scoping with datasets
205 - State "TODO" from "" [2010-11-04 Thu 18:46]
207 The following needs to work, and a related syntax for resampling
208 and similar synthetic data approaches (bootstrapping, imputation)
209 ought to use similar syntax as well.
210 #+srcname: DataSetNameScoping
212 (in-package :ls-user)
214 ;; Syntax examples using lexical scope, closures, and bindings to
215 ;; ensure a clean communication of results
216 ;; This is actually a bit tricky, since we need to clarify whether
217 ;; it is line-at-a-time that we are considering or if there is
218 ;; another mapping strategy. In particular, one could imagine a
219 ;; looping-over-observations function, or a
220 ;; looping-over-independent-observations function which leverages a
221 ;; grouping variable which provides guidance for what is considered
222 ;; independent from the sampling frame being considered. The frame
223 ;; itself (definable via some form of metadata to clarify scope?)
224 ;; could clearly provide a bit of relativity for clarifying what
225 ;; statistical independence means.
227 (with-data dataset ((dsvarname1 [usevarname1])
228 (dsvarname2 [usevarname2]))
231 ;; SAS-centric approach to spec'ing work
232 (looping-over-observations
233 dataset ((dsvarname1 [usevarname1])
234 (dsvarname2 [usevarname2]))
237 ;; SAS plus "statistical sensibility"... for example, if an
238 ;; independent observation actually consists of many observations so
239 ;; that a dataframe of independence results -- for example,
240 ;; longitudinal data or spatial data or local-truncated network data
241 ;; are clean examples of such happening -- then we get the data
242 ;; frame or row representing the independent result.
243 (looping-over-independent-observations
244 dataset independence-defining-variable
245 ((dsvarname1 [usevarname1])
246 (dsvarname2 [usevarname2]))
251 *** DONE [#B] Dataframe variable typing
252 - State "DONE" from "CURR" [2010-11-30 Tue 17:56] \\
253 check-type approach works, we would just have to throw a catchable
254 error if we want to use it in a reliable fashion.
255 - State "CURR" from "TODO" [2010-11-30 Tue 17:56]
256 - State "TODO" from "" [2010-11-04 Thu 18:48]
258 Seems to generally work, need to ensure that we use this for
261 #+srcname: DFvarTyping
263 (in-package :ls-user)
264 (defparameter *df-test*
265 (make-instance 'dataframe-array
266 :storage #2A (('a "test0" 0 0d0)
272 :case-labels (list "0" "1" 2 "3" "4")
273 :var-labels (list "symbol" "string" "integer" "double-float")
274 :var-types (list 'symbol 'string 'integer 'double-float)))
276 ;; with SBCL, ints become floats? Need to adjust output
277 ;; representation appropriately..
280 (defun check-var (df colnum)
281 (let ((nobs (xdim (dataset df) 0)))
283 (check-type (xref df i colnum) (elt (var-types df) i)))))
285 (xdim (dataset *df-test*) 1)
286 (xdim (dataset *df-test*) 0)
288 (check-var *df-test* 0)
291 (xref *df-test* 1 1))
293 (check-type (xref *df-test* 1 1)
294 string) ;; => nil, so good.
295 (check-type (xref *df-test* 1 1)
296 vector) ;; => nil, so good.
297 (check-type (xref *df-test* 1 1)
298 real) ;; => simple-error type thrown, so good.
300 ;; How to nest errors within errors?
301 (check-type (check-type (xref *df-test* 1 1) real) ;; => error thrown, so good.
305 (check-type *df-test*
306 dataframe-array) ; nil is good.
308 (integerp (xref *df-test* 1 2))
309 (floatp (xref *df-test* 1 2))
310 (integerp (xref *df-test* 1 3))
311 (type-of (xref *df-test* 1 3))
312 (floatp (xref *df-test* 1 3))
314 (type-of (vector 1 1d0))
320 (xref *df-test* 1 '*)
323 ** CURR [#A] Random Numbers [2/6]
324 - State "CURR" from "TODO" [2010-11-05 Fri 15:41]
325 - State "TODO" from "" [2010-10-14 Thu 00:12]
327 Need to select and choose a probability system (probability
328 functions, random numbers). Goal is to have a general framework
329 for representing probability functions, functionals on
330 probabilities, and reproducible random streams based on such
332 *** CURR [#B] CL-VARIATES system evaluation [2/3]
333 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
334 - State "TODO" from "" [2010-10-12 Tue 14:16]
336 CL-VARIATES is a system developed by Gary W King. It uses streams
337 with seeds, and is hence reproducible. (Random comment: why do CL
338 programmers as a class ignore computational reproducibility?)
339 **** DONE [#B] load and verify
340 - State "DONE" from "CURR" [2010-11-04 Thu 18:59] \\
341 load, init, and verify performance.
342 - State "CURR" from "TODO" [2010-11-04 Thu 18:58]
343 - State "TODO" from "" [2010-11-04 Thu 18:58]
345 <2010-11-30 Tue> : just modified cls.asd to ensure that we load
346 as appropriate the correct random variate package.
348 #+srcname: Loading-CL-VARIATES
350 (in-package :cl-user)
351 (asdf:oos 'asdf:load-op 'cl-variates)
352 (asdf:oos 'asdf:load-op 'cl-variates-test)
356 : #<ASDF:LOAD-OP NIL {C2C30E1}>
360 #+srcname: CL-VARIATES-UNITTESTS
363 (in-package :cl-variates-test)
365 (run-tests :suite 'cl-variates-test)
366 (describe (run-tests :suite 'cl-variates-test))
370 **** DONE [#B] Examples of use
371 - State "DONE" from "CURR" [2010-11-05 Fri 15:39] \\
372 basic example of reproducible draws from the uniform and normal random
374 - State "CURR" from "TODO" [2010-11-05 Fri 15:39]
375 - State "TODO" from "" [2010-11-04 Thu 19:01]
377 #+srcname: CL-VARIATES-REPRO
380 (in-package :cl-variates-user)
382 (defparameter state (make-random-number-generator))
383 (setf (random-seed state) 44)
386 (loop for i from 1 to 10 collect
387 (random-range state 0 10))
388 ;; => (1 5 1 0 7 1 2 2 8 10)
389 (setf (random-seed state) 44)
390 (loop for i from 1 to 10 collect
391 (random-range state 0 10))
392 ;; => (1 5 1 0 7 1 2 2 8 10)
394 (setf (random-seed state) 44)
396 (loop for i from 1 to 10 collect
397 (normal-random state 0 1))
399 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
400 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
401 ;; 0.20750134211656893 -0.14501914108452274)
403 (setf (random-seed state) 44)
404 (loop for i from 1 to 10 collect
405 (normal-random state 0 1))
407 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
408 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
409 ;; 0.20750134211656893 -0.14501914108452274)
413 **** CURR [#B] Full example of general usage
414 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
415 - State "TODO" from "" [2010-11-05 Fri 15:40]
417 What we want to do here is describe the basic available API that
418 is present. So while the previous work describes what the basic
419 reproducibility approach would be in terms of generating lists of
420 reproducible pRNG streams, we need the full range of possible
421 probability laws that are present.
423 One of the good things about cl-variates is that it provides for
424 reproducibility. One of the bad things is that it has a mixed
427 *** TODO [#B] CL-RANDOM system evaluation
428 - State "TODO" from "" [2010-11-05 Fri 15:40]
431 1. no seed setting for random numbers
432 2. contamination of a probability support with optimization and
437 2. nice design for generics.
439 *** TODO [#B] Native CLS (from XLS)
440 - State "TODO" from "" [2010-11-05 Fri 15:40]
442 ** TODO [#B] Numerical Linear Algebra
443 - State "TODO" from "" [2010-10-14 Thu 00:12]
445 *** TODO [#B] LLA evaluation
446 - State "TODO" from "" [2010-10-12 Tue 14:13]
447 ;;; experiments with LLA
448 (in-package :cl-user)
449 (asdf:oos 'asdf:load-op 'lla)
450 (in-package :lla-user)
452 *** CURR [#B] Lisp-Matrix system evaluation
453 - State "CURR" from "TODO" [2010-10-12 Tue 14:13]
454 - State "TODO" from "" [2010-10-12 Tue 14:13]
456 *** TODO [#B] LispLab system evaluation
457 - State "TODO" from "" [2010-10-12 Tue 14:13]
459 ** TODO [#B] Statistical Procedures to implement
460 - State "TODO" from "" [2010-10-14 Thu 00:12]
463 (in-package :cls-user)
468 ;; population design eval and opt
476 number of samples/cost of lab analysis and collection
480 (defun pfim (&key model ( constraints ( summary-function )
482 (list num-subjects num-times list-times))))
486 Each individal has a deisgn psi_i
487 nubmer of samples n_i and sampling times t_{i{1}} t_{i{n_1}}
488 individuals can differ
492 individual-level model
495 (=model y_i (+ (f \theta_i \psi_i) epsilion_i ))
496 (=var \epsilion_i \sigma_between \sigma_within )
498 ;; Information Matrix for pop deisgn
500 (defparameter IM (sum (i 1 N) (MF \psi_i \phi_i)))
503 For nonlinear structureal models, expand around RE=0
505 Cramer-Rao : MF^{-1} is lower bound for estimation variance.
509 - smallest SE, but is a matrix, so
510 - criteria for matrix comparison
511 -- D-opt, (power (determinant MF) (/ 1 P))
514 find design maxing D opt, (power (determinant MF) (/ 1 P))
516 -- contin vars for smapling times within interval or set -- number of groups for cat vars
518 Stat in Med 2009, expansion around post-hoc RE est, not necessarily zero.
520 Example binary covariate C
523 (if (= i reference-class)
528 (=model (log \theta) ( ))
535 PFIM provides for a given design and values of \beta:
537 SE/RSE for \beta of each class of each covar
538 eval influence of design on SE(\beta)
540 inter-occassion variability (IOV)
541 - patients sampled more than once, H occassions
543 - additional vars to estimate
547 ;;; comparison criteria
549 functional of conc/time curve which is used for comparison, i.e.
550 (AUC conc/time-curve)
551 (Cmax conc/time-curve)
552 (Tmax conc/time-curve)
556 (defun conc/time-curve (t)
559 (let ((conc (exp (* t \beta1))))
565 (url-get "www.pfim.biostat.fr")
568 ;;; Thinking of generics...
569 (information-matrix model parameters)
570 (information-matrix variance-matrix)
571 (information-matrix model data)
572 (information-matrix list-of-individual-IMs)
575 (defun IM (loglikelihood parameters times)
576 "Does double work. Sum up the resulting IMs to form a full IM."
577 (let ((IM (make-matrix (length parameters)
579 :initial-value 0.0d0)))
580 (dolist (parameterI parameters)
581 (dolist (parameterJ parameters)
583 (differentiate (differentiate loglikelihood parameterI) parameterJ))))))
585 *** difference between empirical, fisherian, and ...? information.
586 *** Example of Integration with CL-GENOMIC
587 - State "TODO" from "" [2010-10-12 Tue 14:03]
589 CL-GENOMIC is a very interesting data-structure strategy for
590 manipulating sequence data.
594 (in-package :cl-user)
595 (asdf:oos 'asdf:compile-op :ironclad)
596 (asdf:oos 'asdf:load-op :cl-genomic)
598 (in-package :bio-sequence)
599 (make-dna "agccg") ;; fine
600 (make-aa "agccg") ;; fine
601 (make-aa "agc9zz") ;; error expected
604 ** TODO [#B] Documentation and Examples [0/3]
605 - State "TODO" from "" [2010-10-14 Thu 00:12]
607 *** TODO [#B] Docudown
608 - State "TODO" from "" [2010-11-05 Fri 15:34]
611 - State "TODO" from "" [2010-11-05 Fri 15:34]
613 *** TODO [#B] CLPDF, and literate data analysis
614 - State "TODO" from "" [2010-11-05 Fri 15:34]
617 Place proposals for features, work, etc here...
618 ** <2011-12-29 Thu> new stuff
619 First new proposal is to track proposals.
621 This project is dedicated to all the lisp hackers out there who
622 provided the basic infrastructure to get so far so fast with minimal