2 Time-stamp: <2012-10-04 02:40:24 tony>
3 Creation: <2008-09-08 08:06:30 tony>
8 Author: AJ Rossini <blindglobe@gmail.com>
9 Copyright: (c) 2007-2010, AJ Rossini <blindglobe@gmail.com>. BSD.
10 Purpose: Stuff that needs to be made working sits inside the
13 This file contains the current challenges to solve,
14 including a description of the setup and the work to
15 solve. Solutions welcome.
17 What is this talk of 'release'? Klingons do not make software
18 'releases'. Our software 'escapes', leaving a bloody trail of
19 designers and quality assurance people in its wake.
21 * Design Documentation
23 This section provides some of the details regarding infrastructure of
24 the system, as a means of including TODO statements.
26 ** (Internal) Package and (External) System Hierarchy
28 current (possibly incomplete) set of lisp dependencies <2012-10-04 Thu>
30 0 ~/sandbox/xarray.git
31 1 ~/sandbox/foreign-numeric-vector.git
32 2 ~/sandbox/trivial-features.git
33 3 ~/sandbox/alexandria.git
36 6 ~/sandbox/cl-utilities
37 7 ~/sandbox/metabang-bind.git
38 8 ~/sandbox/iterate.git
39 9 ~/sandbox/array-operations.git
42 *** Singletons (primary building blocks)
44 These are packages as well as
46 | asdf | common system loader |
47 | xarray | common access structure to array-like |
48 | | (matrix, vector) structures. |
49 | cls-config | initialization of Lisp state, variables, etc, |
50 | | localization to the particular lisp. |
51 | lift | unit-testing |
52 | cffi | foriegn function library |
58 as of now, all are in QL heirarchy
60 *** Dependency structure
62 | lisp-matrix | general purpose matrix package, linking to lapack | |
63 | | for numerics. Depends on: | |
66 | | cl-blapack | cffi |
68 | cls-dataframe | in the same spirit as lisp-matrix, a means to | |
69 | | create tables. Perhaps better called datatables? | |
70 | cls-probability | depends on gsll, cl-variates, cl-? initially, | |
74 **** Random number streams and probability calculus
77 Something for random numbers, that has a settable seed. However,
78 we could pass on the "right thing" in favor of something that
82 **** import of data in text files
84 CSV and similar specialized imports
98 Usually, we need to load it before going on.
105 though sometimes we might want to recompile fully. (Can this be
106 done via QL? Need to check)
108 #+name: compile-it-all
110 (asdf:oos 'asdf:compile-op :cls :force T)
115 - State "DONE" from "CURR" [2010-10-12 Tue 13:48] \\
116 setup is mostly complete
117 - State "CURR" from "TODO" [2010-10-12 Tue 13:47]
118 - State "TODO" from "" [2010-10-12 Tue 13:47]
119 This is an example of a custom setup, not really interesting at
120 this point except to remind Tony how to program.
123 (in-package :cl-user)
125 (defun init-CLS (&key (compile 'nil))
126 (let ((packagesToLoad (list ;; core system
127 :lift :lisp-matrix :cls
129 ;; :cl-cairo2-x11 :iterate
132 :cl-pdf :cl-typesetting
134 :asdf-system-connections :xarray
136 :metatilities-base :anaphora :tinaa
137 :cl-ppcre :cl-markdown :docudown
138 ;; version and validate CLOS objects
139 ;; :versioned-objects :validations
142 ;; :cl-glu :cl-glut :cl-glut-examples
147 (mapcar #'(lambda (x)
149 (asdf:oos 'asdf:compile-op x :force T)
150 (asdf:oos 'asdf:load-op x)))
153 (init-CLS)) ;; vs (init-CLS :compile T)
157 | | #<PACKAGE "COMMON-LISP-USER"> |
159 ** TODO [#A] Integrate with quicklist support.
160 - State "TODO" from "" [2010-11-30 Tue 18:00]
162 important to merge with quicklisp system loader support.
163 ** CURR [#A] Testing: unit, regression, examples. [0/3]
164 - State "CURR" from "TODO" [2010-10-12 Tue 13:51]
165 - State "TODO" from "" [2010-10-12 Tue 13:51]
166 Testing consists of unit tests, which internally verify subsets of
167 code, regression tests, and functional tests (in increasing order
169 *** CURR [#B] Unit tests
170 - State "CURR" from "TODO" [2010-11-04 Thu 18:33]
171 - State "CURR" from "TODO" [2010-10-12 Tue 13:48]
172 - State "TODO" from "" [2010-10-12 Tue 13:48]
173 Unit tests have been started using LIFT. Need to consider some of
174 the other systems that provide testing, when people add them to the
175 mix of libraries that we need, along with examples of how to use.
177 #+srcname: ex-cls-unittest
179 (in-package :lisp-stat-unittests)
180 (run-tests :suite 'lisp-stat-ut)
184 : #<Results for LISP-STAT-UT 78 Tests, 7 Failures, 20 Errors>
186 ;; => tests = 78, failures = 7, errors = 20
190 The following needs to be solved in order to have a decent
191 installation qualification (IQ) and performance qualification (PQ)
193 #+srcname: cls-unittest
195 (in-package :lisp-stat-unittests)
196 (asdf:oos 'asdf:test-op 'cls)
197 ;; which runs (describe (run-tests :suite 'lisp-stat-ut))
202 and check documentation to see if it is useful.
205 (in-package :lisp-stat-unittests)
207 (describe 'lisp-stat-ut)
208 (documentation 'lisp-stat-ut 'type)
210 ;; FIXME: Example: currently not relevant, yet
211 ;; (describe (lift::run-test :test-case 'lisp-stat-unittests::create-proto
212 ;; :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
214 (describe (lift::run-tests :suite 'lisp-stat-ut-dataframe))
215 (lift::run-tests :suite 'lisp-stat-ut-dataframe)
217 (describe (lift::run-test
218 :test-case 'lisp-stat-unittests::create-proto
219 :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
222 *** TODO [#B] Regression Tests
223 - State "TODO" from "" [2010-10-12 Tue 13:54]
225 *** TODO [#B] Functional Tests
226 - State "TODO" from "" [2010-10-12 Tue 13:54]
228 ** CURR [#B] Functional Examples that need to work [1/3]
229 - State "CURR" from "TODO" [2010-11-30 Tue 17:57]
230 - State "TODO" from "" [2010-10-12 Tue 13:55]
232 These examples should be functional forms within CLS, describing
233 working functionality which is needed for work.
234 *** TODO [#A] Dataframe creation
235 Illustration via a file, that we need to get working so that we
236 can get data in-and-out of CLS structures.
238 #+BEGIN_SRC lisp :export examples/example-DF-creation.lisp
239 ;;; -*- mode: lisp -*-
240 ;;; Copyright (c) 2006-2012, by A.J. Rossini <blindglobe@gmail.com>
241 ;;; See COPYRIGHT file for any additional restrictions (BSD license).
242 ;;; Since 1991, ANSI was finally finished. Edited for ANSI Common Lisp.
244 ;;; Time-stamp: <2012-10-04 02:16:45 tony>
245 ;;; Creation: <2012-07-01 11:29:42 tony>
246 ;;; File: example.lisp
247 ;;; Author: AJ Rossini <blindglobe@gmail.com>
248 ;;; Copyright: (c) 2012, AJ Rossini. BSD.
249 ;;; Purpose: example of possible usage.
251 ;;; What is this talk of 'release'? Klingons do not make software
252 ;;; 'releases'. Our software 'escapes', leaving a bloody trail of
253 ;;; designers and quality assurance people in its wake.
259 ;; use the example package...
260 (in-package :cls-user)
263 ;; or better yet, create a package/namespace for the particular problem being attacked.
264 (defpackage :my-package-user
265 (:documentation "demo of how to put serious work should be placed in
266 a similar package elsewhere for reproducibility. This hints as to
267 what needs to be done for a user- or analysis-package.")
268 (:nicknames :my-clswork-user)
269 (:use :common-lisp ; always needed for user playgrounds!
270 :lisp-matrix ; we only need the packages that we need...
271 :common-lisp-statistics
272 :lisp-stat-data-examples) ;; this ensures access to a data package
273 (:export summarize-data summarize-results this-data this-report)
274 (:shadowing-import-from :lisp-stat call-method call-next-method
276 expt + - * / ** mod rem abs 1+ 1- log exp sqrt sin cos tan
277 asin acos atan sinh cosh tanh asinh acosh atanh float random
278 truncate floor ceiling round minusp zerop plusp evenp oddp
279 < <= = /= >= > > ;; complex
280 conjugate realpart imagpart phase
281 min max logand logior logxor lognot ffloor fceiling
282 ftruncate fround signum cis
286 (in-package :my-clswork-user)
288 ;; create some data by hand using arrays, and demonstrate access.
290 (let ((myArray #2A((1 2 3)(4 5 6)))
291 (myDF (make-dataframe #2A((1 2 3)(4 5 6))))
292 (myLOL (list (list 1 2 3) (list 4 5 6)))
293 ;; FIXME: listoflist conversion does not work.
294 ;; (myDFlol (make-dataframe '(list ((1 2 3)(4 5 6)))))
297 (= (xref myArray 1 1)
302 *** TODO [#B] Scoping with datasets
303 - State "TODO" from "" [2010-11-04 Thu 18:46]
305 The following needs to work, and a related syntax for resampling
306 and similar synthetic data approaches (bootstrapping, imputation)
307 ought to use similar syntax as well.
308 #+srcname: DataSetNameScoping
310 (in-package :ls-user)
312 ;; Syntax examples using lexical scope, closures, and bindings to
313 ;; ensure a clean communication of results
314 ;; This is actually a bit tricky, since we need to clarify whether
315 ;; it is line-at-a-time that we are considering or if there is
316 ;; another mapping strategy. In particular, one could imagine a
317 ;; looping-over-observations function, or a
318 ;; looping-over-independent-observations function which leverages a
319 ;; grouping variable which provides guidance for what is considered
320 ;; independent from the sampling frame being considered. The frame
321 ;; itself (definable via some form of metadata to clarify scope?)
322 ;; could clearly provide a bit of relativity for clarifying what
323 ;; statistical independence means.
325 (with-data dataset ((dsvarname1 [usevarname1])
326 (dsvarname2 [usevarname2]))
329 ;; SAS-centric approach to spec'ing work
330 (looping-over-observations
331 dataset ((dsvarname1 [usevarname1])
332 (dsvarname2 [usevarname2]))
335 ;; SAS plus "statistical sensibility"... for example, if an
336 ;; independent observation actually consists of many observations so
337 ;; that a dataframe of independence results -- for example,
338 ;; longitudinal data or spatial data or local-truncated network data
339 ;; are clean examples of such happening -- then we get the data
340 ;; frame or row representing the independent result.
341 (looping-over-independent-observations
342 dataset independence-defining-variable
343 ((dsvarname1 [usevarname1])
344 (dsvarname2 [usevarname2]))
349 *** DONE [#B] Dataframe variable typing
350 - State "DONE" from "CURR" [2010-11-30 Tue 17:56] \\
351 check-type approach works, we would just have to throw a catchable
352 error if we want to use it in a reliable fashion.
353 - State "CURR" from "TODO" [2010-11-30 Tue 17:56]
354 - State "TODO" from "" [2010-11-04 Thu 18:48]
356 Seems to generally work, need to ensure that we use this for
359 #+srcname: DFvarTyping
361 (in-package :ls-user)
362 (defparameter *df-test*
363 (make-instance 'dataframe-array
364 :storage #2A (('a "test0" 0 0d0)
370 :case-labels (list "0" "1" 2 "3" "4")
371 :var-labels (list "symbol" "string" "integer" "double-float")
372 :var-types (list 'symbol 'string 'integer 'double-float)))
374 ;; with SBCL, ints become floats? Need to adjust output
375 ;; representation appropriately..
378 (defun check-var (df colnum)
379 (let ((nobs (xdim (dataset df) 0)))
381 (check-type (xref df i colnum) (elt (var-types df) i)))))
383 (xdim (dataset *df-test*) 1)
384 (xdim (dataset *df-test*) 0)
386 (check-var *df-test* 0)
389 (xref *df-test* 1 1))
391 (check-type (xref *df-test* 1 1)
392 string) ;; => nil, so good.
393 (check-type (xref *df-test* 1 1)
394 vector) ;; => nil, so good.
395 (check-type (xref *df-test* 1 1)
396 real) ;; => simple-error type thrown, so good.
398 ;; How to nest errors within errors?
399 (check-type (check-type (xref *df-test* 1 1) real) ;; => error thrown, so good.
403 (check-type *df-test*
404 dataframe-array) ; nil is good.
406 (integerp (xref *df-test* 1 2))
407 (floatp (xref *df-test* 1 2))
408 (integerp (xref *df-test* 1 3))
409 (type-of (xref *df-test* 1 3))
410 (floatp (xref *df-test* 1 3))
412 (type-of (vector 1 1d0))
418 (xref *df-test* 1 '*)
421 ** CURR [#A] Random Numbers [2/6]
422 - State "CURR" from "TODO" [2010-11-05 Fri 15:41]
423 - State "TODO" from "" [2010-10-14 Thu 00:12]
425 Need to select and choose a probability system (probability
426 functions, random numbers). Goal is to have a general framework
427 for representing probability functions, functionals on
428 probabilities, and reproducible random streams based on such
430 *** CURR [#B] CL-VARIATES system evaluation [2/3]
431 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
432 - State "TODO" from "" [2010-10-12 Tue 14:16]
434 CL-VARIATES is a system developed by Gary W King. It uses streams
435 with seeds, and is hence reproducible. (Random comment: why do CL
436 programmers as a class ignore computational reproducibility?)
437 **** DONE [#B] load and verify
438 - State "DONE" from "CURR" [2010-11-04 Thu 18:59] \\
439 load, init, and verify performance.
440 - State "CURR" from "TODO" [2010-11-04 Thu 18:58]
441 - State "TODO" from "" [2010-11-04 Thu 18:58]
443 <2010-11-30 Tue> : just modified cls.asd to ensure that we load
444 as appropriate the correct random variate package.
446 #+srcname: Loading-CL-VARIATES
448 (in-package :cl-user)
449 (asdf:oos 'asdf:load-op 'cl-variates)
450 (asdf:oos 'asdf:load-op 'cl-variates-test)
454 : #<ASDF:LOAD-OP NIL {C2C30E1}>
458 #+srcname: CL-VARIATES-UNITTESTS
461 (in-package :cl-variates-test)
463 (run-tests :suite 'cl-variates-test)
464 (describe (run-tests :suite 'cl-variates-test))
468 **** DONE [#B] Examples of use
469 - State "DONE" from "CURR" [2010-11-05 Fri 15:39] \\
470 basic example of reproducible draws from the uniform and normal random
472 - State "CURR" from "TODO" [2010-11-05 Fri 15:39]
473 - State "TODO" from "" [2010-11-04 Thu 19:01]
475 #+srcname: CL-VARIATES-REPRO
478 (in-package :cl-variates-user)
480 (defparameter state (make-random-number-generator))
481 (setf (random-seed state) 44)
484 (loop for i from 1 to 10 collect
485 (random-range state 0 10))
486 ;; => (1 5 1 0 7 1 2 2 8 10)
487 (setf (random-seed state) 44)
488 (loop for i from 1 to 10 collect
489 (random-range state 0 10))
490 ;; => (1 5 1 0 7 1 2 2 8 10)
492 (setf (random-seed state) 44)
494 (loop for i from 1 to 10 collect
495 (normal-random state 0 1))
497 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
498 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
499 ;; 0.20750134211656893 -0.14501914108452274)
501 (setf (random-seed state) 44)
502 (loop for i from 1 to 10 collect
503 (normal-random state 0 1))
505 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
506 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
507 ;; 0.20750134211656893 -0.14501914108452274)
511 **** CURR [#B] Full example of general usage
512 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
513 - State "TODO" from "" [2010-11-05 Fri 15:40]
515 What we want to do here is describe the basic available API that
516 is present. So while the previous work describes what the basic
517 reproducibility approach would be in terms of generating lists of
518 reproducible pRNG streams, we need the full range of possible
519 probability laws that are present.
521 One of the good things about cl-variates is that it provides for
522 reproducibility. One of the bad things is that it has a mixed
525 *** TODO [#B] CL-RANDOM system evaluation
526 - State "TODO" from "" [2010-11-05 Fri 15:40]
529 1. no seed setting for random numbers
530 2. contamination of a probability support with optimization and
535 2. nice design for generics.
537 *** TODO [#B] Native CLS (from XLS)
538 - State "TODO" from "" [2010-11-05 Fri 15:40]
540 ** TODO [#B] Numerical Linear Algebra
541 - State "TODO" from "" [2010-10-14 Thu 00:12]
543 *** TODO [#B] LLA evaluation
544 - State "TODO" from "" [2010-10-12 Tue 14:13]
545 ;;; experiments with LLA
546 (in-package :cl-user)
547 (asdf:oos 'asdf:load-op 'lla)
548 (in-package :lla-user)
550 *** CURR [#B] Lisp-Matrix system evaluation
551 - State "CURR" from "TODO" [2010-10-12 Tue 14:13]
552 - State "TODO" from "" [2010-10-12 Tue 14:13]
554 *** TODO [#B] LispLab system evaluation
555 - State "TODO" from "" [2010-10-12 Tue 14:13]
557 ** TODO [#B] Statistical Procedures to implement
558 - State "TODO" from "" [2010-10-14 Thu 00:12]
561 (in-package :cls-user)
566 ;; population design eval and opt
574 number of samples/cost of lab analysis and collection
578 (defun pfim (&key model ( constraints ( summary-function )
580 (list num-subjects num-times list-times))))
584 Each individal has a deisgn psi_i
585 nubmer of samples n_i and sampling times t_{i{1}} t_{i{n_1}}
586 individuals can differ
590 individual-level model
593 (=model y_i (+ (f \theta_i \psi_i) epsilion_i ))
594 (=var \epsilion_i \sigma_between \sigma_within )
596 ;; Information Matrix for pop deisgn
598 (defparameter IM (sum (i 1 N) (MF \psi_i \phi_i)))
601 For nonlinear structureal models, expand around RE=0
603 Cramer-Rao : MF^{-1} is lower bound for estimation variance.
607 - smallest SE, but is a matrix, so
608 - criteria for matrix comparison
609 -- D-opt, (power (determinant MF) (/ 1 P))
612 find design maxing D opt, (power (determinant MF) (/ 1 P))
614 -- contin vars for smapling times within interval or set -- number of groups for cat vars
616 Stat in Med 2009, expansion around post-hoc RE est, not necessarily zero.
618 Example binary covariate C
621 (if (= i reference-class)
626 (=model (log \theta) ( ))
633 PFIM provides for a given design and values of \beta:
635 SE/RSE for \beta of each class of each covar
636 eval influence of design on SE(\beta)
638 inter-occassion variability (IOV)
639 - patients sampled more than once, H occassions
641 - additional vars to estimate
645 ;;; comparison criteria
647 functional of conc/time curve which is used for comparison, i.e.
648 (AUC conc/time-curve)
649 (Cmax conc/time-curve)
650 (Tmax conc/time-curve)
654 (defun conc/time-curve (t)
657 (let ((conc (exp (* t \beta1))))
663 (url-get "www.pfim.biostat.fr")
666 ;;; Thinking of generics...
667 (information-matrix model parameters)
668 (information-matrix variance-matrix)
669 (information-matrix model data)
670 (information-matrix list-of-individual-IMs)
673 (defun IM (loglikelihood parameters times)
674 "Does double work. Sum up the resulting IMs to form a full IM."
675 (let ((IM (make-matrix (length parameters)
677 :initial-value 0.0d0)))
678 (dolist (parameterI parameters)
679 (dolist (parameterJ parameters)
681 (differentiate (differentiate loglikelihood parameterI) parameterJ))))))
683 *** difference between empirical, fisherian, and ...? information.
684 *** Example of Integration with CL-GENOMIC
685 - State "TODO" from "" [2010-10-12 Tue 14:03]
687 CL-GENOMIC is a very interesting data-structure strategy for
688 manipulating sequence data.
692 (in-package :cl-user)
693 (asdf:oos 'asdf:compile-op :ironclad)
694 (asdf:oos 'asdf:load-op :cl-genomic)
696 (in-package :bio-sequence)
697 (make-dna "agccg") ;; fine
698 (make-aa "agccg") ;; fine
699 (make-aa "agc9zz") ;; error expected
702 ** TODO [#B] Documentation and Examples [0/3]
703 - State "TODO" from "" [2010-10-14 Thu 00:12]
705 *** TODO [#B] Docudown
706 - State "TODO" from "" [2010-11-05 Fri 15:34]
709 - State "TODO" from "" [2010-11-05 Fri 15:34]
711 *** TODO [#B] CLPDF, and literate data analysis
712 - State "TODO" from "" [2010-11-05 Fri 15:34]
715 Place proposals for features, work, etc here...
716 ** <2011-12-29 Thu> new stuff
717 First new proposal is to track proposals.
719 This project is dedicated to all the lisp hackers out there who
720 provided the basic infrastructure to get so far so fast with minimal