1 #+TODO: TODO CURR | DONE
2 #+TODO: POSTPONED | CANCELED
5 Time-stamp: <2012-10-12 09:00:45 tony>
6 Creation: <2008-09-08 08:06:30 tony>
11 Author: AJ Rossini <blindglobe@gmail.com>
12 Copyright: (c) 2007-2010, AJ Rossini <blindglobe@gmail.com>. BSD.
13 Purpose: Stuff that needs to be made working sits inside the
16 This file contains the current challenges to solve,
17 including a description of the setup and the work to
18 solve. Solutions welcome.
20 What is this talk of 'release'? Klingons do not make software
21 'releases'. Our software 'escapes', leaving a bloody trail of
22 designers and quality assurance people in its wake.
24 * Approach and Design, Strategy and Tactics
28 Please develop possible high level API code in examples
29 subdirectory -- when we get it "reasonable", migrate into
30 appropriate core code directories in the src subdirectory.
32 ** (Internal) Package and (External) System Hierarchy
34 This section provides some of the details regarding infrastructure of
37 current (possibly incomplete) set of lisp dependencies <2012-10-04 Thu>
39 0 ~/sandbox/xarray.git
40 1 ~/sandbox/foreign-numeric-vector.git
41 2 ~/sandbox/trivial-features.git
42 3 ~/sandbox/alexandria.git
45 6 ~/sandbox/cl-utilities
46 7 ~/sandbox/metabang-bind.git
47 8 ~/sandbox/iterate.git
48 9 ~/sandbox/array-operations.git
51 *** Singletons (primary building blocks)
53 These are packages as well as
55 | asdf | common system loader |
56 | xarray | common access structure to array-like |
57 | | (matrix, vector) structures. |
58 | cls-config | initialization of Lisp state, variables, etc, |
59 | | localization to the particular lisp. |
60 | lift | unit-testing |
61 | cffi | foriegn function library |
67 as of now, all are in QL heirarchy
69 *** Dependency structure
71 | lisp-matrix | general purpose matrix package, linking to lapack | |
72 | | for numerics. Depends on: | |
75 | | cl-blapack | cffi |
77 | cls-dataframe | in the same spirit as lisp-matrix, a means to | |
78 | | create tables. Perhaps better called datatables? | |
79 | cls-probability | depends on gsll, cl-variates, cl-? initially, | |
83 **** Random number streams and probability calculus
86 Something for random numbers, that has a settable seed. However,
87 we could pass on the "right thing" in favor of something that
91 **** import of data in text files
93 CSV and similar specialized imports
107 Usually, we need to load it everything before going on.
117 and sometimes we might want to recompile fully:
119 #+name: recompile-it-all
121 (asdf:oos 'asdf:compile-op :cls :force T)
124 Currently <2012-10-10 Wed> QuickLisp support doesn't provide a
125 recompilation facility. And QL is built over and partially extends
126 ASDF, so we should be fine for now.
128 ** DONE [#B] Example of Custom Data analysis set up
129 - State "DONE" from "CURR" [2010-10-12 Tue 13:48] \\
130 setup is mostly complete
131 - State "CURR" from "TODO" [2010-10-12 Tue 13:47]
132 - State "TODO" from "" [2010-10-12 Tue 13:47]
134 This is an example of a custom setup, not really interesting at
135 this point (it will hopefully be obsolete by the first release)
136 except to remind Tony how to program. Pointy-headed managers need
137 any support they can find in order to regress to their
140 The only point of this section is to illustrate that we could want
141 to load additional modules that are not a central part of the core
145 #+begin_src lisp :tangle "examples/CustomLoader.lisp"
146 ;; always ensure we are in the right package to leave droppings and access functionality
147 (in-package :cl-user)
149 (defun init-CLS (&key (compile 'nil))
150 (let ((packagesToLoad (list ;; core system
151 :lift :lisp-matrix :cls
153 ;; :cl-cairo2-x11 :iterate
156 :cl-pdf :cl-typesetting
158 :asdf-system-connections :xarray
160 :metatilities-base :anaphora :tinaa
161 :cl-ppcre :cl-markdown :docudown
162 ;; version and validate CLOS objects
163 ;; :versioned-objects :validations
166 ;; :cl-glu :cl-glut :cl-glut-examples
171 (mapcar #'(lambda (x)
173 (asdf:oos 'asdf:compile-op x :force T)
174 (asdf:oos 'asdf:load-op x)))
176 ;; (init-CLS :compile T) vs:
181 | | #<PACKAGE "COMMON-LISP-USER"> |
183 ** CURR [#A] Integrate with quicklist support.
185 important to merge with quicklisp system loader support. We
186 currently have some of this work integrated, but I think there are
187 a few systems which are not auto-installable.
189 ** CURR [#A] Testing: unit, regression, examples. [0/3]
190 - State "CURR" from "TODO" [2010-10-12 Tue 13:51]
191 - State "TODO" from "" [2010-10-12 Tue 13:51]
192 Testing consists of unit tests, which internally verify subsets of
193 code, regression tests, and functional tests (in increasing order
195 *** CURR [#B] Unit tests
196 - State "CURR" from "TODO" [2010-11-04 Thu 18:33]
197 - State "CURR" from "TODO" [2010-10-12 Tue 13:48]
198 - State "TODO" from "" [2010-10-12 Tue 13:48]
199 Unit tests have been started using LIFT. Need to consider some of
200 the other systems that provide testing, when people add them to the
201 mix of libraries that we need, along with examples of how to use.
203 #+name: ex-cls-unittest
205 (in-package :lisp-stat-unittests)
206 (run-tests :suite 'lisp-stat-ut)
210 : #<Results for LISP-STAT-UT 78 Tests, 7 Failures, 20 Errors>
212 ;; => tests = 78, failures = 7, errors = 20
216 The following needs to be solved in order to have a decent
217 installation qualification (IQ) and performance qualification (PQ)
221 (in-package :lisp-stat-unittests)
222 (asdf:oos 'asdf:test-op 'cls)
223 ;; which runs (describe (run-tests :suite 'lisp-stat-ut))
228 and check documentation to see if it is useful.
231 (in-package :lisp-stat-unittests)
233 (describe 'lisp-stat-ut)
234 (documentation 'lisp-stat-ut 'type)
236 ;; FIXME: Example: currently not relevant, yet
237 ;; (describe (lift::run-test :test-case 'lisp-stat-unittests::create-proto
238 ;; :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
240 (describe (lift::run-tests :suite 'lisp-stat-ut-dataframe))
241 (lift::run-tests :suite 'lisp-stat-ut-dataframe)
243 (describe (lift::run-test
244 :test-case 'lisp-stat-unittests::create-proto
245 :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
248 *** TODO [#B] Regression Tests
249 - State "TODO" from "" [2010-10-12 Tue 13:54]
251 *** TODO [#B] Functional Tests
252 - State "TODO" from "" [2010-10-12 Tue 13:54]
254 ** CURR [#B] Functional Examples that need to work [1/3]
255 - State "CURR" from "TODO" [2010-11-30 Tue 17:57]
256 - State "TODO" from "" [2010-10-12 Tue 13:55]
258 These examples should be functional forms within CLS, describing
259 working functionality which is needed for work.
260 *** TODO [#A] Dataframe creation
261 Illustration via a file, that we need to get working so that we
262 can get data in-and-out of CLS structures.
264 #+BEGIN_SRC lisp :export examples/example-DF-creation.lisp
265 ;;; -*- mode: lisp -*-
266 ;;; Copyright (c) 2006-2012, by A.J. Rossini <blindglobe@gmail.com>
267 ;;; See COPYRIGHT file for any additional restrictions (BSD license).
268 ;;; Since 1991, ANSI was finally finished. Edited for ANSI Common Lisp.
270 ;;; Time-stamp: <2012-10-04 02:16:45 tony>
271 ;;; Creation: <2012-07-01 11:29:42 tony>
272 ;;; File: example.lisp
273 ;;; Author: AJ Rossini <blindglobe@gmail.com>
274 ;;; Copyright: (c) 2012, AJ Rossini. BSD.
275 ;;; Purpose: example of possible usage.
277 ;;; What is this talk of 'release'? Klingons do not make software
278 ;;; 'releases'. Our software 'escapes', leaving a bloody trail of
279 ;;; designers and quality assurance people in its wake.
285 ;; use the example package...
286 (in-package :cls-user)
289 ;; or better yet, create a package/namespace for the particular problem being attacked.
290 (defpackage :my-package-user
291 (:documentation "demo of how to put serious work should be placed in
292 a similar package elsewhere for reproducibility. This hints as to
293 what needs to be done for a user- or analysis-package.")
294 (:nicknames :my-clswork-user)
295 (:use :common-lisp ; always needed for user playgrounds!
296 :lisp-matrix ; we only need the packages that we need...
297 :common-lisp-statistics
298 :lisp-stat-data-examples) ;; this ensures access to a data package
299 (:export summarize-data summarize-results this-data this-report)
300 (:shadowing-import-from :lisp-stat call-method call-next-method
302 expt + - * / ** mod rem abs 1+ 1- log exp sqrt sin cos tan
303 asin acos atan sinh cosh tanh asinh acosh atanh float random
304 truncate floor ceiling round minusp zerop plusp evenp oddp
305 < <= = /= >= > > ;; complex
306 conjugate realpart imagpart phase
307 min max logand logior logxor lognot ffloor fceiling
308 ftruncate fround signum cis
312 (in-package :my-clswork-user)
314 ;; create some data by hand using arrays, and demonstrate access.
316 (let ((myArray #2A((1 2 3)(4 5 6)))
317 (myDF (make-dataframe #2A((1 2 3)(4 5 6))))
318 (myLOL (list (list 1 2 3) (list 4 5 6)))
319 ;; FIXME: listoflist conversion does not work.
320 ;; (myDFlol (make-dataframe '(list ((1 2 3)(4 5 6)))))
323 (= (xref myArray 1 1)
328 *** TODO [#B] Scoping with datasets
329 - State "TODO" from "" [2010-11-04 Thu 18:46]
331 The following needs to work, and a related syntax for resampling
332 and similar synthetic data approaches (bootstrapping, imputation)
333 ought to use similar syntax as well.
334 #+name: DataSetNameScoping
336 (in-package :ls-user)
338 ;; Syntax examples using lexical scope, closures, and bindings to
339 ;; ensure a clean communication of results
340 ;; This is actually a bit tricky, since we need to clarify whether
341 ;; it is line-at-a-time that we are considering or if there is
342 ;; another mapping strategy. In particular, one could imagine a
343 ;; looping-over-observations function, or a
344 ;; looping-over-independent-observations function which leverages a
345 ;; grouping variable which provides guidance for what is considered
346 ;; independent from the sampling frame being considered. The frame
347 ;; itself (definable via some form of metadata to clarify scope?)
348 ;; could clearly provide a bit of relativity for clarifying what
349 ;; statistical independence means.
351 (with-data dataset ((dsvarname1 [usevarname1])
352 (dsvarname2 [usevarname2]))
355 ;; SAS-centric approach to spec'ing work
356 (looping-over-observations
357 dataset ((dsvarname1 [usevarname1])
358 (dsvarname2 [usevarname2]))
361 ;; SAS plus "statistical sensibility"... for example, if an
362 ;; independent observation actually consists of many observations so
363 ;; that a dataframe of independence results -- for example,
364 ;; longitudinal data or spatial data or local-truncated network data
365 ;; are clean examples of such happening -- then we get the data
366 ;; frame or row representing the independent result.
367 (looping-over-independent-observations
368 dataset independence-defining-variable
369 ((dsvarname1 [usevarname1])
370 (dsvarname2 [usevarname2]))
375 *** DONE [#B] Dataframe variable typing
376 - State "DONE" from "CURR" [2010-11-30 Tue 17:56] \\
377 check-type approach works, we would just have to throw a catchable
378 error if we want to use it in a reliable fashion.
379 - State "CURR" from "TODO" [2010-11-30 Tue 17:56]
380 - State "TODO" from "" [2010-11-04 Thu 18:48]
382 Seems to generally work, need to ensure that we use this for
387 (in-package :ls-user)
388 (defparameter *df-test*
389 (make-instance 'dataframe-array
390 :storage #2A (('a "test0" 0 0d0)
396 :case-labels (list "0" "1" 2 "3" "4")
397 :var-labels (list "symbol" "string" "integer" "double-float")
398 :var-types (list 'symbol 'string 'integer 'double-float)))
400 ;; with SBCL, ints become floats? Need to adjust output
401 ;; representation appropriately..
404 (defun check-var (df colnum)
405 (let ((nobs (xdim (dataset df) 0)))
407 (check-type (xref df i colnum) (elt (var-types df) i)))))
409 (xdim (dataset *df-test*) 1)
410 (xdim (dataset *df-test*) 0)
412 (check-var *df-test* 0)
415 (xref *df-test* 1 1))
417 (check-type (xref *df-test* 1 1)
418 string) ;; => nil, so good.
419 (check-type (xref *df-test* 1 1)
420 vector) ;; => nil, so good.
421 (check-type (xref *df-test* 1 1)
422 real) ;; => simple-error type thrown, so good.
424 ;; How to nest errors within errors?
425 (check-type (check-type (xref *df-test* 1 1) real) ;; => error thrown, so good.
429 (check-type *df-test*
430 dataframe-array) ; nil is good.
432 (integerp (xref *df-test* 1 2))
433 (floatp (xref *df-test* 1 2))
434 (integerp (xref *df-test* 1 3))
435 (type-of (xref *df-test* 1 3))
436 (floatp (xref *df-test* 1 3))
438 (type-of (vector 1 1d0))
444 (xref *df-test* 1 '*)
447 ** CURR [#A] Random Numbers [2/6]
448 - State "CURR" from "TODO" [2010-11-05 Fri 15:41]
449 - State "TODO" from "" [2010-10-14 Thu 00:12]
451 Need to select and choose a probability system (probability
452 functions, random numbers). Goal is to have a general framework
453 for representing probability functions, functionals on
454 probabilities, and reproducible random streams based on such
456 *** CURR [#B] CL-VARIATES system evaluation [2/3]
457 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
458 - State "TODO" from "" [2010-10-12 Tue 14:16]
460 CL-VARIATES is a system developed by Gary W King. It uses streams
461 with seeds, and is hence reproducible. (Random comment: why do CL
462 programmers as a class ignore computational reproducibility?)
464 The main problem with this system is licensing. It has a weird
465 licensing schema which prevents
467 #+name: Loading-CL-VARIATES
469 (in-package :cl-user)
470 (ql:quickload :cl-variates)
471 ;;(ql:quickload :cl-variates-test)
474 #+name: CL-VARIATES-UNITTESTS
476 (in-package :cl-variates-test)
478 (run-tests :suite 'cl-variates-test)
479 (describe (run-tests :suite 'cl-variates-test))
482 basic example of reproducible draws from the uniform and normal
483 random number streams.
485 #+name: CL-VARIATES-REPRO
488 (in-package :cl-variates-user)
490 (defparameter state (make-random-number-generator))
491 (setf (random-seed state) 44)
494 (loop for i from 1 to 10 collect
495 (random-range state 0 10))
496 ;; => (1 5 1 0 7 1 2 2 8 10)
497 (setf (random-seed state) 44)
498 (loop for i from 1 to 10 collect
499 (random-range state 0 10))
500 ;; => (1 5 1 0 7 1 2 2 8 10)
502 (setf (random-seed state) 44)
504 (loop for i from 1 to 10 collect
505 (normal-random state 0 1))
507 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
508 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
509 ;; 0.20750134211656893 -0.14501914108452274)
511 (setf (random-seed state) 44)
512 (loop for i from 1 to 10 collect
513 (normal-random state 0 1))
515 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
516 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
517 ;; 0.20750134211656893 -0.14501914108452274)
521 **** CURR [#B] Full example of general usage
522 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
523 - State "TODO" from "" [2010-11-05 Fri 15:40]
525 What we want to do here is describe the basic available API that
526 is present. So while the previous work describes what the basic
527 reproducibility approach would be in terms of generating lists of
528 reproducible pRNG streams, we need the full range of possible
529 probability laws that are present.
531 One of the good things about cl-variates is that it provides for
532 reproducibility. One of the bad things is that it has a mixed
535 *** TODO [#B] CL-RANDOM system evaluation
536 - State "TODO" from "" [2010-11-05 Fri 15:40]
539 1. no seed setting for random numbers
540 2. contamination of a probability support with optimization and
545 2. nice design for generics.
547 *** TODO [#B] Native CLS (from XLS)
548 - State "TODO" from "" [2010-11-05 Fri 15:40]
550 ** TODO [#B] Numerical Linear Algebra [0/3]
551 - State "TODO" from "" [2010-10-14 Thu 00:12]
553 *** TODO [#B] LLA evaluation
554 - State "TODO" from "" [2010-10-12 Tue 14:13]
556 LLA is an SBCL targetted linear algebra library from Tamas Papp
558 #+NAME LLA-experiments
560 (in-package :cl-user)
561 (asdf:oos 'asdf:load-op 'lla)
562 (in-package :lla-user)
566 *** CURR [#B] Lisp-Matrix system evaluation
567 - State "CURR" from "TODO" [2010-10-12 Tue 14:13]
568 - State "TODO" from "" [2010-10-12 Tue 14:13]
572 *** TODO [#B] LispLab system evaluation
573 - State "TODO" from "" [2010-10-12 Tue 14:13]
575 LL is an SBCL targetted linear algebra library from ---
577 ** TODO [#B] Numerical Statistical Procedures to implement
579 By this, I mean procedures which provide numerical quantitative or
580 precise categorical qualitative results (for example, excluding
581 visualizations, which tend to produce very useful but relatively
582 imprecise actionable insights).
584 *** CURR [#A] Basic Descriptives
590 (in-package :cls-user)
595 ;; population design eval and opt
603 number of samples/cost of lab analysis and collection
607 (defun pfim (&key model ( constraints ( summary-function )
609 (list num-subjects num-times list-times))))
613 Each individal has a deisgn psi_i
614 nubmer of samples n_i and sampling times t_{i{1}} t_{i{n_1}}
615 individuals can differ
619 individual-level model
622 (=model y_i (+ (f \theta_i \psi_i) epsilion_i ))
623 (=var \epsilion_i \sigma_between \sigma_within )
625 ;; Information Matrix for pop deisgn
627 (defparameter IM (sum (i 1 N) (MF \psi_i \phi_i)))
630 For nonlinear structureal models, expand around RE=0
632 Cramer-Rao : MF^{-1} is lower bound for estimation variance.
636 - smallest SE, but is a matrix, so
637 - criteria for matrix comparison
638 -- D-opt, (power (determinant MF) (/ 1 P))
641 find design maxing D opt, (power (determinant MF) (/ 1 P))
643 -- contin vars for smapling times within interval or set -- number of groups for cat vars
645 Stat in Med 2009, expansion around post-hoc RE est, not necessarily zero.
647 Example binary covariate C
650 (if (= i reference-class)
655 (=model (log \theta) ( ))
662 PFIM provides for a given design and values of \beta:
664 SE/RSE for \beta of each class of each covar
665 eval influence of design on SE(\beta)
667 inter-occassion variability (IOV)
668 - patients sampled more than once, H occassions
670 - additional vars to estimate
674 ;;; comparison criteria
676 functional of conc/time curve which is used for comparison, i.e.
677 (AUC conc/time-curve)
678 (Cmax conc/time-curve)
679 (Tmax conc/time-curve)
683 (defun conc/time-curve (t)
686 (let ((conc (exp (* t \beta1))))
692 (url-get "www.pfim.biostat.fr")
695 ;;; Thinking of generics...
696 (information-matrix model parameters)
697 (information-matrix variance-matrix)
698 (information-matrix model data)
699 (information-matrix list-of-individual-IMs)
702 (defun IM (loglikelihood parameters times)
703 "Does double work. Sum up the resulting IMs to form a full IM."
704 (let ((IM (make-matrix (length parameters)
706 :initial-value 0.0d0)))
707 (dolist (parameterI parameters)
708 (dolist (parameterJ parameters)
710 (differentiate (differentiate loglikelihood parameterI) parameterJ))))))
713 *** TODO [#C] difference between empirical, fisherian, and ...? information.
714 *** TODO [#C] Example of Integration with CL-GENOMIC
715 - State "TODO" from "" [2010-10-12 Tue 14:03]
717 CL-GENOMIC is a very interesting data-structure strategy for
718 manipulating sequence data.
722 (in-package :cl-user)
723 (asdf:oos 'asdf:compile-op :ironclad)
724 (asdf:oos 'asdf:load-op :cl-genomic)
726 (in-package :bio-sequence)
727 (make-dna "agccg") ;; fine
728 (make-aa "agccg") ;; fine
729 (make-aa "agc9zz") ;; error expected
732 ** TODO [#A] Visual data analytic methods [0/10]
733 *** TODO [#B] Evaluate Graphics toolkits [0/3]
735 **** TODO [#B] QT and similar tools
737 Pros: Insight from Deepyan Saarkar and Mike -- super fast plot
738 routines for dynamic interactive graphics. Crossplatform.
742 **** TODO [#B] Cairo-based
744 Pros: actually have example lattice/trellis plotting system with
745 Tamas Papp's cl-2d based on cl-cairo2.
747 Con: cross-platform? setup on a mac?
749 **** TODO [#C] Others?
751 increase priority if someone cares enough to code
753 *** TODO [#A] Evaluate APIs, methods, designs, back-end into framework [0/2]
754 By this, I mean that we need a good proposal, and it should be
755 based on history. I need to email Paul Murrell and Deepyan and
756 Hadley for a "lessons learned in statistical graphics systems".
757 **** TODO [#B] Paul Murrell's core R system (grid?)
759 **** TODO [#B] Peter Siebel's Grammer of Graphics javascript implementation
760 Thanks Peter Schmiedeskamp for pointing this out.
762 *** TODO [#B] Implement Visualization routines [0/2]
763 This should happen one-two times. Remember, with the package
764 approach, we can try out new packages and continually build newer
765 ones, as long as we appropriately version the interface for user
767 **** TODO [#A] actual statistical graphics
768 we need functions to x-y plots, bar charts, and need the API to
769 describe in terms of statistical quantities, scatter plots,
772 Also, will be important to get prototypes working ASAP to get
773 testing and feedback. But remember, not all users want what is
774 good for them, just like not all people "honestly prefer"
775 completely healthy approaches to life.
777 See file:README.org and the Philosophy for background for the
780 **** TODO [#C] Statistical toolkit and pipeline, ala ORCA
782 Orca (sutherland, cook, lumley, rossini, etal) was a java based
783 toolkit for pipelined DAG representations of interactive dynamic
786 ** TODO [#B] Documentation and Examples [0/3]
787 - State "TODO" from "" [2010-10-14 Thu 00:12]
789 I've started putting examples of use in function documentation. If
790 you are a lisp'er, you'll find this pendantic and insulting. Many
791 of the uses are trivial. However, this has been tested out on a
792 number of research statisticians (the primary user audience) and
795 Still need to write the
798 (evaluate-documentation-example 'function-name)
801 function, which would print out the example and run it live.
802 Hopefully with the same results. Need to setup the infrastructure,
803 but basically, we'd like something like:
805 #+name: Example-InLineDoc
808 (example-code-for-function-1)
809 (example-code-for-function-...)
810 (example-code-for-function-n))
813 and have this within the doc-string. Then the doc-string would be
814 parsed for the appropriate code and we'd get the results, evaluated
815 in a special name space derived from the object (function, class)
816 name, possibly with the corresponding functions and environment
817 set up that would be required. OR, it could just work in cl-user
818 (which is the default starting location.
820 Here are some possible common lisp systems that could be
823 *** TODO [#B] Docudown
824 - State "TODO" from "" [2010-11-05 Fri 15:34]
827 - State "TODO" from "" [2010-11-05 Fri 15:34]
829 *** TODO [#B] CLPDF, and literate data analysis
830 - State "TODO" from "" [2010-11-05 Fri 15:34]
833 Place proposals for features, work, etc here...
834 ** <2011-12-29 Thu> new stuff
835 First new proposal is to track proposals.
839 This project is dedicated to all the lisp hackers out there who
840 provided the basic infrastructure to get so far so fast with minimal
843 And to all the people trying to help to get this off the ground.