1 #+TODO: TODO CURR | DONE
2 #+TODO: POSTPONED | CANCELED
5 Time-stamp: <2012-10-10 05:33:29 tony>
6 Creation: <2008-09-08 08:06:30 tony>
11 Author: AJ Rossini <blindglobe@gmail.com>
12 Copyright: (c) 2007-2010, AJ Rossini <blindglobe@gmail.com>. BSD.
13 Purpose: Stuff that needs to be made working sits inside the
16 This file contains the current challenges to solve,
17 including a description of the setup and the work to
18 solve. Solutions welcome.
20 What is this talk of 'release'? Klingons do not make software
21 'releases'. Our software 'escapes', leaving a bloody trail of
22 designers and quality assurance people in its wake.
24 * Design Documentation
26 This section provides some of the details regarding infrastructure of
27 the system, as a means of including TODO statements.
29 ** (Internal) Package and (External) System Hierarchy
31 current (possibly incomplete) set of lisp dependencies <2012-10-04 Thu>
33 0 ~/sandbox/xarray.git
34 1 ~/sandbox/foreign-numeric-vector.git
35 2 ~/sandbox/trivial-features.git
36 3 ~/sandbox/alexandria.git
39 6 ~/sandbox/cl-utilities
40 7 ~/sandbox/metabang-bind.git
41 8 ~/sandbox/iterate.git
42 9 ~/sandbox/array-operations.git
45 *** Singletons (primary building blocks)
47 These are packages as well as
49 | asdf | common system loader |
50 | xarray | common access structure to array-like |
51 | | (matrix, vector) structures. |
52 | cls-config | initialization of Lisp state, variables, etc, |
53 | | localization to the particular lisp. |
54 | lift | unit-testing |
55 | cffi | foriegn function library |
61 as of now, all are in QL heirarchy
63 *** Dependency structure
65 | lisp-matrix | general purpose matrix package, linking to lapack | |
66 | | for numerics. Depends on: | |
69 | | cl-blapack | cffi |
71 | cls-dataframe | in the same spirit as lisp-matrix, a means to | |
72 | | create tables. Perhaps better called datatables? | |
73 | cls-probability | depends on gsll, cl-variates, cl-? initially, | |
77 **** Random number streams and probability calculus
80 Something for random numbers, that has a settable seed. However,
81 we could pass on the "right thing" in favor of something that
85 **** import of data in text files
87 CSV and similar specialized imports
101 Usually, we need to load it everything before going on.
111 and sometimes we might want to recompile fully:
113 #+name: recompile-it-all
115 (asdf:oos 'asdf:compile-op :cls :force T)
118 Currently <2012-10-10 Wed> QuickLisp support doesn't provide a
119 recompilation facility. And QL is built over and partially extends
120 ASDF, so we should be fine for now.
122 ** DONE [#B] Example of Custom Data analysis set up
123 - State "DONE" from "CURR" [2010-10-12 Tue 13:48] \\
124 setup is mostly complete
125 - State "CURR" from "TODO" [2010-10-12 Tue 13:47]
126 - State "TODO" from "" [2010-10-12 Tue 13:47]
128 This is an example of a custom setup, not really interesting at
129 this point (it will hopefully be obsolete by the first release)
130 except to remind Tony how to program. Pointy-headed managers need
131 any support they can find in order to regress to their
134 The only point of this section is to illustrate that we could want
135 to load additional modules that are not a central part of the core
139 #+begin_src lisp :tangle "examples/CustomLoader.lisp"
140 ;; always ensure we are in the right package to leave droppings and access functionality
141 (in-package :cl-user)
143 (defun init-CLS (&key (compile 'nil))
144 (let ((packagesToLoad (list ;; core system
145 :lift :lisp-matrix :cls
147 ;; :cl-cairo2-x11 :iterate
150 :cl-pdf :cl-typesetting
152 :asdf-system-connections :xarray
154 :metatilities-base :anaphora :tinaa
155 :cl-ppcre :cl-markdown :docudown
156 ;; version and validate CLOS objects
157 ;; :versioned-objects :validations
160 ;; :cl-glu :cl-glut :cl-glut-examples
165 (mapcar #'(lambda (x)
167 (asdf:oos 'asdf:compile-op x :force T)
168 (asdf:oos 'asdf:load-op x)))
170 ;; (init-CLS :compile T) vs:
175 | | #<PACKAGE "COMMON-LISP-USER"> |
177 ** CURR [#A] Integrate with quicklist support.
179 important to merge with quicklisp system loader support. We
180 currently have some of this work integrated, but I think there are
181 a few systems which are not auto-installable.
183 ** CURR [#A] Testing: unit, regression, examples. [0/3]
184 - State "CURR" from "TODO" [2010-10-12 Tue 13:51]
185 - State "TODO" from "" [2010-10-12 Tue 13:51]
186 Testing consists of unit tests, which internally verify subsets of
187 code, regression tests, and functional tests (in increasing order
189 *** CURR [#B] Unit tests
190 - State "CURR" from "TODO" [2010-11-04 Thu 18:33]
191 - State "CURR" from "TODO" [2010-10-12 Tue 13:48]
192 - State "TODO" from "" [2010-10-12 Tue 13:48]
193 Unit tests have been started using LIFT. Need to consider some of
194 the other systems that provide testing, when people add them to the
195 mix of libraries that we need, along with examples of how to use.
197 #+name: ex-cls-unittest
199 (in-package :lisp-stat-unittests)
200 (run-tests :suite 'lisp-stat-ut)
204 : #<Results for LISP-STAT-UT 78 Tests, 7 Failures, 20 Errors>
206 ;; => tests = 78, failures = 7, errors = 20
210 The following needs to be solved in order to have a decent
211 installation qualification (IQ) and performance qualification (PQ)
215 (in-package :lisp-stat-unittests)
216 (asdf:oos 'asdf:test-op 'cls)
217 ;; which runs (describe (run-tests :suite 'lisp-stat-ut))
222 and check documentation to see if it is useful.
225 (in-package :lisp-stat-unittests)
227 (describe 'lisp-stat-ut)
228 (documentation 'lisp-stat-ut 'type)
230 ;; FIXME: Example: currently not relevant, yet
231 ;; (describe (lift::run-test :test-case 'lisp-stat-unittests::create-proto
232 ;; :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
234 (describe (lift::run-tests :suite 'lisp-stat-ut-dataframe))
235 (lift::run-tests :suite 'lisp-stat-ut-dataframe)
237 (describe (lift::run-test
238 :test-case 'lisp-stat-unittests::create-proto
239 :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
242 *** TODO [#B] Regression Tests
243 - State "TODO" from "" [2010-10-12 Tue 13:54]
245 *** TODO [#B] Functional Tests
246 - State "TODO" from "" [2010-10-12 Tue 13:54]
248 ** CURR [#B] Functional Examples that need to work [1/3]
249 - State "CURR" from "TODO" [2010-11-30 Tue 17:57]
250 - State "TODO" from "" [2010-10-12 Tue 13:55]
252 These examples should be functional forms within CLS, describing
253 working functionality which is needed for work.
254 *** TODO [#A] Dataframe creation
255 Illustration via a file, that we need to get working so that we
256 can get data in-and-out of CLS structures.
258 #+BEGIN_SRC lisp :export examples/example-DF-creation.lisp
259 ;;; -*- mode: lisp -*-
260 ;;; Copyright (c) 2006-2012, by A.J. Rossini <blindglobe@gmail.com>
261 ;;; See COPYRIGHT file for any additional restrictions (BSD license).
262 ;;; Since 1991, ANSI was finally finished. Edited for ANSI Common Lisp.
264 ;;; Time-stamp: <2012-10-04 02:16:45 tony>
265 ;;; Creation: <2012-07-01 11:29:42 tony>
266 ;;; File: example.lisp
267 ;;; Author: AJ Rossini <blindglobe@gmail.com>
268 ;;; Copyright: (c) 2012, AJ Rossini. BSD.
269 ;;; Purpose: example of possible usage.
271 ;;; What is this talk of 'release'? Klingons do not make software
272 ;;; 'releases'. Our software 'escapes', leaving a bloody trail of
273 ;;; designers and quality assurance people in its wake.
279 ;; use the example package...
280 (in-package :cls-user)
283 ;; or better yet, create a package/namespace for the particular problem being attacked.
284 (defpackage :my-package-user
285 (:documentation "demo of how to put serious work should be placed in
286 a similar package elsewhere for reproducibility. This hints as to
287 what needs to be done for a user- or analysis-package.")
288 (:nicknames :my-clswork-user)
289 (:use :common-lisp ; always needed for user playgrounds!
290 :lisp-matrix ; we only need the packages that we need...
291 :common-lisp-statistics
292 :lisp-stat-data-examples) ;; this ensures access to a data package
293 (:export summarize-data summarize-results this-data this-report)
294 (:shadowing-import-from :lisp-stat call-method call-next-method
296 expt + - * / ** mod rem abs 1+ 1- log exp sqrt sin cos tan
297 asin acos atan sinh cosh tanh asinh acosh atanh float random
298 truncate floor ceiling round minusp zerop plusp evenp oddp
299 < <= = /= >= > > ;; complex
300 conjugate realpart imagpart phase
301 min max logand logior logxor lognot ffloor fceiling
302 ftruncate fround signum cis
306 (in-package :my-clswork-user)
308 ;; create some data by hand using arrays, and demonstrate access.
310 (let ((myArray #2A((1 2 3)(4 5 6)))
311 (myDF (make-dataframe #2A((1 2 3)(4 5 6))))
312 (myLOL (list (list 1 2 3) (list 4 5 6)))
313 ;; FIXME: listoflist conversion does not work.
314 ;; (myDFlol (make-dataframe '(list ((1 2 3)(4 5 6)))))
317 (= (xref myArray 1 1)
322 *** TODO [#B] Scoping with datasets
323 - State "TODO" from "" [2010-11-04 Thu 18:46]
325 The following needs to work, and a related syntax for resampling
326 and similar synthetic data approaches (bootstrapping, imputation)
327 ought to use similar syntax as well.
328 #+name: DataSetNameScoping
330 (in-package :ls-user)
332 ;; Syntax examples using lexical scope, closures, and bindings to
333 ;; ensure a clean communication of results
334 ;; This is actually a bit tricky, since we need to clarify whether
335 ;; it is line-at-a-time that we are considering or if there is
336 ;; another mapping strategy. In particular, one could imagine a
337 ;; looping-over-observations function, or a
338 ;; looping-over-independent-observations function which leverages a
339 ;; grouping variable which provides guidance for what is considered
340 ;; independent from the sampling frame being considered. The frame
341 ;; itself (definable via some form of metadata to clarify scope?)
342 ;; could clearly provide a bit of relativity for clarifying what
343 ;; statistical independence means.
345 (with-data dataset ((dsvarname1 [usevarname1])
346 (dsvarname2 [usevarname2]))
349 ;; SAS-centric approach to spec'ing work
350 (looping-over-observations
351 dataset ((dsvarname1 [usevarname1])
352 (dsvarname2 [usevarname2]))
355 ;; SAS plus "statistical sensibility"... for example, if an
356 ;; independent observation actually consists of many observations so
357 ;; that a dataframe of independence results -- for example,
358 ;; longitudinal data or spatial data or local-truncated network data
359 ;; are clean examples of such happening -- then we get the data
360 ;; frame or row representing the independent result.
361 (looping-over-independent-observations
362 dataset independence-defining-variable
363 ((dsvarname1 [usevarname1])
364 (dsvarname2 [usevarname2]))
369 *** DONE [#B] Dataframe variable typing
370 - State "DONE" from "CURR" [2010-11-30 Tue 17:56] \\
371 check-type approach works, we would just have to throw a catchable
372 error if we want to use it in a reliable fashion.
373 - State "CURR" from "TODO" [2010-11-30 Tue 17:56]
374 - State "TODO" from "" [2010-11-04 Thu 18:48]
376 Seems to generally work, need to ensure that we use this for
381 (in-package :ls-user)
382 (defparameter *df-test*
383 (make-instance 'dataframe-array
384 :storage #2A (('a "test0" 0 0d0)
390 :case-labels (list "0" "1" 2 "3" "4")
391 :var-labels (list "symbol" "string" "integer" "double-float")
392 :var-types (list 'symbol 'string 'integer 'double-float)))
394 ;; with SBCL, ints become floats? Need to adjust output
395 ;; representation appropriately..
398 (defun check-var (df colnum)
399 (let ((nobs (xdim (dataset df) 0)))
401 (check-type (xref df i colnum) (elt (var-types df) i)))))
403 (xdim (dataset *df-test*) 1)
404 (xdim (dataset *df-test*) 0)
406 (check-var *df-test* 0)
409 (xref *df-test* 1 1))
411 (check-type (xref *df-test* 1 1)
412 string) ;; => nil, so good.
413 (check-type (xref *df-test* 1 1)
414 vector) ;; => nil, so good.
415 (check-type (xref *df-test* 1 1)
416 real) ;; => simple-error type thrown, so good.
418 ;; How to nest errors within errors?
419 (check-type (check-type (xref *df-test* 1 1) real) ;; => error thrown, so good.
423 (check-type *df-test*
424 dataframe-array) ; nil is good.
426 (integerp (xref *df-test* 1 2))
427 (floatp (xref *df-test* 1 2))
428 (integerp (xref *df-test* 1 3))
429 (type-of (xref *df-test* 1 3))
430 (floatp (xref *df-test* 1 3))
432 (type-of (vector 1 1d0))
438 (xref *df-test* 1 '*)
441 ** CURR [#A] Random Numbers [2/6]
442 - State "CURR" from "TODO" [2010-11-05 Fri 15:41]
443 - State "TODO" from "" [2010-10-14 Thu 00:12]
445 Need to select and choose a probability system (probability
446 functions, random numbers). Goal is to have a general framework
447 for representing probability functions, functionals on
448 probabilities, and reproducible random streams based on such
450 *** CURR [#B] CL-VARIATES system evaluation [2/3]
451 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
452 - State "TODO" from "" [2010-10-12 Tue 14:16]
454 CL-VARIATES is a system developed by Gary W King. It uses streams
455 with seeds, and is hence reproducible. (Random comment: why do CL
456 programmers as a class ignore computational reproducibility?)
458 The main problem with this system is licensing. It has a weird
459 licensing schema which prevents
461 #+name: Loading-CL-VARIATES
463 (in-package :cl-user)
464 (ql:quickload :cl-variates)
465 ;;(ql:quickload :cl-variates-test)
468 #+name: CL-VARIATES-UNITTESTS
470 (in-package :cl-variates-test)
472 (run-tests :suite 'cl-variates-test)
473 (describe (run-tests :suite 'cl-variates-test))
476 basic example of reproducible draws from the uniform and normal
477 random number streams.
479 #+name: CL-VARIATES-REPRO
482 (in-package :cl-variates-user)
484 (defparameter state (make-random-number-generator))
485 (setf (random-seed state) 44)
488 (loop for i from 1 to 10 collect
489 (random-range state 0 10))
490 ;; => (1 5 1 0 7 1 2 2 8 10)
491 (setf (random-seed state) 44)
492 (loop for i from 1 to 10 collect
493 (random-range state 0 10))
494 ;; => (1 5 1 0 7 1 2 2 8 10)
496 (setf (random-seed state) 44)
498 (loop for i from 1 to 10 collect
499 (normal-random state 0 1))
501 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
502 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
503 ;; 0.20750134211656893 -0.14501914108452274)
505 (setf (random-seed state) 44)
506 (loop for i from 1 to 10 collect
507 (normal-random state 0 1))
509 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
510 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
511 ;; 0.20750134211656893 -0.14501914108452274)
515 **** CURR [#B] Full example of general usage
516 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
517 - State "TODO" from "" [2010-11-05 Fri 15:40]
519 What we want to do here is describe the basic available API that
520 is present. So while the previous work describes what the basic
521 reproducibility approach would be in terms of generating lists of
522 reproducible pRNG streams, we need the full range of possible
523 probability laws that are present.
525 One of the good things about cl-variates is that it provides for
526 reproducibility. One of the bad things is that it has a mixed
529 *** TODO [#B] CL-RANDOM system evaluation
530 - State "TODO" from "" [2010-11-05 Fri 15:40]
533 1. no seed setting for random numbers
534 2. contamination of a probability support with optimization and
539 2. nice design for generics.
541 *** TODO [#B] Native CLS (from XLS)
542 - State "TODO" from "" [2010-11-05 Fri 15:40]
544 ** TODO [#B] Numerical Linear Algebra [0/3]
545 - State "TODO" from "" [2010-10-14 Thu 00:12]
547 *** TODO [#B] LLA evaluation
548 - State "TODO" from "" [2010-10-12 Tue 14:13]
550 LLA is an SBCL targetted linear algebra library from Tamas Papp
552 #+NAME LLA-experiments
554 (in-package :cl-user)
555 (asdf:oos 'asdf:load-op 'lla)
556 (in-package :lla-user)
560 *** CURR [#B] Lisp-Matrix system evaluation
561 - State "CURR" from "TODO" [2010-10-12 Tue 14:13]
562 - State "TODO" from "" [2010-10-12 Tue 14:13]
566 *** TODO [#B] LispLab system evaluation
567 - State "TODO" from "" [2010-10-12 Tue 14:13]
569 LL is an SBCL targetted linear algebra library from ---
571 ** TODO [#B] Numerical Statistical Procedures to implement
573 By this, I mean procedures which provide numerical quantitative or
574 precise categorical qualitative results (for example, excluding
575 visualizations, which tend to produce very useful but relatively
576 imprecise actionable insights).
578 *** CURR [#A] Basic Descriptives
584 (in-package :cls-user)
589 ;; population design eval and opt
597 number of samples/cost of lab analysis and collection
601 (defun pfim (&key model ( constraints ( summary-function )
603 (list num-subjects num-times list-times))))
607 Each individal has a deisgn psi_i
608 nubmer of samples n_i and sampling times t_{i{1}} t_{i{n_1}}
609 individuals can differ
613 individual-level model
616 (=model y_i (+ (f \theta_i \psi_i) epsilion_i ))
617 (=var \epsilion_i \sigma_between \sigma_within )
619 ;; Information Matrix for pop deisgn
621 (defparameter IM (sum (i 1 N) (MF \psi_i \phi_i)))
624 For nonlinear structureal models, expand around RE=0
626 Cramer-Rao : MF^{-1} is lower bound for estimation variance.
630 - smallest SE, but is a matrix, so
631 - criteria for matrix comparison
632 -- D-opt, (power (determinant MF) (/ 1 P))
635 find design maxing D opt, (power (determinant MF) (/ 1 P))
637 -- contin vars for smapling times within interval or set -- number of groups for cat vars
639 Stat in Med 2009, expansion around post-hoc RE est, not necessarily zero.
641 Example binary covariate C
644 (if (= i reference-class)
649 (=model (log \theta) ( ))
656 PFIM provides for a given design and values of \beta:
658 SE/RSE for \beta of each class of each covar
659 eval influence of design on SE(\beta)
661 inter-occassion variability (IOV)
662 - patients sampled more than once, H occassions
664 - additional vars to estimate
668 ;;; comparison criteria
670 functional of conc/time curve which is used for comparison, i.e.
671 (AUC conc/time-curve)
672 (Cmax conc/time-curve)
673 (Tmax conc/time-curve)
677 (defun conc/time-curve (t)
680 (let ((conc (exp (* t \beta1))))
686 (url-get "www.pfim.biostat.fr")
689 ;;; Thinking of generics...
690 (information-matrix model parameters)
691 (information-matrix variance-matrix)
692 (information-matrix model data)
693 (information-matrix list-of-individual-IMs)
696 (defun IM (loglikelihood parameters times)
697 "Does double work. Sum up the resulting IMs to form a full IM."
698 (let ((IM (make-matrix (length parameters)
700 :initial-value 0.0d0)))
701 (dolist (parameterI parameters)
702 (dolist (parameterJ parameters)
704 (differentiate (differentiate loglikelihood parameterI) parameterJ))))))
707 *** TODO [#C] difference between empirical, fisherian, and ...? information.
708 *** TODO [#C] Example of Integration with CL-GENOMIC
709 - State "TODO" from "" [2010-10-12 Tue 14:03]
711 CL-GENOMIC is a very interesting data-structure strategy for
712 manipulating sequence data.
716 (in-package :cl-user)
717 (asdf:oos 'asdf:compile-op :ironclad)
718 (asdf:oos 'asdf:load-op :cl-genomic)
720 (in-package :bio-sequence)
721 (make-dna "agccg") ;; fine
722 (make-aa "agccg") ;; fine
723 (make-aa "agc9zz") ;; error expected
726 ** TODO [#A] Visual data analytic methods [0/10]
727 *** TODO [#B] Evaluate Graphics toolkits [0/3]
729 **** TODO [#B] QT and similar tools
731 Pros: Insight from Deepyan Saarkar and Mike -- super fast plot
732 routines for dynamic interactive graphics. Crossplatform.
736 **** TODO [#B] Cairo-based
738 Pros: actually have example lattice/trellis plotting system with
739 Tamas Papp's cl-2d based on cl-cairo2.
741 Con: cross-platform? setup on a mac?
743 **** TODO [#C] Others?
745 increase priority if someone cares enough to code
747 *** TODO [#A] Evaluate APIs, methods, designs, back-end into framework [0/2]
748 By this, I mean that we need a good proposal, and it should be
749 based on history. I need to email Paul Murrell and Deepyan and
750 Hadley for a "lessons learned in statistical graphics systems".
751 **** TODO [#B] Paul Murrell's core R system (grid?)
753 **** TODO [#B] Peter Siebel's Grammer of Graphics javascript implementation
754 Thanks Peter Schmiedeskamp for pointing this out.
756 *** TODO [#B] Implement Visualization routines [0/2]
757 This should happen one-two times. Remember, with the package
758 approach, we can try out new packages and continually build newer
759 ones, as long as we appropriately version the interface for user
761 **** TODO [#A] actual statistical graphics
762 we need functions to x-y plots, bar charts, and need the API to
763 describe in terms of statistical quantities, scatter plots,
766 Also, will be important to get prototypes working ASAP to get
767 testing and feedback. But remember, not all users want what is
768 good for them, just like not all people "honestly prefer"
769 completely healthy approaches to life.
771 See file:README.org and the Philosophy for background for the
774 **** TODO [#C] Statistical toolkit and pipeline, ala ORCA
776 Orca (sutherland, cook, lumley, rossini, etal) was a java based
777 toolkit for pipelined DAG representations of interactive dynamic
780 ** TODO [#B] Documentation and Examples [0/3]
781 - State "TODO" from "" [2010-10-14 Thu 00:12]
783 I've started putting examples of use in function documentation. If
784 you are a lisp'er, you'll find this pendantic and insulting. Many
785 of the uses are trivial. However, this has been tested out on a
786 number of research statisticians (the primary user audience) and
789 Still need to write the
792 (evaluate-documentation-example 'function-name)
795 function, which would print out the example and run it live.
796 Hopefully with the same results. Need to setup the infrastructure,
797 but basically, we'd like something like:
799 #+name: Example-InLineDoc
802 (example-code-for-function-1)
803 (example-code-for-function-...)
804 (example-code-for-function-n))
807 and have this within the doc-string. Then the doc-string would be
808 parsed for the appropriate code and we'd get the results, evaluated
809 in a special name space derived from the object (function, class)
810 name, possibly with the corresponding functions and environment
811 set up that would be required. OR, it could just work in cl-user
812 (which is the default starting location.
814 Here are some possible common lisp systems that could be
817 *** TODO [#B] Docudown
818 - State "TODO" from "" [2010-11-05 Fri 15:34]
821 - State "TODO" from "" [2010-11-05 Fri 15:34]
823 *** TODO [#B] CLPDF, and literate data analysis
824 - State "TODO" from "" [2010-11-05 Fri 15:34]
827 Place proposals for features, work, etc here...
828 ** <2011-12-29 Thu> new stuff
829 First new proposal is to track proposals.
833 This project is dedicated to all the lisp hackers out there who
834 provided the basic infrastructure to get so far so fast with minimal
837 And to all the people trying to help to get this off the ground.