1 #+TODO: TODO CURR | DONE
5 Time-stamp: <2012-10-08 05:26:30 tony>
6 Creation: <2008-09-08 08:06:30 tony>
11 Author: AJ Rossini <blindglobe@gmail.com>
12 Copyright: (c) 2007-2010, AJ Rossini <blindglobe@gmail.com>. BSD.
13 Purpose: Stuff that needs to be made working sits inside the
16 This file contains the current challenges to solve,
17 including a description of the setup and the work to
18 solve. Solutions welcome.
20 What is this talk of 'release'? Klingons do not make software
21 'releases'. Our software 'escapes', leaving a bloody trail of
22 designers and quality assurance people in its wake.
24 * Design Documentation
26 This section provides some of the details regarding infrastructure of
27 the system, as a means of including TODO statements.
29 ** (Internal) Package and (External) System Hierarchy
31 current (possibly incomplete) set of lisp dependencies <2012-10-04 Thu>
33 0 ~/sandbox/xarray.git
34 1 ~/sandbox/foreign-numeric-vector.git
35 2 ~/sandbox/trivial-features.git
36 3 ~/sandbox/alexandria.git
39 6 ~/sandbox/cl-utilities
40 7 ~/sandbox/metabang-bind.git
41 8 ~/sandbox/iterate.git
42 9 ~/sandbox/array-operations.git
45 *** Singletons (primary building blocks)
47 These are packages as well as
49 | asdf | common system loader |
50 | xarray | common access structure to array-like |
51 | | (matrix, vector) structures. |
52 | cls-config | initialization of Lisp state, variables, etc, |
53 | | localization to the particular lisp. |
54 | lift | unit-testing |
55 | cffi | foriegn function library |
61 as of now, all are in QL heirarchy
63 *** Dependency structure
65 | lisp-matrix | general purpose matrix package, linking to lapack | |
66 | | for numerics. Depends on: | |
69 | | cl-blapack | cffi |
71 | cls-dataframe | in the same spirit as lisp-matrix, a means to | |
72 | | create tables. Perhaps better called datatables? | |
73 | cls-probability | depends on gsll, cl-variates, cl-? initially, | |
77 **** Random number streams and probability calculus
80 Something for random numbers, that has a settable seed. However,
81 we could pass on the "right thing" in favor of something that
85 **** import of data in text files
87 CSV and similar specialized imports
101 Usually, we need to load it everything before going on.
111 and sometimes we might want to recompile fully:
113 #+name: recompile-it-all
115 (asdf:oos 'asdf:compile-op :cls :force T)
118 Currently QuickLisp support doesn't provide a recompilation
119 facility. And QL is built over and partially extends ASDF, so we
120 should be fine for now.
123 - State "DONE" from "CURR" [2010-10-12 Tue 13:48] \\
124 setup is mostly complete
125 - State "CURR" from "TODO" [2010-10-12 Tue 13:47]
126 - State "TODO" from "" [2010-10-12 Tue 13:47]
128 This is an example of a custom setup, not really interesting at
129 this point (it's obsolete) except to remind Tony how to program.
130 Pointy-headed managers need any support they can find in order to
131 regress to their hacker-childhood.
133 The only point of this section is to illustrate that we could want
134 to load additional modules that are not a central part of the core
138 #+begin_src lisp :tangle "examples/CustomLoader.lisp
139 (in-package :cl-user) ; always ensure we are in the right package to leave droppings and access functionality
141 (defun init-CLS (&key (compile 'nil))
142 (let ((packagesToLoad (list ;; core system
143 :lift :lisp-matrix :cls
145 ;; :cl-cairo2-x11 :iterate
148 :cl-pdf :cl-typesetting
150 :asdf-system-connections :xarray
152 :metatilities-base :anaphora :tinaa
153 :cl-ppcre :cl-markdown :docudown
154 ;; version and validate CLOS objects
155 ;; :versioned-objects :validations
158 ;; :cl-glu :cl-glut :cl-glut-examples
163 (mapcar #'(lambda (x)
165 (asdf:oos 'asdf:compile-op x :force T)
166 (asdf:oos 'asdf:load-op x)))
169 (init-CLS)) ;; vs (init-CLS :compile T)
173 | | #<PACKAGE "COMMON-LISP-USER"> |
175 ** CURR [#A] Integrate with quicklist support.
177 important to merge with quicklisp system loader support. We
178 currently have some of this work integrated, but I think there are
179 a few systems which are not auto-installable.
182 ** CURR [#A] Testing: unit, regression, examples. [0/3]
183 - State "CURR" from "TODO" [2010-10-12 Tue 13:51]
184 - State "TODO" from "" [2010-10-12 Tue 13:51]
185 Testing consists of unit tests, which internally verify subsets of
186 code, regression tests, and functional tests (in increasing order
188 *** CURR [#B] Unit tests
189 - State "CURR" from "TODO" [2010-11-04 Thu 18:33]
190 - State "CURR" from "TODO" [2010-10-12 Tue 13:48]
191 - State "TODO" from "" [2010-10-12 Tue 13:48]
192 Unit tests have been started using LIFT. Need to consider some of
193 the other systems that provide testing, when people add them to the
194 mix of libraries that we need, along with examples of how to use.
196 #+name: ex-cls-unittest
198 (in-package :lisp-stat-unittests)
199 (run-tests :suite 'lisp-stat-ut)
203 : #<Results for LISP-STAT-UT 78 Tests, 7 Failures, 20 Errors>
205 ;; => tests = 78, failures = 7, errors = 20
209 The following needs to be solved in order to have a decent
210 installation qualification (IQ) and performance qualification (PQ)
214 (in-package :lisp-stat-unittests)
215 (asdf:oos 'asdf:test-op 'cls)
216 ;; which runs (describe (run-tests :suite 'lisp-stat-ut))
221 and check documentation to see if it is useful.
224 (in-package :lisp-stat-unittests)
226 (describe 'lisp-stat-ut)
227 (documentation 'lisp-stat-ut 'type)
229 ;; FIXME: Example: currently not relevant, yet
230 ;; (describe (lift::run-test :test-case 'lisp-stat-unittests::create-proto
231 ;; :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
233 (describe (lift::run-tests :suite 'lisp-stat-ut-dataframe))
234 (lift::run-tests :suite 'lisp-stat-ut-dataframe)
236 (describe (lift::run-test
237 :test-case 'lisp-stat-unittests::create-proto
238 :suite 'lisp-stat-unittests::lisp-stat-ut-proto))
241 *** TODO [#B] Regression Tests
242 - State "TODO" from "" [2010-10-12 Tue 13:54]
244 *** TODO [#B] Functional Tests
245 - State "TODO" from "" [2010-10-12 Tue 13:54]
247 ** CURR [#B] Functional Examples that need to work [1/3]
248 - State "CURR" from "TODO" [2010-11-30 Tue 17:57]
249 - State "TODO" from "" [2010-10-12 Tue 13:55]
251 These examples should be functional forms within CLS, describing
252 working functionality which is needed for work.
253 *** TODO [#A] Dataframe creation
254 Illustration via a file, that we need to get working so that we
255 can get data in-and-out of CLS structures.
257 #+BEGIN_SRC lisp :export examples/example-DF-creation.lisp
258 ;;; -*- mode: lisp -*-
259 ;;; Copyright (c) 2006-2012, by A.J. Rossini <blindglobe@gmail.com>
260 ;;; See COPYRIGHT file for any additional restrictions (BSD license).
261 ;;; Since 1991, ANSI was finally finished. Edited for ANSI Common Lisp.
263 ;;; Time-stamp: <2012-10-04 02:16:45 tony>
264 ;;; Creation: <2012-07-01 11:29:42 tony>
265 ;;; File: example.lisp
266 ;;; Author: AJ Rossini <blindglobe@gmail.com>
267 ;;; Copyright: (c) 2012, AJ Rossini. BSD.
268 ;;; Purpose: example of possible usage.
270 ;;; What is this talk of 'release'? Klingons do not make software
271 ;;; 'releases'. Our software 'escapes', leaving a bloody trail of
272 ;;; designers and quality assurance people in its wake.
278 ;; use the example package...
279 (in-package :cls-user)
282 ;; or better yet, create a package/namespace for the particular problem being attacked.
283 (defpackage :my-package-user
284 (:documentation "demo of how to put serious work should be placed in
285 a similar package elsewhere for reproducibility. This hints as to
286 what needs to be done for a user- or analysis-package.")
287 (:nicknames :my-clswork-user)
288 (:use :common-lisp ; always needed for user playgrounds!
289 :lisp-matrix ; we only need the packages that we need...
290 :common-lisp-statistics
291 :lisp-stat-data-examples) ;; this ensures access to a data package
292 (:export summarize-data summarize-results this-data this-report)
293 (:shadowing-import-from :lisp-stat call-method call-next-method
295 expt + - * / ** mod rem abs 1+ 1- log exp sqrt sin cos tan
296 asin acos atan sinh cosh tanh asinh acosh atanh float random
297 truncate floor ceiling round minusp zerop plusp evenp oddp
298 < <= = /= >= > > ;; complex
299 conjugate realpart imagpart phase
300 min max logand logior logxor lognot ffloor fceiling
301 ftruncate fround signum cis
305 (in-package :my-clswork-user)
307 ;; create some data by hand using arrays, and demonstrate access.
309 (let ((myArray #2A((1 2 3)(4 5 6)))
310 (myDF (make-dataframe #2A((1 2 3)(4 5 6))))
311 (myLOL (list (list 1 2 3) (list 4 5 6)))
312 ;; FIXME: listoflist conversion does not work.
313 ;; (myDFlol (make-dataframe '(list ((1 2 3)(4 5 6)))))
316 (= (xref myArray 1 1)
321 *** TODO [#B] Scoping with datasets
322 - State "TODO" from "" [2010-11-04 Thu 18:46]
324 The following needs to work, and a related syntax for resampling
325 and similar synthetic data approaches (bootstrapping, imputation)
326 ought to use similar syntax as well.
327 #+name: DataSetNameScoping
329 (in-package :ls-user)
331 ;; Syntax examples using lexical scope, closures, and bindings to
332 ;; ensure a clean communication of results
333 ;; This is actually a bit tricky, since we need to clarify whether
334 ;; it is line-at-a-time that we are considering or if there is
335 ;; another mapping strategy. In particular, one could imagine a
336 ;; looping-over-observations function, or a
337 ;; looping-over-independent-observations function which leverages a
338 ;; grouping variable which provides guidance for what is considered
339 ;; independent from the sampling frame being considered. The frame
340 ;; itself (definable via some form of metadata to clarify scope?)
341 ;; could clearly provide a bit of relativity for clarifying what
342 ;; statistical independence means.
344 (with-data dataset ((dsvarname1 [usevarname1])
345 (dsvarname2 [usevarname2]))
348 ;; SAS-centric approach to spec'ing work
349 (looping-over-observations
350 dataset ((dsvarname1 [usevarname1])
351 (dsvarname2 [usevarname2]))
354 ;; SAS plus "statistical sensibility"... for example, if an
355 ;; independent observation actually consists of many observations so
356 ;; that a dataframe of independence results -- for example,
357 ;; longitudinal data or spatial data or local-truncated network data
358 ;; are clean examples of such happening -- then we get the data
359 ;; frame or row representing the independent result.
360 (looping-over-independent-observations
361 dataset independence-defining-variable
362 ((dsvarname1 [usevarname1])
363 (dsvarname2 [usevarname2]))
368 *** DONE [#B] Dataframe variable typing
369 - State "DONE" from "CURR" [2010-11-30 Tue 17:56] \\
370 check-type approach works, we would just have to throw a catchable
371 error if we want to use it in a reliable fashion.
372 - State "CURR" from "TODO" [2010-11-30 Tue 17:56]
373 - State "TODO" from "" [2010-11-04 Thu 18:48]
375 Seems to generally work, need to ensure that we use this for
380 (in-package :ls-user)
381 (defparameter *df-test*
382 (make-instance 'dataframe-array
383 :storage #2A (('a "test0" 0 0d0)
389 :case-labels (list "0" "1" 2 "3" "4")
390 :var-labels (list "symbol" "string" "integer" "double-float")
391 :var-types (list 'symbol 'string 'integer 'double-float)))
393 ;; with SBCL, ints become floats? Need to adjust output
394 ;; representation appropriately..
397 (defun check-var (df colnum)
398 (let ((nobs (xdim (dataset df) 0)))
400 (check-type (xref df i colnum) (elt (var-types df) i)))))
402 (xdim (dataset *df-test*) 1)
403 (xdim (dataset *df-test*) 0)
405 (check-var *df-test* 0)
408 (xref *df-test* 1 1))
410 (check-type (xref *df-test* 1 1)
411 string) ;; => nil, so good.
412 (check-type (xref *df-test* 1 1)
413 vector) ;; => nil, so good.
414 (check-type (xref *df-test* 1 1)
415 real) ;; => simple-error type thrown, so good.
417 ;; How to nest errors within errors?
418 (check-type (check-type (xref *df-test* 1 1) real) ;; => error thrown, so good.
422 (check-type *df-test*
423 dataframe-array) ; nil is good.
425 (integerp (xref *df-test* 1 2))
426 (floatp (xref *df-test* 1 2))
427 (integerp (xref *df-test* 1 3))
428 (type-of (xref *df-test* 1 3))
429 (floatp (xref *df-test* 1 3))
431 (type-of (vector 1 1d0))
437 (xref *df-test* 1 '*)
440 ** CURR [#A] Random Numbers [2/6]
441 - State "CURR" from "TODO" [2010-11-05 Fri 15:41]
442 - State "TODO" from "" [2010-10-14 Thu 00:12]
444 Need to select and choose a probability system (probability
445 functions, random numbers). Goal is to have a general framework
446 for representing probability functions, functionals on
447 probabilities, and reproducible random streams based on such
449 *** CURR [#B] CL-VARIATES system evaluation [2/3]
450 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
451 - State "TODO" from "" [2010-10-12 Tue 14:16]
453 CL-VARIATES is a system developed by Gary W King. It uses streams
454 with seeds, and is hence reproducible. (Random comment: why do CL
455 programmers as a class ignore computational reproducibility?)
457 The main problem with this system is licensing. It has a weird
458 licensing schema which prevents
460 #+name: Loading-CL-VARIATES
462 (in-package :cl-user)
463 (ql:quickload :cl-variates)
464 ;;(ql:quickload :cl-variates-test)
467 #+name: CL-VARIATES-UNITTESTS
469 (in-package :cl-variates-test)
471 (run-tests :suite 'cl-variates-test)
472 (describe (run-tests :suite 'cl-variates-test))
475 basic example of reproducible draws from the uniform and normal
476 random number streams.
478 #+name: CL-VARIATES-REPRO
481 (in-package :cl-variates-user)
483 (defparameter state (make-random-number-generator))
484 (setf (random-seed state) 44)
487 (loop for i from 1 to 10 collect
488 (random-range state 0 10))
489 ;; => (1 5 1 0 7 1 2 2 8 10)
490 (setf (random-seed state) 44)
491 (loop for i from 1 to 10 collect
492 (random-range state 0 10))
493 ;; => (1 5 1 0 7 1 2 2 8 10)
495 (setf (random-seed state) 44)
497 (loop for i from 1 to 10 collect
498 (normal-random state 0 1))
500 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
501 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
502 ;; 0.20750134211656893 -0.14501914108452274)
504 (setf (random-seed state) 44)
505 (loop for i from 1 to 10 collect
506 (normal-random state 0 1))
508 ;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
509 ;; 1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
510 ;; 0.20750134211656893 -0.14501914108452274)
514 **** CURR [#B] Full example of general usage
515 - State "CURR" from "TODO" [2010-11-05 Fri 15:40]
516 - State "TODO" from "" [2010-11-05 Fri 15:40]
518 What we want to do here is describe the basic available API that
519 is present. So while the previous work describes what the basic
520 reproducibility approach would be in terms of generating lists of
521 reproducible pRNG streams, we need the full range of possible
522 probability laws that are present.
524 One of the good things about cl-variates is that it provides for
525 reproducibility. One of the bad things is that it has a mixed
528 *** TODO [#B] CL-RANDOM system evaluation
529 - State "TODO" from "" [2010-11-05 Fri 15:40]
532 1. no seed setting for random numbers
533 2. contamination of a probability support with optimization and
538 2. nice design for generics.
540 *** TODO [#B] Native CLS (from XLS)
541 - State "TODO" from "" [2010-11-05 Fri 15:40]
543 ** TODO [#B] Numerical Linear Algebra
544 - State "TODO" from "" [2010-10-14 Thu 00:12]
546 *** TODO [#B] LLA evaluation
547 - State "TODO" from "" [2010-10-12 Tue 14:13]
549 LLA is an SBCL targetted linear algebra library from Tamas Papp
551 #+NAME LLA-experiments
553 (in-package :cl-user)
554 (asdf:oos 'asdf:load-op 'lla)
555 (in-package :lla-user)
559 *** CURR [#B] Lisp-Matrix system evaluation
560 - State "CURR" from "TODO" [2010-10-12 Tue 14:13]
561 - State "TODO" from "" [2010-10-12 Tue 14:13]
565 *** TODO [#B] LispLab system evaluation
566 - State "TODO" from "" [2010-10-12 Tue 14:13]
568 LL is an SBCL targetted linear algebra library from ---
570 ** TODO [#B] Statistical Procedures to implement
571 - State "TODO" from "" [2010-10-14 Thu 00:12]
577 (in-package :cls-user)
582 ;; population design eval and opt
590 number of samples/cost of lab analysis and collection
594 (defun pfim (&key model ( constraints ( summary-function )
596 (list num-subjects num-times list-times))))
600 Each individal has a deisgn psi_i
601 nubmer of samples n_i and sampling times t_{i{1}} t_{i{n_1}}
602 individuals can differ
606 individual-level model
609 (=model y_i (+ (f \theta_i \psi_i) epsilion_i ))
610 (=var \epsilion_i \sigma_between \sigma_within )
612 ;; Information Matrix for pop deisgn
614 (defparameter IM (sum (i 1 N) (MF \psi_i \phi_i)))
617 For nonlinear structureal models, expand around RE=0
619 Cramer-Rao : MF^{-1} is lower bound for estimation variance.
623 - smallest SE, but is a matrix, so
624 - criteria for matrix comparison
625 -- D-opt, (power (determinant MF) (/ 1 P))
628 find design maxing D opt, (power (determinant MF) (/ 1 P))
630 -- contin vars for smapling times within interval or set -- number of groups for cat vars
632 Stat in Med 2009, expansion around post-hoc RE est, not necessarily zero.
634 Example binary covariate C
637 (if (= i reference-class)
642 (=model (log \theta) ( ))
649 PFIM provides for a given design and values of \beta:
651 SE/RSE for \beta of each class of each covar
652 eval influence of design on SE(\beta)
654 inter-occassion variability (IOV)
655 - patients sampled more than once, H occassions
657 - additional vars to estimate
661 ;;; comparison criteria
663 functional of conc/time curve which is used for comparison, i.e.
664 (AUC conc/time-curve)
665 (Cmax conc/time-curve)
666 (Tmax conc/time-curve)
670 (defun conc/time-curve (t)
673 (let ((conc (exp (* t \beta1))))
679 (url-get "www.pfim.biostat.fr")
682 ;;; Thinking of generics...
683 (information-matrix model parameters)
684 (information-matrix variance-matrix)
685 (information-matrix model data)
686 (information-matrix list-of-individual-IMs)
689 (defun IM (loglikelihood parameters times)
690 "Does double work. Sum up the resulting IMs to form a full IM."
691 (let ((IM (make-matrix (length parameters)
693 :initial-value 0.0d0)))
694 (dolist (parameterI parameters)
695 (dolist (parameterJ parameters)
697 (differentiate (differentiate loglikelihood parameterI) parameterJ))))))
700 *** difference between empirical, fisherian, and ...? information.
701 *** TODO [#C] Example of Integration with CL-GENOMIC
702 - State "TODO" from "" [2010-10-12 Tue 14:03]
704 CL-GENOMIC is a very interesting data-structure strategy for
705 manipulating sequence data.
709 (in-package :cl-user)
710 (asdf:oos 'asdf:compile-op :ironclad)
711 (asdf:oos 'asdf:load-op :cl-genomic)
713 (in-package :bio-sequence)
714 (make-dna "agccg") ;; fine
715 (make-aa "agccg") ;; fine
716 (make-aa "agc9zz") ;; error expected
719 ** TODO [#B] Documentation and Examples [0/3]
720 - State "TODO" from "" [2010-10-14 Thu 00:12]
722 I've started putting examples of use in function documentation. If
723 you are a lisp'er, you'll find this pendantic and insulting. Many
724 of the uses are trivial. However, this has been tested out on a
725 number of research statisticians (the primary user audience) and
728 Still need to write the
731 (evaluate-documentation-example 'function-name)
734 function, which would print out the example and run it live.
735 Hopefully with the same results. Need to setup the infrastructure,
736 but basically, we'd like something like:
738 #+name: Example-InLineDoc
741 (example-code-for-function-1)
742 (example-code-for-function-...)
743 (example-code-for-function-n))
746 and have this within the doc-string. Then the doc-string would be
747 parsed for the appropriate code and we'd get the results, evaluated
748 in a special name space derived from the object (function, class)
749 name, possibly with the corresponding functions and environment
750 set up that would be required. OR, it could just work in cl-user
751 (which is the default starting location.
753 Here are some possible common lisp systems that could be
756 *** TODO [#B] Docudown
757 - State "TODO" from "" [2010-11-05 Fri 15:34]
760 - State "TODO" from "" [2010-11-05 Fri 15:34]
762 *** TODO [#B] CLPDF, and literate data analysis
763 - State "TODO" from "" [2010-11-05 Fri 15:34]
766 Place proposals for features, work, etc here...
767 ** <2011-12-29 Thu> new stuff
768 First new proposal is to track proposals.
772 This project is dedicated to all the lisp hackers out there who
773 provided the basic infrastructure to get so far so fast with minimal
776 And to all the people trying to help to get this off the ground.