STARTING LISP-MATRIX SWAP IN! NEXT FEW EONS! ALL IS BROKEN!
[CommonLispStat.git] / src / data / data-clos.lisp
blobffc4fd5b84e0e5bc0e3ed1fd43ad7eb8037f964a
1 ;;; -*- mode: lisp -*-
3 ;;; Time-stamp: <2008-10-03 02:15:49 tony>
4 ;;; Creation: <2008-03-12 17:18:42 blindglobe@gmail.com>
5 ;;; File: data-clos.lisp
6 ;;; Author: AJ Rossini <blindglobe@gmail.com>
7 ;;; Copyright: (c)2008, AJ Rossini. BSD, LLGPL, or GPLv2, depending
8 ;;; on how it arrives.
9 ;;; Purpose: data package for lispstat. redoing data structures
10 ;;; in a CLOS based framework.
11 ;;;
12 ;;; No real basis for work, there is a bit of new-ness and R-ness to
13 ;;; this work. In particular, the notion of relation is key and
14 ;;; integral to the analysis. Tables are related and matched
15 ;;; vectors,for example. "column" vectors are related observations
16 ;;; (by measure/recording) while "row" vectors are related readings
17 ;;; (by case)
20 ;;; What is this talk of 'release'? Klingons do not make software
21 ;;; 'releases'. Our software 'escapes', leaving a bloody trail of
22 ;;; designers and quality assurance people in its wake.
24 ;;; This organization and structure is new to the 21st Century
25 ;;; version.
27 (in-package :lisp-stat-data-clos)
29 ;;; Relational structure -- can we capture a completely unnormalized
30 ;;; data strucutre to propose possible modeling approaches, and
31 ;;; propose appropriate models and inferential strategies?
33 ;; verb-driven schema for data collection. Should encode independence
34 ;; or lack of when possible.
36 #+nil(progn
37 (def-statschema MyDB
38 :tables (list (list t1 )
39 (list t2 )
40 (list t4 ))
41 :unique-key key
42 :stat-relation '(t1 (:nest-within t2) (:nest-within t3))
43 :))
45 ;; Need to figure out typed vectors. We then map a series of typed
46 ;; vectors over to tables where columns are equal typed. In a sense,
47 ;; this is a relation (1-1) of equal-typed arrays. For the most part,
48 ;; this ends up making the R data.frame into a relational building
49 ;; block (considering 1-1 mappings using row ID as a relation).
50 ;; Is this a worthwhile generalization?
52 ;;; verbs vs semantics for dt conversion -- consider the possibily of
53 ;;; how adverbs and verbs relate, where to put which semantically to
54 ;;; allow for general approach.
56 ;;; eg. Kasper's talk on the FUSION collection of parsers.
59 ;;; Need to consider modification APIs
60 ;;; actions are:
61 ;;; - import
62 ;;; - get/set row names (case names)
63 ;;; - column names (variable names)
64 ;;; - dataset values
65 ;;; - annotation/metadata
66 ;;; - make sure that we do coherency checking in the exported
67 ;;; - functions.
68 ;;; - ...
69 ;;; - reshapeData/reformat/reshapr a reformed version of the dataset (no
70 ;;; additional input).
71 ;;; - either overwriting or not, i.e. with or without copy.
72 ;;; - check consistency of resulting data with metadata and related
73 ;;; data information.
74 ;;; -
76 (defclass data-pointer ()
77 ((store :initform nil
78 :initarg :storage
79 :accessor dataset
80 :documentation "Data storage: typed as table, array,
81 relation, or pointer/reference to such.")
82 (documentation-string :initform nil
83 :initarg :doc
84 :accessor doc-string
85 :documentation "uncomputable information
86 about statistical-dataset
87 instance.")
89 ;; the rest of this is metadata. In particular, we should find a
90 ;; more flexible, compact way to store this.
91 (case-labels :initform nil
92 :initarg :case-labels
93 :accessor case-labels
94 :documentation "labels used for describing cases (doc
95 metadata), possibly used for merging.")
96 (var-labels :initform nil
97 :initarg :var-labels
98 :accessor var-labels
99 :documentation "Variable names."))
100 (:documentation "Standard Cases by Variables Statistical-Dataset,
101 i.e. an S data.frame."))
104 (defgeneric get-variable-matrix (dataset-pointer list-of-variable-names)
105 (:documentation "retrieves a matrix whose columns are the variable
106 names in same order specified."))
108 (defgeneric get-variable-vector (dataset-pointer variable-name))
110 ;; statistical-dataset is the basic cases by variables framework.
111 ;; Need to embed this within other structures which allow for
112 ;; generalized relations. Goal is to ensure that relations imply and
113 ;; drive the potential for statistical relativeness such as
114 ;; correlation, interference, and similar concepts.
116 ;; Actions on a statistical data structure.