update to rationale for this new system.
[CommonLispStat.git] / Doc / papers / CLS-philosophy.tex
blobc14e4f102aee5f2f626c9b633f681c248657f129
1 \documentclass{article}
3 \title{CLS: an approach for a new statistical system}
4 \author{AJ Rossini}
5 \date{\today}
7 \begin{document}
9 \maketitle
11 \section{Introduction}
12 \label{sec:intro}
14 Statisticians who use a computer for data analysis invariably take one
15 of two approaches (considered in the extremes here for illustration):
16 \begin{enumerate}
17 \item the \emph{FORTRAN} approach of coding numerical and algorithmic
18 information into the computer program code used for the data
19 analysis, or
20 \item the \emph{GUI} approach, via Microsoft Excel, SPSS, Minitab, and
21 similar approaches, where tasks are facilitated, sometimes with
22 accompanying workflow support.
23 \end{enumerate}
24 Both approaches have co-existed since the early 80s, with the FORTRAN
25 approach dating back to the dawn of the computing era.
27 \section{Components of a procedure}
28 \label{sec:components}
30 Statistics consists of a range of procedures that can be applied to
31 make decisions. However, the range of procedures and resulting
32 interpretation makes it difficult to actually drive a cook-book style
33 approach. But it remains that there is a color-palette of procedures
34 which can be used to support decision making from collected
35 datatsets.
37 define a statistical procedure as a decision-making approach which
38 entails the intertwining of formal and informal structure.
40 Components, first pass
41 \begin{enumerate}
42 \item \label{statproc-decision} Decision to make
43 \item \label{statproc-assessment} Assessment approach to use (some are
44 inherently different, others just look different): example, T-test
45 is a simplified ANOVA, so look different. Or ML-based (with a
46 hill-climb) vs. ML with a LS fit (look different) vs. bayesian
47 Linear regression (inherently different).
48 \item \label{statproc-normalization} Normalization of the problem for
49 assessment/comparison with other reference behaviours or
50 probabilistic processes.
51 \item \label{conclusion} Type of conclusion desired, and instance of
52 that conclusion (when data is present)
53 \end{enumerate}
55 This forms an \textit{abstract class} of a procedure, which can be
56 represented by a real class, which can then be instantiated through
57 the application of data.
59 Components from Gelman:
60 \begin{enumerate}
61 \item design or analysis state (0th order)
62 \item measurement
63 \item comparison
64 \item ??normalization??
65 \end{enumerate}
66 to characterize the application of a statistical procedure.
68 \subsection{Decision}
69 \label{sec:components:decision}
71 By example, consider the t-test as an instance of a procedure,
72 representing the general class of testing hypotheses surrounding 2
73 means. Related would be formal likelihood tests with distributions,
74 the superspace/classes from regression and ANOVA.
75 Questions could be:
76 \begin{itemize}
77 \item are the 2 means the same?
78 \item what is the difference?
79 \item what is the strength of the difference?
80 \end{itemize}
82 One component (from Gelman's dicotomy) -- comparison. And decision
83 could equal ( comparsion, extremeness from expectation ).
86 \subsection{Core Assessment}
87 \label{sec:components:assessment}
89 This is the construction of the model and parameters that would be
90 used to form the term used to make the assessment. Here, we could
91 consider
92 \begin{equation}
93 \label{eq:assess:ex:1}
94 \hat{E}[Y|G=1] - \hat{E}[Y|G=0]
95 \end{equation}
96 as the fundamental quantity to compare. This can arise from many
97 sources such as regression models
98 \begin{equation}
99 \label{eq:assess:ex:2}
100 Y = \mu + \beta G + \epsilon \\
101 E[\epsilon] = 0
102 \end{equation}
104 \begin{equation}
105 \label{eq:assess:ex:2}
106 E[Y|G] = \mu + \beta G
107 \end{equation}
109 \subsection{Normalized Behavior}
110 \label{sec:components:normbeh}
111 Let $X=(Y,G)$ from above, the whole data.
113 empirical adjustment:
114 \begin{equation}
115 \label{eq:norm:ex:1}
116 \frac{ \hat\mu_1 - \hat\mu_0}%
117 {\hat{SE}(\hat\mu_1 - \hat\mu_0)}
118 \end{equation}
119 or regression-model-based:
120 \begin{equation}
121 \label{eq:norm:ex:2}
122 \frac{ \hat\beta}%
123 {\hat{SE}(\hat\beta)}
124 \end{equation}
125 or likelihood-model-based: (FIXME!)
126 \begin{equation}
127 \label{eq:norm:ex:3}
128 -2 \log \frac{ L(\hat\beta|X)}%
129 {L(0|X)}
130 \end{equation}
131 or score-model-based:
132 \begin{equation}
133 \label{eq:norm:ex:4}
134 \cal{I}^{-1}(\beta=0,X) S(\beta=0,X)
135 \end{equation}
137 \subsection{Conclusion Desired}
138 \label{sec:component:conclusion}
140 Value or Range on the Target Scale (existing parameter describing
141 data-oriented substantive model)
143 Translation of Value/Range on the Decision Scale (what to do, what to
144 decide about the problem, i.e. in a testing framework).
146 \section{Class Implementation}
147 \label{sec:class}
150 \section{Discussion}
151 \label{sec:disc}
155 \end{document}