Doc/papers/CLS-philosophy.tex

   1 \documentclass{article}
   2
   3 \title{CLS: an approach for a new statistical system}
   4 \author{AJ Rossini}
   5 \date{\today}
   6
   7 \begin{document}
   8
   9 \maketitle
  10
  11 \section{Introduction}
  12 \label{sec:intro}
  13
  14 Statisticians who use a computer for data analysis invariably take one
  15 of two approaches (considered in the extremes here for illustration):
  16 \begin{enumerate}
  17 \item the \emph{FORTRAN} approach of coding numerical and algorithmic
  18   information into the computer program code used for the data
  19   analysis, or
  20 \item the \emph{GUI} approach, via Microsoft Excel, SPSS, Minitab, and
  21   similar approaches, where tasks are facilitated, sometimes with
  22   accompanying workflow support.
  23 \end{enumerate}
  24 Both approaches have co-existed since the early 80s, with the FORTRAN
  25 approach dating back to the dawn of the computing era.
  26
  27 \section{Components of a procedure}
  28 \label{sec:components}
  29
  30 Statistics consists of a range of procedures that can be applied to
  31 make decisions.  However, the range of procedures and resulting
  32 interpretation makes it difficult to actually drive a cook-book style
  33 approach.  But it remains that there is a color-palette of procedures
  34 which can be used to support decision making from collected
  35 datatsets.
  36
  37 define a statistical procedure as a decision-making approach which
  38 entails the intertwining of formal and informal structure.
  39
  40 Components, first pass
  41 \begin{enumerate}
  42 \item \label{statproc-decision} Decision to make
  43 \item \label{statproc-assessment} Assessment approach to use (some are
  44   inherently different, others just look different): example, T-test
  45   is a simplified ANOVA, so look different.  Or ML-based (with a
  46   hill-climb) vs. ML with a LS fit (look different) vs. bayesian
  47   Linear regression (inherently different).
  48 \item \label{statproc-normalization} Normalization of the problem for
  49   assessment/comparison with other reference behaviours or
  50   probabilistic processes.
  51 \item \label{conclusion} Type of conclusion desired, and instance of
  52   that conclusion (when data is present)
  53 \end{enumerate}
  54
  55 This forms an \textit{abstract class} of a procedure, which can be
  56 represented by a real class, which can then be instantiated through
  57 the application of data.
  58
  59 Components from Gelman:
  60 \begin{enumerate}
  61 \item design or analysis state (0th order)
  62 \item measurement
  63 \item comparison
  64 \item ??normalization??
  65 \end{enumerate}
  66 to characterize the application of a statistical procedure.
  67
  68 \subsection{Decision}
  69 \label{sec:components:decision}
  70
  71 By example, consider the t-test as an instance of a procedure,
  72 representing the general class of testing hypotheses surrounding 2
  73 means.  Related would be formal likelihood tests with distributions,
  74 the superspace/classes from regression and ANOVA.
  75 Questions could be:
  76 \begin{itemize}
  77 \item are the 2 means the same?
  78 \item what is the difference?
  79 \item what is the strength of the difference?
  80 \end{itemize}
  81
  82 One component (from Gelman's dicotomy) -- comparison.  And decision
  83 could equal ( comparsion, extremeness from expectation ).
  84
  85
  86 \subsection{Core Assessment}
  87 \label{sec:components:assessment}
  88
  89 This is the construction of the model and parameters that would be
  90 used to form the term used to make the assessment.  Here, we could
  91 consider
  92 \begin{equation}
  93   \label{eq:assess:ex:1}
  94   \hat{E}[Y|G=1] - \hat{E}[Y|G=0]
  95 \end{equation}
  96 as the fundamental quantity to compare.    This can arise from many
  97 sources such as regression models
  98 \begin{equation}
  99   \label{eq:assess:ex:2}
 100   Y = \mu + \beta G + \epsilon \\
 101   E[\epsilon] = 0
 102 \end{equation}
 103 or
 104 \begin{equation}
 105   \label{eq:assess:ex:2}
 106   E[Y|G] = \mu + \beta G
 107 \end{equation}
 108
 109 \subsection{Normalized Behavior}
 110 \label{sec:components:normbeh}
 111 Let $X=(Y,G)$ from above, the whole data.
 112
 113 empirical adjustment:
 114 \begin{equation}
 115   \label{eq:norm:ex:1}
 116   \frac{ \hat\mu_1 - \hat\mu_0}%
 117   {\hat{SE}(\hat\mu_1 - \hat\mu_0)}
 118 \end{equation}
 119 or regression-model-based:
 120 \begin{equation}
 121   \label{eq:norm:ex:2}
 122   \frac{ \hat\beta}%
 123   {\hat{SE}(\hat\beta)}
 124 \end{equation}
 125 or likelihood-model-based: (FIXME!)
 126 \begin{equation}
 127   \label{eq:norm:ex:3}
 128   -2 \log \frac{ L(\hat\beta|X)}%
 129   {L(0|X)}
 130 \end{equation}
 131 or score-model-based:
 132 \begin{equation}
 133   \label{eq:norm:ex:4}
 134   \cal{I}^{-1}(\beta=0,X) S(\beta=0,X)
 135 \end{equation}
 136
 137 \subsection{Conclusion Desired}
 138 \label{sec:component:conclusion}
 139
 140 Value or Range on the Target Scale (existing parameter describing
 141 data-oriented substantive model)
 142
 143 Translation of Value/Range on the Decision Scale (what to do, what to
 144 decide about the problem, i.e. in a testing framework).
 145
 146 \section{Class Implementation}
 147 \label{sec:class}
 148
 149
 150 \section{Discussion}
 151 \label{sec:disc}
 152
 153
 154
 155 \end{document}