ESS[SAS]: somebody forgot about the SUM statement (probably me)
[ess.git] / doc / ajr-talk.tex
blob72e1798365c998f5e9b4469d7058917ee30ca615
1 %% Given in Seattle, at the National Research Center for Statistics
2 %% and the Environment, and at MathSoft, October 1997.
4 \documentclass[semhelv]{seminar}
5 %% comment out the above and use below, if you have font troubles
6 %%\documentclass[semlcmss]{seminar}
7 %%\documentclass{seminar}
9 \usepackage{slidesec}
10 \usepackage[dvips]{graphicx}
11 %\usepackage{html,heqn,htmllist} % LaTeX2HTML support
13 \begin{document}
15 \typeout{ }
16 \typeout{If you have font troubles, comment out line 5 and
17 uncomment line 7}
18 \typeout{ }
20 \begin{slide}
21 \slideheading{ESS and Literate Programming: \\
22 Tools for Efficient Statistical Programming and Data Analysis}
24 \begin{center}
25 Dr. A.J. Rossini \\
26 Statistics Department \\
27 University of South Carolina \\
28 rossini@stat.sc.edu \\
29 % \htmladdnormallink{http://www.stat.sc.edu/\~{}rossini/rsrch/seattle-nrcse/}
30 % {http://www.stat.sc.edu/\~{}rossini/rsrch/seattle-nrcse/}
31 \end{center}
33 Note: \textbf{ESS} is joint work with: Martin Maechler (ETHZ), Kurt
34 Hornik (TU-Wien), and Richard M. Heiberger (Temple).
35 \end{slide}
37 \begin{itemize}
38 \item Thanks also to Bates, Kademan, Ritter (initial versions), and
39 David Smith (3.x, 4.x).
40 \end{itemize}
42 Recently, I've been giving talks which focus on Computer/Statistician
43 interfaces, like this one. They tend to get strange responses. This
44 is not too different from the applied/theoretical Comp Sci situation,
45 where journals focus both on software practice, as well as on theory.
46 Practitioners from both groups are also very confused.
48 So for today, we will focus on one area of statistical practice, as
49 opposed to statistical theory (methods/mathematics/data analysis).
51 \begin{slide}
52 \slideheading{Introduction}
54 Topics:
55 \begin{itemize}
56 \item Computing Environments
57 \item ESS (=\emph{EMACS Speaks Statistics}), and its capabilities
58 \item Literate Programming and Literate Data Analysis
59 \item Links between ESS and Lit Prog
60 \item Future (planned?) Extensions
61 \end{itemize}
63 \end{slide}
65 \begin{itemize}
66 \item Outline of Talk
67 \end{itemize}
69 \begin{slide}
70 \slideheading{Computing Environments}
72 Computer / Statistician Interface:
73 \begin{itemize}
74 \item Command line: keyboard based, minimal mouse usage (SAS (old
75 style), S-PLUS 3.x, R, XLispStat, Minitab (old style))
76 \item Graphical: pointer (mouse)-based, ``point and click'', minimal
77 keyboard usage (S-PLUS 4, ViSta, Minitab)
78 \item Mixed Interfaces: S-PLUS 4, SAS, ViSta, Minitab
79 \end{itemize}
81 The \emph{Interface} is dependent on the particular user's practice
82 (note the repetition of packages).
83 \end{slide}
86 \begin{itemize}
87 \item Importance to Statisticians.
88 \item Issues of efficiency
89 \end{itemize}
91 \begin{slide}
92 \slideheading{Computing Environments: Design Considerations}
94 \begin{itemize}
95 \item keyboard entry is faster than mouse entry (eventually?)
96 \item mouse entry is easier than keyboard entry (initially?)
97 \item At present, keyboard interfaces allow for more
98 complex/powerful methods, but this is more a function of current
99 software than a design limitation.
100 \item Keyboard interfaces generally require memorization of some
101 commands (at the minimum: those required to find others!).
102 \end{itemize}
103 \end{slide}
105 \begin{itemize}
106 \item Not discussing using keyboard as a substitute for a
107 mouse/pointer-based interface! (i.e. keys to move the pointer, and
108 to press the pointer buttons).
109 \end{itemize}
111 \begin{slide}
112 \slideheading{Emacs}
114 \begin{itemize}
115 \item Text Editor (not a word processor)
116 \item Fully Configurable and Extensible:
117 \begin{itemize}
118 \item Communication with system processes
119 \item Automation of procedures (version control, file comparison)
120 \item Language-specific customizations
121 \end{itemize}
122 \end{itemize}
124 Currently, comes in 2 GNU Flavors:
125 \begin{itemize}
126 \item Emacs (FSF/RMS developed)
127 \item XEmacs (derivative of the above, more features, more bloat)
128 \end{itemize}
130 \end{slide}
132 Some current modes:
133 \begin{itemize}
134 \item \LaTeX
135 \item C, Fortran, SQL
136 \item Makefiles
137 \item version control
138 \end{itemize}
140 XEmacs features:
141 \begin{itemize}
142 \item Embeddable images
143 \item mixed fonts and rich-text (near-WYSIWYG LaTeX document
144 construction)
145 \end{itemize}
147 \begin{slide}
148 \slideheading{ESS (=\emph{EMACS Speaks Statistics})}
150 A package for Emacs, with the following features:
151 \begin{itemize}
152 \item Mode for generating program code for statistical packages
153 \item Inclusion of statistical processes as inferior processes
154 controlled through Emacs.
155 \item Interface for on-line help features for statistical packages
156 (via calls to the package, standard WWW documentation, etc)
157 \item Interface for generating, documenting, and reusing transcripts
158 for data analysis and development sessions.
159 \end{itemize}
161 \end{slide}
163 \begin{itemize}
164 \item Discuss the README
165 \item What are we trying to do
166 \end{itemize}
168 \begin{slide}
169 \slideheading{ESS capabilities: programming}
171 \begin{itemize}
172 \item Syntax-based color and font highlighting
173 \item Access to on-line help facilities when inferior statistical
174 processes are running; to WWW based facilities when not.
175 \item Connection to statistical processes, ability to switch
176 processes (i.e. different instantiations of the same dialect,
177 different dialects)
178 \end{itemize}
179 \end{slide}
181 \begin{itemize}
182 \item to improve code readability.
183 \item on-line help, split off into a different buffer.
184 \item connection to different processes:
185 \begin{itemize}
186 \item to compare different results
187 \item to get different help information
188 \end{itemize}
189 \end{itemize}
192 \begin{slide}
193 \slideheading{ESS capabilities: inferior process}
195 \begin{itemize}
196 \item Syntax-based color and font highlighting
197 \item Searchable Command History
198 \item Separate buffers for help output
199 \item Log (Transcript) of commands
200 \item Debugging assistance (of source)
201 \end{itemize}
202 \end{slide}
204 \begin{itemize}
205 \item to improve code readability.
206 \item ``Splus -e'' improved.
207 \item split up output by context (help, etc).
208 \end{itemize}
210 \begin{slide}
211 \slideheading{ESS capabilities: transcript}
213 \begin{itemize}
214 \item Reuse of transcripts in different dialects, for different
215 sessions.
216 \item Commenting of transcripts
217 \item Comparison of output
218 \end{itemize}
219 \end{slide}
221 \begin{itemize}
222 \item One use: authors who are targeting a number of dialects of S,
223 S-PLUS, and R.
224 \item Readability
225 \item comparison of output among different statistical packages.
226 \end{itemize}
228 %\begin{slide}
229 % \slideheading{ESS: examples}
231 % %%\includegraphics[height=3\semin,width=3\semin]{ex1.ps}
233 % %%\includegraphics[height=3\semin,width=3\semin]{ex2.ps}
235 %\end{slide}
237 %Discuss example screen shots.
239 \begin{slide}
240 \slideheading{ESS: future plans}
242 \begin{itemize}
243 \item Improved support for current languages
244 \item Additional support for other languages (Fiasco/SPSS)
245 \item Inlined statistical graphics (XEmacs)
246 \item Object browser (partial implementation exists for SAS,
247 eventually for S and XLispStat languages).
248 \item Language independent UI/GUI for data analysis.
249 \item Language independent UI/GUI for statistical instruction.
250 \end{itemize}
251 \end{slide}
253 \begin{itemize}
254 \item ESS as a general purpose statistical GUI
255 \item ESS as a general purpose statistical instruction/assistance tool
256 \begin{itemize}
257 \item Statistical help via WWW
258 \item Download similar Analyses
259 \item Guidance via Literate Data analysis
260 \end{itemize}
261 \end{itemize}
264 \begin{slide}
265 \slideheading{Why bother?}
267 \begin{itemize}
268 \item No real need to have a unifying mode (both too much and too
269 little).
270 \item Different languages have different syntax
271 \end{itemize}
272 \textbf{BUT:}
273 \begin{itemize}
274 \item Common, generic interface.
275 \item Improved documentation is important.
276 \item To this goal, want to encourage documentation and revision
277 logs:
278 \begin{itemize}
279 \item Revision and Source Control Systems (not discussed: SCCS,
280 RCS, CVS, PRCS) (automated via Emacs, or manually implemented)
281 \item Literate Programming (discussed)
282 \end{itemize}
283 \end{itemize}
284 \end{slide}
286 \begin{itemize}
287 \item Why 2 topics?
288 \item Single common mode for Literate Programming, Literate Data
289 Analysis.
290 \item Revision/Source control is extremely useful for keeping track of
291 document and program changes. But will not be discussed today.
292 \end{itemize}
294 \begin{slide}
295 \slideheading{Literate Programming: Introduction}
297 \begin{itemize}
298 \item Introduced by D.E.Knuth for developing \TeX (1984)
299 \item Intent: The task of programming should shift from telling a
300 computer what to do, to explaining to human beings what we are
301 doing.
302 \item Remark: Every use of the word \textbf{programming} (used
303 before and after this) can be exchanged for the words \textbf{data
304 analysis}.
305 \end{itemize}
306 \end{slide}
308 \begin{itemize}
309 \item Origin
310 \item Goal: a means to document results.
311 \item \emph{Data Analysis} can be thought of as the implementation of
312 a particular type/style of \emph{Programming}.
313 \end{itemize}
315 \begin{slide}
316 \slideheading{Lit Prog: Example}
317 {\tiny
318 \begin{verbatim}
319 \section{Introduction}\label{intro}
321 For regression models with interval-censored data, there is generally no simply means of
322 conditioning out the nuisance parameter. This nuisance parameter is generally a function of the
323 baseline cumulative distribution function, such as $\Lambda(t)$, the cumulative hazard function, or
324 $\Phi(t) = F(t)/S(t)$, the odds function.
326 However, with right-censored data under the proportional hazards model, we can condition out the
327 nuisance parameter. This simplifies and speeds up the resulting estimation.
329 The main body of code will be the simulation routine, given by
331 <<bag.sim.S>>=
332 bag.sim <- function(beta,N,nobs,numBSsamp) {
333 <<simulation routine.S>> ;
334 <<report and summarize results.S>> ;
336 @ %def bag.sim N nobs beta numBSsamp
338 This can be divided into dataset generation and analysis sections.
340 <<simulation routine.S>>=
341 for (i in 1:N) {
342 <<Generate data set.S>> ;
343 <<Analyse data set using imputation and bootstrap.S>> ;
345 @ %def i
346 \end{verbatim}
348 \end{slide}
350 \begin{itemize}
351 \item Basic example of literate programming.
352 \item document chunks start with '@'.
353 \item program chunks start with '\verb$<<$'
354 \end{itemize}
356 \begin{slide}
357 \slideheading{Web Processing}
359 \begin{itemize}
360 \item To get documentation: weave a web (forms: tex, html, etc).
361 \item To get the code: tangle a web
362 \end{itemize}
363 idea: don't want to read or change raw code (and discourage this!);
364 documentation should tell you everything.
365 \end{slide}
367 \begin{itemize}
368 \item Not WWW...
369 \item basically text processing.
370 \end{itemize}
372 \begin{slide}
373 \slideheading{Statistical Programming and Data Analysis}
375 Is there a difference?
376 \end{slide}
378 \begin{itemize}
379 \item Why literate programming is of interest for data analysis
380 documentation.
381 \item at the most, differences just in the intended use of the code.
383 \end{itemize}
385 \begin{slide}
386 \slideheading{Requirements in a Literate Programming Tool}
387 (for statisticians)
389 Program:
390 \begin{itemize}
391 \item Language independence
392 \item Platform independence
393 \end{itemize}
395 Statistician:
396 \begin{itemize}
397 \item discipline
398 \item interest
399 \end{itemize}
400 \end{slide}
402 \begin{itemize}
403 \item Language independence (examples: S-PLUS, C, Fortran code mixed
404 interchangeably).
405 \item Flexible and platform independent
406 \end{itemize}
408 \begin{slide}
409 \slideheading{Relationship between ESS and Lit Prog}
411 Good:
412 \begin{itemize}
413 \item Documentation (using AUC-TeX), ease of programmability
414 \item Ability to test out real code (dump straight into a running
415 process).
416 \end{itemize}
417 Bad:
418 \begin{itemize}
419 \item Not fully automated (yet!).
420 \item learn yet another language (or meta-language).
421 \end{itemize}
423 \end{slide}
425 lack of automation implies that one needs to access a command line to
426 actually produce results.
428 \begin{itemize}
429 \item Current Links
430 \end{itemize}
432 \begin{slide}
433 \slideheading{Future Extensions: ESS/Lit Prog}
435 \end{slide}
437 \begin{itemize}
438 \item Templates
439 \item Software packaging
440 \end{itemize}
442 \begin{slide}
443 \slideheading{Where to find}
445 \begin{itemize}
446 \item ESS: http://ess.stat.wisc.edu/
447 %%\htmladdnormallink{ESS} {http://www.stat.sc.edu/~rossini/projects/}:
448 \item Literate Programming:
449 \begin{itemize}
450 \item General Information:
451 http://www.desy.de/ftp/pub/userwww/projects/LitProg.html
452 %%\htmladdnormallink{General Information} {http://www.desy.de/ftp/pub/userwww/projects/LitProg.html}
453 %%\htmladdnormallink{http://www.desy.de/ftp/pub/userwww/projects/LitProg.html} {http://www.desy.de/ftp/pub/userwww/projects/LitProg.html}
454 \item %%\htmladdnormallink{Frequently Asked Questions about Lit
455 %%Prog}{http://shelob.ce.ttu.edu/daves/faq.html}:
456 Frequently Asked Questions about Lit Prog: http://shelob.ce.ttu.edu/daves/faq.html
457 %%\htmladdnormallink{http://shelob.ce.ttu.edu/daves/faq.html} {http://shelob.ce.ttu.edu/daves/faq.html}
458 \item %% \htmladdnormallink{Noweb} {http://www.cs.virginia.edu/~nr/noweb/intro.html}:
459 %%\htmladdnormallink{http://www.cs.virginia.edu/~nr/noweb/intro.html} {http://www.cs.virginia.edu/~nr/noweb/intro.html}
460 Noweb: http://www.cs.virginia.edu/~nr/noweb/intro.html
461 \end{itemize}
462 \end{itemize}
463 \end{slide}
465 \begin{itemize}
466 \item Where to get it.
467 \end{itemize}
469 \end{document}