errorbar test data
[PyX/mjg.git] / manual / datafile.tex
blob160dec81955b6ffbb3d2463b8476be2fbd502c7f
1 % $Header$
2 \chapter{Module datafile: reading a datafile}
3 \label{datafile}
5 \section{Reading a table from a file}
7 The module datafile contains the class \verb|datafile| which can be
8 used to read in a table from a file. You just have to construct an
9 instance and provide a filename as the parameter, e.g.
10 \verb|datafile("testdata")|. The parsing of the file, namely the
11 columns of the table, is done by matching regular expressions. They
12 can be modified, as they are additional named arguments of the
13 constructor, which are:
15 \medskip
16 \begin{tabularx}{\linewidth}{ll>{\raggedright\arraybackslash}X}
17 argument name&default&description\\
18 \hline
19 \texttt{commentpattern}&\texttt{re.compile(r"(\#+|!+|\%+)\textbackslash s*")}&start a comment line\\
20 \texttt{stringpattern}&\texttt{re.compile(r"\textbackslash"(.*?)\textbackslash"(\textbackslash s+|\$)}&a string column\\
21 \texttt{columnpattern}&\texttt{re.compile(r"(.*?)(\textbackslash s+|\$)}&any other column\\
22 \texttt{parser}&\texttt{mathtree.parser()}&see section~\ref{datafile:cumulate}\\
23 \end{tabularx}
24 \medskip
26 The processing of the input file is done be reading line by line from
27 the file and first strip leading and tailing whitespaces of the lines.
28 Then a check is performed, whether the line matches the comment
29 pattern or not. If it does match, this rest of the line is analysed
30 like a table line when no data was read before (otherwise it is just
31 thrown away). The result is interpreted as column titles. As the
32 titles are sequentially overwritten by another comment line previous
33 to the data, finally the last non-empty comment line determines the
34 column titles.
36 Thus we have still to explain, how the reading of data lines works. We
37 create a list of entries for each column out of a given line. A line
38 resulting in an empty list (e.g. an empty line) is just ignored. As
39 shown in the table above, there is a special string column pattern.
40 When it matches it forces the interpretation of a column as a string.
41 Otherwise \verb|datafile| will try to convert the columns
42 automatically into floats except for the title line. When the
43 conversions fails, it just keeps the string.
45 The string pattern allows for columns to contain whitespaces. It
46 matches a string whenever it starts with a quote (\verb|"|) and then
47 tries to find the end of that very string by another quote immediately
48 followed by a whitespace or the end of the line. Hence a quote
49 within the string is just ignored and no kind of escaping is needed.
50 The only disadvantage is, that you cannot describe a string which
51 contains a quote and a whitespace consecutively. While, on the
52 other hand, this implementation usually just does exactly what the
53 user wants without needing any kind of special escaping, it is
54 considered to be the best solution. Additionally we want to mention,
55 that an \TeX-expression with a quote immediately follod by a
56 whitespace is a really rare construct. While strings are expected to
57 get used in \TeX-expressions later on, the syntactical limitation
58 should normally not be relevant.
60 Please note that you can easily reconfigure the patterns. For example
61 you may change the characters used between the columns.
63 Finally, the \verb|datafile| just provides two internal variables,
64 where the data is stored. In \verb|titles| holds a list of the column
65 titles. It is prefixed by the entry \verb|None| indicating an empty
66 title for the column of the line number. \verb|data| contains a list
67 for the lines, where each line is a list of the entries of that very
68 line. It is prefixed with an integer entry for the line number
69 (starting at 1). The number of columns at each line is the same and it
70 is also equal to the length of the titles list. In order to fullfill
71 this condition, entries with the value \verb|None| might be inserted.
73 \section{Accessing columns}
75 The method \verb|getcolumnno| takes a parameter as the column
76 description. If it matches exactly one entry in the titles list, the
77 number of this element is returned. Otherwise the parameter should be
78 an integer and it is checked, if this integer is a valid column index.
79 It still might be negative (counting the columns from the end).
80 When an error occurres, the exception \verb|ColumnError| is raised.
81 Examples are \verb|getcolumnno(1)| or \verb|getcolumnno("title")|.
83 The method \verb|getcolumn| takes the same argument as the method
84 \verb|getcolumnno| described above, but it returns a list with the
85 values of this very column.
87 \section{Mathematics on columns}
89 By the method \verb|addcolumn| a new column is appended to the data.
90 The method takes an string as the first parameter which is interpreted
91 as an expression. Then the expression contains an equal sign
92 (\verb|=|), everything left to the last equal sign will become the
93 title of the new column. When no equal sign is found, the title will
94 be set to \verb|None|. The part right to the last equal sign is
95 interpreted as an mathematical expression. A list of functions and
96 operators can be found in appendix~\ref{mathtree}. The expression
97 might contain variable names. These names can either directly be a
98 column title, but this is only allowed for column titles which are
99 valid variable names (e.g. they should start with a letter and contain
100 only letters, digits and the underscore). Alternatively, temporary
101 names for columns can be given by additional named parameters of the
102 \verb|addcolumn| method. The named parameters are stronger compared to
103 column titles in the determination of the appropriate column. The name
104 of a named parameter will be the variable name in the expression. The
105 value of a named parameter should be a valid parameter of the
106 \verb|getcolumnno| method. The data referenced by variables in the
107 expression need to be floats, otherwise the result for that data line
108 will be \verb|None|. Examples are \verb|addcolumn("av=(min+max)/2")|
109 or \verb|addcolumn("av=(a+b)/2", a=1, b="max")|.
111 \section{Dirty tricks for mathematics on columns}
112 \label{datafile:cumulate}
114 I want to present a solution for cumulating data in a new column.
115 As told in the title of this section, it is considered to be a dirty
116 trick, because it relies on side effects in the calculation of the new
117 column, namely it sums the result up in a hidden variable. In some
118 way, it's exactly wanted, on the other hand, it is nothing I would
119 ever consider to do officially (don't expect the following lines to
120 become part of \PyX{} in the future). On the other hand, nothing could
121 be told about using this tick and I don't expect that this feature
122 will break in future versions. Somehow, this is needed and the
123 possibility to implement something like this has to stay.
125 What we want to do is to add another function within the allowed
126 expression syntax for doing mathematics on columns. We supply a class
127 which we can hook it into the mathematical expression parser later on:
129 \begin{quote}
130 \begin{verbatim}
131 from pyx import mathtree
133 class Cumulate(mathtree.MathTreeFunc1):
135 def __init__(self, *args):
136 mathtree.MathTreeFunc1.__init__(self, "cumulate", *args)
137 self.sum = 0
139 def Calc(self, VarDict):
140 self.sum += self.ArgV[0].Calc(VarDict)
141 return self.sum
143 MyFuncs = mathtree.DefaultMathTreeFuncs + (Cumulate,)
144 \end{verbatim}
145 \end{quote}
147 Please note, that you explicitly have to import \verb|mathtree|,
148 because its not usually needed in \PyX{} applications and thus it is
149 not imported by \verb|from pyx import *|.
151 To finally use \verb|cumulate|, you have to supply a new parser to the
152 datafile, where the new generated list of available functions is
153 placed in:
155 \begin{quote}
156 \begin{verbatim}
157 df = datafile.datafile("mydata",
158 parser=mathtree.parser(MathTreeFuncs=MyFuncs))
159 df.addcolumn("sum=cumulate(costs)")
160 \end{verbatim}
161 \end{quote}