examples infrastructur
[PyX/mjg.git] / manual / data.tex
blob9b78d73c045cec2a45ebfff800ad92b090a0e672
1 \chapter{Module data}
2 \label{datafile}
4 \section{Reading a table from a file}
6 The module datafile contains the class \verb|datafile| which can be
7 used to read in a table from a file. You just have to construct an
8 instance and provide a filename as the parameter, e.g.
9 \verb|datafile("testdata")|. The parsing of the file, namely the
10 columns of the table, is done by matching regular expressions. They
11 can be modified, as they are additional named arguments of the
12 constructor, namely:
14 \medskip
15 \begin{tabularx}{\linewidth}{ll>{\raggedright\arraybackslash}X}
16 argument name&default&description\\
17 \hline
18 \texttt{commentpattern}&\texttt{re.compile(r"(\#+|!+|\%+)\textbackslash s*")}&start a comment line\\
19 \texttt{stringpattern}&\texttt{re.compile(r"\textbackslash"(.*?)\textbackslash"(\textbackslash s+|\$)}&a string column\\
20 \texttt{columnpattern}&\texttt{re.compile(r"(.*?)(\textbackslash s+|\$)}&any other column\\
21 \end{tabularx}
22 \medskip
24 The processing of the input file is done be reading the file line by
25 line and first strip leading and tailing whitespaces of the line. Then
26 a check is performed, whether the line matches the comment pattern or
27 not. If it does match, this rest of the line is analysed like a table
28 line when no data was read before (otherwise it is just thrown away).
29 The result is interpreted as column titles. As the titles are
30 sequentially overwritten by another comment line previous to the data,
31 finally the last non-empty comment line determines the column titles.
33 Thus we have still to explain, how the reading of data lines works. We
34 create a list of entries for each column out of a given line. A line
35 resulting in an empty list (e.g. an empty line) is just ignored. As
36 shown in the table above, there is a special string column pattern.
37 When it matches it forces the interpretation of a column as a string.
38 Otherwise \verb|datafile| will try to convert the columns
39 automatically into floats except for the title line. When the
40 conversions fails, it just keeps the string.
42 The string pattern allows for columns to contain whitespaces. It
43 matches a string whenever it starts with a quote (\verb|"|) and then
44 tries to find the end of that very string by another quote immediately
45 followed by a whitespace or the end of the line. Hence a quote
46 within a string is just ignored and no kind of escaping is needed. The
47 only disadvantage is, that you cannot describe a string which
48 contains a quote and a whitespace consecutively. However, you can
49 always replace the string pattern and replace the quote character.
51 Finally the number of columns is fixed to the maximal number contained
52 in the file and lines with less entries get filled with \verb|None|.
53 Also the titles list is cutted to this maximal number of columns.
55 \section{Accessing columns}
57 The method \verb|getcolumnno| takes a parameter as the column
58 description. If it matches exactly one entry in the titles list, the
59 number of this element is returned. Otherwise the parameter should be
60 an integer and it is checked, if this integer is a valid column index.
61 Like for other python indices a column number might be negative
62 counting the columns from the end. When an error occurres, the
63 exception \verb|ColumnError| is raised. Please note, that the datafile
64 inserts a first column having the index 0, which contains the line
65 number (starting at 1 and counting only data lines). Examples are
66 \verb|getcolumnno(1)| or \verb|getcolumnno("title")|.
68 The method \verb|getcolumn| takes the same argument as the method
69 \verb|getcolumnno| described above, but it returns a list with the
70 values of this very column.
72 \section{Mathematics on columns}
74 By the method \verb|addcolumn| a new column is appended. The method
75 takes a string as the first parameter which is interpreted as an
76 expression. When the expression contains an equal sign (\verb|=|),
77 everything left to the last equal sign will become the title of the
78 new column. If no equal sign is found, the title will be set to
79 \verb|None|. The part right to the last equal sign is interpreted as
80 an mathematical expression. A list of functions, predefined variables
81 and operators can be found in appendix~\ref{mathtree}. The list of
82 available functions and predefined variables can be extended by a
83 dictionary passed as the argument \verb|extern| to the constructor.
85 The expression might contain variable names. The interpretation of
86 this names is done in the following way:
87 \begin{itemize}
88 \item The names can be a column title, but this is only allowed for
89 column titles which are valid variable names (e.g. they should start
90 with a letter and contain only letters, digits and the underscore).
91 \item A temporary name might be used, which is given by additional
92 named parameters of the \verb|addcolumn| method. The named parameters
93 are stronger compared to column titles of the previous point in the
94 determination of the appropriate column. The name of a named parameter
95 will be the variable name in the expression. The value of a named
96 parameter should be a valid parameter of the \verb|getcolumnno|
97 method.
98 \item A variable name can start with the dollar symbol (\verb|$|) and
99 the following integer number will directly refer to a column number.
100 \end{itemize}
101 The data referenced by variables in the expression need to be
102 floats, otherwise the result for that data line will be \verb|None|.
103 Examples are \verb|addcolumn("av=(min+max)/2")|,
104 \verb|addcolumn("av=(a+b+$3)/2", a=1, b="max")|.
106 % \section{Dirty tricks for mathematics on columns}
107 % \label{datafile:cumulate}
109 % I want to present a solution for cumulating data in a new column.
110 % As told in the title of this section, it is considered to be a dirty
111 % trick, because it relies on side effects in the calculation of the new
112 % column, namely it sums up the result in a hidden variable. While
113 % that is wanted, it is nothing I would ever consider to do officially
114 % (don't expect the following lines to become part of \PyX{} in the
115 % future). On the other hand, nothing could be told about using this
116 % tick and I don't expect that this feature will break in future
117 % versions. Somehow, it is needed and the possibility to implement this
118 % trick has to stay.
120 % What we want to do is to add another function within the allowed
121 % expression syntax for doing mathematics on columns. We supply a class
122 % which we can hook it into the mathematical expression parser later on:
124 % \begin{quote}
125 % \begin{verbatim}
126 % from pyx import mathtree
128 % class Cumulate(mathtree.MathTreeFunc1):
130 % def __init__(self, *args):
131 % mathtree.MathTreeFunc1.__init__(self, "cumulate", *args)
132 % self.sum = 0
134 % def Calc(self, VarDict):
135 % self.sum += self.ArgV[0].Calc(VarDict)
136 % return self.sum
138 % MyFuncs = mathtree.DefaultMathTreeFuncs + (Cumulate,)
139 % \end{verbatim}
140 % \end{quote}
142 % Please note, that you explicitly have to import \verb|mathtree|,
143 % because its not usually needed in \PyX{} applications and thus it is
144 % not imported by \verb|from pyx import *|.
146 % To finally use \verb|cumulate|, you have to supply a new parser to the
147 % datafile, where the new generated list of available functions is
148 % placed in:
150 % \begin{quote}
151 % \begin{verbatim}
152 % df = datafile.datafile("mydata",
153 % parser=mathtree.parser(
154 % MathTreeFuncs=MyFuncs,
155 % MathTreeVals=datafile.MathTreeValsWithCol))
156 % df.addcolumn("sum=cumulate(costs)")
157 % \end{verbatim}
158 % \end{quote}
160 % The explicit setting of \verb|MathTreeVals| is needed in order to keep
161 % column variables working. Variables starting with the dollar symbol
162 % (\verb|$|) are not allowed within the original mathtree.
164 \section{reading data from a sectioned config file}
166 The class \verb|sectionfile| provides a reader for files in the
167 ConfigFile format (see \verb|ConfigFile| from the pyx standard
168 library).
170 \section{Own datafile readers}
172 The development of other datafile readers should be based on the
173 helper class \verb|data| by inheritance. When doing so, the methods
174 \verb|getcolumnno|, \verb|getcolumn|, and \verb|addcolumn| are
175 immediately available and the cooperation with other parts of \PyX{}
176 is assured. All what has to be done, is a call to the inherited
177 constructor supplying the title list and a list of data points. A data
178 point itself is a list of floats or strings. The number of entries per
179 data point and the number of titles provided must fit together.