From d283be7bddb2101b6d0edd311ecadd90415fc632 Mon Sep 17 00:00:00 2001 From: Petr Baudis Date: Tue, 9 Mar 2010 15:46:14 +0100 Subject: [PATCH] tex: Style, grammar corrections and other minor changes all over the map --- tex/gostyle.tex | 209 +++++++++++++++++++++++++++++++++----------------------- 1 file changed, 123 insertions(+), 86 deletions(-) diff --git a/tex/gostyle.tex b/tex/gostyle.tex index 1fc9397..f3a6efa 100644 --- a/tex/gostyle.tex +++ b/tex/gostyle.tex @@ -435,7 +435,7 @@ not statistically significant. We have chosen an intuitive and simple approach inspired by pattern features used when computing ELO ratings for candidate patterns in Computer Go play. \cite{ELO} Each pattern is a~combination of several {\em pattern features} -(name-value pairs) matched at the position of the played move. +(name--value pairs) matched at the position of the played move. We use these features: \begin{itemize} @@ -450,7 +450,7 @@ We use these features: The spatial patterns are normalized (using a dictionary) to be always black-to-play and maintain translational and rotational symmetry. -Configurations with diameter between 2 and 9 in the gridcular metric% +Configurations of radius between 2 and 9 in the gridcular metric% \footnote{The {\em gridcular} metric $d(x,y) = |\delta x| + |\delta y| + \max(|\delta x|, |\delta y|)$ defines a circle-like structure on the Go board square grid. \cite{SpatPat} } @@ -462,8 +462,10 @@ We have implemented the data extraction by making use of the pattern features matching implementation within the Pachi go-playing program (TODO). We extract information on players by converting the SGF game records to GTP (TODO) stream that feeds Pachi's {\tt patternscan} -engine which outputs a~single patternspec per move. We can then gather -all encountered patternspecs belonging to a~given player and summarize +engine which outputs a~single {\em patternspec} (string representation +of the particular pattern features combination) per move. + +We can then gather all patternspecs played by a~given player and summarize them; the $\vec p$ vector then consists of normalized counts of the given $n$ most frequent patternspecs. @@ -471,44 +473,60 @@ the given $n$ most frequent patternspecs. \section{Data Mining} \label{data-mining} -To assess the properties of gathered \emph{pattern vectors} and their influence on playing styles, -we have analysed the data by a~few basic data minining techniques. - -First two methods rely purely on data gathered from the GoGoD database. Principal component -analysis finds orthogonal vector components that have biggest variance. Reversing the process then -indicates which patterns correlate with each style. Additionally, PCA can be used as a vector-preprocessing -for methods that are negatively sensitive to \emph{pattern vector} component correlations. - -A~second method -- Kohonen's networks -- is based on the theory of self-organizing maps of neurons that -compete against each other for representation of the input space. Because neurons in the network are -organized in a two-dimensional plane, the trained network virtually spreads vectors to the 2D plane, -allowing for simple visualization. - -In addition to methods operating just upon the set of \emph{pattern vectors}, we have used and compared two methods -that approximate an unknown \emph{style vector} $\vec S$ of a player with known \emph{pattern vector} $\vec P$. - -First of them is called $k$-Nearest Neighbor (kNN) classification. -Simpy said kNN approximates $\vec S$ by the ``average'' of \emph{style vectors} of $k$ nearest \emph{pattern vectors}. -The last method is based on the theory of Neural Networks -- networks of artificial neurons, that are used -for their generalization abilities. Neural network can learn correlations between input and output vectors and -generalize the ``knowledge'' to unknown vectors sufficiently good. - -TODO rozdelit na algo/results?? +To assess the properties of gathered \emph{pattern vectors} +and their influence on playing styles, +we have processes the data using a~few basic data minining techniques. + +The first two methods ({\em analytic}) rely purely on data gathered +from the game collection +and serve to show internal structure and correlations within the data set. +Principal component analysis finds orthogonal vector components that +have the largest variance. +Reversing the process then indicates which patterns correlate with each style. +Additionally, PCA can be used as a vector-preprocessing for methods +that are negatively sensitive to \emph{pattern vector} component correlations. + +A~second method -- Kohonen maps -- is based on the theory of self-organizing maps of abstract neurons that +compete against each other for representation of the input space. +Because neurons in the network are organized in a two-dimensional plane, +the trained network virtually spreads vectors to the 2D plane, +allowing for simple visualization of clusters of players with similar styles. + +TODO: style vector -> output vector? + +Furthermore, we have used and compared two \emph{classification} methods +that approximate well-defined but unknown \emph{style vector} $\vec S$ +based on input \emph{pattern vector} $\vec P$. +The methods are calibrated based on expert or prior knowledge about +training pattern vectors and then their error is measured on a testing +set of pattern vectors. + +One of the methods is $k$-Nearest Neighbor (kNN) classifier: +we approximate $\vec S$ by composing the \emph{style vectors} of $k$ nearest \emph{pattern vectors}. +The other is based on a multi-layer feed-forward Artificial Neural Network: +the neural network can learn correlations between input and output vectors +and generalize the ``knowledge'' to unknown vectors; it can be more flexible +in the interpretation of different pattern vector elements and discern more +complex relations than the kNN classifier, but e.g. requires larger training sample. + +TODO: Dava ta posledni veta nejaky smysl?! \subsection{Principal Component Analysis} \label{data-mining} -Principal Component Analysis \emph{PCA} \cite{Jolliffe1986} is a~method we use to reduce the dimensions of -\emph{pattern vectors} while preserving as much information as possible. - -Shortly, PCA is an eigenvalue decomposition of a~covariance matrix of centered \emph{pattern vectors}. -It can be thought of as a~mapping $o$ from $n$-dimensional vector space to a~reduced $m$-dimensional vector space. -The base of this reduced vector space comprises $m$ eigenvectors of original vectors' covariance matrix. -We choose them to be the eigenvectors with biggest eigenvalues. -Ordered by decreasing eigenvalues, the eigenvectors form rows of the transformation matrix $W$. - -Finally, we represent reduced \emph{pattern vectors} as a vector of coeficients of this eigenvector-base. -For each original \emph{pattern vector} $\vec p_i$, we obtain its new representation $\vec r_i$ as shown -in the following equation: +We use Principal Component Analysis \emph{PCA} \cite{Jolliffe1986} +to reduce the dimensions of the \emph{pattern vectors} while preserving +as much information as possible. + +Briefly, PCA is an eigenvalue decomposition of a~covariance matrix of centered \emph{pattern vectors}, +producing a~linear mapping $o$ from $n$-dimensional vector space +to a~reduced $m$-dimensional vector space. +The $m$ eigenvectors of the original vectors' covariance matrix +with the largest eigenvalues are used as the base of the reduced vector space; +the eigenvectors form the transformation matrix $W$. + +For each original \emph{pattern vector} $\vec p_i$, +we obtain its new representation $\vec r_i$ in the PCA base +as shown in the following equation: \begin{equation} \vec r_i = W \cdot \vec p_i \end{equation} @@ -516,7 +534,7 @@ in the following equation: The whole process is described in the Algorithm \ref{alg:pca}. \begin{algorithm} -\caption{PCA -- Principal Component analysis} +\caption{PCA -- Principal Component Analysis} \begin{algorithmic} \label{alg:pca} \REQUIRE{$m > 0$, set of players $R$ with \emph{pattern vectors} $p_r$} @@ -528,8 +546,8 @@ The whole process is described in the Algorithm \ref{alg:pca}. \STATE $\mathit{Cov}[i,j] \leftarrow 1/|R| \cdot \sum_{r \in R}{\vec p_{ri} \cdot \vec p_{rj}}$ \ENDFOR \STATE Compute Eigenvalue Decomposition of $\mathit{Cov}$ matrix -\STATE Get $m$ biggest eigenvalues -\STATE According eigenvectors ordered by decreasing eigenvalues form rows of matrix $W$ +\STATE Get $m$ largest eigenvalues +\STATE Most significant eigenvectors ordered by decreasing eigenvalues form the rows of matrix $W$ \FOR{ $r \in R$} \STATE $\vec r_r\leftarrow W \vec p_r$ \ENDFOR @@ -538,21 +556,27 @@ The whole process is described in the Algorithm \ref{alg:pca}. \subsection{Kohonen Maps} \label{koh} -Kohonen map is a self-organizing network with neurons organized in a two-dimensional plane. -Neurons in the map compete for representation of input vector space. Each neuron $\vec n$ represents a vector -and the network is trained so that the neurons that are topologically close tend to represent vectors that -are close as well. That is realised as follows. Initially random network is sequentially trained; -in each iterations we choose a random training vector $\vec t$ and find neuron $\vec w$ that is closest -to $\vec t$ in Euclidean metric (we call $\vec w$ a \emph{winner neuron}). +Kohonen map is a self-organizing network with neurons spread over a two-dimensional plane. +Neurons in the map compete for representation of portions of the input vector space. +Each neuron $\vec n$ represents a vector +and the network is trained so that the neurons that are topologically close +tend to represent vectors that are close as well. + +First, a randomly initialized network is sequentially trained; +in each iteration, we choose a random training vector $\vec t$ +and find the neuron $\vec w$ that is closest to $\vec t$ in Euclidean metric +(we call $\vec w$ a \emph{winner neuron}). We then adapt neurons from the neighbourhood of $\vec w$ employing an equation: \begin{equation} \vec n = \vec n + \alpha \cdot \mathit{Influence}(\vec w, \vec n) \cdot (\vec t - \vec n) \end{equation} -where $\alpha$ is a learning parameter, usually decreasing in time. $Influence()$ is a function that forces neurons -to spread. Such function is usually realised using a mexican hat function or a difference-of-gaussians (see \cite{TODO} -for details). A state of the network can be valued by calculating mean square difference between each $\vec t \in T$ -and its corresponding \emph{winner neuron} $\vec w_t$: +where $\alpha$ is a learning parameter, usually decreasing in time. +$Influence()$ is a function that forces neurons to spread. +Such function is usually realised using a mexican hat function or a difference-of-gaussians +(see \cite{TODO} for details). +The state of the network can be evaluated by calculating mean square difference +between each $\vec t \in T$ and its corresponding \emph{winner neuron} $\vec w_t$: \begin{equation} \mathit{Error}(N,T) = \sum_{\vec t \in T}{|\vec w_t - \vec t|} \end{equation} @@ -582,17 +606,18 @@ and its corresponding \emph{winner neuron} $\vec w_t$: \subsection{k-nearest Neighbors Classifier} \label{knn} -K-nearest neigbors is an essential classification technique. -We use it to approximate player's \emph{style vector} $\vec S$, assuming that his \emph{pattern vector} $\vec P$ is known. -To achieve this, we utilize \emph{reference style vectors} (see section \ref{style-vectors}). - -The idea is based on a assumption that similarities in players' \emph{pattern vectors} -correlate with similarities in players' \emph{style vectors}. We try to approximate $\vec S$ -as a weighted average of \emph{style vectors} +Our goal is to approximate player's \emph{style vector} $\vec S$ +based on their \emph{pattern vector} $\vec P$. +To achieve this, we require prior knowledge of \emph{reference style vectors} +(see section \ref{style-vectors}). + +In this method, we assume that similarities in players' \emph{pattern vectors} +uniformly correlate with similarities in players' \emph{style vectors}. +We try to approximate $\vec S$ as a weighted average of \emph{style vectors} $\vec s_i$ of $k$ players with \emph{pattern vectors} $\vec p_i$ closest to $\vec P$. This is illustrated in the Algorithm \ref{alg:knn}. Note that the weight is a function of distance and it is not explicitly defined in Algorithm \ref{alg:knn}. -During our research, exponentialy decreasing weight has proven to be sufficient. +During our research, exponentially decreasing weight has proven to be sufficient. \begin{algorithm} \caption{k-Nearest Neighbors} @@ -613,45 +638,57 @@ During our research, exponentialy decreasing weight has proven to be sufficient. \subsection{Neural Network Classifier} \label{neural-net} -As an alternative to the k-Nearest Neigbors algorithm (section \ref{knn}), we have used -a classificator based on feed-forward artificial neural networks \cite{TODO}. -Neural networks (NN) are known for their ability to generalize and find correlations and patterns between -input and output data. Neural network is an adaptive system and it -must undergo a certain training before it can be reasonably used. Basically, we use -information for \emph{reference players} (for which either \emph{pattern vectors} and -\emph{style vectors} are known) as training data. +As an alternative to the k-Nearest Neigbors algorithm (section \ref{knn}), +we have used a classificator based on feed-forward artificial neural networks \cite{TODO}. +Neural networks (NN) are known for their ability to generalize +and find correlations and patterns between input and output data. +Neural network is an adaptive system that must undergo a training +period before it can be reasonably used, similarly to the requirement +of reference vectors for the k-Nearest Neighbors algorithm above. \subsubsection{Computation and activation of the NN} Technically, neural network is a network of interconnected computational units called neurons. -A feedforward neural network has a layered topology; it usually has one \emph{input layer}, one \emph{output -layer} and an arbitrary number of \emph{hidden layers} between. +A feedforward neural network has a layered topology; +it usually has one \emph{input layer}, one \emph{output layer} +and an arbitrary number of \emph{hidden layers} inbetween. Each neuron $i$ is connected to all neurons in the previous layer and each connection has its weight $w_{ij}$ -The computation proceeds in a discrete time steps. -In the first step, \emph{activation} of neurons in the \emph{input layer} is set according to the \emph{input vector}. -Then, we iteratively compute output of each neuron in next layer until the output layer is reached. The activity of -output layer is then presented as a result. +The computation proceeds in discrete time steps. +In the first step, the neurons in the \emph{input layer} +are \emph{activated} according to the \emph{input vector}. +Then, we iteratively compute output of each neuron in the next layer +until the output layer is reached. +The activity of output layer is then presented as the result. -The activation $y_i$ of neuron $i$ from the layer $I$ is computed using the following equation: +The activation $y_i$ of neuron $i$ from the layer $I$ is computed as \begin{equation} y_i = f(\sum_{j \in J}{w_{ij} y_j}) \end{equation} -where $J$ is a previous layer, while $y_j$ is the activation for neurons from $J$ layer. Function $f()$ is -called \emph{activation function} and its purpose is to bound outputs of neurons. A typical example of an activation -function is a sigmoid function.\footnote{The sigmoid function is a special case of the logistic function; it is defined by the formula -$\sigma(x)=\frac{1}{1+e^{-(rx+k)}}$, parameters control the growth rate ($r$) and the x-position ($k$).} +where $J$ is the previous layer, while $y_j$ is the activation for neurons from $J$ layer. +Function $f()$ is so-called \emph{activation function} +and its purpose is to bound the outputs of neurons. +A typical example of an activation function is the sigmoid function.% +\footnote{A special case of the logistic function, defined by the formula +$\sigma(x)=\frac{1}{1+e^{-(rx+k)}}$; parameters control the growth rate ($r$) +and the x-position ($k$).} \subsubsection{Training} -The training of the feedforward neural network usually involves some +The training of the feed-forward neural network usually involves some modification of supervised Backpropagation learning algorithm. \cite{TODO} We use first-order optimization algorithm called RPROP \cite{Riedmiller1993}. -Because the \emph{reference set} is not usually very large, we have devised a simple method for its extension. -This enhancement is based upon adding random linear combinations of \emph{style and pattern vectors} to the training set. +Because the \emph{reference set} is usually not very large, +we have devised a simple method for its extension. +This enhancement is based upon adding random linear combinations +of \emph{style and pattern vectors} to the training set. + +TODO: Tohle je puvodni napad? -As insinuated above, the training set consist of pairs of input vectors (\emph{pattern vectors}) and -desired output vectors (\emph{style vectors}). The training set $T$ is then enlarged by adding linear combinations: +As outlined above, the training set consists of pairs of +input vectors (\emph{pattern vectors}) and +desired output vectors (\emph{style vectors}). +The training set $T$ is then extended by adding the linear combinations: \begin{equation} T_\mathit{base} = \{(\vec p_r, \vec s_r) | r \in R\}\\ \end{equation} @@ -659,8 +696,8 @@ T_\mathit{base} = \{(\vec p_r, \vec s_r) | r \in R\}\\ T_\mathit{ext} = \{(\vec p, \vec s) | \exists D \subseteq R : \vec p = \sum_{d \in D}{g_d \vec p_d}, \vec s = \sum_{d \in D}{g_d \vec s_d}\} \end{equation} TODO zabudovat $g_d$ dovnitr? -where $g_d, d \in D$ are random coeficients, so that $\sum_{d \in D}{g_d} = 1$. The training set -is then constructed as: +where $g_d, d \in D$ are random coeficients, so that $\sum_{d \in D}{g_d} = 1$. +The training set is then constructed as: \begin{equation} T = T_\mathit{base} \cup \mathit{SomeFiniteSubset}(T_\mathit{ext}) \end{equation} @@ -700,7 +737,7 @@ TODO num layers, num neurons, .. \subsection{Implementation} We have implemented the data mining methods as an open-source project ``gostyle'' \cite{TODO}, -licensed under GNU GPL. +made available under the GNU GPL licence. PCA: In our implementation, we use a~library called MDP \cite{MDP}. TODO libfann -- 2.11.4.GIT