gostyle_old, commit old
[gostyle.git] / clanek_go_congress / clanek.tex
blobb6211897f7fb379f2ff9ba05831534d5cdc485eb
1 \documentclass[12pt,a4paper,notitlepage]{article}
3 \usepackage[a4paper,vmargin={20mm,20mm},hmargin={20mm,20mm}]{geometry}
5 %% Použité kódování znaků: obvykle latin2, cp1250 nebo utf8:
6 \usepackage[utf8]{inputenc}
8 %% Ostatní balíčky
9 \usepackage[titletoc]{appendix}
10 \usepackage{graphicx}
11 \usepackage{wrapfig}
12 \usepackage{color}
13 \usepackage[multiple]{footmisc}
14 \usepackage{amsthm}
15 \usepackage{amsmath}
16 \usepackage{threeparttable}
17 \usepackage{longtable}
18 \usepackage{tabularx}
19 \usepackage{amsfonts}
20 \usepackage{caption}
21 \usepackage[lined, ruled, boxed, linesnumbered]{algorithm2e}
23 \usepackage[round]{natbib} % sazba pouzite literatury
25 \usepackage{psfrag}
27 \usepackage{psgo,array}
28 \usepackage{url} % sazba URL
30 \usepackage[ps2pdf,unicode]{hyperref} % Musí být za všemi ostatními balíčky
31 \usepackage{breakurl}
34 %\hypersetup{pdftitle=Meta-learning methods for analyzing Go playing trends}
35 %\hypersetup{pdfauthor=Josef Moudřík}
37 \begin{document}
39 % paper title
40 % can use linebreaks \\ within to get better formatting as desired
41 %\title{On Move Pattern Trends\\in Large Go Games Corpus}
42 \title{Evaluating Go Game Records\\for Prediction of Player Attributes }
44 % use \thanks{} to gain access to the first footnote area
45 % a separate \thanks must be used for each paragraph as LaTeX2e's \thanks
46 % was not built to handle multiple paragraphs
47 \author{Josef~Moud\v{r}\'{i}k%
48 \thanks{J. Moud\v{r}\'{i}k is student at the Faculty of Math and Physics, Charles University, Prague, CZ.},~Petr~Baudi\v{s}%
49 \thanks{P. Baudi\v{s} is student at the Faculty of Math and Physics,
50 Charles University, Prague, CZ, and also does some of his Computer
51 Go research as an employee of SUSE Labs Prague, Novell CZ.}}
52 \maketitle
54 \begin{abstract}
55 We propose a~way of extracting a per-move evaluation of sets of Go game records.
56 The evaluations capture different aspects of the games such as patterns played
57 or statistics of sente/gote sequences (among others); using machine learning
58 algorithms, they can be used to predict arbitrary relevant target variables.
59 We apply this methodology to predict strength and playing style (e.g.
60 territoriality or aggressivity) of a player and make our predictor
61 available as an online tool, a part of the GoStyle project.
62 %% No, na tohle neni v clanku misto, pze to ma mit jen 8 stranek
63 % navic bych tyhle veci chtel zverejnit i samy o sobe, nejak dukladnejc,
65 %By inspecting the dependencies between the evaluations and the target variable,
66 %we are able to tell which patterns are bad or good (in case of strength as the
67 %target variable), or which moves e.g. constitute the territorial style of play.
69 We propose a number of possible applications including seeding real-work ranks
70 of internet players, aiding in Go study and tuning of Go-playing programs, or
71 contribution to Go-theoretical discussion on the scope of ``playing style''.
72 \end{abstract}
75 \section{Introduction}
76 The field of Computer Go usually focuses on the problem
77 of creating a~program to play the game, finding the best move from a~given
78 board position \cite{GellySilver2008}. We focus on analyzing existing game
79 records with the aim of helping humans to play and understand the game better
80 instead.
82 Go is a~two-player full-information board game played
83 on a~square grid (usually $19\times19$ lines) with black and white
84 stones; the goal of the game is to surround the most territory and
85 capture enemy stones. We assume basic familiarity with the game.
87 Since the game has a worldwide popularity, there exist large collections
88 of Go game records, both for amateur players and professionals
89 e.g. \citep{KGS,GoGoD}.
90 So far, not much has been done in analysing these records using computers.
91 There are programs that serve as tools to study the opening phase of the game
92 by giving simple statistics of next move from professional
93 games~\citep{Kombilo,MoyoGo}.
94 The professional games have also been used in computer Go;
95 patterns from the professional games
96 are used as a heuristic to improve the tree
97 searching, e.g.~\citep{PatElo}. Apart from these, we are not aware of
98 any other uses.
100 Following up our initial research \citep{GoStyleArxiv},
101 we present a deeper approach. We extract different
102 kinds of information from the records to create a complex
103 evaluation of the game sample. The \emph{evaluation} is a vector
104 composed of independent features -- each of the features
105 captures different aspect of the sample. For example,
106 we use statistics of most frequent
107 local patterns played, statistics of high and low plays
108 in different game stages, etc.
110 Using machine learning, the evaluation of the sample
111 can be used to predict relevant variables. In this work
112 for instance,
113 the sample consists of games of a player
114 and we predict his strength or playing style.
116 This paper is organized as follows. Section~\ref{sec:feat}
117 presents the features comprising the evaluation.
118 Section~\ref{sec:mach} gives details about the machine
119 learning method we have used.
120 In Section~\ref{sec:expe} we give details about our
121 datasets -- for prediction of strength and style -- and
122 show how precisely can the prediction be conducted.
123 Section~\ref{sec:disc} discusses applications and future work.
125 \section{Feature Extraction}
126 \label{sec:feat}
127 This section presents the methods for extracting the evaluation
128 vector (call it $ev$) from a set of games. Because we should
129 distinguish between both players in any particular game,
130 each game in the
131 set is accompanied by the color which specifies our player of
132 interest. The sample is therefore is regarded as a \emph{set
133 of colored games}, $GC = \{ (game_1, color_1), ...\}$. For example,
134 the $color_1$ specifies the player of interest in $game_1$.
136 The evaluation vector $ev$ is composed by concatenating several
137 sub-vectors we call \emph{features} -- examples include the
138 aforementioned local patterns or statistics of sente and gote
139 sequences. These will be detailed in the rest of this section.
140 Some of the explanations are simplified to fit the size of
141 the paper, please see \citep{Moudrik13} for precise details and algorithms.
143 \subsection{Raw Game Processing}
144 Firstly, we need to specify how do we process the games.
145 \footnote{
146 We use the standard \emph{.sgf} file format as input, \cite{SGF}.
148 We have used the Pachi Go
149 Engine~\citep{Pachi} which -- apart
150 from being quite a good performing Go Bot -- allows to extract
151 raw information from each game on a per-move basis.
152 For each move,
153 Pachi outputs a list of key-value pairs regarding the current move:
155 \begin{itemize}
156 \item \textbf{atari flag} --- whether the move put enemy stones in atari,
157 \item \textbf{atari escape flag} --- whether the move saved own stones from atari,
158 \item \textbf{capture} --- number of enemy stones the move captured,
159 \item \textbf{contiguity to last move} --- the gridcular
160 distance \footnotemark{\ref{grid}} from the last move,
161 \item \textbf{board edge distance} --- the distance from
162 the nearest edge of the board,
163 \item \textbf{spatial pattern} --- configuration of stones around the played move.
164 \end{itemize}
166 We use this information to compute the higher level features given below.
167 The spatial pattern pictures positions of stones around the current move up to
168 a certain distance.\footnote{
169 \label{grid}
170 The distance is given by the {\em gridcular} metric
171 $d(x,y) = |\delta x| + |\delta y| + \max(|\delta x|, |\delta y|)$, which produces
172 a circle-like structure on the Go board square grid \cite{SpatPat}.
173 Spatial patterns of sizes 2 to 6 are regarded.
176 \subsection{Patterns}
177 The first feature collects a statistics of $N = 400$ most frequently ocurring
178 spatial patterns (together with both atari flags). The list of the $N$ most frequently
179 played patterns is computed beforehand from the whole database of games.
181 Given a set of of colored games $GC$ we then count how many times was each of the $N$
182 patterns played -- thus obtaining a vector $c$ of counts ($|c| = 400$).
183 With simple occurences count however, particular counts $c_i$ increase proportionally to
184 number of games in $GC$. To maintain invariancy under the number of games in the sample,
185 a normalization is needed. We do this by dividing the $c$ by $|GC|$, though other schemes
186 are possible, see \citep{Moudrik13}.
188 \subsection{$\omega$-local Sente and Gote Sequences}
189 Because the concept of sente and gote is very important in real games, we devised
190 a statistics which tries to capture distribution of sente and gote plays in the games
191 from the sample. Because deciding what moves are sente or gote can be hard even
192 for human players, we restricted ourselves to what we call $\omega$-local (sente
193 and gote) sequences. The simplification has a clear assumption -- the responses to
194 a sente move are always local. We say, that a move is $\omega$-local (with respect
195 to the previous move) if its gridcular distance from previous move
196 is smaller than a fixed number $\omega$ (in this work, we used $\omega=10$\footnote{
197 Of course, different values are possible, see \citep{Moudrik13} for details.
199 Of course, this assumption might not always hold, but
200 the feature proves to be useful nonetheless.
202 We than partition each game into $\omega$-local sequences (that is, each move in the
203 sequence is $\omega$-local with respect to its directly previous move) and observe
204 whether the player who started the sequence is different from the player who ended it.
205 If it is so, the $\omega$-local sequence is said to be sente for player who started it
206 because he gets to play somewhere else first (tenuki). Similarly if the player who
207 started the sequence had to respond at last we say that the sequence is gote for him.
208 Based on this partitioning, we can count the average number of sente and gote
209 sequences per game from the sample $GC$. These two numbers, along with their difference,
210 form the second feature.
212 \subsection{Border Distance}
213 The third feature is a two dimensional histogram, counting the average number of moves
214 in the sample played low or high in different game stages. The original idea was to help
215 to distinguish between territorial and influence based moves in the opening.
217 The first dimension is specified by
218 the move's border distance, the second one by the number of the current move. The size of each
219 dimension is given by intervals dividing the domains.
220 We use
221 $$ByMoves = \{ \langle1, 10\rangle, \langle 11, 64\rangle, \langle 65,200\rangle, \langle 201, \infty)\}$$
222 for the move coordinate -- the motivation is to (very roughly) distinguish
223 between opening (say first ten moves), early middle game (moves 11-64), middle game
224 and endgame.
225 The border distance dimension is given by
226 $$ByDist = \{ \langle1, 2\rangle, \langle 3 \rangle, \langle4\rangle, \langle 5, \infty)\}$$
227 (distinguishing between first 2 lines, 3rd line of territory, 4th line of influence and
228 higher plays for the rest).
230 If we use the $ByMoves$ and $ByDist$ intervals to divide the domains, we obtain a histogram
231 of total $|ByMoves| * |ByDist| = 16$ field. For each move, we increase the count in the
232 appropriate histogram fields. In the end, the whole histogram is normalized
233 to establish invariancy under the number of games scanned by dividing the
234 histogram elements by $|GC|$. These 16 numbers form the third feature.
236 \subsection{Captured Stones}
237 Apart from the border distance feature, we realized a two-dimensional histogram
238 which counts numbers of captured stones in different game stages. The motivation is
239 simple -- especially beginners tend to capture stones because ``they could'' instead of
240 because it is the ''best move''. For example, in the opening such capture might
241 be a grave mistake.
243 As before, one of the dimensions is given by intervals
244 $$ByMoves = \{ \langle1, 60\rangle, \langle 61, 240\rangle, \langle 241, \infty)\}$$
245 which try to specify the game stages (opening, middle game, endgame).
246 The second dimension has a fixed size of three bins. Along the number of captives
247 of the player of interest (the first bin), we also count the number of his
248 opponent's captives (the second bin) and a difference between the two numbers
249 (the third bin). Together, we obtain a histogram of $|ByMoves| * 3 = 9$ elements.
250 These 9 numbers (again normalized by dividing by $|GC|$) are the output of the fourth
251 feature.
253 \subsection{Win/Loss Statistics}
254 Finally, we came up with a simple feature which makes statistics of
255 wins and losses and whether they were by points or by resignation\footnote{
256 We disregard forfeited, unfinished or jigo games in this feature
257 because the frequency of these events is so small it would
258 require a very large dataset to utilize them reliably.
260 For example, quite a lot of weak players continues playing already lost games
261 until the end, mainly because their counting is not very good (they do not
262 know there is no way to win), while professionals do not hesitate to resign
263 if they think that nothing can be done.
265 For the colored games of $GC$ we count how many times did the player of interest:
266 \begin{itemize}
267 \item win standardly,
268 \item win by resignation,
269 \item lost standardly,
270 \item and lost by resignation.
271 \end{itemize}
272 Again, we divide these four numbers by $|GC|$ to maintain the invariancy under number of games
273 in $GC$. Furthermore, for the games won or lost standardly we count:
274 \begin{itemize}
275 \item average number of points the player won by for won games,
276 \item average number of points he lost by for lost games.
277 \end{itemize}
278 The six numbers form the last feature.
280 \section{Machine Learning}
281 \label{sec:mach}
282 So far, we have learned how we can turn a set of coloured games $GC$ into
283 an evaluation. Now, we are going to study how to utilize the evaulation.
284 If we are to predict various player attributes, we need some input data
285 to learn from. Suppose we have a dataset $D$ consisting
286 of pairs $D=\{ (GC_i, y_i),...\}$, where $GC_i$
287 corresponds to a set of colored games of $i$-th player and $y_i$ is the
288 target attribute. The $y_i$ might be fairly arbitrary, as long as it has
289 \emph{some} relation to the $GC_i$. For example, $y_i$ might be $i$'s strength.
291 Now, lets denote our evaluation process we presented before as $eval$ and
292 let $ev_i$ be evaluation of $i$-th player, $ev_i = eval(GC_i)$. Then,
293 we can transform $D$ into $D_{ev} = \{(ev_i, y_i),. \}$, which forms
294 our training data.
295 The task of our machine learning algorithm is to generalize the knowledge
296 from the dataset $D_{ev}$ to predict correct $y_X$ even to previously unseen $GC_X$.
297 In the case of strength, we might therefore be able to predict strength $y_X$
298 of an unknown player $X$ given a set of his games $GC_X$ (from which we can
299 compute the evaluation $ev_X$).
301 In this work, we have used a bagged artificial neural network
302 to learn the dependency.
303 Neural networks are a standard technique in machine learning. The network is
304 composed of simple computational units which are organized in a layered topology.
305 Please see the monograph by \citet{haykin_nn} to learn more.
306 We have used a simple feedforward neural network with 20 hidden units, trained
307 using the RPROP algorithm \citep{Riedmiller1993}.
309 The bagging \citep{breimanbag96} is a method which combines an ensemble of
310 $N$ models (trained on differently sampled data) to improve their
311 performance and robustness. In this work, we used $N=20$. Please refer to the
312 paper to learn more about bagging.
314 \subsection{Measuring the Performance}
315 Given a dataset $D_{ev}$, it would be nice to estimate performance of a given machine
316 learning algorithm (in our case the bagged neural network). A performance measure
317 allows to compare different algorithms and give estimates of method precision for
318 unseen inputs. A standard way to do this is to divide the $D_{ev}$ in to training
319 and testing parts and compute the error of the method on the testing part.
321 A commonly used measure is the mean square error ($MSE$) which estimates variance of
322 the error distribution. We use its square root ($RMSE$) which is an estimate of
323 standard deviation of the predictions.
325 $$ MSE = \frac{1}{|Ts|} \sum_{(ev, y) \in Ts}{ (predict(ev) - y)^2} $$
327 Where the machine learning model $predict$ is trained on the
328 training data $Tr$ and $Ts$ denotes the testing data.
329 Now we will describe how do we split the data into testing and training for the
330 error estimation to be robust.
332 \subsubsection*{Cross-Validation}
334 Cross-validation is a standard statistical technique for robust estimation of parameters.
335 The idea is to split the data into $k$ disjunct subsets (called \emph{folds}), and then
336 iteratively compose the training and testing sets and measure errors.
337 %In each of the $k$ iterations, $k$-th fold is chosen as the testing data, and
338 %all the remaining $k-1$ folds form the training data. The division into the folds is
339 %done randomly, and so that the folds have approximately the
340 %same size.
341 Refer to~\citep{crossval} to learn more. In this work, we have used 10-fold
342 cross validation.
344 \section{Experiments and Results}
345 \label{sec:expe}
347 \subsection{Strength}
348 One of two major domains we have tested our framework on is the prediction of player
349 strengths.
350 \subsubsection*{Dataset}
351 We have collected a large sample of games from the publicly available
352 archives of the Kiseido Go server~\citep{KGSArchives}.
353 The sample consists of over 100 000 records of games in the \emph{.sgf} format~\citep{SGF}.
355 For each rank $r$ in the range of 6-dan to 20-kyu, we gathered a
356 list of players $P_r$ of the particular rank. To avoid biases caused by
357 different strategies, the sample only consists of games played on $19 \times 19$ goban without
358 handicap stones.
359 The set of colored games $GC_p$ for a~player $p \in P_r$ consists of the games player $p$
360 played when he had the rank $r$. We only use the $GC_p$ if the number of
361 games is not smaller than 10 games; if the sample is larger than 50 games, we
362 randomly choose a subset of the sample (the size of subset is uniformly randomly
363 chosen from interval $\langle 10, 50\rangle$).\footnote{
364 By cutting the number of games to a fixed number (say 50) for large
365 samples, we would create an artificial disproportion in sizes of $GC_p$,
366 which could introduce bias into the process.
369 For each of the 26 ranks, we gathered 120 such $GC_p$.
370 The target variable $y$ we learn directly corresponds to the ranks:
371 $y=20$ for rank of 20-kyu, $y=1$ for 1-kyu, $y=0$ for 1-dan, $y=-5$
372 for 6-dan, other values similarly. (With increasing strength, the $y$
373 decreases.)
375 \subsubsection*{Results}
377 \subsection{Style}
379 The second domain is the prediction of different aspects of player styles.
381 \subsubsection*{Dataset}
382 The collection of games in this dataset comes from the Games of Go on Disk database by \citet{GoGoD}.
383 This database contains more than 70 000 games, spanning from the ancient times
384 to the present.
386 We chose a small subset of well known players (mainly from the 20th century) and
387 asked some experts (professional and strong amateur players)
388 to evaluate these players using a questionnaire. The experts (Alexander
389 Dinerchtein 3-pro, Motoki Noguchi 7-dan,
390 Vladim\'{i}r Dan\v{e}k 5-dan and V\'{i}t Brunner 4-dan)
391 were asked to value the players on four scales, each ranging from 1 to 10.
393 %\begin{table}[h!]
394 \begin{center}
395 %\caption{Styles}
396 \begin{tabular}{|c|c|c|}
397 \hline
398 \textbf{Style} & \textbf{1} & \textbf{10}\\ \hline
399 Territoriality & Moyo & Territory \\
400 Orthodoxity & Classic & Novel \\
401 Aggressivity& Calm & Fighting \\
402 Thickness & Safe & Shinogi \\ \hline
403 \end{tabular}
404 \end{center}
405 %\caption[Definition of the style scales]{
406 %The definition of the style scales.
408 %\label{tab:style_def}
409 %\end{table}
411 The scales try to reflect
412 some of the traditionally perceived playing styles.\footnote{
413 Refer to~\citet{GoGoD:styles}, or~\citet{senseis:styles} to grasp the concept deeper.
415 For example, the first scale (\emph{territoriality})
416 stresses whether a player prefers safe, yet inherently smaller territory (number 10 on the scale),
417 or roughly sketched large territory (\emph{moyo}, 1 on the scale), which is however insecure.
418 For each of the selected professionals, we took 192 of his games from the GoGoD database
419 at random. We divided these games (at random) into 12 colored sets $GC$ of 16 games.
421 The target variable (for each of the four styles) $y$ is given by average of the answers of
422 the experts. Results of the questionnaire are published in~\citep{style_quest}.
424 \subsubsection*{Results}
427 \section{Discussion}
428 \label{sec:disc}
430 The results in both the domains showed, that our evaluations are useful in predicting
431 different kinds of player attributes. This might have a number of possible applications.
433 So far, we have utilized some of our findings in an online web
434 application\footnote{\url{http://gostyle.j2m.cz/webapp.html}}. Based on data submitted
435 by an user, it computes his evaluation and predicts his playing style
436 and recommends relevant professional players to review.
438 Other possible applications include helping the ranking algorithms to converge faster ---
439 usually, the ranking of a player is determined from his opponents' ranking by looking
440 at numbers of wins and losses (e.g. by computing an ELO rating). Our methods might improve this
441 by including the domain knowledge.
442 Similarly, a computer Go program can quickly classify the level of its
443 human opponent based on the evaluation from their previous games
444 and auto-adjust its difficulty settings accordingly
445 to provide more even games for beginners.
447 Also, it is possible to study dependencies between single elements of the evaluation vector
448 and the target variable $y$ directly. By pinpointing e.g. the patterns
449 that correlate most strongly with small strength (players who play them are weak), we can
450 warn the user not to play these. We have made some initial research into this in~\citep{Moudrik13},
451 we do not present these results here, because of space constraints.
453 \section{Conclusion}
454 \label{sec:conc}
455 This article presents a method for evaluating a player based on a sample of his games.
456 These summary evaluations turn out to be useful in many cases --- they allow us to predict
457 different player attributes (such as strength, or playing style) with reasonable accuracy.
458 We hope, that the applications of these findings can help to improve both human and computer
459 understanding in the game of Go.
461 \section{Implementation}
462 \label{sec:impl}
464 The code used in this work
465 is released online as a part of GoStyle project~\citep{GoStyleWeb}.
466 The majority of the source code is implemented in
467 the Python programming language~\citep{Python27}.
469 The machine learnin part was realized using the
470 Orange Datamining suite~\citep{curk05}, with the exception of
471 the Fast Artificial Neural Network library FANN~\citep{Nissen2003}.
472 We used the Pachi Go engine~\citep{Pachi} for the raw game processing.
474 \bibliographystyle{abbrvnat}
475 \bibliography{clanek}
477 \end{document}