clanek_go_congress/clanek.tex

   1 \documentclass[12pt,a4paper,notitlepage]{article}
   2
   3 \usepackage[a4paper,vmargin={20mm,20mm},hmargin={20mm,20mm}]{geometry}
   4
   5 %% Použité kódování znaků: obvykle latin2, cp1250 nebo utf8:
   6 \usepackage[utf8]{inputenc}
   7
   8 %% Ostatní balíčky
   9 \usepackage[titletoc]{appendix}
  10 \usepackage{graphicx}
  11 \usepackage{wrapfig}
  12 \usepackage{color}
  13 \usepackage[multiple]{footmisc}
  14 \usepackage{amsthm}
  15 \usepackage{amsmath}
  16 \usepackage{threeparttable}
  17 \usepackage{longtable}
  18 \usepackage{tabularx}
  19 \usepackage{amsfonts}
  20 \usepackage{caption}
  21 \usepackage[lined, ruled, boxed, linesnumbered]{algorithm2e}
  22
  23 \usepackage[round]{natbib}             % sazba pouzite literatury
  24
  25 \usepackage{psfrag}
  26
  27 \usepackage{psgo,array}
  28 \usepackage{url}                % sazba URL
  29
  30 \usepackage[ps2pdf,unicode]{hyperref}   % Musí být za všemi ostatními balíčky
  31 \usepackage{breakurl}
  32
  33
  34 %\hypersetup{pdftitle=Meta-learning methods for analyzing Go playing trends}
  35 %\hypersetup{pdfauthor=Josef Moudřík}
  36
  37 \begin{document}
  38 %
  39 % paper title
  40 % can use linebreaks \\ within to get better formatting as desired
  41 %\title{On Move Pattern Trends\\in  Large Go Games Corpus}
  42 \title{Evaluating Go Game Records\\for Prediction of Player Attributes }
  43
  44 % use \thanks{} to gain access to the first footnote area
  45 % a separate \thanks must be used for each paragraph as LaTeX2e's \thanks
  46 % was not built to handle multiple paragraphs
  47 \author{Josef~Moud\v{r}\'{i}k%
  48 \thanks{J. Moud\v{r}\'{i}k is student at the Faculty of Math and Physics, Charles University, Prague, CZ.},~Petr~Baudi\v{s}%
  49 \thanks{P. Baudi\v{s} is student at the Faculty of Math and Physics,
  50 Charles University, Prague, CZ, and also does some of his Computer
  51 Go research as an employee of SUSE Labs Prague, Novell CZ.}}
  52 \maketitle
  53
  54 \begin{abstract}
  55 We propose a~way of extracting a per-move evaluation of sets of Go game records.
  56 The evaluations capture different aspects of the games such as patterns played
  57 or statistics of sente/gote sequences (among others); using machine learning
  58 algorithms, they can be used to predict arbitrary relevant target variables.
  59 We apply this methodology to predict strength and playing style (e.g.
  60 territoriality or aggressivity) of a player and make our predictor
  61 available as an online tool, a part of the GoStyle project.
  62 %% No, na tohle neni v clanku misto, pze to ma mit jen 8 stranek
  63 % navic bych tyhle veci chtel zverejnit i samy o sobe, nejak dukladnejc,
  64 %
  65 %By inspecting the dependencies between the evaluations and the target variable,
  66 %we are able to tell which patterns are bad or good (in case of strength as the
  67 %target variable), or which moves e.g. constitute the territorial style of play.
  68 %%
  69 We propose a number of possible applications including seeding real-work ranks
  70 of internet players, aiding in Go study and tuning of Go-playing programs, or
  71 contribution to Go-theoretical discussion on the scope of ``playing style''.
  72 \end{abstract}
  73
  74
  75 \section{Introduction}
  76 The field of Computer Go usually focuses on the problem
  77 of creating a~program to play the game, finding the best move from a~given
  78 board position \cite{GellySilver2008}. We focus on analyzing existing game
  79 records with the aim of helping humans to play and understand the game better
  80 instead.
  81
  82 Go is a~two-player full-information board game played
  83 on a~square grid (usually $19\times19$ lines) with black and white
  84 stones; the goal of the game is to surround the most territory and
  85 capture enemy stones. We assume basic familiarity with the game.
  86
  87 Since the game has a worldwide popularity, there exist large collections
  88 of Go game records, both for amateur players and professionals
  89 e.g. \citep{KGS,GoGoD}.
  90 So far, not much has been done in analysing these records using computers.
  91 There are programs that serve as tools to study the opening phase of the game
  92 by giving simple statistics of next move from professional
  93 games~\citep{Kombilo,MoyoGo}.
  94 The professional games have also been used in computer Go;
  95 patterns from the professional games
  96 are used as a heuristic to improve the tree
  97 searching, e.g.~\citep{PatElo}. Apart from these, we are not aware of
  98 any other uses.
  99
 100 Following up our initial research \citep{GoStyleArxiv},
 101 we present a deeper approach. We extract different
 102 kinds of information from the records to create a complex
 103 evaluation of the game sample. The \emph{evaluation} is a vector
 104 composed of independent features -- each of the features
 105 captures different aspect of the sample. For example,
 106 we use statistics of most frequent
 107 local patterns played, statistics of high and low plays
 108 in different game stages, etc.
 109
 110 Using machine learning, the evaluation of the sample
 111 can be used to predict relevant variables. In this work
 112 for instance,
 113 the sample consists of games of a player
 114 and we predict his strength or playing style.
 115
 116 This paper is organized as follows. Section~\ref{sec:feat}
 117 presents the features comprising the evaluation.
 118 Section~\ref{sec:mach} gives details about the machine
 119 learning method we have used.
 120 In Section~\ref{sec:expe} we give details about our
 121 datasets -- for prediction of strength and style -- and
 122 show how precisely can the prediction be conducted.
 123 Section~\ref{sec:disc} discusses applications and future work.
 124
 125 \section{Feature Extraction}
 126 \label{sec:feat}
 127 This section presents the methods for extracting the evaluation
 128 vector (call it $ev$) from a set of games. Because we should
 129 distinguish between both players in any particular game,
 130 each game in the
 131 set is accompanied by the color which specifies our player of
 132 interest. The sample is therefore is regarded as a \emph{set
 133 of colored games}, $GC = \{ (game_1, color_1), ...\}$. For example,
 134 the $color_1$ specifies the player of interest in $game_1$.
 135
 136 The evaluation vector $ev$ is composed by concatenating several
 137 sub-vectors we call \emph{features} -- examples include the
 138 aforementioned local patterns or statistics of sente and gote
 139 sequences. These will be detailed in the rest of this section.
 140 Some of the explanations are simplified to fit the size of
 141 the paper, please see \citep{Moudrik13} for precise details and algorithms.
 142
 143 \subsection{Raw Game Processing}
 144 Firstly, we need to specify how do we process the games.
 145 \footnote{
 146 We use the standard \emph{.sgf} file format as input, \cite{SGF}.
 147 }
 148 We have used the Pachi Go
 149 Engine~\citep{Pachi} which -- apart
 150 from being quite a good performing Go Bot -- allows to extract
 151 raw information from each game on a per-move basis.
 152 For each move,
 153 Pachi outputs a list of key-value pairs regarding the current move:
 154
 155 \begin{itemize}
 156     \item \textbf{atari flag} --- whether the move put enemy stones in atari,
 157     \item \textbf{atari escape flag} --- whether the move saved own stones from atari,
 158     \item \textbf{capture} --- number of enemy stones the move captured,
 159     \item \textbf{contiguity to last move} --- the gridcular
 160         distance \footnotemark{\ref{grid}} from the last move,
 161     \item \textbf{board edge distance} --- the distance from
 162         the nearest edge of the board,
 163     \item \textbf{spatial pattern} --- configuration of stones around the played move.
 164 \end{itemize}
 165
 166 We use this information to compute the higher level features given below.
 167 The spatial pattern pictures positions of stones around the current move up to
 168 a certain distance.\footnote{
 169     \label{grid}
 170     The distance is given by the {\em gridcular} metric
 171 $d(x,y) = |\delta x| + |\delta y| + \max(|\delta x|, |\delta y|)$, which produces
 172 a circle-like structure on the Go board square grid \cite{SpatPat}.
 173 Spatial patterns of sizes 2 to 6 are regarded.
 174 }
 175
 176 \subsection{Patterns}
 177 The first feature collects a statistics of $N = 400$ most frequently ocurring
 178 spatial patterns (together with both atari flags). The list of the $N$ most frequently
 179 played patterns is computed beforehand from the whole database of games.
 180
 181 Given a set of of colored games $GC$ we then count how many times was each of the $N$
 182 patterns played -- thus obtaining a vector $c$ of counts ($|c| = 400$).
 183 With simple occurences count however, particular counts $c_i$ increase proportionally to
 184 number of games in $GC$. To maintain invariancy under the number of games in the sample,
 185 a normalization is needed. We do this by dividing the $c$ by $|GC|$, though other schemes
 186 are possible, see \citep{Moudrik13}.
 187
 188 \subsection{$\omega$-local Sente and Gote Sequences}
 189 Because the concept of sente and gote is very important in real games, we devised
 190 a statistics which tries to capture distribution of sente and gote plays in the games
 191 from the sample. Because deciding what moves are sente or gote can be hard even
 192 for human players, we restricted ourselves to what we call $\omega$-local (sente
 193 and gote) sequences. The simplification has a clear assumption -- the responses to
 194 a sente move are always local. We say, that a move is $\omega$-local (with respect
 195 to the previous move) if its gridcular distance from previous move
 196 is smaller than a fixed number $\omega$ (in this work, we used $\omega=10$\footnote{
 197     Of course, different values are possible, see \citep{Moudrik13} for details.
 198 }).
 199 Of course, this assumption might not always hold, but
 200 the feature proves to be useful nonetheless.
 201
 202 We than partition each game into $\omega$-local sequences (that is, each move in the
 203 sequence is $\omega$-local with respect to its directly previous move) and observe
 204 whether the player who started the sequence is different from the player who ended it.
 205 If it is so, the $\omega$-local sequence is said to be sente for player who started it
 206 because he gets to play somewhere else first (tenuki). Similarly if the player who
 207 started the sequence had to respond at last we say that the sequence is gote for him.
 208 Based on this partitioning, we can count the average number of sente and gote
 209 sequences per game from the sample $GC$. These two numbers, along with their difference,
 210 form the second feature.
 211
 212 \subsection{Border Distance}
 213 The third feature is a two dimensional histogram, counting the average number of moves
 214 in the sample played low or high in different game stages. The original idea was to help
 215 to distinguish between territorial and influence based moves in the opening.
 216
 217 The first dimension is specified by
 218 the move's border distance, the second one by the number of the current move. The size of each
 219 dimension is given by intervals dividing the domains.
 220 We use
 221 $$ByMoves = \{ \langle1, 10\rangle, \langle 11, 64\rangle, \langle 65,200\rangle, \langle 201, \infty)\}$$
 222 for the move coordinate -- the motivation is to (very roughly) distinguish
 223 between opening (say first ten moves), early middle game (moves 11-64), middle game
 224 and endgame.
 225 The border distance dimension is given by
 226 $$ByDist = \{ \langle1, 2\rangle, \langle 3 \rangle, \langle4\rangle, \langle 5, \infty)\}$$
 227 (distinguishing between first 2 lines, 3rd line of territory, 4th line of influence and
 228 higher plays for the rest).
 229
 230 If we use the $ByMoves$ and $ByDist$ intervals to divide the domains, we obtain a histogram
 231 of total $|ByMoves| * |ByDist| = 16$ field. For each move, we increase the count in the
 232 appropriate histogram fields.  In the end, the whole histogram is normalized
 233 to establish invariancy under the number of games scanned by dividing the
 234 histogram elements by $|GC|$. These 16 numbers form the third feature.
 235
 236 \subsection{Captured Stones}
 237 Apart from the border distance feature, we realized a two-dimensional histogram
 238 which counts numbers of captured stones in different game stages. The motivation is
 239 simple -- especially beginners tend to capture stones because ``they could'' instead of
 240 because it is the ''best move''. For example, in the opening such capture might
 241 be a grave mistake.
 242
 243 As before, one of the dimensions is given by intervals
 244 $$ByMoves = \{ \langle1, 60\rangle, \langle 61, 240\rangle, \langle 241, \infty)\}$$
 245 which try to specify the game stages (opening, middle game, endgame).
 246 The second dimension has a fixed size of three bins. Along the number of captives
 247 of the player of interest (the first bin), we also count the number of his
 248 opponent's captives (the second bin) and a difference between the two numbers
 249 (the third bin). Together, we obtain a histogram of $|ByMoves| * 3 = 9$ elements.
 250 These 9 numbers (again normalized by dividing by $|GC|$) are the output of the fourth
 251 feature.
 252
 253 \subsection{Win/Loss Statistics}
 254 Finally, we came up with a simple feature which makes statistics of
 255 wins and losses and whether they were by points or by resignation\footnote{
 256     We disregard forfeited, unfinished or jigo games in this feature
 257     because the frequency of these events is so small it would
 258     require a very large dataset to utilize them reliably.
 259 }.
 260 For example, quite a lot of weak players continues playing already lost games
 261 until the end, mainly because their counting is not very good (they do not
 262 know there is no way to win), while professionals do not hesitate to resign
 263 if they think that nothing can be done.
 264
 265 For the colored games of $GC$ we count how many times did the player of interest:
 266 \begin{itemize}
 267     \item win standardly,
 268     \item win by resignation,
 269     \item lost standardly,
 270     \item and lost by resignation.
 271 \end{itemize}
 272 Again, we divide these four numbers by $|GC|$ to maintain the invariancy under number of games
 273 in $GC$. Furthermore, for the games won or lost standardly we count:
 274 \begin{itemize}
 275     \item average number of points the player won by for won games,
 276     \item average number of points he lost by for lost games.
 277 \end{itemize}
 278 The six numbers form the last feature.
 279
 280 \section{Machine Learning}
 281 \label{sec:mach}
 282 So far, we have learned how we can turn a set of coloured games $GC$ into
 283 an evaluation. Now, we are going to study how to utilize the evaulation.
 284 If we are to predict various player attributes, we need some input data
 285 to learn from. Suppose we have a dataset $D$ consisting
 286 of pairs $D=\{ (GC_i, y_i),...\}$, where $GC_i$
 287 corresponds to a set of colored games of $i$-th player and $y_i$ is the
 288 target attribute. The $y_i$ might be fairly arbitrary, as long as it has
 289 \emph{some} relation to the $GC_i$. For example, $y_i$ might be $i$'s strength.
 290
 291 Now, lets denote our evaluation process we presented before as $eval$ and
 292 let $ev_i$ be evaluation of $i$-th player, $ev_i = eval(GC_i)$. Then,
 293 we can transform $D$ into $D_{ev} = \{(ev_i, y_i),. \}$, which forms
 294 our training data.
 295 The task of our machine learning algorithm is to generalize the knowledge
 296 from the dataset $D_{ev}$ to predict correct $y_X$ even to previously unseen $GC_X$.
 297 In the case of strength, we might therefore be able to predict strength $y_X$
 298 of an unknown player $X$ given a set of his games $GC_X$ (from which we can
 299 compute the evaluation $ev_X$).
 300
 301 In this work, we have used a bagged artificial neural network
 302 to learn the dependency.
 303 Neural networks are a standard technique in machine learning. The network is
 304 composed of simple computational units which are organized in a layered topology.
 305 Please see the monograph by \citet{haykin_nn} to learn more.
 306 We have used a simple feedforward neural network with 20 hidden units, trained
 307 using the RPROP algorithm \citep{Riedmiller1993}.
 308
 309 The bagging \citep{breimanbag96} is a method which combines an ensemble of
 310 $N$ models (trained on differently sampled data) to improve their
 311 performance and robustness. In this work, we used $N=20$. Please refer to the
 312 paper to learn more about bagging.
 313
 314 \subsection{Measuring the Performance}
 315 Given a dataset $D_{ev}$, it would be nice to estimate performance of a given machine
 316 learning algorithm (in our case the bagged neural network). A performance measure
 317 allows to compare different algorithms and give estimates of method precision for
 318 unseen inputs. A standard way to do this is to divide the $D_{ev}$ in to training
 319 and testing parts and compute the error of the method on the testing part.
 320
 321 A commonly used measure is the mean square error ($MSE$) which estimates variance of
 322 the error distribution. We use its square root ($RMSE$) which is an estimate of
 323 standard deviation of the predictions.
 324
 325 $$ MSE = \frac{1}{|Ts|} \sum_{(ev, y) \in Ts}{ (predict(ev) - y)^2} $$
 326
 327 Where the machine learning model $predict$ is trained on the
 328 training data $Tr$ and $Ts$ denotes the testing data.
 329 Now we will describe how do we split the data into testing and training for the
 330 error estimation to be robust.
 331
 332 \subsubsection*{Cross-Validation}
 333
 334 Cross-validation is a standard statistical technique for robust estimation of parameters.
 335 The idea is to split the data into $k$ disjunct subsets (called \emph{folds}), and then
 336 iteratively compose the training and testing sets and measure errors.
 337 %In each of the $k$ iterations, $k$-th fold is chosen as the testing data, and
 338 %all the remaining $k-1$ folds form the training data. The division into the folds is
 339 %done randomly, and so that the folds have approximately the
 340 %same size.
 341 Refer to~\citep{crossval} to learn more. In this work, we have used 10-fold
 342 cross validation.
 343
 344 \section{Experiments and Results}
 345 \label{sec:expe}
 346
 347 \subsection{Strength}
 348 One of two major domains we have tested our framework on is the prediction of player
 349 strengths.
 350 \subsubsection*{Dataset}
 351 We have collected a large sample of games from the publicly available
 352 archives of the Kiseido Go server~\citep{KGSArchives}.
 353 The sample consists of over 100 000 records of games in the \emph{.sgf} format~\citep{SGF}.
 354
 355 For each rank $r$ in the range of 6-dan to 20-kyu, we gathered a
 356 list of players $P_r$ of the particular rank. To avoid biases caused by
 357 different strategies, the sample only consists of games played on $19 \times 19$ goban without
 358 handicap stones.
 359 The set of colored games $GC_p$ for a~player $p \in P_r$ consists of the games player $p$
 360 played when he had the rank $r$. We only use the $GC_p$ if the number of
 361 games is not smaller than 10 games; if the sample is larger than 50 games, we
 362 randomly choose a subset of the sample (the size of subset is uniformly randomly
 363 chosen from interval $\langle 10, 50\rangle$).\footnote{
 364    By cutting the number of games to a fixed number (say 50) for large
 365    samples, we would create an artificial disproportion in sizes of $GC_p$,
 366    which could introduce bias into the process.
 367 }
 368
 369 For each of the 26 ranks, we gathered 120 such $GC_p$.
 370 The target variable $y$ we learn directly corresponds to the ranks:
 371 $y=20$ for rank of 20-kyu, $y=1$ for 1-kyu, $y=0$ for 1-dan, $y=-5$
 372 for 6-dan, other values similarly. (With increasing strength, the $y$
 373 decreases.)
 374
 375 \subsubsection*{Results}
 376
 377 \subsection{Style}
 378
 379 The second domain is the prediction of different aspects of player styles.
 380
 381 \subsubsection*{Dataset}
 382 The collection of games in this dataset comes from the Games of Go on Disk database by \citet{GoGoD}.
 383 This database contains more than 70 000 games, spanning from the ancient times
 384 to the present.
 385
 386 We chose a small subset of well known players (mainly from the 20th century) and
 387 asked some experts (professional and strong amateur players)
 388 to evaluate these players using a questionnaire. The experts (Alexander
 389 Dinerchtein 3-pro, Motoki Noguchi 7-dan,
 390 Vladim\'{i}r Dan\v{e}k 5-dan and V\'{i}t Brunner 4-dan)
 391 were asked to value the players on four scales, each ranging from 1 to 10.
 392
 393 %\begin{table}[h!]
 394 \begin{center}
 395 %\caption{Styles}
 396 \begin{tabular}{|c|c|c|}
 397 \hline
 398 \textbf{Style} & \textbf{1} & \textbf{10}\\ \hline
 399 Territoriality & Moyo & Territory \\
 400 Orthodoxity & Classic & Novel \\
 401 Aggressivity& Calm & Fighting \\
 402 Thickness & Safe & Shinogi \\ \hline
 403 \end{tabular}
 404 \end{center}
 405 %\caption[Definition of the style scales]{
 406 %The definition of the style scales.
 407 %}
 408 %\label{tab:style_def}
 409 %\end{table}
 410
 411 The scales try to reflect
 412 some of the traditionally perceived playing styles.\footnote{
 413     Refer to~\citet{GoGoD:styles}, or~\citet{senseis:styles} to grasp the concept deeper.
 414 }
 415 For example, the first scale (\emph{territoriality})
 416 stresses whether a player prefers safe, yet inherently smaller territory (number 10 on the scale),
 417 or roughly sketched large territory (\emph{moyo}, 1 on the scale), which is however insecure.
 418 For each of the selected professionals, we took 192 of his games from the GoGoD database
 419 at random. We divided these games (at random) into 12 colored sets $GC$ of 16 games.
 420
 421 The target variable (for each of the four styles) $y$ is given by average of the answers of
 422 the experts. Results of the questionnaire are published in~\citep{style_quest}.
 423
 424 \subsubsection*{Results}
 425
 426
 427 \section{Discussion}
 428 \label{sec:disc}
 429
 430 The results in both the domains showed, that our evaluations are useful in predicting
 431 different kinds of player attributes. This might have a number of possible applications.
 432
 433 So far, we have utilized some of our findings in an online web
 434 application\footnote{\url{http://gostyle.j2m.cz/webapp.html}}. Based on data submitted
 435 by an user, it computes his evaluation and predicts his playing style
 436 and recommends relevant professional players to review.
 437
 438 Other possible applications include helping the ranking algorithms to converge faster ---
 439 usually, the ranking of a player is determined from his opponents' ranking by looking
 440 at numbers of wins and losses (e.g. by computing an ELO rating). Our methods might improve this
 441 by including the domain knowledge.
 442 Similarly, a computer Go program can quickly classify the level of its
 443 human opponent based on the evaluation from their previous games
 444 and auto-adjust its difficulty settings accordingly
 445 to provide more even games for beginners.
 446
 447 Also, it is possible to study dependencies between single elements of the evaluation vector
 448 and the target variable $y$ directly. By pinpointing e.g. the patterns
 449 that correlate most strongly with small strength (players who play them are weak), we can
 450 warn the user not to play these. We have made some initial research into this in~\citep{Moudrik13},
 451 we do not present these results here, because of space constraints.
 452
 453 \section{Conclusion}
 454 \label{sec:conc}
 455 This article presents a method for evaluating a player based on a sample of his games.
 456 These summary evaluations turn out to be useful in many cases --- they allow us to predict
 457 different player attributes (such as strength, or playing style) with reasonable accuracy.
 458 We hope, that the applications of these findings can help to improve both human and computer
 459 understanding in the game of Go.
 460
 461 \section{Implementation}
 462 \label{sec:impl}
 463
 464 The code used in this work
 465 is released online as a part of GoStyle project~\citep{GoStyleWeb}.
 466 The majority of the source code is implemented in
 467 the Python programming language~\citep{Python27}.
 468
 469 The machine learnin part was realized using the
 470 Orange Datamining suite~\citep{curk05}, with the exception of
 471 the Fast Artificial Neural Network library FANN~\citep{Nissen2003}.
 472 We used the Pachi Go engine~\citep{Pachi} for the raw game processing.
 473
 474 \bibliographystyle{abbrvnat}
 475 \bibliography{clanek}
 476
 477 \end{document}