blob | af632a3005f121433f0cf80b9526b387f8d79ae4 |

5 %% Použité kódování znaků: obvykle latin2, cp1250 nebo utf8:

8 %% Ostatní balíčky

34 %\hypersetup{pdftitle=Meta-learning methods for analyzing Go playing trends}

35 %\hypersetup{pdfauthor=Josef Moudřík}

38 %

39 % paper title

40 % can use linebreaks \\ within to get better formatting as desired

41 %\title{On Move Pattern Trends\\in Large Go Games Corpus}

44 % use \thanks{} to gain access to the first footnote area

45 % a separate \thanks must be used for each paragraph as LaTeX2e's \thanks

46 % was not built to handle multiple paragraphs

48 \thanks{J. Moud\v{r}\'{i}k is student at the Faculty of Math and Physics, Charles University, Prague, CZ.},~Petr~Baudi\v{s}%

50 in this area has been supported by the SUSE Labs and the Department

51 of Applied Mathematics, Faculty of Mathematics and Physics, Charles University.}}

52 \maketitle

55 We propose a~way of extracting and aggrgating per-move evaluations from sets of Go game records.

56 The evaluations capture different aspects of the games such as the played patterns

57 or statistics of sente/gote sequences (among others); using machine learning

58 algorithms, they can be used to predict arbitrary relevant target variables.

59 We apply this methodology to predict strength and playing style (e.g.

60 territoriality or aggressivity) of a player and make our predictor

61 available as an online tool, a part of the GoStyle project.

62 %% No, na tohle neni v clanku misto, pze to ma mit jen 8 stranek

63 % navic bych tyhle veci chtel zverejnit i samy o sobe, nejak dukladnejc,

64 %

65 %By inspecting the dependencies between the evaluations and the target variable,

66 %we are able to tell which patterns are bad or good (in case of strength as the

67 %target variable), or which moves e.g. constitute the territorial style of play.

68 %%

69 We propose a number of possible applications including seeding real-work ranks

70 of internet players, aiding in Go study and tuning of Go-playing programs, or

71 a contribution to Go-theoretical discussion on the scope of ``playing style''.

76 The field of Computer Go usually focuses on the problem

77 of creating a~program to play the game, finding the best move from a~given

79 records with the aim of helping humans to play and understand the game better

80 instead.

82 Go is a~two-player full-information board game played

84 stones; the goal of the game is to surround the most territory and

85 capture enemy stones. We assume basic familiarity with the game.

87 Since the game has a worldwide popularity, large collections

88 of Go game records have been compiled, covering both amateur and professional games,

90 So far, not much has been done to analyze these records using computers.

91 There are programs that serve as tools to study the opening phase of the game

92 by giving simple statistics of next move candidates based on professional

94 The professional games have also been used in computer Go;

95 patterns from the professional games

96 are used as a heuristic to improve the tree

98 any principially different usage.

101 we present a deeper approach. We extract different

102 kinds of information from the records to create a complex

104 composed of independent features -- each of the features

105 captures different aspect of the sample. For example,

106 we use statistics of most frequent

107 local patterns played, statistics of high and low plays

108 in different game stages, etc.

110 Using machine learning, the evaluation of the sample

111 can be used to predict relevant variables. In this work

112 for instance,

113 the sample consists of the games of a player

114 and we predict his strength or playing style.

117 presents the features comprising the evaluation.

119 learning method we have used.

121 datasets -- for prediction of strength and style -- and

122 show how precisely can the prediction be conducted.

127 This section presents the methods for extracting the evaluation

128 vector (we call it $ev$) from a set of games. Because we aggregate

129 data by player, each game in the set is accompanied by the color

130 which specifies our player of interest.

134 The evaluation vector $ev$ is composed by concatenating several

136 aforementioned local patterns or statistics of sente and gote

137 sequences. These will be detailed in the rest of this section.

138 Some of the details are edited for brevity,

144 } are processed by the Pachi Go

146 about each move in the game.

147 For each move,

148 Pachi outputs a list of key-value pairs regarding the current move:

157 the nearest edge of the board,

161 We use this information to compute the higher level features given below.

162 The spatial pattern pictures positions of stones around the current move up to

169 }

172 The first feature collects a statistics of $N = 400$ most frequently ocurring

173 spatial patterns (together with both atari flags). The list of the $N$ most frequently

174 played patterns is computed beforehand from the whole database of games. The patterns

175 are normalized to be black to play and are invariant under rotation and mirroring.

177 Given a set of colored games $GC$ we then count how many times was each of the $N$

179 With simple occurence count however, particular counts $c_i$ increase proportionally to

180 number of games in $GC$. To maintain invariancy under the number of games in the sample,

181 a normalization is needed. We do this by dividing the $\vec c$ by $|GC|$, though other schemes

185 Because the concept of sente and gote is very important in real games, we devised

186 a statistics which tries to capture distribution of sente and gote plays in the games

187 from the sample. Because deciding what moves are sente or gote can be hard even

188 for human players, we restricted ourselves to what we call $\omega$-local (sente

189 and gote) sequences. We say that a move is $\omega$-local (with respect

190 to the previous move) if its gridcular distance from previous move

192 The simplification has a clear assumption -- the responses to

193 a sente move are always local.

194 Of course, this assumption might not always hold, but

195 the feature proves to be useful nonetheless.

197 We then partition each game into $\omega$-local sequences (that is, each move in the

198 sequence is $\omega$-local with respect to its directly previous move) and observe

199 whether the player who started the sequence is different from the player who ended it.

200 If it is so, the $\omega$-local sequence is said to be sente for the player who started it

201 because he gets to play somewhere else first (tenuki). Similarly if the player who

202 started the sequence had to respond last we say that the sequence is gote for him.

203 Based on this partitioning, we can count the average number of sente and gote

204 sequences per game from the sample $GC$. These two numbers, along with their difference,

205 form the second feature.

208 The third feature is a two dimensional histogram counting the average number of moves

209 in the sample played low or high in different game stages. The original idea was to help

210 to distinguish between territorial and influence based moves in the opening.

212 The first dimension is specified by the move's border distance,

213 the second one by the number of the current move from the beginning of the game.

214 The granularity of each dimension is given by intervals dividing the domains.

215 We use the

217 division for the border distance dimension

219 higher plays for the rest).

220 The move number division is given by

221 $$ByMoves = \{ \langle1, 10\rangle, \langle 11, 64\rangle, \langle 65,200\rangle, \langle 201, \infty)\}$$

222 with the motivation to (very roughly) distinguish

224 and endgame.

226 If we use the $ByMoves$ and $ByDist$ intervals to divide the domains, we obtain a histogram

228 we increase the count in the

229 appropriate histogram field. In the end, the whole histogram is normalized

230 to establish invariancy under the number of games scanned by dividing the

231 histogram elements by $|GC|$. These 16 numbers form the third feature.

234 Apart from the border distance feature, we also maintain a two-dimensional histogram

235 which counts the numbers of captured stones in different game stages. The motivation is

236 simple -- especially beginners tend to capture stones because ``they could'' instead of

237 because it is the ''best move''. For example, such a capture might

238 be a grave mistake in the opening.

240 As before, one of the dimensions is given by the intervals

242 which try to specify the game stages (roughly: opening, middle game, endgame).

243 The second dimension has a fixed size of three bins. Along the number of captives

244 of the player of interest (the first bin), we also count the number of his

245 opponent's captives (the second bin) and a difference between the two numbers

248 Again, the colored games $GC$ are processed move by move by increasing

249 the counts of captivated stones (or 0) in the appropriate field.

250 The 9 numbers (again normalized by dividing by $|GC|$) are the output of the fourth

251 feature.

254 Finally, we used a simple feature that keeps statistics of

256 We disregard forfeited, unfinished or jigo games in this feature

257 because the frequency of these events is so small it would

258 require a very large dataset to utilize them reliably.

259 }.

260 For example, many weak players continue playing games that are already lost

261 until the end, mainly because their counting is not very good (they do not

262 know there is no way to win), while professionals do not hesitate to resign

263 if they think that nothing can be done.

265 In the colored games of $GC$, we count how many times did the player of interest:

267 \item win by counting,

268 \item win by resignation,

269 \item lost by counting,

270 \item and lost by resignation.

272 Again, we divide these four numbers by $|GC|$ to maintain the invariancy under the number of games

273 in $GC$. Furthermore, for the games won or lost in counting we count the average

274 size of the win or loss in points. The six numbers form the last feature.

278 So far, we have considered how we can turn a set of coloured games $GC$ into

279 an evaluation vector. Now, we are going to show how to utilize the evaluation.

280 To predict various player attributes, we start with an input dataset $D$ consisting

282 corresponds to a set of colored games of $i$-th player and $y_i$ is the

283 target attribute. The $y_i$ might be fairly arbitrary, as long as it has

286 Now, let's denote our evaluation process we presented before as $eval$ and

287 let $ev_i$ be evaluation of $i$-th player, $ev_i = eval(GC_i)$. Then,

289 our training dataset.

291 The task of our machine learning algorithm is to generalize the knowledge

293 In the case of strength, we might therefore be able to predict strength $y_X$

294 of an unknown player $X$ given a set of his games $GC_X$ (from which we can

295 compute the evaluation $ev_X$).

297 After testing multiple approaches, we have settled on using a bagged artificial neural network

298 as the predictor.

299 Neural networks are a standard technique in machine learning. The network is

300 composed of simple computational units which are organized in a layered topology,

302 We have used a simple feedforward neural network with 20 hidden units, trained

306 $N$ models (trained on differently sampled data) to improve their

307 performance and robustness. In this work, we used a bag of $N=20$ neural networks

308 (previous paragraph).

312 To assess the efficiency of our method and give estimates of its precision for

315 and testing parts and compute the error of the method on the testing part.

317 A commonly used measure is the mean square error ($MSE$) which estimates variance of

318 the error distribution. We use its square root ($RMSE$) which is an estimate of

319 standard deviation of the predictions

321 where the machine learning model $predict$ is trained on the

322 training data $T_r$ and $T_s$ denotes the testing data.

327 which is a standard statistical technique for robust estimation of parameters.

329 iteratively compose the training and testing sets and measure errors.

330 %In each of the $k$ iterations, $k$-th fold is chosen as the testing data, and

331 %all the remaining $k-1$ folds form the training data. The division into the folds is

332 %done randomly, and so that the folds have approximately the

333 %same size.

334 In this work, we have used 5-fold cross validation.

340 One of the two major domains we have tested our framework on is the prediction of player

341 strengths.

343 We have collected a large sample of games from the public

348 list of players $P_r$ of the particular rank. To avoid biases caused by

350 handicap stones.

351 The set of colored games $GC_p$ for a~player $p \in P_r$ consists of the games player $p$

352 played when he had the rank $r$. We only use the $GC_p$ if the number of

354 randomly choose a subset of the sample (the size of subset is uniformly randomly

356 By cutting the number of games to a fixed number (say 50) for large

357 samples, we would create an artificial disproportion in sizes of $GC_p$,

358 which could introduce bias into the process.

359 }

362 The target variable $y$ to learn from directly corresponds to the ranks:

364 for 6-dan, other values similarly. (With increasing strength, the $y$

365 decreases.)

371 with simple reference method of Mean regression, which just constantly

372 predicts average of the strengths in the dataset regardless of the evaluation vector.

374 The results show that the prediction of strength has standard deviation $\sigma$

375 (estimated by the $RMSE$ error)

376 of approximately $2.7$ rank. Under the assumption of normality

377 of errors, we can say that 68\% of predictions fall within distance of

383 \hline

386 \hline

388 Bagged NN & 2.712 \\

390 \hline

394 $RMSE$ performance of the strength prediction. The mean regression is

395 a reference model which predicts constant value (average of the

396 strengths in the dataset) regardless of the set of games. The results

397 are computed by 5-fold cross-validation.

398 }

404 The second domain is the prediction of different aspects of player styles.

407 The collection of games in this dataset comes from the Games of Go on Disk database by \citet{GoGoD}.

408 This database contains more than 70 000 professional games, spanning from the ancient times

409 to the present.

411 We chose a small subset of popular professional players (mainly from the 20th century) and

412 asked several experts (professional and strong amateur players)

413 to evaluate these players using a questionnaire. The experts (Alexander

418 %\begin{table}[h!]

420 %\caption{Styles}

422 \hline

424 Territoriality & Moyo & Territory \\

425 Orthodoxity & Classic & Novel \\

426 Aggressivity& Calm & Fighting \\

427 Thickness & Safe & Shinogi \\ \hline

430 %\caption[Definition of the style scales]{

431 %The definition of the style scales.

432 %}

433 %\label{tab:style_def}

434 %\end{table}

436 The scales try to reflect

439 }

441 stresses whether a player prefers safe, yet inherently smaller territory (number 10 on the scale),

445 For each of the selected professionals, we took 192 of his games from the GoGoD database

447 The target variable (for each of the four styles) $y$ is given by average of the answers of

454 standard deviation from correct answers of around 1.6 to be a good precision.

456 We should note that the mean regression has very small $RMSE$

457 for the scale of thickness.

458 This stems from the fact that the experts' answers from the questionnaire

459 have themselves very little variance --- our conclusion is that the scale of

465 \hline

466 &

468 \hline

472 \hline

475 \hline

479 $RMSE$ performance for prediction of different styles. The mean regression is

480 a reference model which predicts constant value (average of the

481 values of each particular style) regardless of the set of games. The results

482 are computed by 5-fold cross-validation.

483 }

491 The results in both domains have shown that our evaluations are useful in predicting

492 different kinds of player attributes. This can have a number of possible applications.

494 So far, we have utilized some of our findings in an online web

496 by the users, it evaluates their games and predicts their playing style

497 and recommends relevant professional players to review. Of course,

498 our methods for style estimation are trained on very strong players and they might

499 not thus be fully generalizable to ordinary players. Weak players might not have a consistent

500 style or the whole concept of style might not be even applicable for them. Estimating this

501 effect is however not easily possible, since we do not have data about weak players' styles.

502 Our webapp allows the users to submit their own opinion about their style, therefore we should

503 be able to consider this effect in the future research.

505 Other possible applications include helping the ranking algorithms to converge faster ---

506 usually, the ranking of a player is determined from his opponents' ranking by looking

507 at the numbers of wins and losses (e.g. by computing an Elo rating). Our methods might improve this

508 by including the domain knowledge.

509 Similarly, a computer Go program can quickly classify the level of its

510 human opponent based on the evaluation from their previous games

511 and auto-adjust its difficulty settings accordingly

512 to provide more even games for beginners.

514 It is also possible to study dependencies between single elements of the evaluation vector

515 and the target variable $y$ directly. By pinpointing e.g. the patterns

516 of the strongest correlation with bad strength (players who play them are weak), we can

517 warn the users not to play these. We have made some initial research into this in~\citep{Moudrik13},

518 we do not present these results here because of space constraints.

522 We presented a method for evaluating players based on a sample of their games.

523 These summary evaluations turn out to be useful in many cases --- they allow us to predict

524 different player attributes (such as strength, or playing style) with reasonable accuracy.

525 We believe that the applications of these findings can help to improve both human and computer

526 understanding in the game of Go.

531 The code used in this work

533 The majority of the source code is implemented in

536 The machine learning was implemented using the