blob | b6211897f7fb379f2ff9ba05831534d5cdc485eb |

5 %% Použité kódování znaků: obvykle latin2, cp1250 nebo utf8:

8 %% Ostatní balíčky

34 %\hypersetup{pdftitle=Meta-learning methods for analyzing Go playing trends}

35 %\hypersetup{pdfauthor=Josef Moudřík}

38 %

39 % paper title

40 % can use linebreaks \\ within to get better formatting as desired

41 %\title{On Move Pattern Trends\\in Large Go Games Corpus}

44 % use \thanks{} to gain access to the first footnote area

45 % a separate \thanks must be used for each paragraph as LaTeX2e's \thanks

46 % was not built to handle multiple paragraphs

48 \thanks{J. Moud\v{r}\'{i}k is student at the Faculty of Math and Physics, Charles University, Prague, CZ.},~Petr~Baudi\v{s}%

50 Charles University, Prague, CZ, and also does some of his Computer

51 Go research as an employee of SUSE Labs Prague, Novell CZ.}}

52 \maketitle

55 We propose a~way of extracting a per-move evaluation of sets of Go game records.

56 The evaluations capture different aspects of the games such as patterns played

57 or statistics of sente/gote sequences (among others); using machine learning

58 algorithms, they can be used to predict arbitrary relevant target variables.

59 We apply this methodology to predict strength and playing style (e.g.

60 territoriality or aggressivity) of a player and make our predictor

61 available as an online tool, a part of the GoStyle project.

62 %% No, na tohle neni v clanku misto, pze to ma mit jen 8 stranek

63 % navic bych tyhle veci chtel zverejnit i samy o sobe, nejak dukladnejc,

64 %

65 %By inspecting the dependencies between the evaluations and the target variable,

66 %we are able to tell which patterns are bad or good (in case of strength as the

67 %target variable), or which moves e.g. constitute the territorial style of play.

68 %%

69 We propose a number of possible applications including seeding real-work ranks

70 of internet players, aiding in Go study and tuning of Go-playing programs, or

71 contribution to Go-theoretical discussion on the scope of ``playing style''.

76 The field of Computer Go usually focuses on the problem

77 of creating a~program to play the game, finding the best move from a~given

79 records with the aim of helping humans to play and understand the game better

80 instead.

82 Go is a~two-player full-information board game played

84 stones; the goal of the game is to surround the most territory and

85 capture enemy stones. We assume basic familiarity with the game.

87 Since the game has a worldwide popularity, there exist large collections

88 of Go game records, both for amateur players and professionals

90 So far, not much has been done in analysing these records using computers.

91 There are programs that serve as tools to study the opening phase of the game

92 by giving simple statistics of next move from professional

94 The professional games have also been used in computer Go;

95 patterns from the professional games

96 are used as a heuristic to improve the tree

98 any other uses.

101 we present a deeper approach. We extract different

102 kinds of information from the records to create a complex

104 composed of independent features -- each of the features

105 captures different aspect of the sample. For example,

106 we use statistics of most frequent

107 local patterns played, statistics of high and low plays

108 in different game stages, etc.

110 Using machine learning, the evaluation of the sample

111 can be used to predict relevant variables. In this work

112 for instance,

113 the sample consists of games of a player

114 and we predict his strength or playing style.

117 presents the features comprising the evaluation.

119 learning method we have used.

121 datasets -- for prediction of strength and style -- and

122 show how precisely can the prediction be conducted.

127 This section presents the methods for extracting the evaluation

128 vector (call it $ev$) from a set of games. Because we should

129 distinguish between both players in any particular game,

130 each game in the

131 set is accompanied by the color which specifies our player of

134 the $color_1$ specifies the player of interest in $game_1$.

136 The evaluation vector $ev$ is composed by concatenating several

138 aforementioned local patterns or statistics of sente and gote

139 sequences. These will be detailed in the rest of this section.

140 Some of the explanations are simplified to fit the size of

144 Firstly, we need to specify how do we process the games.

147 }

148 We have used the Pachi Go

150 from being quite a good performing Go Bot -- allows to extract

151 raw information from each game on a per-move basis.

152 For each move,

153 Pachi outputs a list of key-value pairs regarding the current move:

162 the nearest edge of the board,

166 We use this information to compute the higher level features given below.

167 The spatial pattern pictures positions of stones around the current move up to

174 }

177 The first feature collects a statistics of $N = 400$ most frequently ocurring

178 spatial patterns (together with both atari flags). The list of the $N$ most frequently

179 played patterns is computed beforehand from the whole database of games.

181 Given a set of of colored games $GC$ we then count how many times was each of the $N$

182 patterns played -- thus obtaining a vector $c$ of counts ($|c| = 400$).

183 With simple occurences count however, particular counts $c_i$ increase proportionally to

184 number of games in $GC$. To maintain invariancy under the number of games in the sample,

185 a normalization is needed. We do this by dividing the $c$ by $|GC|$, though other schemes

189 Because the concept of sente and gote is very important in real games, we devised

190 a statistics which tries to capture distribution of sente and gote plays in the games

191 from the sample. Because deciding what moves are sente or gote can be hard even

192 for human players, we restricted ourselves to what we call $\omega$-local (sente

193 and gote) sequences. The simplification has a clear assumption -- the responses to

194 a sente move are always local. We say, that a move is $\omega$-local (with respect

195 to the previous move) if its gridcular distance from previous move

198 }).

199 Of course, this assumption might not always hold, but

200 the feature proves to be useful nonetheless.

202 We than partition each game into $\omega$-local sequences (that is, each move in the

203 sequence is $\omega$-local with respect to its directly previous move) and observe

204 whether the player who started the sequence is different from the player who ended it.

205 If it is so, the $\omega$-local sequence is said to be sente for player who started it

206 because he gets to play somewhere else first (tenuki). Similarly if the player who

207 started the sequence had to respond at last we say that the sequence is gote for him.

208 Based on this partitioning, we can count the average number of sente and gote

209 sequences per game from the sample $GC$. These two numbers, along with their difference,

210 form the second feature.

213 The third feature is a two dimensional histogram, counting the average number of moves

214 in the sample played low or high in different game stages. The original idea was to help

215 to distinguish between territorial and influence based moves in the opening.

217 The first dimension is specified by

218 the move's border distance, the second one by the number of the current move. The size of each

219 dimension is given by intervals dividing the domains.

220 We use

221 $$ByMoves = \{ \langle1, 10\rangle, \langle 11, 64\rangle, \langle 65,200\rangle, \langle 201, \infty)\}$$

222 for the move coordinate -- the motivation is to (very roughly) distinguish

224 and endgame.

225 The border distance dimension is given by

228 higher plays for the rest).

230 If we use the $ByMoves$ and $ByDist$ intervals to divide the domains, we obtain a histogram

231 of total $|ByMoves| * |ByDist| = 16$ field. For each move, we increase the count in the

232 appropriate histogram fields. In the end, the whole histogram is normalized

233 to establish invariancy under the number of games scanned by dividing the

234 histogram elements by $|GC|$. These 16 numbers form the third feature.

237 Apart from the border distance feature, we realized a two-dimensional histogram

238 which counts numbers of captured stones in different game stages. The motivation is

239 simple -- especially beginners tend to capture stones because ``they could'' instead of

240 because it is the ''best move''. For example, in the opening such capture might

241 be a grave mistake.

243 As before, one of the dimensions is given by intervals

245 which try to specify the game stages (opening, middle game, endgame).

246 The second dimension has a fixed size of three bins. Along the number of captives

247 of the player of interest (the first bin), we also count the number of his

248 opponent's captives (the second bin) and a difference between the two numbers

250 These 9 numbers (again normalized by dividing by $|GC|$) are the output of the fourth

251 feature.

254 Finally, we came up with a simple feature which makes statistics of

256 We disregard forfeited, unfinished or jigo games in this feature

257 because the frequency of these events is so small it would

258 require a very large dataset to utilize them reliably.

259 }.

260 For example, quite a lot of weak players continues playing already lost games

261 until the end, mainly because their counting is not very good (they do not

262 know there is no way to win), while professionals do not hesitate to resign

263 if they think that nothing can be done.

265 For the colored games of $GC$ we count how many times did the player of interest:

267 \item win standardly,

268 \item win by resignation,

269 \item lost standardly,

270 \item and lost by resignation.

272 Again, we divide these four numbers by $|GC|$ to maintain the invariancy under number of games

273 in $GC$. Furthermore, for the games won or lost standardly we count:

275 \item average number of points the player won by for won games,

276 \item average number of points he lost by for lost games.

278 The six numbers form the last feature.

282 So far, we have learned how we can turn a set of coloured games $GC$ into

283 an evaluation. Now, we are going to study how to utilize the evaulation.

284 If we are to predict various player attributes, we need some input data

285 to learn from. Suppose we have a dataset $D$ consisting

287 corresponds to a set of colored games of $i$-th player and $y_i$ is the

288 target attribute. The $y_i$ might be fairly arbitrary, as long as it has

291 Now, lets denote our evaluation process we presented before as $eval$ and

292 let $ev_i$ be evaluation of $i$-th player, $ev_i = eval(GC_i)$. Then,

294 our training data.

295 The task of our machine learning algorithm is to generalize the knowledge

297 In the case of strength, we might therefore be able to predict strength $y_X$

298 of an unknown player $X$ given a set of his games $GC_X$ (from which we can

299 compute the evaluation $ev_X$).

301 In this work, we have used a bagged artificial neural network

302 to learn the dependency.

303 Neural networks are a standard technique in machine learning. The network is

304 composed of simple computational units which are organized in a layered topology.

306 We have used a simple feedforward neural network with 20 hidden units, trained

310 $N$ models (trained on differently sampled data) to improve their

311 performance and robustness. In this work, we used $N=20$. Please refer to the

312 paper to learn more about bagging.

316 learning algorithm (in our case the bagged neural network). A performance measure

317 allows to compare different algorithms and give estimates of method precision for

319 and testing parts and compute the error of the method on the testing part.

321 A commonly used measure is the mean square error ($MSE$) which estimates variance of

322 the error distribution. We use its square root ($RMSE$) which is an estimate of

323 standard deviation of the predictions.

327 Where the machine learning model $predict$ is trained on the

328 training data $Tr$ and $Ts$ denotes the testing data.

329 Now we will describe how do we split the data into testing and training for the

330 error estimation to be robust.

334 Cross-validation is a standard statistical technique for robust estimation of parameters.

336 iteratively compose the training and testing sets and measure errors.

337 %In each of the $k$ iterations, $k$-th fold is chosen as the testing data, and

338 %all the remaining $k-1$ folds form the training data. The division into the folds is

339 %done randomly, and so that the folds have approximately the

340 %same size.

342 cross validation.

348 One of two major domains we have tested our framework on is the prediction of player

349 strengths.

351 We have collected a large sample of games from the publicly available

356 list of players $P_r$ of the particular rank. To avoid biases caused by

358 handicap stones.

359 The set of colored games $GC_p$ for a~player $p \in P_r$ consists of the games player $p$

360 played when he had the rank $r$. We only use the $GC_p$ if the number of

362 randomly choose a subset of the sample (the size of subset is uniformly randomly

364 By cutting the number of games to a fixed number (say 50) for large

365 samples, we would create an artificial disproportion in sizes of $GC_p$,

366 which could introduce bias into the process.

367 }

370 The target variable $y$ we learn directly corresponds to the ranks:

372 for 6-dan, other values similarly. (With increasing strength, the $y$

373 decreases.)

379 The second domain is the prediction of different aspects of player styles.

382 The collection of games in this dataset comes from the Games of Go on Disk database by \citet{GoGoD}.

383 This database contains more than 70 000 games, spanning from the ancient times

384 to the present.

386 We chose a small subset of well known players (mainly from the 20th century) and

387 asked some experts (professional and strong amateur players)

388 to evaluate these players using a questionnaire. The experts (Alexander

393 %\begin{table}[h!]

395 %\caption{Styles}

397 \hline

399 Territoriality & Moyo & Territory \\

400 Orthodoxity & Classic & Novel \\

401 Aggressivity& Calm & Fighting \\

402 Thickness & Safe & Shinogi \\ \hline

405 %\caption[Definition of the style scales]{

406 %The definition of the style scales.

407 %}

408 %\label{tab:style_def}

409 %\end{table}

411 The scales try to reflect

414 }

416 stresses whether a player prefers safe, yet inherently smaller territory (number 10 on the scale),

418 For each of the selected professionals, we took 192 of his games from the GoGoD database

421 The target variable (for each of the four styles) $y$ is given by average of the answers of

430 The results in both the domains showed, that our evaluations are useful in predicting

431 different kinds of player attributes. This might have a number of possible applications.

433 So far, we have utilized some of our findings in an online web

435 by an user, it computes his evaluation and predicts his playing style

436 and recommends relevant professional players to review.

438 Other possible applications include helping the ranking algorithms to converge faster ---

439 usually, the ranking of a player is determined from his opponents' ranking by looking

440 at numbers of wins and losses (e.g. by computing an ELO rating). Our methods might improve this

441 by including the domain knowledge.

442 Similarly, a computer Go program can quickly classify the level of its

443 human opponent based on the evaluation from their previous games

444 and auto-adjust its difficulty settings accordingly

445 to provide more even games for beginners.

447 Also, it is possible to study dependencies between single elements of the evaluation vector

448 and the target variable $y$ directly. By pinpointing e.g. the patterns

449 that correlate most strongly with small strength (players who play them are weak), we can

450 warn the user not to play these. We have made some initial research into this in~\citep{Moudrik13},

451 we do not present these results here, because of space constraints.

455 This article presents a method for evaluating a player based on a sample of his games.

456 These summary evaluations turn out to be useful in many cases --- they allow us to predict

457 different player attributes (such as strength, or playing style) with reasonable accuracy.

458 We hope, that the applications of these findings can help to improve both human and computer

459 understanding in the game of Go.

464 The code used in this work

466 The majority of the source code is implemented in

469 The machine learnin part was realized using the