From bce7226ff25ddd110137ea4fd0927c4d881da444 Mon Sep 17 00:00:00 2001 From: hellboy Date: Mon, 15 Mar 2010 02:05:09 +0100 Subject: [PATCH] gostyle.tex: strength estimation for different number of games --- tex/gostyle.tex | 57 ++++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 46 insertions(+), 11 deletions(-) diff --git a/tex/gostyle.tex b/tex/gostyle.tex index 8ea42ec..dd23c80 100644 --- a/tex/gostyle.tex +++ b/tex/gostyle.tex @@ -793,7 +793,7 @@ The sociomap has been visualised using the Team Profile Analyzer \cite{TPA} which is part of the Sociomap suite \cite{SociomapSite}. -\section{Strength Estimator} +\section{Strength Estimation} \begin{figure*}[!t] \centering @@ -818,12 +818,12 @@ Multiple independent real-world ranking scales exist the difference between scales can be up to several ranks and the rank distributions also differ. \cite{RankComparison} +\subsection{Data used} As the source game collection, we use Go Teaching Ladder reviews archive% \footnote{The reviews contain comments and variations --- we consider only the main variation with the actual played game.} \cite{GTL} --- this collection contains 7700 games of players with strength ranging -from 30-kyu to 4-dan; we consider only even games with clear rank information, -and then randomly separate 770 games as a testing set. +from 30-kyu to 4-dan; we consider only even games with clear rank information. Since the rank information is provided by the users and may not be consistent, we are forced to take a simplified look at the ranks, discarding the differences between various systems and thus somewhat @@ -831,6 +831,7 @@ increasing error in our model.\footnote{Since our results seem satisfying, we did not pursue to try another collection; one could e.g. look at game archives of some Go server.} +\subsection{PCA analysis} First, we have created a single pattern vector for each rank, from 30-kyu to 4-dan; we have performed PCA analysis on the pattern vectors, achieving near-perfect rank correspondence in the first PCA dimension% @@ -847,16 +848,49 @@ reasonably satisfying accuracy by itself.% \footnote{Extended vector normalization (sec. \ref{xnorm}) produced noticeably less clear-cut results.} -To further enhance the strength estimator accuracy, -we have tried to train a NN classifier on our train set, consisting -of one $(\vec p, {\rm rank})$ pair per player --- we use the pattern vector -for activation of input neurons and rank number as result of the output -neuron. We then proceeded to test the NN on per-player pattern vectors built -from the games in the test set, yielding MSE of TODO with TODO games per player -on average. +\subsection{Strength classifier} +To further enhance the strength estimator usability, +we have tried to use a $k$-NN classifier on the testing set, consisting +of one $(\vec p, {\rm rank})$ pair per player. The testing set was +randomly separated as $10\%$ size of the game database. +We want to learn the smallest number of games for a~player's strength to be reasonably estimated. +The player files within each rank in the database were +merged to increase the size of files and to make the pattern distribution more reliable. + +TODO zminit uz na zacatku? +Moreover, we discarded \emph{ldist} and \emph{lldist} from the pattern files, since they increase +granularity of patterns (note that we are dealing with small input files, so the benefit of +having more features -- that increase variability -- is negligible in this case). + +The results for different file sizes are shown in the table \ref{table-str-class}. The smaller the +file, the smaller the number of games needed; however, with smaller files comes bigger error. +The error is listed as either MSE, or standard deviation $\sigma$ in percentage (meaning +the difference from the real rank on average). + +TODO je v pohode ta velikost souboru?? (zalezi na reprezentaci..) +TODO jestli jen hry tak predelat i nahore +\begin{table}[!t] +% increase table row spacing, adjust to taste +\renewcommand{\arraystretch}{1.3} +\caption{TODO} +\label{table-str-class} +\centering +\begin{tabular}{|c|c|c|c|c|} +\hline +File size ($\sim$ number of games) & MSE & $\sigma \%$ \\ \hline +$500 \mathrm{kB} (\sim85)$& $0.007$ & $4\%$ \\ +$250 \mathrm{kB} (\sim43)$& $0.029$ & $8\%$ \\ +$100 \mathrm{kB} (\sim17)$& $0.081$ & $14\%$ \\ +$50 \mathrm{kB} (\sim9)$& $0.131$ & $18\%$ \\ +$10 \mathrm{kB} (\sim2)$& $0.187$ & $22\%$ \\\hline +\end{tabular} +\end{table} +Finally, we used a $8$-fold cross validation (see section \ref{crossval}) on one-file-per-rank files, +yielding MSE $0.085$ (rank encoded linearly in $[-1,1]$), which is equivalent to +standard deviation of $15\%$. -\section{Style Estimator} +\section{Style Estimation} \label{styleest} As a~second case study for our pattern analysis, @@ -1316,6 +1350,7 @@ data. The network's output was afterwards rescaled back to allow for MSE compari All input (pattern) vectors were preprocessed using PCA, reducing the input dimension from $400$ to $23$. \subsubsection{Cross-validation} +\label{crossval} To compare and evaluate all methods, we have performed $5$-fold cross validation and compared each method's performance with a~random classifier. In the $5$-fold cross-validation, we randomly divide the training set -- 2.11.4.GIT