diff options
author | InigoGutierrez <inigogf.95@gmail.com> | 2023-06-12 20:16:04 +0200 |
---|---|---|
committer | InigoGutierrez <inigogf.95@gmail.com> | 2023-06-12 20:16:04 +0200 |
commit | d4a81490bf1396089eb3dac5955a3a8e4cb26e37 (patch) | |
tree | f96febc7950c2742bc36f04ab13bff56851f2388 /doc/tex/results.tex | |
parent | b08408d23186205e71dfc68634021e3236bfb45c (diff) | |
parent | 65ac3a6b050dcb88688cdc2654b1ed6693e9a160 (diff) | |
download | imago-master.tar.gz imago-master.zip |
Diffstat (limited to 'doc/tex/results.tex')
-rw-r--r-- | doc/tex/results.tex | 116 |
1 files changed, 62 insertions, 54 deletions
diff --git a/doc/tex/results.tex b/doc/tex/results.tex index 3c586e4..e166873 100644 --- a/doc/tex/results.tex +++ b/doc/tex/results.tex @@ -30,13 +30,16 @@ \section{Results} +This section shows an analysis of the successes and failures of the final +product and the strength of play of its various algorithms. + \subsection{Monte Carlo Tree Search Evaluation} The Monte Carlo Algorithm tries to explore the tree of possibilities as efficiently as possible. With this approach, it can be expected to fail when -alone on a problem such big as the game of Go. Nonetheless, there are some areas -where it can be useful. It will be evaluated by its capabilities while playing -games but also when presented with go problems. +alone on a problem such as big as the game of Go. Nonetheless, there are some +areas where it can be useful. It will be evaluated by its capabilities while +playing games but also when presented with Go problems. The Monte Carlo algorithm has been set to do 5 explorations with 10 simulations each when it is asked for a move. In the hardware used this makes it think for @@ -57,7 +60,8 @@ Some moves could be debated as sensible, but many of them are on the second or fist line, which so early on the game are the two worst places to play at. It seems clear that this algorithm, at least with this specific configuration, -can't handle the size of an empty Go board, even on the 9x9 size. +can't handle the tree of possibilities of an empty Go board even at the 9x9 +size. A record of the game is shown in \fref{fig:mctsVSmcts}. @@ -72,56 +76,63 @@ A record of the game is shown in \fref{fig:mctsVSmcts}. Since tree exploration on smaller boards or advanced games with little empty spaces should be easier the algorithm has also been tested on some Go problems. -A Go problem or tsumego is a predefined layout of the board, or of a section of -the board, for which the player must find some beneficial move. Life and death -problems are a subset of tsumegos in which the survival of a group depends on -finding the correct sequence to save or kill the group. One collection of such -tsumegos is \textit{Cho Chikun's Encyclopedia of Life and Death}, part of which -are available on OGS\cite{ogsLifeAndDeath}, an online go server. +A Go problem or \gls{tsumego} is a predefined layout of the board, or of a +section of the board, for which the player must find some beneficial move. Life +and death problems are a subset of \gls{tsumego}s in which the aim is to find +the correct sequence to save or kill a group. One collection of such +\gls{tsumego}s is \textit{Cho Chikun Life and Death Dictionary} +\cite{choLifeAndDeath}, part of which are available on OGS +\cite{ogsLifeAndDeath}, an online Go server. The first of these problems and what the algorithm suggested as moves is shown in \fref{fig:mctsProblem01}. Black makes the first move, which means the solution is to find some favorable outcome for black, which in this case is killing the white group. The white -group has a critical point on B1. If white plays on B1 they make two eyes and -live, but if black plays there first white can't make two eyes and dies, so B1 -is the solution to the tsumego. This is one of the easiest Go problems. +group has a \gls{criticalPoint} on B1. If white plays on B1 they make two +\glspl{eye} and live, but if black plays there first white can't make two +\glspl{eye} and dies, so B1 is the solution to the \gls{tsumego}. This is one of +the easiest Go problems. The algorithm neglects this solution. While asked five times to generate a move for the starting position it suggested B1 only once. But notably, after making another move, it consistently suggested B1 for white, -which is the solution now that white has to play. So in the end it was able to -solve the tsumego, probably because after making a move it had already explored -part of the tree but it was difficult that it explored the solution for the -first move. +which is now the correct move for white. So in the end it was able to solve the +\gls{tsumego}, probably because after making a move it had already explored part +of the tree but it was difficult that it explored the solution for the first +move, that is, in its first exploration cycle. -The engine was tested against other tsumegos but it was not able to solve them, -so no more are shown here. +The engine was tested against other \gls{tsumego}s but it was not able to solve +them, so no more are shown here. This was the one simple enough for an engine of +this characteristics and running on the testing hardware to solve. \subsection{Neural Network Training} -Each network has been training by receiving batches of SGF files which are first -converted to lists of moves by the Training System and then provided to the -train function of the networks. This has been executed with the following -command: +Each network has been training by receiving batches of \acrshort{sgf} files +which are first converted to lists of moves by the Training System and then +provided to the train function of the networks. This has been executed with the +following command: \inputminted[fontsize=\small]{bash}{listings/trainCommand.sh} -Which lists the contents of a folder containing multiple SGF files, shuffles the -list, takes some of them and passes them to train.py as arguments. The combined -list of game positions and moves made from them in all the games selected make -up the sample of the training. The number of files selected can of course be -adapted to the situation. +Which lists the contents of a folder containing multiple \acrshort{sgf} files, +shuffles the list, takes some of them and passes them to train.py as arguments. +The combined list of game positions and moves made from them in all the games +selected make up the sample of the training. The number of files selected can of +course be adapted to the situation. The networks were configured to train for 20 epochs, that is, processing the full batch and updating the weights on the network 20 times. 10\% of the sample -were used as validation. This means they were not used for training but to -check the accuracy and loss of the network after training with the rest of -the batch. This technique of validation can help detect overfitting, which -happens when a network does a very good job on the population it has been -trained on but it fails when receiving unseen data. +was used as validation. This means they were not used for training but to check +the accuracy and loss of the network after training with the rest of the batch. +This technique of validation can help detect overfitting, which happens when a +network does a very good job on the population it has been trained on but it +fails when receiving unseen data. + +The outputs from this training process can be seen on \lref{code:denseTraining} +and \lref{code:convTraining} for the dense network and the convolutional +network respectively. The training shows loss decrementing and accuracy incrementing on each epoch for both training and validation samples, which suggests there is learning happening @@ -134,10 +145,6 @@ as with batches of 10 games an epoch of training could be completed in one minute, in three minutes batches of 20 games, and in up to an hour with batches of 100 games. -The outputs from this training process can be seen on \flist{code:denseTraining} -and \flist{code:convTraining} for the dense network and the convolutional -network respectively. - \begin{listing}[h] \inputminted[fontsize=\scriptsize]{text}{listings/denseTraining.txt} \caption{Dense neural network training log.} @@ -163,8 +170,8 @@ of predicting human moves through the training epochs. Another way is to make the network play either against itself, another network, or a human player, and analyze the score it provides to each possible play. The -output of the network is a vector with the likelihood of making each move, so -we can take this values as how strongly the engine suggests each of them. +output of the network is a vector with the likelihood of making each move, so we +can take these values as how strongly the engine suggests each of them. \subsection{Neural Networks Against Themselves} @@ -184,11 +191,11 @@ move), can be seen on Figs.~\ref{fig:denseVSdense01}, \ref{fig:denseVSdense02}, The dense network starts on the center of the board, which is one of the standard openings in the 9x9 board. It starts on a very good track, but we must -acknowledge that the empty board is a position present on every go match it has -trained on and so it should know it well. It probably means the center was the -most played opening in the sample. It is interesting to check the heatmap of -this move, since the selected move has only a score of 0.27. Other common -openings on the 9x9 are represented on the vertices with some significant score. +acknowledge that the empty board is a position present on every Go match it has +trained on, so it should know it well. It probably means the center was the most +played opening in the sample. It is interesting to check the heatmap of this +move, since the selected move has only a score of 0.27. Other common openings on +the 9x9 are represented on the vertices with some significant score. The heatmap of the response to the center play is also interesting. The four highest scored moves are equivalent, since the first move has not broken any @@ -201,12 +208,13 @@ second black move is suggested with more likelihood than the previous one. This is also a very common sequence of play. The following moves are a white attack followed by black's sensible response, -and a strong white extension (arguably to the wrong side, but the minutia of why +and a strong white extension (arguably to the wrong side, but the details of why the other side would be better is probably not grasped by the engine). The -following moves could be called looking for complication, which is a strategy. +following moves could be classified as ``looking for complication'', which is a +strategy. -Overall some moves could be criticised but the engine is showing at least a -basic knowledge of the game's strategy. +Overall some moves can be criticised but the engine is showing at least a basic +knowledge of the game's strategy. \subsubsection{Convolutional Network Against Itself} @@ -219,21 +227,21 @@ more certain of the move than the dense network. Its next two plays are also the same, but the match starts to diverge from the fourth move. Instead of attacking the black stone on the side, white plays directly under at -B5, keeping the edge of symmetry. This is strategically important because while -symmetry is present playing on either side makes no difference, while playing -after the opponent has broken symmetry gives the opportunity to respond and -capitalize on the now better side to play. +B5, keeping the edge of symmetry. This is strategically important because as +long as symmetry is present playing on either side makes no difference, while +playing after the opponent has broken symmetry gives the opportunity to respond +and capitalize on the now better side to play. Black responds to the white stone by attacking it at B4 and white then extends to the more open side with F7. The match up to this point is discussed on Sensei's Library as an interesting -sequence deriving from the center first stone\cite{sl_sword}. The discussion +sequence deriving from the center first stone \cite{sl_sword}. The discussion there states that white should have extended to the other side. The following moves are some sensible continuations, with approaches and responses on each side. They are probably not the best moves, but the idea -behind them can be read from the board. +behind them can be grasped from the board. \subsubsection{Dense Network Against Convolutional} |