From b08408d23186205e71dfc68634021e3236bfb45c Mon Sep 17 00:00:00 2001 From: InigoGutierrez Date: Fri, 1 Jul 2022 16:10:15 +0200 Subject: First version. --- doc/tex/results.tex | 416 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 416 insertions(+) create mode 100644 doc/tex/results.tex (limited to 'doc/tex/results.tex') diff --git a/doc/tex/results.tex b/doc/tex/results.tex new file mode 100644 index 0000000..3c586e4 --- /dev/null +++ b/doc/tex/results.tex @@ -0,0 +1,416 @@ +\newcommand{\boards}[5]{ +\begin{figure}[h] + \begin{center} + \includegraphics[width=0.4\textwidth]{#1} + \includegraphics[width=0.4\textwidth]{#2} + \caption{#3} + \label{#4} + #5 + \end{center} +\end{figure} +} + +\newcommand{\moveHeatmap}[9]{ +\begin{figure}[h] + \begin{center} + \includegraphics[width=0.35\textwidth]{#1} + \includegraphics[width=0.5\textwidth]{#2} + \\[\smallskipamount] + \includegraphics[width=0.35\textwidth]{#3} + \includegraphics[width=0.5\textwidth]{#4} + \\[\smallskipamount] + \includegraphics[width=0.35\textwidth]{#5} + \includegraphics[width=0.5\textwidth]{#6} + \caption{#7} + \label{#8} + #9 + \end{center} +\end{figure} +} + +\section{Results} + +\subsection{Monte Carlo Tree Search Evaluation} + +The Monte Carlo Algorithm tries to explore the tree of possibilities as +efficiently as possible. With this approach, it can be expected to fail when +alone on a problem such big as the game of Go. Nonetheless, there are some areas +where it can be useful. It will be evaluated by its capabilities while playing +games but also when presented with go problems. + +The Monte Carlo algorithm has been set to do 5 explorations with 10 simulations +each when it is asked for a move. In the hardware used this makes it think for +40 seconds on an empty 9x9 board. + +\subsubsection{Monte Carlo Tree Search against itself} + +\boards +{img/matches/mctsVSmcts01.jpg} +{img/matches/mctsVSmcts02.jpg} +{Monte Carlo Tree Search against itself.} +{fig:mctsVSmcts} +{Both boards are from the same match. The first shows only the first 12 moves +for clarity. The second shows the finished game.} + +When set to play against itself the algorithm shows what looks like random play. +Some moves could be debated as sensible, but many of them are on the second or +fist line, which so early on the game are the two worst places to play at. + +It seems clear that this algorithm, at least with this specific configuration, +can't handle the size of an empty Go board, even on the 9x9 size. + +A record of the game is shown in \fref{fig:mctsVSmcts}. + +\subsubsection{Monte Carlo Tree Search against Go problems} + +\boards +{img/problems/mctsProblem01.jpg} +{img/problems/mctsProblem02.jpg} +{Monte Carlo against the first of Cho Chikun's problems.} +{fig:mctsProblem01} + +Since tree exploration on smaller boards or advanced games with little empty +spaces should be easier the algorithm has also been tested on some Go problems. + +A Go problem or tsumego is a predefined layout of the board, or of a section of +the board, for which the player must find some beneficial move. Life and death +problems are a subset of tsumegos in which the survival of a group depends on +finding the correct sequence to save or kill the group. One collection of such +tsumegos is \textit{Cho Chikun's Encyclopedia of Life and Death}, part of which +are available on OGS\cite{ogsLifeAndDeath}, an online go server. + +The first of these problems and what the algorithm suggested as moves is shown +in \fref{fig:mctsProblem01}. + +Black makes the first move, which means the solution is to find some favorable +outcome for black, which in this case is killing the white group. The white +group has a critical point on B1. If white plays on B1 they make two eyes and +live, but if black plays there first white can't make two eyes and dies, so B1 +is the solution to the tsumego. This is one of the easiest Go problems. + +The algorithm neglects this solution. While asked five times to generate a move +for the starting position it suggested B1 only once. + +But notably, after making another move, it consistently suggested B1 for white, +which is the solution now that white has to play. So in the end it was able to +solve the tsumego, probably because after making a move it had already explored +part of the tree but it was difficult that it explored the solution for the +first move. + +The engine was tested against other tsumegos but it was not able to solve them, +so no more are shown here. + +\subsection{Neural Network Training} + +Each network has been training by receiving batches of SGF files which are first +converted to lists of moves by the Training System and then provided to the +train function of the networks. This has been executed with the following +command: + +\inputminted[fontsize=\small]{bash}{listings/trainCommand.sh} + +Which lists the contents of a folder containing multiple SGF files, shuffles the +list, takes some of them and passes them to train.py as arguments. The combined +list of game positions and moves made from them in all the games selected make +up the sample of the training. The number of files selected can of course be +adapted to the situation. + +The networks were configured to train for 20 epochs, that is, processing the +full batch and updating the weights on the network 20 times. 10\% of the sample +were used as validation. This means they were not used for training but to +check the accuracy and loss of the network after training with the rest of +the batch. This technique of validation can help detect overfitting, which +happens when a network does a very good job on the population it has been +trained on but it fails when receiving unseen data. + +The training shows loss decrementing and accuracy incrementing on each epoch for +both training and validation samples, which suggests there is learning happening +and no overfitting occurring. + +The number of files to train on has been selected by trying to keep the +resulting training time manageable for the developer. It was found that +incrementing the files per batch did not increment the training time linearly, +as with batches of 10 games an epoch of training could be completed in one +minute, in three minutes batches of 20 games, and in up to an hour with batches +of 100 games. + +The outputs from this training process can be seen on \flist{code:denseTraining} +and \flist{code:convTraining} for the dense network and the convolutional +network respectively. + +\begin{listing}[h] + \inputminted[fontsize=\scriptsize]{text}{listings/denseTraining.txt} + \caption{Dense neural network training log.} + \label{code:denseTraining} +\end{listing} + +\begin{listing}[h] + \inputminted[fontsize=\scriptsize]{text}{listings/convTraining.txt} + \caption{Convolutional neural network training log.} + \label{code:convTraining} +\end{listing} + +\subsection{Neural Network Evaluation} + +One way of studying the results of the training could be to programmatically +test if the network is able to predict human moves when provided with a game +file. This has the drawback of not directly testing the strength of the network, +since it would be tested on predicting the moves of some human player, whose +level could greatly vary, and not on its playing capabilities alone. This also +has already been tested to some extent by the validation portion of the samples +providing for training, and the results show the network increasing its accuracy +of predicting human moves through the training epochs. + +Another way is to make the network play either against itself, another network, +or a human player, and analyze the score it provides to each possible play. The +output of the network is a vector with the likelihood of making each move, so +we can take this values as how strongly the engine suggests each of them. + +\subsection{Neural Networks Against Themselves} + +The case of neural networks playing against themselves is interesting in that it +shows moves with high certainty, that is, the network strongly suggests a +specific move. It can be thought that this would be the most common line of play +the engine has seen, with one certain move leading into another. + +This happened for both the dense neural network and the convolutional network. + +\subsubsection{Dense network Against Itself} + +The initial moves of a match of the dense network against itself, along with +heatmaps showing the output from the network (which chance it gives for each +move), can be seen on Figs.~\ref{fig:denseVSdense01}, \ref{fig:denseVSdense02}, +\ref{fig:denseVSdense03} and \ref{fig:denseVSdense04}. + +The dense network starts on the center of the board, which is one of the +standard openings in the 9x9 board. It starts on a very good track, but we must +acknowledge that the empty board is a position present on every go match it has +trained on and so it should know it well. It probably means the center was the +most played opening in the sample. It is interesting to check the heatmap of +this move, since the selected move has only a score of 0.27. Other common +openings on the 9x9 are represented on the vertices with some significant score. + +The heatmap of the response to the center play is also interesting. The four +highest scored moves are equivalent, since the first move has not broken any +symmetry. The move they represent, two spaces from the first stone, is also the +consensual response. The three moves following in score are also equivalent +between them. + +The first white stone has broken the symmetry in one direction, and so the +second black move is suggested with more likelihood than the previous one. This +is also a very common sequence of play. + +The following moves are a white attack followed by black's sensible response, +and a strong white extension (arguably to the wrong side, but the minutia of why +the other side would be better is probably not grasped by the engine). The +following moves could be called looking for complication, which is a strategy. + +Overall some moves could be criticised but the engine is showing at least a +basic knowledge of the game's strategy. + +\subsubsection{Convolutional Network Against Itself} + +The same information on a match of the convolutional network against itself can +be seen on Figs.~\ref{fig:convVSconv01}, \ref{fig:convVSconv02}, +\ref{fig:convVSconv03} and \ref{fig:convVSconv04}. + +The convolutional network also likes to start on the center of the board and is +more certain of the move than the dense network. Its next two plays are also the +same, but the match starts to diverge from the fourth move. + +Instead of attacking the black stone on the side, white plays directly under at +B5, keeping the edge of symmetry. This is strategically important because while +symmetry is present playing on either side makes no difference, while playing +after the opponent has broken symmetry gives the opportunity to respond and +capitalize on the now better side to play. + +Black responds to the white stone by attacking it at B4 and white then extends +to the more open side with F7. + +The match up to this point is discussed on Sensei's Library as an interesting +sequence deriving from the center first stone\cite{sl_sword}. The discussion +there states that white should have extended to the other side. + +The following moves are some sensible continuations, with approaches and +responses on each side. They are probably not the best moves, but the idea +behind them can be read from the board. + +\subsubsection{Dense Network Against Convolutional} + +The first 12 and the last 6 moves of a match of the dense network against the +convolutional network can be seen on be seen on Figs.~\ref{fig:denseVSconv01}, +\ref{fig:denseVSconv02}, \ref{fig:denseVSconv03}, \ref{fig:denseVSconv04}, +\ref{fig:denseVSconv05} and \ref{fig:denseVSconv06}. + +The dense network is given black. The convolutional network plays then as white. + +The first three moves are known territory for both networks, since both play the +same sequence when matched against themselves. + +After white plays under the black stone as the fourth move, black makes a maybe +too careful extension, but it seemed very sure about it. Now white has some +doubts. It seems to like the extensions to both sides, if a bit more to the more +open side, a preference both engines have previously shown at this point of the +match. The move it prefers the most, though, would be to play again under black. +So, as this move is impossible, it defers to the extension to the open side with +F3. + +Black now makes the mistake of overattacking the lone white stone on B5 with B6, +and white capitalizes on it by keeping extending with D3. B5 is still a move +with a not negligible score, but it is no longer the highest. + +The following moves are questionable, with black keeping building a very tight +group and getting no territory and white attacking too directly with a lone +stone into that strong group. + +In the final moves of the game we see that black has built a living group on the +right side and things are looking good for it on the rest of the board. White +has passed a bunch of times, starting too early, which means it gives too high +an importance to passing. Since the networks learned only by seeing moves in +response to boards, it has not learnt that it should not pass until there is no +more to gain. But it is also important to note that passing was not the most +likely move for the engine to make, it just happens that the move it would like +to make is already taken and the highest chance available move is to pass. + +\moveHeatmap +{img/matches/denseVSdense_board_01.jpg} +{img/matches/denseVSdense_heatmap_01.png} +{img/matches/denseVSdense_board_02.jpg} +{img/matches/denseVSdense_heatmap_02.png} +{img/matches/denseVSdense_board_03.jpg} +{img/matches/denseVSdense_heatmap_03.png} +{Dense network against itself, moves 1 through 3.} +{fig:denseVSdense01} + +\moveHeatmap +{img/matches/denseVSdense_board_04.jpg} +{img/matches/denseVSdense_heatmap_04.png} +{img/matches/denseVSdense_board_05.jpg} +{img/matches/denseVSdense_heatmap_05.png} +{img/matches/denseVSdense_board_06.jpg} +{img/matches/denseVSdense_heatmap_06.png} +{Dense network against itself, moves 4 through 6.} +{fig:denseVSdense02} + +\moveHeatmap +{img/matches/denseVSdense_board_07.jpg} +{img/matches/denseVSdense_heatmap_07.png} +{img/matches/denseVSdense_board_08.jpg} +{img/matches/denseVSdense_heatmap_08.png} +{img/matches/denseVSdense_board_09.jpg} +{img/matches/denseVSdense_heatmap_09.png} +{Dense network against itself, moves 7 through 9.} +{fig:denseVSdense03} + +\moveHeatmap +{img/matches/denseVSdense_board_10.jpg} +{img/matches/denseVSdense_heatmap_10.png} +{img/matches/denseVSdense_board_11.jpg} +{img/matches/denseVSdense_heatmap_11.png} +{img/matches/denseVSdense_board_12.jpg} +{img/matches/denseVSdense_heatmap_12.png} +{Dense network against itself, moves 10 through 12.} +{fig:denseVSdense04} + +\moveHeatmap +{img/matches/convVSconv_board_01.jpg} +{img/matches/convVSconv_heatmap_01.png} +{img/matches/convVSconv_board_02.jpg} +{img/matches/convVSconv_heatmap_02.png} +{img/matches/convVSconv_board_03.jpg} +{img/matches/convVSconv_heatmap_03.png} +{Convolutional network against itself, moves 1 through 3.} +{fig:convVSconv01} + +\moveHeatmap +{img/matches/convVSconv_board_04.jpg} +{img/matches/convVSconv_heatmap_04.png} +{img/matches/convVSconv_board_05.jpg} +{img/matches/convVSconv_heatmap_05.png} +{img/matches/convVSconv_board_06.jpg} +{img/matches/convVSconv_heatmap_06.png} +{Convolutional network against itself, moves 4 through 6.} +{fig:convVSconv02} + +\moveHeatmap +{img/matches/convVSconv_board_07.jpg} +{img/matches/convVSconv_heatmap_07.png} +{img/matches/convVSconv_board_08.jpg} +{img/matches/convVSconv_heatmap_08.png} +{img/matches/convVSconv_board_09.jpg} +{img/matches/convVSconv_heatmap_09.png} +{Convolutional network against itself, moves 7 through 9.} +{fig:convVSconv03} + +\moveHeatmap +{img/matches/convVSconv_board_10.jpg} +{img/matches/convVSconv_heatmap_10.png} +{img/matches/convVSconv_board_11.jpg} +{img/matches/convVSconv_heatmap_11.png} +{img/matches/convVSconv_board_12.jpg} +{img/matches/convVSconv_heatmap_12.png} +{Convolutional network against itself, moves 10 through 12.} +{fig:convVSconv04} + +\moveHeatmap +{img/matches/denseVSconv_board_01.jpg} +{img/matches/denseVSconv_heatmap_01.png} +{img/matches/denseVSconv_board_02.jpg} +{img/matches/denseVSconv_heatmap_02.png} +{img/matches/denseVSconv_board_03.jpg} +{img/matches/denseVSconv_heatmap_03.png} +{Dense network against convolutional, moves 1 through 3.} +{fig:denseVSconv01} +{The dense network plays as black.} + +\moveHeatmap +{img/matches/denseVSconv_board_04.jpg} +{img/matches/denseVSconv_heatmap_04.png} +{img/matches/denseVSconv_board_05.jpg} +{img/matches/denseVSconv_heatmap_05.png} +{img/matches/denseVSconv_board_06.jpg} +{img/matches/denseVSconv_heatmap_06.png} +{Dense network against convolutional, moves 4 through 6.} +{fig:denseVSconv02} + +\moveHeatmap +{img/matches/denseVSconv_board_07.jpg} +{img/matches/denseVSconv_heatmap_07.png} +{img/matches/denseVSconv_board_08.jpg} +{img/matches/denseVSconv_heatmap_08.png} +{img/matches/denseVSconv_board_09.jpg} +{img/matches/denseVSconv_heatmap_09.png} +{Dense network against convolutional, moves 7 through 9.} +{fig:denseVSconv03} + +\moveHeatmap +{img/matches/denseVSconv_board_10.jpg} +{img/matches/denseVSconv_heatmap_10.png} +{img/matches/denseVSconv_board_11.jpg} +{img/matches/denseVSconv_heatmap_11.png} +{img/matches/denseVSconv_board_12.jpg} +{img/matches/denseVSconv_heatmap_12.png} +{Dense network against convolutional, moves 10 through 12.} +{fig:denseVSconv04} + +\moveHeatmap +{img/matches/denseVSconv_board_72.jpg} +{img/matches/denseVSconv_heatmap_72.png} +{img/matches/denseVSconv_board_73.jpg} +{img/matches/denseVSconv_heatmap_73.png} +{img/matches/denseVSconv_board_74.jpg} +{img/matches/denseVSconv_heatmap_74.png} +{Dense network against convolutional, moves 72 through 74.} +{fig:denseVSconv05} + +\moveHeatmap +{img/matches/denseVSconv_board_75.jpg} +{img/matches/denseVSconv_heatmap_75.png} +{img/matches/denseVSconv_board_76.jpg} +{img/matches/denseVSconv_heatmap_76.png} +{img/matches/denseVSconv_board_77.jpg} +{img/matches/denseVSconv_heatmap_77.png} +{Dense network against convolutional, moves 75 through 77.} +{fig:denseVSconv06} + +% Needed line so the processing of the last command doesn't find an EOF -- cgit v1.2.1