\newcommand{\boards}[5]{ \begin{figure}[h] \begin{center} \includegraphics[width=0.4\textwidth]{#1} \includegraphics[width=0.4\textwidth]{#2} \caption{#3} \label{#4} #5 \end{center} \end{figure} } \newcommand{\moveHeatmap}[9]{ \begin{figure}[h] \begin{center} \includegraphics[width=0.35\textwidth]{#1} \includegraphics[width=0.5\textwidth]{#2} \\[\smallskipamount] \includegraphics[width=0.35\textwidth]{#3} \includegraphics[width=0.5\textwidth]{#4} \\[\smallskipamount] \includegraphics[width=0.35\textwidth]{#5} \includegraphics[width=0.5\textwidth]{#6} \caption{#7} \label{#8} #9 \end{center} \end{figure} } \section{Results} \subsection{Monte Carlo Tree Search Evaluation} The Monte Carlo Algorithm tries to explore the tree of possibilities as efficiently as possible. With this approach, it can be expected to fail when alone on a problem such big as the game of Go. Nonetheless, there are some areas where it can be useful. It will be evaluated by its capabilities while playing games but also when presented with Go problems. The Monte Carlo algorithm has been set to do 5 explorations with 10 simulations each when it is asked for a move. In the hardware used this makes it think for 40 seconds on an empty 9x9 board. \subsubsection{Monte Carlo Tree Search against itself} \boards {img/matches/mctsVSmcts01.jpg} {img/matches/mctsVSmcts02.jpg} {Monte Carlo Tree Search against itself.} {fig:mctsVSmcts} {Both boards are from the same match. The first shows only the first 12 moves for clarity. The second shows the finished game.} When set to play against itself the algorithm shows what looks like random play. Some moves could be debated as sensible, but many of them are on the second or fist line, which so early on the game are the two worst places to play at. It seems clear that this algorithm, at least with this specific configuration, can't handle the size of an empty Go board, even on the 9x9 size. A record of the game is shown in \fref{fig:mctsVSmcts}. \subsubsection{Monte Carlo Tree Search against Go problems} \boards {img/problems/mctsProblem01.jpg} {img/problems/mctsProblem02.jpg} {Monte Carlo against the first of Cho Chikun's problems.} {fig:mctsProblem01} Since tree exploration on smaller boards or advanced games with little empty spaces should be easier the algorithm has also been tested on some Go problems. A Go problem or \gls{tsumego} is a predefined layout of the board, or of a section of the board, for which the player must find some beneficial move. Life and death problems are a subset of \gls{tsumego}s in which the survival of a group depends on finding the correct sequence to save or kill the group. One collection of such \gls{tsumego}s is \textit{Cho Chikun's Encyclopedia of Life and Death}, part of which are available on OGS \cite{ogsLifeAndDeath}, an online Go server. The first of these problems and what the algorithm suggested as moves is shown in \fref{fig:mctsProblem01}. Black makes the first move, which means the solution is to find some favorable outcome for black, which in this case is killing the white group. The white group has a critical point on B1. If white plays on B1 they make two eyes and live, but if black plays there first white can't make two eyes and dies, so B1 is the solution to the \gls{tsumego}. This is one of the easiest Go problems. The algorithm neglects this solution. While asked five times to generate a move for the starting position it suggested B1 only once. But notably, after making another move, it consistently suggested B1 for white, which is the solution now that white has to play. So in the end it was able to solve the \gls{tsumego}, probably because after making a move it had already explored part of the tree but it was difficult that it explored the solution for the first move. The engine was tested against other \gls{tsumego}s but it was not able to solve them, so no more are shown here. \subsection{Neural Network Training} Each network has been training by receiving batches of \acrshort{sgf} files which are first converted to lists of moves by the Training System and then provided to the train function of the networks. This has been executed with the following command: \inputminted[fontsize=\small]{bash}{listings/trainCommand.sh} Which lists the contents of a folder containing multiple \acrshort{sgf} files, shuffles the list, takes some of them and passes them to train.py as arguments. The combined list of game positions and moves made from them in all the games selected make up the sample of the training. The number of files selected can of course be adapted to the situation. The networks were configured to train for 20 epochs, that is, processing the full batch and updating the weights on the network 20 times. 10\% of the sample were used as validation. This means they were not used for training but to check the accuracy and loss of the network after training with the rest of the batch. This technique of validation can help detect overfitting, which happens when a network does a very good job on the population it has been trained on but it fails when receiving unseen data. The training shows loss decrementing and accuracy incrementing on each epoch for both training and validation samples, which suggests there is learning happening and no overfitting occurring. The number of files to train on has been selected by trying to keep the resulting training time manageable for the developer. It was found that incrementing the files per batch did not increment the training time linearly, as with batches of 10 games an epoch of training could be completed in one minute, in three minutes batches of 20 games, and in up to an hour with batches of 100 games. The outputs from this training process can be seen on \lref{code:denseTraining} and \lref{code:convTraining} for the dense network and the convolutional network respectively. \begin{listing}[h] \inputminted[fontsize=\scriptsize]{text}{listings/denseTraining.txt} \caption{Dense neural network training log.} \label{code:denseTraining} \end{listing} \begin{listing}[h] \inputminted[fontsize=\scriptsize]{text}{listings/convTraining.txt} \caption{Convolutional neural network training log.} \label{code:convTraining} \end{listing} \subsection{Neural Network Evaluation} One way of studying the results of the training could be to programmatically test if the network is able to predict human moves when provided with a game file. This has the drawback of not directly testing the strength of the network, since it would be tested on predicting the moves of some human player, whose level could greatly vary, and not on its playing capabilities alone. This also has already been tested to some extent by the validation portion of the samples providing for training, and the results show the network increasing its accuracy of predicting human moves through the training epochs. Another way is to make the network play either against itself, another network, or a human player, and analyze the score it provides to each possible play. The output of the network is a vector with the likelihood of making each move, so we can take this values as how strongly the engine suggests each of them. \subsection{Neural Networks Against Themselves} The case of neural networks playing against themselves is interesting in that it shows moves with high certainty, that is, the network strongly suggests a specific move. It can be thought that this would be the most common line of play the engine has seen, with one certain move leading into another. This happened for both the dense neural network and the convolutional network. \subsubsection{Dense network Against Itself} The initial moves of a match of the dense network against itself, along with heatmaps showing the output from the network (which chance it gives for each move), can be seen on Figs.~\ref{fig:denseVSdense01}, \ref{fig:denseVSdense02}, \ref{fig:denseVSdense03} and \ref{fig:denseVSdense04}. The dense network starts on the center of the board, which is one of the standard openings in the 9x9 board. It starts on a very good track, but we must acknowledge that the empty board is a position present on every Go match it has trained on and so it should know it well. It probably means the center was the most played opening in the sample. It is interesting to check the heatmap of this move, since the selected move has only a score of 0.27. Other common openings on the 9x9 are represented on the vertices with some significant score. The heatmap of the response to the center play is also interesting. The four highest scored moves are equivalent, since the first move has not broken any symmetry. The move they represent, two spaces from the first stone, is also the consensual response. The three moves following in score are also equivalent between them. The first white stone has broken the symmetry in one direction, and so the second black move is suggested with more likelihood than the previous one. This is also a very common sequence of play. The following moves are a white attack followed by black's sensible response, and a strong white extension (arguably to the wrong side, but the minutia of why the other side would be better is probably not grasped by the engine). The following moves could be called looking for complication, which is a strategy. Overall some moves could be criticised but the engine is showing at least a basic knowledge of the game's strategy. \subsubsection{Convolutional Network Against Itself} The same information on a match of the convolutional network against itself can be seen on Figs.~\ref{fig:convVSconv01}, \ref{fig:convVSconv02}, \ref{fig:convVSconv03} and \ref{fig:convVSconv04}. The convolutional network also likes to start on the center of the board and is more certain of the move than the dense network. Its next two plays are also the same, but the match starts to diverge from the fourth move. Instead of attacking the black stone on the side, white plays directly under at B5, keeping the edge of symmetry. This is strategically important because while symmetry is present playing on either side makes no difference, while playing after the opponent has broken symmetry gives the opportunity to respond and capitalize on the now better side to play. Black responds to the white stone by attacking it at B4 and white then extends to the more open side with F7. The match up to this point is discussed on Sensei's Library as an interesting sequence deriving from the center first stone \cite{sl_sword}. The discussion there states that white should have extended to the other side. The following moves are some sensible continuations, with approaches and responses on each side. They are probably not the best moves, but the idea behind them can be read from the board. \subsubsection{Dense Network Against Convolutional} The first 12 and the last 6 moves of a match of the dense network against the convolutional network can be seen on be seen on Figs.~\ref{fig:denseVSconv01}, \ref{fig:denseVSconv02}, \ref{fig:denseVSconv03}, \ref{fig:denseVSconv04}, \ref{fig:denseVSconv05} and \ref{fig:denseVSconv06}. The dense network is given black. The convolutional network plays then as white. The first three moves are known territory for both networks, since both play the same sequence when matched against themselves. After white plays under the black stone as the fourth move, black makes a maybe too careful extension, but it seemed very sure about it. Now white has some doubts. It seems to like the extensions to both sides, if a bit more to the more open side, a preference both engines have previously shown at this point of the match. The move it prefers the most, though, would be to play again under black. So, as this move is impossible, it defers to the extension to the open side with F3. Black now makes the mistake of overattacking the lone white stone on B5 with B6, and white capitalizes on it by keeping extending with D3. B5 is still a move with a not negligible score, but it is no longer the highest. The following moves are questionable, with black keeping building a very tight group and getting no territory and white attacking too directly with a lone stone into that strong group. In the final moves of the game we see that black has built a living group on the right side and things are looking good for it on the rest of the board. White has passed a bunch of times, starting too early, which means it gives too high an importance to passing. Since the networks learned only by seeing moves in response to boards, it has not learnt that it should not pass until there is no more to gain. But it is also important to note that passing was not the most likely move for the engine to make, it just happens that the move it would like to make is already taken and the highest chance available move is to pass. \moveHeatmap {img/matches/denseVSdense_board_01.jpg} {img/matches/denseVSdense_heatmap_01.png} {img/matches/denseVSdense_board_02.jpg} {img/matches/denseVSdense_heatmap_02.png} {img/matches/denseVSdense_board_03.jpg} {img/matches/denseVSdense_heatmap_03.png} {Dense network against itself, moves 1 through 3.} {fig:denseVSdense01} \moveHeatmap {img/matches/denseVSdense_board_04.jpg} {img/matches/denseVSdense_heatmap_04.png} {img/matches/denseVSdense_board_05.jpg} {img/matches/denseVSdense_heatmap_05.png} {img/matches/denseVSdense_board_06.jpg} {img/matches/denseVSdense_heatmap_06.png} {Dense network against itself, moves 4 through 6.} {fig:denseVSdense02} \moveHeatmap {img/matches/denseVSdense_board_07.jpg} {img/matches/denseVSdense_heatmap_07.png} {img/matches/denseVSdense_board_08.jpg} {img/matches/denseVSdense_heatmap_08.png} {img/matches/denseVSdense_board_09.jpg} {img/matches/denseVSdense_heatmap_09.png} {Dense network against itself, moves 7 through 9.} {fig:denseVSdense03} \moveHeatmap {img/matches/denseVSdense_board_10.jpg} {img/matches/denseVSdense_heatmap_10.png} {img/matches/denseVSdense_board_11.jpg} {img/matches/denseVSdense_heatmap_11.png} {img/matches/denseVSdense_board_12.jpg} {img/matches/denseVSdense_heatmap_12.png} {Dense network against itself, moves 10 through 12.} {fig:denseVSdense04} \moveHeatmap {img/matches/convVSconv_board_01.jpg} {img/matches/convVSconv_heatmap_01.png} {img/matches/convVSconv_board_02.jpg} {img/matches/convVSconv_heatmap_02.png} {img/matches/convVSconv_board_03.jpg} {img/matches/convVSconv_heatmap_03.png} {Convolutional network against itself, moves 1 through 3.} {fig:convVSconv01} \moveHeatmap {img/matches/convVSconv_board_04.jpg} {img/matches/convVSconv_heatmap_04.png} {img/matches/convVSconv_board_05.jpg} {img/matches/convVSconv_heatmap_05.png} {img/matches/convVSconv_board_06.jpg} {img/matches/convVSconv_heatmap_06.png} {Convolutional network against itself, moves 4 through 6.} {fig:convVSconv02} \moveHeatmap {img/matches/convVSconv_board_07.jpg} {img/matches/convVSconv_heatmap_07.png} {img/matches/convVSconv_board_08.jpg} {img/matches/convVSconv_heatmap_08.png} {img/matches/convVSconv_board_09.jpg} {img/matches/convVSconv_heatmap_09.png} {Convolutional network against itself, moves 7 through 9.} {fig:convVSconv03} \moveHeatmap {img/matches/convVSconv_board_10.jpg} {img/matches/convVSconv_heatmap_10.png} {img/matches/convVSconv_board_11.jpg} {img/matches/convVSconv_heatmap_11.png} {img/matches/convVSconv_board_12.jpg} {img/matches/convVSconv_heatmap_12.png} {Convolutional network against itself, moves 10 through 12.} {fig:convVSconv04} \moveHeatmap {img/matches/denseVSconv_board_01.jpg} {img/matches/denseVSconv_heatmap_01.png} {img/matches/denseVSconv_board_02.jpg} {img/matches/denseVSconv_heatmap_02.png} {img/matches/denseVSconv_board_03.jpg} {img/matches/denseVSconv_heatmap_03.png} {Dense network against convolutional, moves 1 through 3.} {fig:denseVSconv01} {The dense network plays as black.} \moveHeatmap {img/matches/denseVSconv_board_04.jpg} {img/matches/denseVSconv_heatmap_04.png} {img/matches/denseVSconv_board_05.jpg} {img/matches/denseVSconv_heatmap_05.png} {img/matches/denseVSconv_board_06.jpg} {img/matches/denseVSconv_heatmap_06.png} {Dense network against convolutional, moves 4 through 6.} {fig:denseVSconv02} \moveHeatmap {img/matches/denseVSconv_board_07.jpg} {img/matches/denseVSconv_heatmap_07.png} {img/matches/denseVSconv_board_08.jpg} {img/matches/denseVSconv_heatmap_08.png} {img/matches/denseVSconv_board_09.jpg} {img/matches/denseVSconv_heatmap_09.png} {Dense network against convolutional, moves 7 through 9.} {fig:denseVSconv03} \moveHeatmap {img/matches/denseVSconv_board_10.jpg} {img/matches/denseVSconv_heatmap_10.png} {img/matches/denseVSconv_board_11.jpg} {img/matches/denseVSconv_heatmap_11.png} {img/matches/denseVSconv_board_12.jpg} {img/matches/denseVSconv_heatmap_12.png} {Dense network against convolutional, moves 10 through 12.} {fig:denseVSconv04} \moveHeatmap {img/matches/denseVSconv_board_72.jpg} {img/matches/denseVSconv_heatmap_72.png} {img/matches/denseVSconv_board_73.jpg} {img/matches/denseVSconv_heatmap_73.png} {img/matches/denseVSconv_board_74.jpg} {img/matches/denseVSconv_heatmap_74.png} {Dense network against convolutional, moves 72 through 74.} {fig:denseVSconv05} \moveHeatmap {img/matches/denseVSconv_board_75.jpg} {img/matches/denseVSconv_heatmap_75.png} {img/matches/denseVSconv_board_76.jpg} {img/matches/denseVSconv_heatmap_76.png} {img/matches/denseVSconv_board_77.jpg} {img/matches/denseVSconv_heatmap_77.png} {Dense network against convolutional, moves 75 through 77.} {fig:denseVSconv06} % Needed line so the processing of the last command doesn't find an EOF