diff options
Diffstat (limited to 'doc/tex/conclusions.tex')
-rw-r--r-- | doc/tex/conclusions.tex | 128 |
1 files changed, 128 insertions, 0 deletions
diff --git a/doc/tex/conclusions.tex b/doc/tex/conclusions.tex new file mode 100644 index 0000000..ebbf7fd --- /dev/null +++ b/doc/tex/conclusions.tex @@ -0,0 +1,128 @@ +\section{Conclusions} + +Some problems found during the project, an evaluation of the results and what +could be done in the future to improve the system are discussed here. + +\subsection{Problems with the Implementation of the Game} + +While Go has a simple set of rules, they lead to many borderline cases which +must be specifically addressed. For example, the ko rule obliges to check the +previous board positions after each move so no boards are repeated. + +These problems have been solved by designing the tree of moves of a match so +each move stores the full board layout after it has been played. This of course +increases the memory needed to run the application, but has been chosen over the +alternative of having to recreate the board parsing the tree backwards for each +move to check if ko is occurring, if a move makes a capture and which stones +have already been captured, etc. It is the old problem of memory versus +processing, which in this case has been answered with memory. Which one would be +the best solution would require a deep analysis out of the scope of the project. + +\subsection{Problems with the Monte Carlo Tree Search Algorithm} + +The implementation Monte Carlo Tree Search algorithm was not precise by itself +when asked for moves. This could be attributed to various factors: + +\begin{itemize} + + \item The way the score is counted makes it difficult to evaluate who is + winning at the beginning or middle of the game, so the score of a move + has to be evaluated by running playing simulations. These simulations + are not precise on analyzing the result since the moves in them are + played at random. + + \item The configuration used, 5 explorations with 10 simulations each for + cycle, does not allow for many good moves to be found. + +\end{itemize} + +To solve these problems, either a good score evaluation engine has to be +implemented, which is another machine learning problem by itself, or the +explorations and simulations of the engine tuned up to values unable to be run +on the development hardware. + +\subsection{Problems with Neural Networks} + +Some concerns with the design and results of the neural networks are discussed +on this section. + +\subsubsection{Finding the correct design} + +When approaching neural networks design with a basic understanding of their +inner workings it is easy to get lost in all the different factors that can be +tuned. The number and variant of layers, their number of nodes, their activation +function, the learning rate; they can all be tweaked and doing so in the favour +of the design requires a good knowledge of what is being done. + +The basic approach was to design the input and output shapes and then adjust the +network based on that. At first, the desired output was a 9x9 two-dimensional +matrix with the probabilities of playing on each move. A heatmap generated from +a failed solution of this approach can be seen on \fref{fig:oldHM}, where the +network distributed the probability for each row instead of the whole board. The +fitting of shapes of input, internal layers and output has found to be tricky at +first. + +\begin{figure}[h] + \begin{center} + \includegraphics[width=\textwidth]{img/matches/oldLineByLineHeatmap.png} + \caption{A heatmap from an old design of the dense neural network.} + \label{fig:oldHM} + Note that each row adds up to one, meaning that the engine is selecting + the best move for each row and not for the whole board. + \end{center} +\end{figure} + +Finally, based on the design of classification networks where each possible +class is mapped to a value in the one-dimensional output vector, it was decided +that the output was not a two-dimensional matrix but a one-dimensional vector. +This also helped address the possibility of passing, treating it as just another +possible move that was previously left out because of not knowing how to fit it +on the matrix of possible moves. This vector is then reshaped to a matrix +representing the board when needed, for example to generate the heatmaps used +for evaluation. + +\subsubsection{Mistakes on Networks' Play Patterns} + +One notable mistake made by the networks, specially the convolutional network, +was passing to much. Passing is considered just another move, so the networks +have no grasp that they should not pass until there are no more valuable moves +to be made. A new design problem could be to create a specific passing policy. + +Another observation is that the engines seem to follow traditional openings and +make reasonable decisions at the start of the game, with some interesting +variations. But as the game progresses their moves make less sense. This could +be because they have seen many examples of starting moves but complications at +the middle and end game are much more diverse and the networks had not been +trained on similar situations. + +\subsection{Viable Extensions} + +There are several ways this project could be improved, but were left out by lack +of resources or time. Some interesting ones are explored on this section. + +\subsubsection{Joining Monte Carlo Tree Search and Neural Networks} + +The tree search algorithm is better at the end of the game, where the tree to +explore is much more manageable. On the other hand, the neural networks are +better at the beginning, where the positions are more common and patterns are +simpler. The logical next step would be to join both: the network could select +the best moves to be made, and the tree search algorithm could explore them to +select the best and grow its explored tree in meaningful directions. + +This is also what was done by AlphaGo. By seeing the results obtained in this +project it makes sense they made that decision. + +\subsubsection{Train the Engine by Itself} + +By training a game engine on human matches, it can at best get as good as the +human players it has learned from. + +Top engines learn not from humans but by playing against themselves. This would +be an interesting problem to tackle, if requiring a new approach to the design +of the networks and a lot of training resources. + +\subsubsection{Presenting the Engine to the World} + +Some Go servers allow for bot players to be registered. This would provide a +source of training against players from all around the world and also an +evaluation over time of the engine strength. |