aboutsummaryrefslogtreecommitdiff
path: root/doc/tex/conclusions.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/tex/conclusions.tex')
-rw-r--r--doc/tex/conclusions.tex128
1 files changed, 128 insertions, 0 deletions
diff --git a/doc/tex/conclusions.tex b/doc/tex/conclusions.tex
new file mode 100644
index 0000000..ebbf7fd
--- /dev/null
+++ b/doc/tex/conclusions.tex
@@ -0,0 +1,128 @@
+\section{Conclusions}
+
+Some problems found during the project, an evaluation of the results and what
+could be done in the future to improve the system are discussed here.
+
+\subsection{Problems with the Implementation of the Game}
+
+While Go has a simple set of rules, they lead to many borderline cases which
+must be specifically addressed. For example, the ko rule obliges to check the
+previous board positions after each move so no boards are repeated.
+
+These problems have been solved by designing the tree of moves of a match so
+each move stores the full board layout after it has been played. This of course
+increases the memory needed to run the application, but has been chosen over the
+alternative of having to recreate the board parsing the tree backwards for each
+move to check if ko is occurring, if a move makes a capture and which stones
+have already been captured, etc. It is the old problem of memory versus
+processing, which in this case has been answered with memory. Which one would be
+the best solution would require a deep analysis out of the scope of the project.
+
+\subsection{Problems with the Monte Carlo Tree Search Algorithm}
+
+The implementation Monte Carlo Tree Search algorithm was not precise by itself
+when asked for moves. This could be attributed to various factors:
+
+\begin{itemize}
+
+ \item The way the score is counted makes it difficult to evaluate who is
+ winning at the beginning or middle of the game, so the score of a move
+ has to be evaluated by running playing simulations. These simulations
+ are not precise on analyzing the result since the moves in them are
+ played at random.
+
+ \item The configuration used, 5 explorations with 10 simulations each for
+ cycle, does not allow for many good moves to be found.
+
+\end{itemize}
+
+To solve these problems, either a good score evaluation engine has to be
+implemented, which is another machine learning problem by itself, or the
+explorations and simulations of the engine tuned up to values unable to be run
+on the development hardware.
+
+\subsection{Problems with Neural Networks}
+
+Some concerns with the design and results of the neural networks are discussed
+on this section.
+
+\subsubsection{Finding the correct design}
+
+When approaching neural networks design with a basic understanding of their
+inner workings it is easy to get lost in all the different factors that can be
+tuned. The number and variant of layers, their number of nodes, their activation
+function, the learning rate; they can all be tweaked and doing so in the favour
+of the design requires a good knowledge of what is being done.
+
+The basic approach was to design the input and output shapes and then adjust the
+network based on that. At first, the desired output was a 9x9 two-dimensional
+matrix with the probabilities of playing on each move. A heatmap generated from
+a failed solution of this approach can be seen on \fref{fig:oldHM}, where the
+network distributed the probability for each row instead of the whole board. The
+fitting of shapes of input, internal layers and output has found to be tricky at
+first.
+
+\begin{figure}[h]
+ \begin{center}
+ \includegraphics[width=\textwidth]{img/matches/oldLineByLineHeatmap.png}
+ \caption{A heatmap from an old design of the dense neural network.}
+ \label{fig:oldHM}
+ Note that each row adds up to one, meaning that the engine is selecting
+ the best move for each row and not for the whole board.
+ \end{center}
+\end{figure}
+
+Finally, based on the design of classification networks where each possible
+class is mapped to a value in the one-dimensional output vector, it was decided
+that the output was not a two-dimensional matrix but a one-dimensional vector.
+This also helped address the possibility of passing, treating it as just another
+possible move that was previously left out because of not knowing how to fit it
+on the matrix of possible moves. This vector is then reshaped to a matrix
+representing the board when needed, for example to generate the heatmaps used
+for evaluation.
+
+\subsubsection{Mistakes on Networks' Play Patterns}
+
+One notable mistake made by the networks, specially the convolutional network,
+was passing to much. Passing is considered just another move, so the networks
+have no grasp that they should not pass until there are no more valuable moves
+to be made. A new design problem could be to create a specific passing policy.
+
+Another observation is that the engines seem to follow traditional openings and
+make reasonable decisions at the start of the game, with some interesting
+variations. But as the game progresses their moves make less sense. This could
+be because they have seen many examples of starting moves but complications at
+the middle and end game are much more diverse and the networks had not been
+trained on similar situations.
+
+\subsection{Viable Extensions}
+
+There are several ways this project could be improved, but were left out by lack
+of resources or time. Some interesting ones are explored on this section.
+
+\subsubsection{Joining Monte Carlo Tree Search and Neural Networks}
+
+The tree search algorithm is better at the end of the game, where the tree to
+explore is much more manageable. On the other hand, the neural networks are
+better at the beginning, where the positions are more common and patterns are
+simpler. The logical next step would be to join both: the network could select
+the best moves to be made, and the tree search algorithm could explore them to
+select the best and grow its explored tree in meaningful directions.
+
+This is also what was done by AlphaGo. By seeing the results obtained in this
+project it makes sense they made that decision.
+
+\subsubsection{Train the Engine by Itself}
+
+By training a game engine on human matches, it can at best get as good as the
+human players it has learned from.
+
+Top engines learn not from humans but by playing against themselves. This would
+be an interesting problem to tackle, if requiring a new approach to the design
+of the networks and a lot of training resources.
+
+\subsubsection{Presenting the Engine to the World}
+
+Some Go servers allow for bot players to be registered. This would provide a
+source of training against players from all around the world and also an
+evaluation over time of the engine strength.