aboutsummaryrefslogtreecommitdiff
path: root/doc/tex/conclusions.tex
blob: a6bc4344dbdf03b45731dba021a0586386fcb8e5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
\section{Conclusions}

Some problems found during the project, an evaluation of the results and what
could be done in the future to improve the system are discussed here.

\subsection{Problems with the Implementation of the Game}

While Go has a simple set of rules, they lead to many borderline cases which
must be specifically addressed. As an example, the \gls{ko} rule obliges to
check the previous board positions after each move so no boards are repeated.

These problems have been solved by designing the tree of moves of a match so
each move stores the full board layout after it has been played. This of course
increases the memory needed to run the application, but has been chosen over the
alternative of having to recreate the board parsing the tree backwards for each
move to check if \gls{ko} is occurring, if a move makes a capture and which stones
have already been captured, etc. It is the old problem of memory versus
processing, which in this case has been answered with memory. Which one would be
the best solution would require a deep analysis out of the scope of the project.

\subsection{Problems with the Monte Carlo Tree Search Algorithm}

The implemented Monte Carlo Tree Search algorithm was not precise by itself when
asked for moves. This could be attributed to various factors:

\begin{itemize}

	\item The way the score is counted makes it difficult to evaluate who is
		winning at the beginning or middle of the game, so the score of a move
		has to be evaluated by running playing simulations. These simulations
		are not precise on analyzing the result since the moves in them are
		played at random.

	\item The configuration used, 5 explorations with 10 simulations each per
		cycle, does not allow for many good moves to be found.

\end{itemize}

To solve these problems, either a good score evaluation engine has to be
implemented, which is another machine learning problem by itself, or the
explorations and simulations of the engine tuned up to values unable to be run
on the development hardware.

\subsection{Problems with Neural Networks}

Some concerns with the design and results of the neural networks are discussed
on this section.

\subsubsection{Finding the correct design}

When approaching neural networks design with just basic understanding of their
inner workings it is easy to get lost in all the different factors that can be
tuned. The number and variant of layers, their number of nodes, their activation
function, the learning rate; they can all be tweaked and doing so in the favour
of the design requires a good knowledge of what is being done.

The basic approach was to design the input and output shapes and then adjust the
network based on that. At first, the desired output was a 9x9 two-dimensional
matrix with the probabilities of playing on each move. A heatmap generated from
a failed solution of this approach can be seen on \fref{fig:oldHM}, where the
network distributed the probability for each row instead of the whole board. The
fitting of shapes of input, internal layers and output has found to be tricky at
first.

\begin{figure}[h]
	\begin{center}
		\includegraphics[width=\textwidth]{img/matches/oldLineByLineHeatmap.png}
		\caption{A heatmap from an old design of the dense neural network.}
		\label{fig:oldHM}
		Note that each row adds up to one, meaning that the engine is selecting
		the best move for each row and not for the whole board.
	\end{center}
\end{figure}

Finally, based on the design of classification networks where each possible
class is mapped to a value in the one-dimensional output vector, it was decided
that the output was not a two-dimensional matrix but a one-dimensional vector.
This also helped address the possibility of passing, treating it as just another
possible move that was previously left out because of not knowing how to fit it
on the matrix of possible moves. This vector is then reshaped to a matrix
representing the board when needed, for example to generate the heatmaps used
for evaluation.

\subsubsection{Mistakes on Networks' Play Patterns}

One notable mistake made by the networks, specially the convolutional network,
was passing too much. Passing is considered just another move, so the networks
have no grasp that they should not pass until there are no more valuable moves
to be made. A new design problem could be to create a specific passing policy.

Another observation is that the engines seem to follow traditional openings and
make reasonable decisions at the start of the game, with some interesting
variations. But as the game progresses their moves make less sense. This could
be because they have seen many examples of starting moves but complications at
the middle and end game are much more diverse and the networks had not been
trained on similar situations. These stages of the game are probably where a
strong Monte Carlo Tree Search algorithm would help the most.

\subsection{Viable Extensions}

There are several ways this project could be improved, but were left out by lack
of resources or time. Some interesting ones are explored on this section.

\subsubsection{Joining Monte Carlo Tree Search and Neural Networks}

The tree search algorithm is better at the end of the game, where the tree to
explore is much more manageable. On the other hand, the neural networks are
better at the beginning, where the positions are more common and patterns are
simpler. The logical next step would be to join both: the network could select
the best moves to be made, and the tree search algorithm could explore them to
select the best and grow its explored tree in meaningful directions.

This is also what was done by AlphaGo. Going by the results obtained in this
project it makes sense they went with that design.

\subsubsection{Train the Engine by Itself}

By training a game engine on human matches, it can at best get as good as the
human players it has learned from.

Top engines learn not from humans but by playing against themselves. This would
be an interesting problem to tackle, if requiring a new approach to the design
of the networks and a lot of training resources.

\subsubsection{Presenting the Engine to the World}

Some Go servers allow for bot players to be registered. This would provide a
source of training against players from all around the world and also an
evaluation over time of the engine strength.