aboutsummaryrefslogtreecommitdiff
path: root/doc/tex/results.tex
blob: 9c1e52c954e669808bd9276a48fea2de73ba824c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
\newcommand{\boards}[5]{
\begin{figure}[h]
	\begin{center}
		\includegraphics[width=0.4\textwidth]{#1}
		\includegraphics[width=0.4\textwidth]{#2}
		\caption{#3}
		\label{#4}
		#5
	\end{center}
\end{figure}
}

\newcommand{\moveHeatmap}[9]{
\begin{figure}[h]
	\begin{center}
		\includegraphics[width=0.35\textwidth]{#1}
		\includegraphics[width=0.5\textwidth]{#2}
		\\[\smallskipamount]
		\includegraphics[width=0.35\textwidth]{#3}
		\includegraphics[width=0.5\textwidth]{#4}
		\\[\smallskipamount]
		\includegraphics[width=0.35\textwidth]{#5}
		\includegraphics[width=0.5\textwidth]{#6}
		\caption{#7}
		\label{#8}
		#9
	\end{center}
\end{figure}
}

\section{Results}

\subsection{Monte Carlo Tree Search Evaluation}

The Monte Carlo Algorithm tries to explore the tree of possibilities as
efficiently as possible. With this approach, it can be expected to fail when
alone on a problem such big as the game of Go. Nonetheless, there are some areas
where it can be useful. It will be evaluated by its capabilities while playing
games but also when presented with Go problems.

The Monte Carlo algorithm has been set to do 5 explorations with 10 simulations
each when it is asked for a move. In the hardware used this makes it think for
40 seconds on an empty 9x9 board.

\subsubsection{Monte Carlo Tree Search against itself}

\boards
{img/matches/mctsVSmcts01.jpg}
{img/matches/mctsVSmcts02.jpg}
{Monte Carlo Tree Search against itself.}
{fig:mctsVSmcts}
{Both boards are from the same match. The first shows only the first 12 moves
for clarity. The second shows the finished game.}

When set to play against itself the algorithm shows what looks like random play.
Some moves could be debated as sensible, but many of them are on the second or
fist line, which so early on the game are the two worst places to play at.

It seems clear that this algorithm, at least with this specific configuration,
can't handle the size of an empty Go board, even on the 9x9 size.

A record of the game is shown in \fref{fig:mctsVSmcts}.

\subsubsection{Monte Carlo Tree Search against Go problems}

\boards
{img/problems/mctsProblem01.jpg}
{img/problems/mctsProblem02.jpg}
{Monte Carlo against the first of Cho Chikun's problems.}
{fig:mctsProblem01}

Since tree exploration on smaller boards or advanced games with little empty
spaces should be easier the algorithm has also been tested on some Go problems.

A Go problem or tsumego is a predefined layout of the board, or of a section of
the board, for which the player must find some beneficial move. Life and death
problems are a subset of tsumegos in which the survival of a group depends on
finding the correct sequence to save or kill the group. One collection of such
tsumegos is \textit{Cho Chikun's Encyclopedia of Life and Death}, part of which
are available on OGS\cite{ogsLifeAndDeath}, an online Go server.

The first of these problems and what the algorithm suggested as moves is shown
in \fref{fig:mctsProblem01}.

Black makes the first move, which means the solution is to find some favorable
outcome for black, which in this case is killing the white group. The white
group has a critical point on B1. If white plays on B1 they make two eyes and
live, but if black plays there first white can't make two eyes and dies, so B1
is the solution to the tsumego. This is one of the easiest Go problems.

The algorithm neglects this solution. While asked five times to generate a move
for the starting position it suggested B1 only once.

But notably, after making another move, it consistently suggested B1 for white,
which is the solution now that white has to play. So in the end it was able to
solve the tsumego, probably because after making a move it had already explored
part of the tree but it was difficult that it explored the solution for the
first move.

The engine was tested against other tsumegos but it was not able to solve them,
so no more are shown here.

\subsection{Neural Network Training}

Each network has been training by receiving batches of SGF files which are first
converted to lists of moves by the Training System and then provided to the
train function of the networks. This has been executed with the following
command:

\inputminted[fontsize=\small]{bash}{listings/trainCommand.sh}

Which lists the contents of a folder containing multiple SGF files, shuffles the
list, takes some of them and passes them to train.py as arguments. The combined
list of game positions and moves made from them in all the games selected make
up the sample of the training. The number of files selected can of course be
adapted to the situation.

The networks were configured to train for 20 epochs, that is, processing the
full batch and updating the weights on the network 20 times. 10\% of the sample
were used as validation. This means they were not used for training but to
check the accuracy and loss of the network after training with the rest of
the batch. This technique of validation can help detect overfitting, which
happens when a network does a very good job on the population it has been
trained on but it fails when receiving unseen data.

The training shows loss decrementing and accuracy incrementing on each epoch for
both training and validation samples, which suggests there is learning happening
and no overfitting occurring.

The number of files to train on has been selected by trying to keep the
resulting training time manageable for the developer. It was found that
incrementing the files per batch did not increment the training time linearly,
as with batches of 10 games an epoch of training could be completed in one
minute, in three minutes batches of 20 games, and in up to an hour with batches
of 100 games.

The outputs from this training process can be seen on \flist{code:denseTraining}
and \flist{code:convTraining} for the dense network and the convolutional
network respectively.

\begin{listing}[h]
	\inputminted[fontsize=\scriptsize]{text}{listings/denseTraining.txt}
	\caption{Dense neural network training log.}
	\label{code:denseTraining}
\end{listing}

\begin{listing}[h]
	\inputminted[fontsize=\scriptsize]{text}{listings/convTraining.txt}
	\caption{Convolutional neural network training log.}
	\label{code:convTraining}
\end{listing}

\subsection{Neural Network Evaluation}

One way of studying the results of the training could be to programmatically
test if the network is able to predict human moves when provided with a game
file. This has the drawback of not directly testing the strength of the network,
since it would be tested on predicting the moves of some human player, whose
level could greatly vary, and not on its playing capabilities alone. This also
has already been tested to some extent by the validation portion of the samples
providing for training, and the results show the network increasing its accuracy
of predicting human moves through the training epochs.

Another way is to make the network play either against itself, another network,
or a human player, and analyze the score it provides to each possible play. The
output of the network is a vector with the likelihood of making each move, so
we can take this values as how strongly the engine suggests each of them.

\subsection{Neural Networks Against Themselves}

The case of neural networks playing against themselves is interesting in that it
shows moves with high certainty, that is, the network strongly suggests a
specific move. It can be thought that this would be the most common line of play
the engine has seen, with one certain move leading into another.

This happened for both the dense neural network and the convolutional network.

\subsubsection{Dense network Against Itself}

The initial moves of a match of the dense network against itself, along with
heatmaps showing the output from the network (which chance it gives for each
move), can be seen on Figs.~\ref{fig:denseVSdense01}, \ref{fig:denseVSdense02},
\ref{fig:denseVSdense03} and \ref{fig:denseVSdense04}.

The dense network starts on the center of the board, which is one of the
standard openings in the 9x9 board. It starts on a very good track, but we must
acknowledge that the empty board is a position present on every Go match it has
trained on and so it should know it well. It probably means the center was the
most played opening in the sample. It is interesting to check the heatmap of
this move, since the selected move has only a score of 0.27. Other common
openings on the 9x9 are represented on the vertices with some significant score.

The heatmap of the response to the center play is also interesting. The four
highest scored moves are equivalent, since the first move has not broken any
symmetry. The move they represent, two spaces from the first stone, is also the
consensual response. The three moves following in score are also equivalent
between them.

The first white stone has broken the symmetry in one direction, and so the
second black move is suggested with more likelihood than the previous one. This
is also a very common sequence of play.

The following moves are a white attack followed by black's sensible response,
and a strong white extension (arguably to the wrong side, but the minutia of why
the other side would be better is probably not grasped by the engine). The
following moves could be called looking for complication, which is a strategy.

Overall some moves could be criticised but the engine is showing at least a
basic knowledge of the game's strategy.

\subsubsection{Convolutional Network Against Itself}

The same information on a match of the convolutional network against itself can
be seen on Figs.~\ref{fig:convVSconv01}, \ref{fig:convVSconv02},
\ref{fig:convVSconv03} and \ref{fig:convVSconv04}.

The convolutional network also likes to start on the center of the board and is
more certain of the move than the dense network. Its next two plays are also the
same, but the match starts to diverge from the fourth move.

Instead of attacking the black stone on the side, white plays directly under at
B5, keeping the edge of symmetry. This is strategically important because while
symmetry is present playing on either side makes no difference, while playing
after the opponent has broken symmetry gives the opportunity to respond and
capitalize on the now better side to play.

Black responds to the white stone by attacking it at B4 and white then extends
to the more open side with F7.

The match up to this point is discussed on Sensei's Library as an interesting
sequence deriving from the center first stone\cite{sl_sword}. The discussion
there states that white should have extended to the other side.

The following moves are some sensible continuations, with approaches and
responses on each side. They are probably not the best moves, but the idea
behind them can be read from the board.

\subsubsection{Dense Network Against Convolutional}

The first 12 and the last 6 moves of a match of the dense network against the
convolutional network can be seen on be seen on Figs.~\ref{fig:denseVSconv01},
\ref{fig:denseVSconv02}, \ref{fig:denseVSconv03}, \ref{fig:denseVSconv04},
\ref{fig:denseVSconv05} and \ref{fig:denseVSconv06}.

The dense network is given black. The convolutional network plays then as white.

The first three moves are known territory for both networks, since both play the
same sequence when matched against themselves.

After white plays under the black stone as the fourth move, black makes a maybe
too careful extension, but it seemed very sure about it. Now white has some
doubts. It seems to like the extensions to both sides, if a bit more to the more
open side, a preference both engines have previously shown at this point of the
match. The move it prefers the most, though, would be to play again under black.
So, as this move is impossible, it defers to the extension to the open side with
F3.

Black now makes the mistake of overattacking the lone white stone on B5 with B6,
and white capitalizes on it by keeping extending with D3. B5 is still a move
with a not negligible score, but it is no longer the highest.

The following moves are questionable, with black keeping building a very tight
group and getting no territory and white attacking too directly with a lone
stone into that strong group.

In the final moves of the game we see that black has built a living group on the
right side and things are looking good for it on the rest of the board. White
has passed a bunch of times, starting too early, which means it gives too high
an importance to passing. Since the networks learned only by seeing moves in
response to boards, it has not learnt that it should not pass until there is no
more to gain. But it is also important to note that passing was not the most
likely move for the engine to make, it just happens that the move it would like
to make is already taken and the highest chance available move is to pass.

\moveHeatmap
{img/matches/denseVSdense_board_01.jpg}
{img/matches/denseVSdense_heatmap_01.png}
{img/matches/denseVSdense_board_02.jpg}
{img/matches/denseVSdense_heatmap_02.png}
{img/matches/denseVSdense_board_03.jpg}
{img/matches/denseVSdense_heatmap_03.png}
{Dense network against itself, moves 1 through 3.}
{fig:denseVSdense01}

\moveHeatmap
{img/matches/denseVSdense_board_04.jpg}
{img/matches/denseVSdense_heatmap_04.png}
{img/matches/denseVSdense_board_05.jpg}
{img/matches/denseVSdense_heatmap_05.png}
{img/matches/denseVSdense_board_06.jpg}
{img/matches/denseVSdense_heatmap_06.png}
{Dense network against itself, moves 4 through 6.}
{fig:denseVSdense02}

\moveHeatmap
{img/matches/denseVSdense_board_07.jpg}
{img/matches/denseVSdense_heatmap_07.png}
{img/matches/denseVSdense_board_08.jpg}
{img/matches/denseVSdense_heatmap_08.png}
{img/matches/denseVSdense_board_09.jpg}
{img/matches/denseVSdense_heatmap_09.png}
{Dense network against itself, moves 7 through 9.}
{fig:denseVSdense03}

\moveHeatmap
{img/matches/denseVSdense_board_10.jpg}
{img/matches/denseVSdense_heatmap_10.png}
{img/matches/denseVSdense_board_11.jpg}
{img/matches/denseVSdense_heatmap_11.png}
{img/matches/denseVSdense_board_12.jpg}
{img/matches/denseVSdense_heatmap_12.png}
{Dense network against itself, moves 10 through 12.}
{fig:denseVSdense04}

\moveHeatmap
{img/matches/convVSconv_board_01.jpg}
{img/matches/convVSconv_heatmap_01.png}
{img/matches/convVSconv_board_02.jpg}
{img/matches/convVSconv_heatmap_02.png}
{img/matches/convVSconv_board_03.jpg}
{img/matches/convVSconv_heatmap_03.png}
{Convolutional network against itself, moves 1 through 3.}
{fig:convVSconv01}

\moveHeatmap
{img/matches/convVSconv_board_04.jpg}
{img/matches/convVSconv_heatmap_04.png}
{img/matches/convVSconv_board_05.jpg}
{img/matches/convVSconv_heatmap_05.png}
{img/matches/convVSconv_board_06.jpg}
{img/matches/convVSconv_heatmap_06.png}
{Convolutional network against itself, moves 4 through 6.}
{fig:convVSconv02}

\moveHeatmap
{img/matches/convVSconv_board_07.jpg}
{img/matches/convVSconv_heatmap_07.png}
{img/matches/convVSconv_board_08.jpg}
{img/matches/convVSconv_heatmap_08.png}
{img/matches/convVSconv_board_09.jpg}
{img/matches/convVSconv_heatmap_09.png}
{Convolutional network against itself, moves 7 through 9.}
{fig:convVSconv03}

\moveHeatmap
{img/matches/convVSconv_board_10.jpg}
{img/matches/convVSconv_heatmap_10.png}
{img/matches/convVSconv_board_11.jpg}
{img/matches/convVSconv_heatmap_11.png}
{img/matches/convVSconv_board_12.jpg}
{img/matches/convVSconv_heatmap_12.png}
{Convolutional network against itself, moves 10 through 12.}
{fig:convVSconv04}

\moveHeatmap
{img/matches/denseVSconv_board_01.jpg}
{img/matches/denseVSconv_heatmap_01.png}
{img/matches/denseVSconv_board_02.jpg}
{img/matches/denseVSconv_heatmap_02.png}
{img/matches/denseVSconv_board_03.jpg}
{img/matches/denseVSconv_heatmap_03.png}
{Dense network against convolutional, moves 1 through 3.}
{fig:denseVSconv01}
{The dense network plays as black.}

\moveHeatmap
{img/matches/denseVSconv_board_04.jpg}
{img/matches/denseVSconv_heatmap_04.png}
{img/matches/denseVSconv_board_05.jpg}
{img/matches/denseVSconv_heatmap_05.png}
{img/matches/denseVSconv_board_06.jpg}
{img/matches/denseVSconv_heatmap_06.png}
{Dense network against convolutional, moves 4 through 6.}
{fig:denseVSconv02}

\moveHeatmap
{img/matches/denseVSconv_board_07.jpg}
{img/matches/denseVSconv_heatmap_07.png}
{img/matches/denseVSconv_board_08.jpg}
{img/matches/denseVSconv_heatmap_08.png}
{img/matches/denseVSconv_board_09.jpg}
{img/matches/denseVSconv_heatmap_09.png}
{Dense network against convolutional, moves 7 through 9.}
{fig:denseVSconv03}

\moveHeatmap
{img/matches/denseVSconv_board_10.jpg}
{img/matches/denseVSconv_heatmap_10.png}
{img/matches/denseVSconv_board_11.jpg}
{img/matches/denseVSconv_heatmap_11.png}
{img/matches/denseVSconv_board_12.jpg}
{img/matches/denseVSconv_heatmap_12.png}
{Dense network against convolutional, moves 10 through 12.}
{fig:denseVSconv04}

\moveHeatmap
{img/matches/denseVSconv_board_72.jpg}
{img/matches/denseVSconv_heatmap_72.png}
{img/matches/denseVSconv_board_73.jpg}
{img/matches/denseVSconv_heatmap_73.png}
{img/matches/denseVSconv_board_74.jpg}
{img/matches/denseVSconv_heatmap_74.png}
{Dense network against convolutional, moves 72 through 74.}
{fig:denseVSconv05}

\moveHeatmap
{img/matches/denseVSconv_board_75.jpg}
{img/matches/denseVSconv_heatmap_75.png}
{img/matches/denseVSconv_board_76.jpg}
{img/matches/denseVSconv_heatmap_76.png}
{img/matches/denseVSconv_board_77.jpg}
{img/matches/denseVSconv_heatmap_77.png}
{Dense network against convolutional, moves 75 through 77.}
{fig:denseVSconv06}

% Needed line so the processing of the last command doesn't find an EOF