Designing The Path of Go
Decisions, decisions.
A tricky enough challenge, but that's only half the story. Remember how easy it is to evaluate who's winning in Chess? You can often do it with just a quick glance at the board – a very quick glance if I happen to be playing – but that doesn't work for Go.
"So the size aspect of Go is a problem," says Graepel, "but there's also this evaluation problem to take into account. The computer has to work out whose position is better in order to choose the right moves to make next, and with no difference between knights and pawns, this gets very hard.
"Each of the Go stones is the same: they only take their value from their position on the board and how they inter-relate to all the other stones on the board. That means it's almost impossible to look at the board the same way and come up with the same kind of evaluation."
So how have computer Go researchers tackled these two issues? This is where the science behind The Path of Go gets brilliant – and where the game's modest loading bar comes into its own.
"Let's talk about the evaluation problem first," Graepel says. "Something called Monte Carlo sampling has proved very useful. It's quite an amazing fact, but if you take a Go position in which Black is in a better position than White, how can the computer find out about that?
"A way that appears to work is this: you take that position and you play randomly to the end of the game. By that, I mean that Black and White still make legal moves, but those legal moves are determined by just throwing a die or using a random number generator.
"Do that once and the outcome will be random, of course. But it turns out that if you do that often enough – you always start from the same position, and then you play the game to completion with random moves, say, 10,000 times, you'll find that if Black has an advantage in that position, even through random play, Black will win slightly more often than White.
"This is a very weak statistical signal which is hard to pick up," admits Graepel. "But people in the Go community have advanced this, and discovered that if the computer simulations make moves which have done better in earlier samples – if you effectively bias your random games towards good moves – then the signal gets much stronger.
"That way, you are randomly exploring the game tree, but putting more of your attention on promising moves, and it allows you to evaluate who is winning much more successfully."
Cripes. So every time that little loading bar pops up – every time the computer makes a move in The Path of Go – it's first playing a series of games randomly through to their completion?
"Exactly," laughs Graepel. "That's exactly right. It's a technique called UCT: Upper Confidence Intervals in Trees, and it's become one of the very exciting research areas. Although we use it, we didn't invent, so I don't want to take credit.
"So that's problem number one," he continues. "Now we know how to evaluate a position, the second problem is the size of the tree: there are too many different moves available at each turn. We get around this in part by cutting down the board size for much of the campaign in The Path of Go.
"The original game is played on a 19x19 board, which allows for 361 different points. We've cut that down to a 9x9 board, which allows for only 81 points. It's roughly as complex as Chess, but it makes it less intimidating for players and allows the AI to work much better.
"On 9x9 boards, Go programmes are almost competitive with the best human players now, while on bigger boards they're still very much away from that."
This smaller board is then interpreted using a different technique using pattern recognition, explains Graepel. "Here, the idea is to train a machine learning system that learns to imitate a professional Go player.