Teaching a Computer to Play Video Games
One of the coolest new videos making the rounds is a video by Seth Bling where he introduces a scheme for using machine learning to teach a computer to play a video game. More accurately, using a topological genetic neural network algorithm, he teaches a computer to teach itself how to play a video game. That’s a lot of words, though… let’s break it down!
Topological
Computers can’t see the same way that you or I can, but they can detect occlusion. When one thing overlaps another, you can tell a computer to do things based on that action. Video games don’t necessarily have an interface that you can use to program against it, but they are built of predictable units that have predictable behaviors. The first step is to identify which blocks are good, and which blocks are bad. In this case, could say that blocks that you can walk on are good and blocks that correspond to enemies are bad. Easy enough.
So, you know that you need to create a framework that can teach itself to avoid the bad blocks and stick to the good blocks. You also know that you need to do it on the basis of occlusion. When the block that represents your character occludes a block, it is assumed that you are interfacing with the block and receive the status associated to that block (ie, for a good block, nothing happens, but you die or take damage from a bad block).
Neural Network
This isn’t the next word in the concept, but it’s the next one that’s important to cover. In this application, the neural network is a really just a set of nodes that provide directionality to interact with occlusion, with some guiding logic that specifies the order in which nodes should be hit. Going back to the video game example, you might have a node that projects a straight line that instructs the player to jump whenever a block that is “bad” crosses that line. In the same sense, you might have another node that projects a line that makes the character walk right whenever a “good” block crosses that line. In a game that has a fixed frame of reference, like Mario, that means you have to have a node that projects a line directly to a block that will reliably exist in order to make the character move right through the level (except when jumping; this is a bit of a simplification).
The final component is some logic that determines where these nodes are generated.
Genetic
Remember how you need to have a “move right” node pointing to a block that reliably exists? You could start by saying “on load of the algorithm, initialize with this node”, but that doesn’t count as the computer teaching itself because you’ve hardcoded the instruction. This is where the genetic algorithm comes in (some people also refer to these as evolutionary algorithms). In a simple evolutionary model, the fittest creatures live to reproduce, and pass on some of their attributes to their offspring. Each generation is basically the combination of their parents, plus some random mutations that occur along the way.
Because the fittest variants are mated together, logically this process should take less time than if you did a new run with random inputs every time.
In Mario, fitness is measured as a function of how long Mario has survived. In each generation, the computer runs a number of different simulations, takes the ones that are the fittest (ie, where Mario survived the longest), and combines their attributes and adds a little random deviation. For example, if one simulation lasted especially long because Mario jumped on a particular square, and another because Mario crouched shortly afterwards, you would expect that its offspring has Mario jump, move one block to the right, and then crouch. The random mutation is important because it prevents the evolution from getting stalled on a single bad decision.
Pulling it all together
So, you have an algorithm that randomly tries out a bunch of different variations, picks out the best ones in each generation, and mates them together plus a little added randomness, and does the cycle all over again. Once you’ve achieved that, all you need to do is let it run. According to evolutionary principles, eventually a valid solution will be found. Because the fittest variants are mated together, logically this process should take less time than if you did a new run with random inputs every time.
This is an incredibly accessible implementation of a neural network, and I applaud the creator for bringing it together so clearly. You can view the source code here. Some other interesting reading is available in the form of a paper regarding the evolutionary principle, called Neural Evolution of Augmenting Topologies (NEAT)