One of my favorite scenarios to experiment with is the ‘Deadly Corridor’ challenge. In this scenario, our AI agent’s score is increased the further it gets from its starting position, while ‘monsters’ try to take down our agent. The structure of the map also lends itself to making it more difficult for our AI to perform this task. Another scenario which I think might be useful for us is the ‘Defend The Center’ challenge. It's not as intricate as the first example, but still an interesting challenge for any agent.
Training our agents to beat this challenges is fun, but you will see that the skills learned do not translate well to beating a deathmatch. Why? The simple reason is that these new scenarios limit the possible amount of actions the agent can perform. This makes it easier for the AI to learn, however it also hinders it in learning to play more complicated scenarios, such as a deathmatch. In this blogpost I want to go over some of my efforts to try and use these challenges in order to train our agent to perform in a deathmatch situation. I do not promise I will succeed, but I will try.
As I mentioned, one of the issues with using the Deadly Corridor and Defend the Center scenarios is that the AI just does not have enough options to perform. Well, this is actually something we can remedy quite easily by just creating a unified set of standard actions we think are essential for competing in a deathmatch environment.
MOVE_LEFT MOVE_RIGHT ATTACK MOVE_FORWARD MOVE_BACKWARD TURN_LEFT TURN_RIGHT
Source 1: These are what I consider the minimum required actions to perform any scenario, including a deathmatch.
You can of course decide on your own set of actions, but keep in mind that the more parameters a network has, the longer it will take for the model to properly converge. In order to utilize these actions you could either include them in the scenarios configuration file, or add them in the python code using code below.
Adding these options to your AI will undoubtedly increase the complexity of the task. What can we do to improve our chances of succeeding? One clear option is to increase the number of epochs our AI will be training on. This is a bit tricky, as we need to train just long enough for our AI to learn, but not too long as it’s a waste of resources and may lead to overfitting. Overfitting occurs in machine learning when the trained weights specialize into a specific data set, which prevents the model from making proper predictions if a scenario or environment changes. We want an AI agent with generalized knowledge, yet a proper understanding of the task at hand. Try setting
Epochs = 50 in the training program for starters, if there is noticeable improvement then we know we are in the right direction. This is still a very small training time, so go ahead and keep increasing the number of epochs to train on.
Graph 1: 50 epochs represented in steps of five.
In the above graph there is a huge jump in scores between epoch 20 and and epoch 25, and this is something to keep in mind when training your AI. Sometimes giving the model just a bit more training time is enough to overcome some sort of barrier. In most likelihood at around epoch 25 our model has already started to grasp the basic concept of the challenge, and is starting to develop its own strategy to maximize scores. In the following graph we take it a step further and train our AI just a bit longer.
Graph 2: 150 epoch training session.
The above graph is a great graph, one of the best graphs. Trust me, its results are huge. Believe it. We should have more graphs like it, and build a big firewall to keep other graphs out (/s). But yeah, I really like this graph. Although the numbers in the graph above show a steady improvement in performance, what it does not clearly represent is the paradigm shift the model went through in order to break through and continue improving. Hints of this shift can be glimpsed by looking between epoch 80 and 100. This plateau, and specifically the dip in epoch 80, are ‘ok’ representations of the model attempting a new strategy. Later in post 4 when we test this AI we will grasp its abilities better.
Increasing the amount of training, although effective, is a very brute force solution. A more subtle trick might be to increase the resolution of the image fed into our network, as well as increasing the deepness of our network. The logic here is that a deeper network might be able to grasp more detail from the game world (The learning method stays fairly standard though). This is a gamble of course, as we don’t know if the deeper network will perform this task better. It could just be a waste of resources. Let’s go ahead and deepen our network just a bit and see how it affects our results.
Change the section defining the network and replace the existing Cov2DLayers with the code below.
dqn = Conv2DLayer(dqn, num_filters=8, filter_size=[21,21], nonlinearity=rectify, W=HeUniform("relu"), b=Constant(.1), stride=3) dqn = Conv2DLayer(dqn, num_filters=8, filter_size=[11,11], nonlinearity=rectify, W=HeUniform("relu"), b=Constant(.1), stride=2) dqn = Conv2DLayer(dqn, num_filters=8, filter_size=[6,6], nonlinearity=rectify, W=HeUniform("relu"), b=Constant(.1), stride=2) dqn = Conv2DLayer(dqn, num_filters=8, filter_size=[3,3], nonlinearity=rectify, W=HeUniform("relu"), b=Constant(.1), stride=2)
Graph 3: A 50 epoch training session of a slightly deeper network.
The above graph (Graph 3) is actually fairly surprising. Although I was not expecting the network to create a sentient AI, I did expect a bit better performance. This graph tells me that either the network was incapable of learning (probable) or that it initiated sooo badly that it just got stuck in that configuration. Honestly, it’s most likely error #1, the issue most likely arose from the deeper convolutions not creating a proper representation of the image, which would be catastrophic to the learning process. I will revisit this in the upcoming post and try to mend some of the mistakes made here.
These two methods are fairly straightforward and mostly deal with giving our training process either more time or more horsepower. There are several other methods that can also help. One of those methods, described by the team that built the ViZDoom API detailed here here , provides an interesting method. In the next post I hope to have this network architecture set up and test out how well it works. Other methods such as an Asynchronous Actor-critic network have also shown great performance in this task. Next post I will also test some of the AI agents trained today and check out what their strong and weak points are.
See ya next time.
Download our Incubator Resources
We’re known for sharing everything!
Save more time, get more done!
Innovate from the inside
YOU MIGHT ALSO LIKE