This article is the first in a series of posts that will focus on an exploratory journey of reinforcement based Deep Learning utilizing the VizDoom platform. My goal is to create a Doom AI capable of thriving in a Deathmatch environment (woohoo killer AI). In this particular post I will outline the initial setup process of both Theano, an open source deep learning framework, as well as the VizDoom environmental setup. VizDoom is not the only platform available for reinforcement learning, but it does provide us with an iconic and complicated world for our AI to navigate. Once our environment is properly set up, we will dip our toes into reinforcement learning by running some basic examples and then utilizing some optimization techniques to increase the performance of our AI agent.
Image 1: Doom agent tracking its target. You dead, demon thing
Before we begin setting up our environment let’s first discuss what reinforcement learning is, how it works, and why is it of interest to us. My preferred way of explaining it is that the bot is performing unsupervised learning, while we give it hints on what we want it to do. This explanation might not be 100% accurate, but it pushes our train of thought in the right direction. We don’t tell the AI how to play or what to do (that would be considered supervised learning), but instead hint as to what the end result should be, such as killing monsters or staying alive.
To lead our network in the right direction it must know how well it is doing compared to what we want it to do. For this we create a scoring system relevant to what we are trying to achieve, an example of which can be seen in Table 1.
|+/- ammo||+/- 5|
|+ armor||+ 10|
|-/+ health||+/- 5|
Table 1: Example scores for an AI agent trained to pick up Armor & Ammo using shortest available path.
I should mention that this is only one method of creating a learning system based on reinforcement learning. For example, we can also use adversarial neural networks, which is essentially a two network system actively trying to beat each other at a task, but without an explicit scoring system. There, the act of performing ‘better’ is measured in a different way.
Regardless of the method used, training AI through reinforcement learning is a wonderful opportunity to be both curious and amazed at how AI learns. Unlike supervised learning methods where we feed a network pre-classified data (an extremely time consuming act), reinforcement learning (as well as other ‘unsupervised’ learning methods) can really produce some surprising results. Thinking back to some of the supervised learning projects I have done, I can’t help but wish there was a valid unsupervised method available, It would have saved me many sleepless nights of manually classifying items.
Although we are currently unable to fully understand how an AI arrives at a decision (and yes, while it’s a probabilistic model, it’s far too convoluted for us to fully dissect) we can explore the basic principles of the system rather effortlessly. To start, let’s imagine that instead of Doom we want to create an AI to completely destroy Flappy Bird. To human players, Flappy Bird has 1 input, which is to flap (by tapping). Therefore, our options are to either tap or not to tap, so we’ll give these same two options to our AI. Once we program the AI with two possible options to take (on a per-frame level or some similar increment), we need to create a model for it to know which option to take.
To guide the AI we create a simple scoring system where [Crash: -1,000 , pipe-passed: +5]. This is just one option of setting up the scores. As long as there is a penalty for a crash and a reward for passing between the pipes, then all will work out well. The computer vision aspect of this network is largely dependent on the language used, but generally as long as you are able to process frame images and feed those to the network it should be fine (we will also go over this in further detail in a follow up post). Now that the setup is complete, we start to train our AI using some network architecture of our choosing. What occurs is that in the beginning the network will make random choices as to which input to select, but as it starts getting feedback it will adjust its input strategy and start performing better and better.. To simplify, the AI will start to see that every time the agent (the bird) hits a pipe the score plummets, so it will start trying to avoid pipes.
Unlike many tools out there, I would say that the VizDoom installation is fairly pain-free. To get this working you will need to meet the dependencies of ‘ZDoom’ which is the Doom port upon which ViZDoom was built. We will go over the setup process for Linux, Windows and OSX readers can take a look at the github where the setup process is very well explained.
To start, let’s first create a working directory for all of our AI work.
$ mkdir ~/AI_Projects $ cd AI_Projects
This is for nothing more than organization, which will be key to maintaining your sanity as you start working on more and more AI projects. Next we will go ahead and fetch the ViZDoom repo.
$ git clone https://github.com/Marqt/ViZDoom.git
And now you should have a VizDoom directory inside your AI folder.
$ cd VizDoom
Before we can start playing around we need to install both the ZDoom dependencies as well as the AI frameworks we will be using.
$ sudo apt-get install build-essential zlib1g-dev libsdl2-dev libjpeg-dev nasm tar libbz2-dev libgtk2.0-dev cmake git libfluidsynth-dev libgme-dev libopenal-dev timidity libwildmidi-dev
$ sudo apt-get install python-pip python3-pip $ pip install numpy #just for python2 binding $ pip3 install numpy #just for python3 binding
$ sudo apt-get install libboost-all-dev
$ sudo apt-get install liblua5.1-dev
These steps are copied from the git repo.
(I have not had much luck in getting the Python3 boost bindings to work).
And finally run the following build command inside the VizDoom Directory.
$ cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_PYTHON=ON -DBUILD_JAVA=ON -DBUILD_LUA=ON
Or if you want to give Python 3 a chance.
$ cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_PYTHON=ON -DBUILD_PYTHON3=ON -DBUILD_JAVA=ON -DBUILD_LUA=ON
Another install method is to use pip , though you will still need the git repo.
$ sudo pip install vizdoom
Now that we have a stable version of ViZDoom installed we need to go ahead and install our AI framework. Although the ViZDoom training example code is written using both Theano and Tensorflow you could very well run your own training program using any other deep learning framework (maybe Caffe if you want).
Let’s go ahead and setup Theano.
$ sudo pip install Theano
That should be it for the core setup. I would also recommend the reader to install CUDA if possible, but will not go over the install process as it’s a bit more hands on for each system. Now that we have an ecosystem we can play with we should run the example training program to ensure everything is working as intended.
While inside ViZDoom root directory.
$ cd examples/python
And run the following command.
$ python learning_theano.py
This will initiate the Learning process and start training your AI to perform the simple task of killing a defenceless NPC demon. Depending on hardware and if CUDA is installed and working this could take anywhere from 15 minutes to upwards of an hour (or 2). As the training progresses you should be able to see a clear trend of the AI slowly performing better and better (see Image 2 below). Once the AI is trained for 20 epochs (each epoch containing 2000 individual episodes) the AI will be put to the test and you will see 10 scores displayed on the terminal like on Table 2.
Image 2: AI training and testing process(Left). Beginning of training file (Right)
Table 2: Scoring after AI training
The performance of our AI in this session was a fairly ‘standard’ one. It was capable of actively completing its task, even though it was not always at an optimal level (see the lower scores). Although altering the Neural Network in use might end up improving performance, the task is simple enough and the Network used is good enough to perform the task. But there are other ways we can try to improve AI performance without necessarily altering the network. One way is to actively reduce the learning rate as training progresses. Reducing the learning rate prevents the network from ‘training’ itself out of an optimal configuration as well as allows the network to more finely tune itself towards a better end result.
To show how this works let’s implement it into the training application. First we will make a new copy of the training app.
$ cd ~/VizDoom/examples/python
$ cp learning_theano.py leaning_step_theano.py
At around line 26 in the new application we will add a constant which will serve to reduce our learning rate.
learning_step = 0.1
Then under the ‘for epoch’ loop include the following logic.
if epoch % 5: learning_rate = learning_step*learning_rate
Although not the most elegant piece of code, this ensures that for every 5 epochs we drop the learning rate. One of the benefits of including something like this is that we are given more options when it comes to customizing our training, giving us the opportunity to make slight performance improvements. Below, I have included a series of graphs showing the performance difference between 3 AI agents trained using the unchanged example program (vanilla) and 3 AI agents trained using a learning step decay.
Graph 1: Performance comparison 1
Graph 2: Performance comparison 2
Graph 3: Performance comparison 3
In order to keep the experiment unbiased, the 6 individual AI agents were all trained using the same number of epochs, episodes, and batch size. In fact, there was no change other than the step function mentioned in above. Although the improvement is not ‘yuuhge’ as some would say, there is still a clear trend showing that the AI agents trained using a step method have a more consistent performance (except Step-learning based AI agent 2… you disappoint me son). Table 3 & Graph 4 show us a side by side comparison of the average score attained by each of the AI agents trained. table shows a clearer view of the benefits gained by using a step learning method.
|Vanilla AI Average||Step-Learning AI averages|
Table 3: AI Score averages
Graph 4: A graphical representation of the AI average scores
I hope this first post was able to show you that getting started in deep learning is not so difficult. I know many people are interested in a deeper explanation of how the AI actually trains and how it works, so this will be addressed in a future post. In the upcoming post we will compare some different Neural Network architecture when it comes to performing the 'Deadly_corridor' Scenario. In the meantime, have fun experimenting on your own.
You can sign up below to be alerted when I post the next article in this series.
Download our Incubator Resources
We’re known for sharing everything!
Save more time, get more done!
Innovate from the inside
YOU MIGHT ALSO LIKE