Reinforcement Learning (A2C) / by Marshall Yeung

With MT-D*-Lite yet to be fully functional, I have decided to take a pause from the pathing algorithm and focus on helping a teammate with creating a neural network. The intention was to ultimately have the AI decide what action it should perform creating less predictable behavior. In order to have an AI that does such a behavior, it is impossible to program it the traditional way where the AI’s actions are deterministic and therefore predictable. There are other options that would potentially produce the desired result such as fuzzy logic, GOAP, and even genetic algorithm; however, we have decided on taking the challenge to create a neural network that would produce the behavior we imagined.

Neural network and machine learning is a vast subject. There are many different implementations and models that suit different needs. We started by understanding the basics of a neural network; the inputs, the neurons, the activation functions, and the layers. The basic concepts behind the neural network are not too difficult to understand. The struggling part comes when determining what activation function to use and the mathematics that goes behind these formulas. While understanding the basics of a neural network is important, it will not produce the desired outcome that we want. For that, we needed to search deeper into more complex neural network models, thus, stumbling across the actor-critic model.

For an AI to decide its actions, it first needs to be trained. One way to train the AI is through reinforcement learning, of which the actor-critic model falls under. The basic principle of the model is to have a multi-network system that influences one another. The actor-network focuses on maximizing the immediate reward of its actions while the critic-network focuses on the quality of the action in relation to the overall outcome. Both these network work in complement to improve one another to ultimately produce an optimal policy for solving the problem at hand.

Understanding the concept is completely different than understanding the underlying math. We researched the algorithm and implementations of the model in hopes to be able to produce our own implementation, unfortunately, most of our search ends in a Python implementation that streamlines the math and underlying calculations that we seek. We spend most of the week trying to implement the network and programmed as far as we can. The remaining issue is how to perform the necessary calculations that we needed, which we would have to research deeper on.