AI Warehouse | AI Learns to Walk | Deep Reinforcement Learning
In this video an AI Warehouse agent named Albert learns how to walk to escape 5 rooms I created.
The AI was trained using Deep Reinforcement Learning, a method of Machine Learning which involves rewarding the agent for doing something correctly, and punishing it for doing anything incorrectly.
Albert's actions are controlled by a Neural Network that's updated after each attempt in order to try to give Albert more rewards and less punishments over time.
In every āAI learns to walkā video Iāve seen, the AI either learns to walk in a weird, non-human way, or they use motion capture of a real person walking and simply train the AI to imitate that. I thought it was weird that nobody tried to train it to walk properly from scratch (without any external data), so I wanted to give it a shot!
Thatās what I said 4 months ago. Itās been really difficult, but Iāve finally managed to do it, so please watch the whole video! The final result ended up being awesome :)
THE BASICS
I created everything using Unity and ML-Agents. Albert is controlled entirely by an artificial brain (neural network) which has 5 layers, the first layer consists of the inputs (the information Albert is given before taking action, like his limb positions and velocities), the last layer tells him what actions to take and the middle 3 layers, called hidden layers, are where the calculations are performed to convert the inputs into actions. His brain was trained using the standard algorithm in reinforcement learning; proximal policy optimization (PPO).
For each of Albertās limbs Iāve given him (as an input) the position, velocity, angular velocity, contacts (if itās touching the ground, wall or obstacle) and the strength applied to it. Iāve also given him the distance from each foot to the ground, direction of the closest target, the direction his bodyās moving, his bodyās velocity, the distance from his chest to his feet and the amount of time one foot has been in front of the other. As for his actions, we allow Albert to control each body partās rotation and strength (with some limitations so his arm canāt bend backwards, for example).
Just like the last videos, Albert was trained using reinforcement learning. For each of Albert's attempts, we calculate a score for how 'good' it was and make small, calculated adjustments to his brain to try to encourage the behaviors that led to a higher score and avoid those that led to a lower score. You can think of increasing Albertās score as rewarding him and decreasing his score as punishing him, or you can think about it like natural selection where the best performing Alberts are most likely to reproduce. For this video there are 13 different types of rewards (ways to calculate Albert's score), we start off with only a couple and with each new room add more, always in an attempt to get him to walk.
REWARD FUNCTION
We start off very simple in the first room, we reward him based on how much he moved to the target and we punish him for moving in the wrong direction. This led to Albert doing the worm towards the target, since he figured out that was the easiest way for him to move the quickest/get the highest score.
In the second room we start checking if his limbs hit the ground. If the limb that hits the ground is a foot we reward him (but only if it's in front of his other foot, more on that later), if it isnāt, we punish him. I also made it so Albert wasnāt rewarded at all unless his chest was high enough to force it to at least be partially standing. As seen in the video, this encourages him to not fall over and encourages him to use his feet to do it. We also introduced a new reward designed to encourage smoother movement; if he approaches the maximum strength allowed on a limb he's punished, and he's rewarded if he uses a strength of almost 0. This encourages him to opt for the more human-like movement of using a bit of strength from many limbs as opposed to a lot of strength from one limb.
This is where we start to polish Albertās gait that developed in room 2 and teach him to turn. From here on we start using the chest height calculation as another direct reward where the higher his chest is the more heās rewarded in an attempt to get him to stand up as straight as possible. These rewards so far give Albert a decent gait, however heās still not using both of his feet (which was by far the hardest part of this project), so room 4 is designed to do exactly that.
We get Albert to take more steps from a few additional rewards. To start, we introduce a 2 second timer that resets when one foot goes in front of the other. We reward Albert whenever this timer is above 0 (the front foot has been in front for < 2 seconds), and we punish him whenever the timer goes below 0 (the front foot has been in front > 2 seconds). We add another reward proportional to the distance of his steps to encourage him to take larger steps. To smooth out the movement, we also add a punishment every frame proportional to the difference in his bodyās velocity from the previous frame to the current frame, so if heās moving at a perfectly consistent velocity he isnāt punished at all, and if he makes very quick erratic movements heās punished a lot.
For the final room the only change I made to the reward function was to go back to an earlier version of a reward. Throughout the other rooms I had been tinkering with how I should reward Albertās feet being grounded, my initial thought was to only reward the front foot for being grounded to try to get him to put more weight on his front foot when taking steps, but somewhere along the way I changed it to just rewarding Albert for any foot being grounded, and that was the version Albert trained with in rooms 3 and 4. For this final room I switched back to the old front foot grounded reward which resulted in a much nicer looking walk. Also, the video makes it seem like I never reset Albertās brain, that isn't entirely true, I had to occasionally reset it because of something called decaying plasticity.
OTHER
For rooms 1 to 4 I only allowed Albert to make a decision every 5 game ticks, but for the final room I removed that constraint and let him make decisions every frame. I found if Albert makes a decision every game tick itās too difficult for him to commit to any proper movements, he ends up just making very small movements like slightly pushing his front foot forward when he should be taking a full step.
The 5 game tick decision time forces him to commit to his decision for at least 5 game ticks so he ends up being more careful when moving a limb. When I recorded him beating the final room I removed this limitation because heās already learned to commit to his actions so allowing him to make a decision every tick just results in a smoother motion.
Thank you so much for watching, this video took me 4 months to make, so please, if you enjoyed it or learned something from it, share it with someone you think will also enjoy it! :)