Mission-Ready Reinforcement Learning

We are using reinforcement learning to train artificial intelligence to team with humans and carry out complex military operations.
Multiple images of drawn faces in different colors connected by white lines.

Reinforcement learning (RL) is a machine learning technique that trains artificial intelligence (AI) to solve complex decision problems — such as finding the optimal strategy for playing chess. Lincoln Laboratory and the Department of Defense believe that RL will be a key technology for human-machine collaborative tasks, such as U.S. Air Force autonomous wingmen and U.S. Army human-robot ground teams for clearing buildings. However, in order for AI to be an effective teammate in highly complex and nuanced real-world scenarios, we must first demonstrate AI teaming effectiveness in a simplified, constrained task.

Our Mission-Ready Reinforcement Learning (MeRLin) project paired human players with various AI teammates in the collaborative card game called Hanabi. Our results showed that the human participants had a strong adverse subjective reaction toward a state-of-the art RL agent across nearly all axes of trust, interpretability, and predictability. We hypothesized that the current human-AI technology gap is due to the fact that RL is optimizing for the wrong metrics. We then developed a new RL training paradigm for collaborative settings where agents are trained based on a metric known as diversity, which teaches AI how to be a good teammate by training it with a mathematically diverse set of AI counterparts.

For next steps, we plan to integrate this new algorithm with the wargaming simulator called Command: Modern Operations (CommandMO) and an RL framework called RLlib. This software engineering effort will enable a large suite of RL algorithms to be trained within the wide range of joint domain military scenarios that can be simulated in CommandMO. We believe this work will provide a valuable asset to military sponsors who seek to discover tactics and counter-tactics for nascent technologies before they are deployed on the battlefield.