Mission-Ready Reinforcement Learning

Reinforcement learning (RL) is a machine learning technique that trains artificial intelligence (AI) to solve complex decision problems — such as finding the optimal strategy for playing chess. Lincoln Laboratory and the Department of Defense believe that RL will be a key technology for human-machine collaborative tasks, such as U.S. Air Force autonomous wingmen and U.S. Army human-robot ground teams for clearing buildings. However, in order for AI to be an effective teammate in highly complex and nuanced real-world scenarios, we must first demonstrate AI teaming effectiveness in a simplified, constrained task.
Our Mission-Ready Reinforcement Learning (MeRLin) project paired human players with various AI teammates in the collaborative card game called Hanabi. Our results showed that the human participants had a strong adverse subjective reaction toward a state-of-the art RL agent across nearly all axes of trust, interpretability, and predictability. We hypothesized that the current human-AI technology gap is due to the fact that RL is optimizing for the wrong metrics. We then developed a new RL training paradigm for collaborative settings where agents are trained based on a metric known as diversity, which teaches AI how to be a good teammate by training it with a mathematically diverse set of AI counterparts.
For next steps, we plan to integrate this new algorithm with the wargaming simulator called Command: Modern Operations (CommandMO) and an RL framework called RLlib. This software engineering effort will enable a large suite of RL algorithms to be trained within the wide range of joint domain military scenarios that can be simulated in CommandMO. We believe this work will provide a valuable asset to military sponsors who seek to discover tactics and counter-tactics for nascent technologies before they are deployed on the battlefield.