Publications

Refine Results

(Filters Applied) Clear All

Health-informed policy gradients for multi-agent reinforcement learning

Summary

This paper proposes a definition of system health in the context of multiple agents optimizing a joint reward function. We use this definition as a credit assignment term in a policy gradient algorithm to distinguish the contributions of individual agents to the global reward. The health-informed credit assignment is then extended to a multi-agent variant of the proximal policy optimization algorithm and demonstrated on simple particle environments that have elements of system health, risk-taking, semi-expendable agents, and partial observability. We show significant improvement in learning performance compared to policy gradient methods that do not perform multi-agent credit assignment.
READ LESS

Summary

This paper proposes a definition of system health in the context of multiple agents optimizing a joint reward function. We use this definition as a credit assignment term in a policy gradient algorithm to distinguish the contributions of individual agents to the global reward. The health-informed credit assignment is then...

READ MORE

Learning emergent discrete message communication for cooperative reinforcement learning

Published in:
37th Conf. on Uncertainty in Artificial Intelligence, UAI 2021, early access, 26-30 July 2021.

Summary

Communication is a important factor that enables agents work cooperatively in multi-agent reinforcement learning (MARL). Most previous work uses continuous message communication whose high representational capacity comes at the expense of interpretability. Allowing agents to learn their own discrete message communication protocol emerged from a variety of domains can increase the interpretability for human designers and other agents. This paper proposes a method to generate discrete messages analogous to human languages, and achieve communication by a broadcast-and-listen mechanism based on self-attention. We show that discrete message communication has performance comparable to continuous message communication but with much a much smaller vocabulary size. Furthermore, we propose an approach that allows humans to interactively send discrete messages to agents.
READ LESS

Summary

Communication is a important factor that enables agents work cooperatively in multi-agent reinforcement learning (MARL). Most previous work uses continuous message communication whose high representational capacity comes at the expense of interpretability. Allowing agents to learn their own discrete message communication protocol emerged from a variety of domains can increase...

READ MORE

Adaptive stress testing: finding likely failure events with reinforcement learning

Published in:
J. Artif. Intell. Res., Vol. 69, 2020, pp. 1165-1201.

Summary

Finding the most likely path to a set of failure states is important to the analysis of safety critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due to the complex stochastic environment in which the system operates. As a result, safety validation is not only concerned about whether a failure can occur, but also discovering which failures are most likely to occur. This article presents adaptive stress testing (AST), a framework for finding the most likely path to a failure event in simulation. We consider a general black box setting for partially observable and continuous-valued systems operating in an environment with stochastic disturbances. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulation-based and does not require internal knowledge of the system, making it suitable for black-box testing of large systems. We present different formulations depending on whether the state is fully observable or partially observable. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where we are concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where a prototype aircraft collision avoidance system is stress tested to find the most likely scenarios of near mid-air collision.
READ LESS

Summary

Finding the most likely path to a set of failure states is important to the analysis of safety critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due...

READ MORE

Deep implicit coordination graphs for multi-agent reinforcement learning [e-print]

Summary

Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large number of agents. We apply DICG to both centralized-training-centralized-execution and centralized-training-decentralized-execution regimes. We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.
READ LESS

Summary

Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise...

READ MORE

Optimized airborne collision avoidance in mixed equipage environments

Published in:
MIT Lincoln Laboratory Report ATC-408

Summary

Developing robust collision avoidance logic that reliably prevents collision without excessive alerting is challenging due to sensor error and uncertainty in the future paths of the aircraft. Over the past few years, research has focused on the use of a computational method known as dynamic programming for producing an optimized decision logic for airborne collision avoidance. This report focuses on recent research on coordination, interoperability, and multiple-threat encounters. The methodology presented in this report results in logic that is safer and performs better than legacy TCAS. Modeling and simulation indicate that the proposed methodology can bring significant benefit to the current airspace and can support the need for safe, non-disruptive collision protection as the airspace continues to evolve.
READ LESS

Summary

Developing robust collision avoidance logic that reliably prevents collision without excessive alerting is challenging due to sensor error and uncertainty in the future paths of the aircraft. Over the past few years, research has focused on the use of a computational method known as dynamic programming for producing an optimized...

READ MORE

Next-generation airborne collision avoidance system

Published in:
Lincoln Laboratory Journal, Vol. 19, No. 1, 2012, pp. 17-33.

Summary

In response to a series of midair collisions involving commercial airliners, Lincoln Laboratory was directed by the Federal Aviation Administration in the 1970s to participate in the development of an onboard collision avoidance system. In its current manifestation, the Traffic Alert and Collision Avoidance System is mandated worldwide on all large aircraft and has significantly improved the safety of air travel, but major changes to the airspace planned over the coming years will require substantial modification to the system. Recently, Lincoln Laboratory has been pioneering the development of a new approach to collision avoidance systems that completely rethinks how such systems are engineered, allowing the system to provide a higher degree of safety without interfering with normal, safe operations.
READ LESS

Summary

In response to a series of midair collisions involving commercial airliners, Lincoln Laboratory was directed by the Federal Aviation Administration in the 1970s to participate in the development of an onboard collision avoidance system. In its current manifestation, the Traffic Alert and Collision Avoidance System is mandated worldwide on all...

READ MORE

Hazard alerting based on probabilistic models

Published in:
J. Guidance, Control, Dynamics, Vol. 35, No. 2, March-April 2012, pp. 442-450.

Summary

Hazard alerting systems alert operators to potential future undesirable events so that action may be taken to mitigate risk. One way to develop a hazard alerting system based on probabilistic models is by using a threshold-based approach, where the probability of the undesirable event without mitigation is compared against a threshold. Another way to develop such a system is to model the system as a Markov decision process and solve for the hazard experiments reveal that an expected utility approach performs better than threshold-based approaches when the dynamic stochasticity is high, where accounting for delays or changes in the alert becomes more important. however, for certain system parameters and operating environments, a threshold-based approach may provide comparable performance.
READ LESS

Summary

Hazard alerting systems alert operators to potential future undesirable events so that action may be taken to mitigate risk. One way to develop a hazard alerting system based on probabilistic models is by using a threshold-based approach, where the probability of the undesirable event without mitigation is compared against a...

READ MORE

A new approach for designing safer collision avoidance systems

Published in:
Air Traffic Control Q., Vol. 20, No. 1, January 2012, pp. 27-45.

Summary

The Traffic Alert and Collision Avoidance System significantly reduces the risk of mid-air collision and is mandated worldwide on transport aircraft. Engineering the avoidance logic was costly and spanned decades. The development followed an iterative process where the logic was specified using pseudocode, evaluated in simulation, and revised based on performance against a set of metrics. Modifying the logic is difficult because the pseudocode contains many heuristic rules that interact in complex ways. With the introduction of next-generation air traffic control procedures and surveillance systems, the logic will require significant revision to prevent unnecessary alerts. Recent work has explored an approach for designing collision avoidance systems that will shorten the development cycle, improve maintainability, and enhance safety with fewer false alerts. The approach involves computationally deriving optimized logic from encounter models and performance metrics. This paper outlines the approach and discusses the anticipated impact on development, safety, and operation.
READ LESS

Summary

The Traffic Alert and Collision Avoidance System significantly reduces the risk of mid-air collision and is mandated worldwide on transport aircraft. Engineering the avoidance logic was costly and spanned decades. The development followed an iterative process where the logic was specified using pseudocode, evaluated in simulation, and revised based on...

READ MORE

Decomposition methods for optimized collision avoidance with multiple threats

Published in:
DASC 2011, 30th IEEE/AIAA Digital Avionics Systems Conference, 16-20 October 2011, pp. 1D2.

Summary

Aircraft collision avoidance systems assist in the resolution of collision threats from nearby aircraft by issuing avoidance maneuvers to pilots. Encounters where multiple aircraft pose a threat, though rare, can be difficult to resolve because a maneuver that might resolve a conflict with one aircraft might induce conflicts with others. Recent efforts to develop robust collision avoidance systems for single-threat encounters have involved modeling the problem as a Markov decision process and applying dynamic programming to solve for the optimal avoidance strategy. Because this methodology does not scale well to multiple threats, this paper evaluates a variety of decomposition methods that leverage the optimal avoidance strategy for single-threat encounters.
READ LESS

Summary

Aircraft collision avoidance systems assist in the resolution of collision threats from nearby aircraft by issuing avoidance maneuvers to pilots. Encounters where multiple aircraft pose a threat, though rare, can be difficult to resolve because a maneuver that might resolve a conflict with one aircraft might induce conflicts with others...

READ MORE

Collision avoidance for general aviation

Published in:
30th AIAA/IEEE Digital Avionics Systems Conf., 16-20 October 2011.

Summary

The Traffic Alert and Collision Avoidance System (TCAS) is mandated on all large transport aircraft to reduce mid-air collision risk. Since its introduction, no mid-air collisions between TCAS-equipped aircraft have occurred in the United States. However, General Aviation (GA) aircraft are generally not equipped with TCAS and experience collisions several times per year. There is interest in low-cost collision avoidance systems for GA aircraft to reduce collision risk with other GA aircraft as well as with TCAS-equipped aircraft. Since TCAS was designed for large aircraft that can achieve greater vertical rates, the assumptions made by the system and the associated advisories are not always appropriate for GA aircraft. Modifying the TCAS logic to accommodate GA aircraft is far from straightforward. Even minor changes to TCAS to correct operational issues are difficult to implement due to the interaction of the complex rules defining the logic. Recent work has explored an alternative to the TCAS logic based on optimization with respect to a probabilistic model of aircraft behavior. The model encodes performance constraints of GA aircraft, and a computational technique called dynamic programming allows the optimal collision avoidance strategy to be computed efficiently. Prior work has focused on systems that meet the performance assumptions of the existing TCAS logic. However, these assumptions are not always appropriate for GA aircraft. This paper will present simulation results comparing the existing logic to logic that has been optimized to operate onboard GA aircraft. If both aircraft are equipped with collision avoidance logic, it is important that the advisories be coordinated to prevent both aircraft from climbing or descending. The TCAS logic has a built-in coordination mechanism with which a GA system must maintain compatibility. Several coordination strategies, both with the optimized logic and the current logic, are evaluated in simulation.
READ LESS

Summary

The Traffic Alert and Collision Avoidance System (TCAS) is mandated on all large transport aircraft to reduce mid-air collision risk. Since its introduction, no mid-air collisions between TCAS-equipped aircraft have occurred in the United States. However, General Aviation (GA) aircraft are generally not equipped with TCAS and experience collisions several...

READ MORE