From tools to teammates: Integrating robots on human teams
Advanced capabilities and algorithms developed for autonomous systems could streamline human-robot teaming in military operations
Consider a scenario in which a six-member dismounted Marine squad is tasked with raiding a compound that recent intelligence reports indicate is harboring a terrorist group. While it is unlikely that the Marines will be directly confronted by hostiles, it is possible they could be targeted by snipers. As the squad assesses the terrorist presence in the area, a companion robot equipped with a novel infrared sensor scans the horizon and ground for unexpected heat signatures and activity. Meanwhile, a swarm of unmanned aerial vehicles generates three-dimensional (3D) colored maps of the compound in real time.
Compound raid vignettes such as the one described above are being played out in Lincoln Laboratory’s Autonomous Systems Laboratory as part of a basic research program in collaborative robotics, a new trend in the robotics industry that seeks to advance robot-human synergy. The program, which is being carried out by a team of technical staff, military fellows, and interns from Lincoln Laboratory’s Control Systems Engineering, Informatics and Decision Support, and Embedded and Open Systems Groups, is motivated by the Department of Defense’s growing interest in utilizing autonomous systems to enhance warfighter situational awareness. Begun in February 2014, the program is funded by the Office of Naval Research (ONR), which is investing in autonomy and unmanned systems research and development under their Naval Science and Technology Strategic Plan.
“The aim of the program is to make robots that can autonomously and seamlessly collaborate with humans as part of a team,” says Mark Donahue, program manager. Current autonomous systems neither share a common language with humans nor operate within the same cognitive context (beliefs, knowledge base, cultural perspective, and mental state) or cognitive load (i.e., the amount of information an individual can process and retain at any one time). Because of these factors, human-robot communication and interaction have been limited. Robots lack the social intelligence to understand human goals and intentions, to adapt their behavior when circumstances or perspectives change, and to recognize human emotion; thus, they are often controlled and supervised by human operators. For robots to be seen as partners instead of sophisticated tools, robotics technology must be matured to a point at which robots are capable of learning, reasoning, and making decisions as human beings.
The Laboratory’s program in collaborative robotics focuses on developing (1) enabling technologies for autonomous systems and (2) algorithms and cognitive models that autonomously present warfighters with mission-relevant data acquired by autonomous systems. With the goal of enhancing situational awareness among squad members, the program comprises four research objectives:
- Develop techniques for translating raw, calibrated sensor data from autonomous systems into actionable information (e.g., two-dimensional [2D] navigation maps, visual and semantic labels for objects within a map)
- Develop machine-learning algorithms to identify and prioritize information that is relevant given external contexts (mission objective and environmental factors, including weather and terrain) and internal contexts (e.g., human cognitive load, health, skill level)
- Assess the effectiveness of different augmented reality devices—technologies that enhance human perception of and interaction with the real world through computer-generated sensory input such as video, graphics, audio, or tactile data (e.g., heads-up displays, vibrating vests, earbuds, armbands)
- Evaluate whether the technology can be developed into a prototype and deployed in the field
“The motivation is to think several years into the future for how we ideally would interact with robots,” says Donahue. As illustrated in the figure above, the larger vision of the program is to acquire data from multiple autonomous systems and transform those data into actionable intelligence that can provide situational awareness among the squad for effective mission execution.
To function in the real world, autonomous systems must interact with people and physical elements in their environment. Because of the complexity of these interactions, robotics researchers often conduct initial research by investigating a constrained scenario in a theater setting. This test bed helps to inform the design and development of autonomous systems. For the collaborative robotics program, the scenario is a raid on a compound conducted by a six-person dismounted squad aided by zero to many unmanned aerial and ground vehicles. The Laboratory team used the open-source Gazebo simulator to create a virtual reality in which this scenario could be acted out. Gazebo supports the simulation of any desired number of robots in complex 3D indoor and outdoor environments and provides realistic motion, sensor noise, and ground-truth data for object locations that are useful for benchmarking algorithms. Gazebo is integrated with the Robot Operating System (ROS), a framework of software libraries for developing robot applications.
Simulations are conducted in Lincoln Laboratory’s Autonomous Systems Laboratory 3D infrared (IR) tracking theater, which enables real-time human interaction with the virtual world. In this Lincoln Laboratory Interactive Virtual Environment, or L-Live as it is called, the team can play out different vignettes of their scenario. The room’s ceiling-mounted projectors and several wall-mounted cameras track markers placed on moving and static people and objects; these markers enable the team to determine the position and orientation of people and objects to within a millimeter. The motion-capture area overlaps a region in the virtual world, which is projected on the room’s walls over a 270° displayable space for 1:1 scaling.
Events that are tracked in motion capture are re-represented in the virtual world. In-room hardware such as the Turtlebot, an autonomous platform for developing robot applications, is simulated in the virtual world. (Note: The Turtlebot is a research-only surrogate for future combat robots.) A “player” drives the demonstration of the scenario, communicating with L-Live, a tablet (the current augmented reality display), and Turtlebots.
L-Live is proving to be effective in meeting the research needs for this program. To date, two novel capabilities have been developed utilizing commercial off-the-shelf components:
- IR depth sensor. An IR camera was added to the motion-sensing ASUS Xtion (similar to the Microsoft Xbox Kinect), which features a color (red, green, and blue [RGB]) camera and depth sensor. The IR data stream is fused with depth data from the Xtion.
- RGB- and IR-colored OctoMaps. OctoMap, software that generates 3D models of environments by recursively partitioning 3D space into eight equal pieces, was adapted to extend mapping capabilities beyond occupancy grid mapping, which only models occupied areas and free space. Algorithms for assigning color were developed to create RGB and IR-colored OctoMaps that are based on RGB and IR depth data.
|The IR depth sensor (left) can be mounted on the Turtlebot or other mobile robot platforms to generate real-time IR depth streaming data of an environment. Based on these data, a 3D thermal point cloud can then be projected onto an augmented reality display in the user’s perspective (right).|
For the data produced by autonomous systems to be truly useful to the warfighter, they need to be presented in human terms. Semantic modeling requires that robots understand the space in which they operate such that they can identify and classify features of the environment. Within the mission context, these classifications must distinguish friend from foe and asset from threat. Lincoln Laboratory researchers Lianna Hall and Jason Thornton are working with military liaisons, who are supplying ground-truth data, to develop machine-learning algorithms for identifying mission-relevant scene objects and for conveying this information to the warfighter. So far, the focus has been on generating data to create a relevant world model.“We are defining data structures on the basis of what we think is relevant to the mission, given what we know about operations. The machine-learning algorithms can then be tested and redefined if need be,” explains Hall.
“Once the model is built, the next question is how the information should be presented to the soldier,” says Thornton. According to Hall, determining which type of display mode—visual, audio, or tactile—to use involves several factors: “The time of day, weather, and noise levels in the environment, and the operator’s cognitive load, stress levels, health, skills, and preferences must all be considered within the mission context.” If a soldier is running, a tactile warning sent via a vibrating vest or pressure wristband may be ineffective because the soldier may not feel the signal. If the environment is noisy, an auditory message conveyed through earbuds is not the best interface. For situations in which a squad member’s attention is needed elsewhere, perhaps a subtle visual icon rather than a startling audio alarm is better. Working backwards from these kinds of considerations enables the team to incorporate key features in the machine-learning algorithms. In the future, real-time commands and feedback from operators as well as physiological sensor data (e.g., heart rate, perspiration) could be fed back into the information flow system to make the models more robust.
The team is also researching improved algorithms to present streaming sensor data from autonomous systems in the human’s frame of reference. Known as viewpoint transformation, this technique requires that the position and orientation (i.e., pose) of objects and people are tracked over time. For data to be properly rendered on a visual display, pose tracking and viewpoint transformation must be computed in real time (10–100 ms for a human interface) and to high precision. While the team plans to explore auditory and tactile interfaces, their current focus is on visual augmented reality devices that superimpose computer-generated graphics and video, based on sensor data from autonomous systems, on the physical world (see figure below). “We [the team] chose to focus on vision because it is the highest-bandwidth, most natural method of human absorption of data, and visualization is a popular method of human-computer interaction,” says Evan Krause, a former Laboratory researcher who had been investigating how to display augmented intelligence in the human’s frame of reference. For now, a tablet interface is being used; eventually, it will be replaced by a device more suited to field use, such as BAE System’s Q-Warrior. The team is also looking into Osterhout Design Group’s Smart Glasses, a 3D stereoscopic, see-through, high-definition display.
Enabling visual augmented reality involves two key challenges: streaming data and pose tracking in real time. To address the first challenge, the team is using a system-level approach to display only relevant information and intelligent data structures, such as OctoMaps, to represent the environment. The extensions that are currently supported by OctoMap—visible and IR color—are useful in segmenting maps and determining scene meaning. The software will be further extended with two capabilities that together will help users to assess map staleness: a visual indexing feature for assigning semantic labels to voxels (3D pixels) will show not only if a space is occupied but also what is occupying that space; a time-tagging function will provide better motion detection between scenes.
The team is working to solve the pose-tracking challenge by leveraging the motion-capture capabilities of L-Live and researching various approaches to enable tracking in the wild. One option is to use collaborative multiagent simultaneous localization and mapping (SLAM), an algorithm-based technique in which mobile robots and soldiers fitted with sensors build their own maps of an unknown environment from a sequence of landmark measurements while navigating through that environment and localizing within that map. These individual maps can then be combined to produce a global map.
Another option is mobile motion capture. The team is looking into Project Tango, a smartphone equipped with sensors that enable the device to track its position and orientation in real time and to make more than a quarter-million 3D measurements every second; these measurements are then combined to produce a single 3D map of the surrounding space. For the collaborative robotics program, a robot equipped with Project Tango–like sensors could track, relative to itself, the pose of fiducial markers placed on a soldier.
Because the pose-tracking research is in its initial stages, it is not yet clear which approach is best. “The SLAM approach seems like a better long-term solution, but whether it is feasible is unknown at this point,” explains Krause. A notoriously difficult problem in the robotics community, SLAM has been traditionally researched for single agents in constrained environments (e.g., indoor laboratories). The problem has not been solved for many practical settings (e.g., outdoors under various weather and lighting conditions). SLAM suffers from high computational and memory requirements, especially in unconstrained open-world scenarios in which there are hundreds of landmarks.
Future work will focus on quantitative and qualitative evaluations of human task performance with and without augmented intelligence provided by autonomous systems via augmented reality displays. Quantitative parameters to be evaluated include time to complete tasks, effectiveness in responding to environmental threats (e.g., avoiding an improvised explosive device, detecting a sniper), and biometrics (e.g., heart rate). Participants’ opinions on the usefulness of augmented intelligence and its usability in the field will be recorded. The results from this analysis will help determine if augmented intelligence is an effective, intuitive form of human-machine interaction—one that can enhance warfighter situational awareness and mission execution without imposing significant physical and mental burdens on the warfighter. Once perfected, the collaborative robotics technology could be applied to any circumstance in which autonomous systems data may be useful to humans. The team envisions the technology eventually being used by first responders to assess situations at emergency scenes, by emergency services professionals to find survivors following natural or man-made disasters, and by security personnel to protect borders or critical infrastructures.
Posted January 2016top of page