A new method called Clio enables robots to quickly map a scene and identify the items they need to complete a given set of tasks.
September 30, 2024

Imagine having to straighten up a messy kitchen, starting with a counter littered with sauce packets. If your goal is to wipe the counter clean, you might sweep up the packets as a group. If, however, you wanted to first pick out the mustard packets before throwing the rest away, you would sort more discriminately, by sauce type. And if, among the mustards, you had a hankering for Grey Poupon, finding this specific brand would entail a more careful search.

MIT engineers have developed a method that enables robots to make similarly intuitive, task-relevant decisions.

The team’s new approach, named Clio, enables a robot to identify the parts of a scene that matter, given the tasks at hand. With Clio, a robot takes in a list of tasks described in natural language and, based on those tasks, it then determines the level of granularity required to interpret its surroundings and “remember” only the parts of a scene that are relevant.

In real experiments ranging from a cluttered cubicle to a five-story building on MIT’s campus, the team used Clio to automatically segment a scene at different levels of granularity, based on a set of tasks specified in natural-language prompts such as “move rack of magazines” and “get first aid kit.”

The team also ran Clio in real-time on a quadruped robot. As the robot explored an office building, Clio identified and mapped only those parts of the scene that related to the robot’s tasks (such as retrieving a dog toy while ignoring piles of office supplies), allowing the robot to grasp the objects of interest.

Clio is named after the Greek muse of history, for its ability to identify and remember only the elements that matter for a given task. The researchers envision that Clio would be useful in many situations and environments in which a robot would have to quickly survey and make sense of its surroundings in the context of its given task.

“Search and rescue is the motivating application for this work, but Clio can also power domestic robots and robots working on a factory floor alongside humans,” says Luca Carlone, associate professor in MIT’s Department of Aeronautics and Astronautics (AeroAstro), principal investigator in the Laboratory for Information and Decision Systems (LIDS), and director of the MIT SPARK Laboratory. “It’s really about helping the robot understand the environment and what it has to remember in order to carry out its mission.”

The team details their results in a study appearing today in the journal Robotics and Automation Letters. Carlone’s co-authors include members of the SPARK Lab: Dominic Maggio, Yun Chang, Nathan Hughes, and Lukas Schmid; and members of MIT Lincoln Laboratory: Matthew Trang, Dan Griffith, Carlyn Dougherty, and Eric Cristofalo.