Listen to this article
For robots to function in a variety of real-world situations, they need to learn generic policies. To do this, researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) created a Real-to-Sim-to-Real model.
The goal of many developers is to create hardware and software that allows robots to work anywhere, under any circumstances, but a robot that works in one person’s home doesn’t need to know how to work in every home in the neighborhood.
The MIT CSAIL team decided to focus on RialTo, a method that makes it easy to train a robot’s policy for a specific environment: the researchers say the method improved the policy by 67% over imitation learning with the same number of demonstrations.
The system learned to perform everyday tasks such as opening a toaster, putting a book on a shelf, putting a plate on a rack, putting a mug on a shelf, opening a drawer, and opening a cabinet.
“We want the robot to perform extremely well under all circumstances within a single environment: obstructions, distractions, different lighting conditions, changing object poses,” said Marcel Tournet-Villaséville, a research assistant at MIT CSAIL’s Improbable AI Lab and lead author of a new paper on the work.
“We propose a method to instantly create digital twins using the latest techniques in computer vision,” he explains. “With just a smartphone, anyone can capture a digital replica of the real world, and GPU parallelization allows robots to train in a simulated environment much faster than in the real world. Our approach eliminates the need for extensive reward engineering by leveraging several real-world demonstrations to jump-start the training process.”
Register now and start saving.
RialTo builds a policy from the reconstructed scene
Torne’s vision is inspiring, but RialTo goes beyond simply shaking your phone to summon a home robot: First, users use the device to scan their chosen environment with tools like NeRFStudio, ARCode, and Polycam.
Once the scene has been reconstructed, the user can upload it to the RialTo interface to make further adjustments and add any necessary joints to the robot.
The redefined scene is then exported and ingested into a simulator, where the goal is to create a policy based on real-world actions and observations. These real-world demonstrations are replicated in simulation, providing valuable data for reinforcement learning (RL).
“This helps us create strong policies that work well both in simulation and in the real world,” Torne says. “The enhanced algorithms, using reinforcement learning, guide this process and ensure that the policies are effective when applied outside of the simulator.”
Researchers test the model’s performance
In tests, MIT CSAIL found that RialTo produced robust policies for a variety of tasks, both in controlled lab environments and in more unpredictable real-world environments. For each task, the researchers tested the system’s performance at three levels of increasing difficulty: randomizing object poses, adding visual distractions, and applying physical distractions during task execution.
“To deploy robots in the real world, researchers have traditionally turned to methods such as imitation learning from expert data, which can be costly and reinforcement learning can be insecure,” said Zoe Chen, a computer science doctoral student at the University of Washington who was not involved in the paper. “With its novel reality-to-simulation-to-reality pipeline, RialTo directly addresses both the safety constraints of real-world reinforcement learning and the efficient data constraints of data-driven learning methods.”
“This new pipeline not only ensures safe and robust training in simulation before real-world deployment, but also significantly improves the efficiency of data collection,” she added. “RialTo has the potential to significantly scale up robot learning, enabling robots to adapt much more effectively to complex real-world scenarios.”
When combined with real-world data, the researchers say their system performed better than traditional imitation learning methods, especially in situations with a lot of visual and physical distractions.
MIT CSAIL continues robotics training efforts
While the results so far are promising, RialTo is not without its limitations: Currently, it takes three days for the system to be fully trained. To shorten this time, the team hopes to improve the underlying algorithms with their foundational models.
Training in simulation also has limitations: Transferring from simulation to reality remains difficult, as does simulating deformable objects and liquids. The MIT CSAIL team said they plan to build on this work by working to improve the adaptability of their models to new environments while maintaining robustness to a range of disturbances.
“Our next step is to use pre-trained models to accelerate the learning process, minimize human input, and achieve broader generalization capabilities,” Torne said.
Torne wrote the paper with senior authors Abhishek Gupta, an assistant professor at the University of Washington, and Pulkit Agrawal, an assistant professor in MIT’s Department of Electrical Engineering and Computer Science (EECS).
Four other CSAIL members from the lab also received credit: EECS doctoral student Anthony Simeonov SM ’22, research assistant Zechu Li, undergraduate students April Chan, and Tao Chen Ph.D. ’24. The research was supported by a Sony Research Award, the U.S. government, and Hyundai Motor Co., and was conducted in collaboration with the WEIRD (Washington Embodied Intelligence and Robotics Development) Laboratory.