Don’t let the name of the Open-TeleVision project fool you – it’s a framework for improving telepresence and making remote control of robots much more intuitive than other methods. It does this by leveraging the great technology found in modern VR headsets such as the Apple Vision Pro and Meta Quest. There are numerous videos on the project page, many of which demonstrate successful remote control over long distances.
Remote control of robot effectors usually takes some getting used to: camera fields of view are unusual, limbs don’t move the same way arms do, and intuitive human movements like looking around to figure out where things are don’t translate well.
Streaming from a gimbaled stereo camera to a VR headset with head tracking seems like a highly hackable design.
To address this, the researchers provided the user with a real-time stereo video stream on board the robot (through which the user can turn their head and look around as normal) and mapped their arm and hand movements to their humanoid robot counterparts. This provides feedback for manipulating objects and performing tasks in a much more intuitive way. After all, performing tasks becomes much easier when your eyes, body, and hands look and function more or less as expected.
The research paper describes the different systems in detail, but essentially, stereo depth and RGB cameras with 3D printed gimbals are mounted on top of a Unitree H1-like humanoid robot frame equipped with dexterous hands. The VR headset displays a real-time stereoscopic video stream and allows the user to look around. Tracking of the user’s hands is mapped to the dexterous hands and fingers. This allows a person to see, manipulate, and handle things without extensive training – perhaps slower and clumsier than they’d like, but still in an intuitive way.
Want to take a closer look? The GitHub repository has the code you need. While most people probably won’t be mashing the “add to cart” button on something like the Unitree H1, the reference design for a stereo camera that streams to a VR headset and mirrors head tracking with two motorized gimbals looks like something that could add a thing or two to your telepresence project.