Adventures in VR/Spatial Teleoperation

Jimmy Zhang
3 min readFeb 7, 2024
augering holes for trees at a construction site

For more than a year, I was the CTO of a startup using VR to teleoperate robots. There was a $15k rover we operated for food delivery, a $100k Bobcat skidsteer we operated for construction (pictured above), a $40k industrial robotic arm, and others. Here are a few things I learned along the way.

when your work is stinky, go remote (pushing trash into a pit at a waste management facility)

VR gives you an infinitely customizable interface but don’t go down the rabbit hole. When you hopped into a virtual robot control room the entire interface changed from the camera streams to the control menu. We had holograms (fancy term is ‘digital twins’) that were synchronized with the robot’s movements that shimmered like Princess Leia saying “help me Obi-Wan Kenobi, you’re my only hope”. Of course custom controls for things like hydraulics. You can sink endless time into the interface so we kept it simple and functional. In fact, SRI has done heavy machinery teleop before and they said they spent too much time on making the interface too fancy. We used Quests (2/3/Pro) mainly and tried Magic Leap 2 but Apple Vision Pro should further expand the possibilities for customization.

using Starlink out in the field

Latency is good enough now in many scenarios. With HMDs, we think of motion-to-photon latency as the delay between the time you move your head or hand to when you see the display change. When tele-operating a robot in the real world the concept is similar but you have to include the time it takes the physical robot to react to your input signals as well as the time it takes for the video stream to make it back to you over the internet. Surprisingly, this is fast enough today to drive a delivery robot on sidewalks or operate heavy machinery on a construction site even if you’re thousands of miles away. The latency threshold for safely operating in each scenario varies. For example, if you limit yourself to 3mph with a sidewalk robot you can tolerate more than a second of total latency. However, if you want to precisely control a robotic arm you’ll want less than 800ms of total latency. Latency will surely improve in the coming years. 5G cellular is being rolled out and Starlink is the first low-latency satellite internet network in the world with new satellites launching every month. Heck, even WiFi has room to improve.

teleop with a Quest 2 over a hotspot from a car

VR (and Spatial Computing) expands the definition of ‘remote work’. Working remotely has until now meant video conferencing and knowledge work. With VR teleop, people will be able to do more work from another state or even country. As the technology improves, especially with the recent launch of the Apple Vision Pro (bleeding edge displays and best passthrough), this will have a profound effect on the geography of jobs. We’ve already tried remotely pushing trash in a waste management facility, doing landscaping jobs, delivering food on the sidewalks of Spain and a few other experiments. There’s a dentist whose receptionist is working remotely with the aid of an iPad. Pretty sure that should be a (humanoid) robot one day.

Teleop isn’t enough without VR/Spatial Computing. You don’t feel the connection because the operator doesn’t feel embodied in the robot and on the other end there isn’t a 3D representation of the operator with enough expressiveness. This is precisely why Apple is experimenting with EyeSight and FaceTime + Personas on the Vision Pro — they want a better “connection”. There’s so much to explore here and I’m glad the industry is finally breaking out of the gaming niche.