Qualia

Why Do Humanoid Robots Still Struggle With the Small Stuff?

The last decade has seen vast improvements in humanoid robots, but graduating to widespread use might require going back to the fundamentals.
An illustration showing a variety of humanoid robots, including Atlas, Digit, a science fiction robot, and a toy, along with floating stairs.

Companies are developing and now promoting a future full of humanoid robots.

Henry Flores for Quanta Magazine

Introduction

Qualia: Essays that go where curiosity leads

The last time I covered the science of humanoid robots, the state of the art looked downright Orwellian — by which I mean, “four legs good, two legs bad.” It was 2015. Boston Dynamics’ first “Spot” quadruped had taken YouTube by storm, confidently trotting up stairs and recovering from vicious kicks. Also popular at the time: humanoids falling down. Constantly. I felt sorrier for those tottering metal lobsters than I ever did for Spot. Bipedal locomotion is hard.

Cut to now. Humanoids have apparently become so advanced that Tesla is mothballing some electric car models to make way for its Optimus humanoid robot, and start-ups are preselling android butlers with a straight face. Hype aside, I was genuinely curious: Did a paradigm shift happen in the field when I wasn’t looking? Sure, “AI” happened (that is, in the post-ChatGPT sense). I certainly hadn’t overlooked that. But I had no idea what it possibly had to do with robots not falling down anymore.

For a reality check, I called Scott Kuindersma, who recently left Boston Dynamics after many years there, and Jonathan Hurst of Agility Robotics. Both scientists had been present and involved during the robot-faceplant days. Surely today’s robotic bipedal marvels can ascend a few stairs and open a door without breaking a nonexistent sweat, something they famously struggled with a decade ago. I asked each researcher: Can your flagship robot — Boston Dynamics’ Atlas or Agility’s Digit, two of the most credible and pedigreed humanoids on Earth — handle any set of stairs or doorway?

“Not reliably,” Hurst said.

“I don’t think it’s totally solved,” Kuindersma said.

Don’t get me wrong: I don’t believe that some sock-faced robot zombie is close to taking over my household chores. But stairs and doors? It’s 2026. Why are humanoids still this … hard?

Fast, Cheap, and Mostly Under Control

To be fair, a paradigm shift did happen. Three, actually.

First, deep learning — neural networks running on fast GPU chips — turbocharged computer vision and reinforcement learning, which radically improved the speed and sophistication with which robots could perceive and interact with their environments. Then in 2016, a revolution in actuation (roboticist-speak for “making parts move”) began: Heavy hydraulic mechanisms were replaced by smaller, “proprioceptive” electric motors that gave legged robots animal-like nimbleness. Most recently came the large language models. Adapting chatbot technology for robots, it turns out, lets them autonomously plan and perform multistep tasks, such as coring an apple or emptying a dishwasher (in demos, at least).

These advances created the night-and-day difference between “Running Man,” the hulking, halting version of Atlas that won second place in 2015’s DARPA Robotics Challenge, and the svelte, smooth Atlas recently shown breakdancing and autonomously moving irregular items from one bin to another (while dealing with interference from a hockey stick–wielding human).

The Atlas robot from Boston Dynamics shows off in a video from early 2026.

Boston Dynamics/Anadolu Agency via Getty Images

That fluid gait, for example, comes from deep reinforcement learning. Roboticists once coordinated each movement with various hand-engineered algorithms, using equations to model the (simplified) physics of the robot. Now they train neural networks to act as “whole-body controllers” by running countless digital simulations of the humanoid. This process teaches the network a “policy” for how to translate feedback from its environment into actions.

“We use reinforcement learning to build a policy that’s handling the body coordination, collision avoidance, balance, all that stuff,” Kuindersma said. There’s no longer any need to model a robot’s leg as a linear inverted pendulum, for example. “That’s just gone by the wayside,” he said.

This strategy was aided by the proprioceptive actuators pioneered by Sangbae Kim of the Massachusetts Institute of Technology in his Cheetah series of robots. “Reinforcement learning has existed for a long time, you know. People tried it before,” Kim said. “But if you use conventional [motors], the robot just breaks” every time it fails to perfectly execute a policy in the real world — or encounters an obstacle or disturbance.

Kim’s actuators got around the problem with controllable “compliance,” or flexible springiness. Over the past decade, they’ve gotten cheaper and more widely accessible. “Reinforcement learning solved a lot of the [bipedal] locomotion problem, but the hardware was the enabler,” Kim said.

If reinforcement learning and compliant actuation were gifts to humanoid robotics, multimodal AI put a bow on it. In 2023, Google DeepMind introduced “vision-language-action” (VLA) models, which can take in video and natural language and produce movement commands as outputs.

“If you say ‘I’m thirsty,’ it knows you probably want to drink, and it can [generate] the steps that [the robot] needs to take: Go find a thing, and then pick it up in this way,” said Carolina Parada, head of robotics at Google DeepMind. “This is something that, before three years ago, you would have to go hard-code.” In a stroke, VLAs united previously disparate approaches to robotic perception, planning, and control into one general-purpose pipeline.

Robust embodiment, check. Generalizable intelligence, check. (A start, anyway.) So why don’t they add up to humanoids being scientifically “solved” — at least in principle?

May the Force Be With You

Pulkit Agrawal, who studies robot learning at the appropriately named Improbable AI Lab at MIT, had an answer when I reached him there last month. “To have robots which work like humans,” he said, “I think we have to master physics.”

He wasn’t referring to cosmic matters like general relativity or quantum gravity, nor to the virtual “world models” that currently excite leading AI researchers such as Yann LeCun. Instead, Agrawal is talking about mastering something a high school science student ought to be familiar with: force and inertia.

Press images of the Neo from 1X (left) and Tesla’s Optimus (right) imagine a future of humanoid helpers.

Courtesy of 1X; Tesla

The whole point of the humanoid form factor, after all, is to deliver what Kim calls “multipurpose mobile manipulation,” or the ability to move almost anywhere (including on stairs and through doors) and handle almost anything (from unloading pallets to screwing in light bulbs), without hurting anyone in the process. In short, what we do every day. “These things are about [controlling] forces, if you want to do them at speeds of a human,” Agrawal said. “Force control has been a thing in classical [robotics]. But in modern machine learning land, it’s not been that widespread.”

Force control is simple in principle. Picture a robot arm drawing on a whiteboard — without smashing the tip of the marker. Roboticists have known how to make this happen for more than 40 years: They program the arm to behave as if it has an imaginary spring and shock absorber attached to it. “One can make the spring really soft in the direction pointing into the whiteboard, and stiffer along the surface of the whiteboard,” Kuindersma said. “That way the robot maintains the right pressure with the marker while precisely writing the lines and curves of the letters.” This feedback can be driven by force sensors built into the robot’s joints, but the catch is that the classical approaches require a lot of knowledge about the robot, environment, and task in order to work, he further explained.

That approach to controlling force works great for industrial robots with specific tasks to perform, and it even helped with humanoid locomotion. But it was impossible to generalize. Kim’s proprioceptive electric actuators, also called quasi-direct drive actuators, simplified things. Not only were they designed to absorb unexpected impacts without damage, they were also very “transparent,” which meant that the motor converted electrical current into a proportional amount of force (and vice versa) with relatively little error. In essence, the motor itself became a force sensor, which meant “you can remove cost and complexity from your robot by eliminating dedicated force sensors,” Kuindersma said.

As reinforcement learning eclipsed manual programming as a way of controlling humanoid movement, “classic” force control was not forgotten. It just got abstracted and delegated, in a way, to both hardware and AI.

“From an AI point of view, it’s not like you have to be thinking about force control,” Hurst said. “It’s more like you kind of know that you need a quasi-direct drive motor to get close [to the force regulation necessary], then put [the neural network] in simulation and iterate a million times — and then you can put it on the robot and get cool behaviors.”

Those neural networks are learning generalized policies that control the positions of a robot’s body parts. Force regulation often happens only indirectly in simulation training, or sometimes as a side effect when learned from video or human input.

But those methods don’t explicitly teach the physics of force — at least, not yet. “A lot of the signals that are required for doing intelligent force control are not present in [video and human demonstration] data,” Kuindersma said. DeepMind’s Parada acknowledged that the VLA models basically just learn to move between specifically defined poses — and this approach goes a long way. “We’ve been surprised ourselves at how far you can push it, without any other sensing,” she said.

In 2015, the most advanced humanoid robots in the world competed at the DARPA Robotics Challenge Finals. The tech has since improved.

DARPA

But only so far. As long as robot bodies remain relatively stiff and heavy compared to ours, “they have high inertia, and they’re not [as] compliant,” Agrawal said, which means that without force control, they will struggle with precision tasks in complicated environments. “If you’re going to touch delicate objects and you have small errors, bad things are going to happen.” Picture a regular egg and another made of solid steel: One of them needs to be picked up much more carefully.

One way to get around this problem, used by many impressive systems alongside positional accuracy, is just to go slow. Imagine trying to move a chair with your car, Agrawal said: “If I go slowly, I can be precise on how I move [my position], and then I can control where the chair goes, so the [force] problem goes away.” That’s part of why Atlas moves like molasses while grasping auto parts but glides like a gymnast when it’s not touching anything except the floor.

“It would be an overstatement to say that force control is absolutely required in every useful manipulation task — that’s just not true,” Kuindersma said. But he, Hurst, and Parada all readily grant that clever force workarounds won’t deliver the all-purpose mobile dexterity our robot butlers need. Even if today’s VLA-brained bots, refined by reinforcement learning, had “an internet-sized” amount of positional data to train on, “it’s very likely you [would] have to do some additional work,” Parada said. “Humans feel the forces that are working against you when you’re trying to open a bottle.” Humanoids, for the most part, still don’t, which means they have not mastered physics — at least not in the way we have, from a lifetime of interacting with our environments through the extraordinarily complex musculoskeletal and nervous systems gifted to us by evolution.

That’s a big reason why even doors and stairs aren’t fully “solved” for present-day humanoids. These stairs, that door? Probably. But all stairs and doors, plus everything else? “There’s no world in which there are actually useful, autonomous [humanoid] robots that are only doing position-based control,” Kuindersma said. “Force as a first-class citizen is absolutely required.”

Get Smart (or Start Over)?

So how do we get over the wall, scientifically speaking? Most of the experts I asked suspect that it will take a new blend of hardware and software advances. Tactile sensors for better data collection and robot hands that combine high power, compliance, and transparency with low inertia would accomplish a lot, and nobody believes that true material breakthroughs (like replacing motors with artificial muscles) will be necessary.

“The hardware is exceptional, and if you’re blaming [it], you’re making excuses,” said Russ Tedrake, another longtime MIT roboticist I spoke to. “If you put a human brain through the hardware we have today — by teleoperating it, for instance — it’s incredibly capable.” Finding more intelligent ways to control it is key.

The Digit robot from Agility Robotics demonstrates fine motor control in an unstructured environment.

Agility Robotics

When asked how to achieve that, everyone had a different answer. Agrawal is studying how to combine force control with reinforcement learning by having humanoids learn compliant behaviors in simulation, instead of moving between rigidly defined positions. Tedrake, whose work on “large behavior models” (a cousin of VLAs) produced the apple-coring robot demo, recently argued in Science Robotics for a ChatGPT-style regime of “large-scale data collection and large pretrained models.” Frank Park, who wrote the book on modern robotics — literally, the textbook titled Modern Robotics — believes that current AI approaches should be torn down to the studs and replaced with ones that make physics fundamentals (such as force and acceleration) learnable at a foundational level. “The VLA architecture is just all wrong,” he told me. “I believe that approach is doomed to fail.”

In all these conversations, what struck me most wasn’t the debates about which kinds of sensors, data, or AI architecture could “solve” humanoid robotics. Rather, it was the sense that the scientific ethos of the field had changed. Hurst, who had just spun Agility Robotics out of his Oregon State University lab when we first spoke, put a fine point on it.

“I remember Gill Pratt, who was the director of the MIT Leg Lab and then the program manager for the DARPA Robotics Challenge, saying that his big worry was that we’d end up using reinforcement learning and AI to make robots walk and run before we ever actually understood how it works,” he said. “And in a lot of ways, we’re kind of doing that.”

Tedrake agreed but said that it’s hardly the first time we’ve taken scientific and engineering leaps without a firm grip on the fundamentals. “If you look at electricity and magnetism, there was the Volta stage where you’re sticking electrodes in frogs,” he said. “And then we had Faraday, who did exactly the right experiments, and then eventually we had Maxwell tell us the governing equations. I think we’re in the Volta stage.”

So when will humanoids be solved?

“Robots are still bad, and it will take time. But the bones are good. Both are true,” Tedrake said. “And it’s still hard.”

Comment on this article