The walking, talking, dancing Optimus robots at the recent Tesla demonstration generated huge excitement. But this turned to disappointment as it became apparent that much of what was happening was actually being controlled remotely by humans.
As much as this might still be a fascinating glimpse of the future, it’s not the first time that robots have turned out to be a little too good to be true.
Take Sophia, for instance, the robot created by Texas-based Hanson Robotics back in 2016. She was presented by the company as essentially an intelligent being, prompting numerous tech specialists to call this out as well beyond our capabilities at the time.
Similarly we’ve seen carefully choreographed videos of pre-scripted action sequences like Boston Dynamics’ Atlas gymnastics, the English-made Ameca robot “waking up”, and most recently Tesla’s Optimus in the factory. Obviously these are still impressive in different ways, but they’re nowhere near the complete sentient package. Let Optimus or Atlas loose in a random home and you’d see something very different.
A humanoid robot capable of working in our homes needs to be capable of doing many different tasks, using our tools, navigating our environments and communicating with us like a human. If you thought this was just a year or two away, you’re going to be disappointed.
Building robots able to interact and carry out complex tasks in our homes and streets is still a huge challenge. Designing them even to do one specific task well, such as opening a door, is phenomenally difficult.
There are so many door handles with different shapes, weights and materials, not to mention the complexity of dealing with unforeseen circumstances such as a locked door or objects blocking the way. Developers have actually now created a door-opening robot, but robots that can deal with hundreds of everyday tasks are still some way off.
Behind the curtain
The Tesla demonstration’s “Wizard of Oz” remote operation technique is a commonly used control method in this field, giving researchers a benchmark against which to test their real advances. Known as telemetric control, this has been around for some time, and is becoming more advanced.
One of the authors of this article, Carl Strathearn, was at a conference in Japan earlier this year, where a keynote speaker from one of the top robotics labs demonstrated an advanced telemetrics system. It allowed a single human to simultaneously operate many humanoid robots semi-autonomously, using pre-scripted movements, conversation prompts and computerised speech.
Clearly, this is very useful technology. Telemetric systems are used to control robots working in dangerous environments, disability healthcare and even in outer space. But the reason why a human is still at the helm is because even the most advanced humanoid robots, such as Atlas, are not yet reliable enough to operate completely independently in the real world.
Another major problem is what we can call social AI. Leading generative AI programs such as DeepMind’s Gemini and OpenAI’s GPT-4 Vision may be a foundation for creative autonomous AI systems for humanoid robots in the future. But we should not be misled into believing that such models mean that a robot is now capable of functioning well in the real world.
Interpreting information and problem solving like a human requires much more than just recognising words, classifying objects and generating speech. It requires a deeper contextual understanding of people, objects and environments – in other words, common sense.
To explore what is currently possible, we recently completed a research project called Common Sense Enhanced Language and Vision (CiViL). We equipped a robot called Euclid with commonsense knowledge as part of a generative AI vision and language system to assist people in preparing recipes. To do this, we had to create commonsense knowledge databases using real-world problem-solving examples enacted by students.
Euclid could explain complicated steps in recipes, give suggestions when things went wrong, and even point people to locations in the kitchen where utensils and tools might typically be found. Yet there were still issues, such as what to do if someone has a bad allergic reaction while cooking. The problem is that it’s almost impossible to handle every possible scenario, yet that’s what true common sense entails.
This fundamental aspect of AI has got somewhat lost in humanoid robots over the years. Generated speech, realistic facial expressions, telemetric controls, even the ability to play games such as “rock paper scissors” are all impressive. But the novelty soon wears off if the robots are not actually capable of doing anything useful on their own.
This isn’t to say that significant progress isn’t being made toward autonomous humanoid robots. There’s impressive work going on into robotic nervous systems to give robots more senses for learning, for instance. It’s just not usually given the same amount of press attention as the big unveilings.
The data deficit
Another key challenge is the lack of real-world data to train AI systems, since online data doesn’t always accurately represent the real-world conditions necessary for training our robots well enough. We have yet to find an effective way of collecting this real-world data in large enough quantities to get good results. However, this may change soon if we can access it from technologies such as Alexa and Meta Ray-Bans.
Nonetheless, the reality is that we’re still perhaps decades away from developing multimodal humanoid robots with advanced social AI that are capable of helping around the house. Maybe in the meantime we’ll be offered robots controlled remotely from a command centre. Will we want them, though?
In the meantime, it’s also more important that we focus our efforts on creating robots for roles that can support people who urgently need help now. Examples would include healthcare, where there are long waiting lists and understaffed hospitals; and education, to offer a way for overanxious or severely ill children to participate in classrooms remotely. We also need better transparency, legislation and publicly available testing, so that everyone can tell fact from fiction and help build public trust for when the robots eventually do arrive.
Dimitra Gkatzia receives funding from EPSRC.
Carl Strathearn does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.
This article was originally published on The Conversation. Read the original article.