A fundamental limitation of modern artificial intelligence and neural networks is that they aren't good at spatial mapping or navigation without an existing map. However, TechXplore reports that a combination of a predictive coding algorithm and Minecraft gameplay successfully "taught" a neural network how to create spatial maps and subsequently use those spatial maps to predict the following frames of video, yielding a mean-squared error of 0.094% between the predicted image and the final image.
The project demonstrates genuine spatial awareness of AI, which still isn't seen in the impossible architecture and other strange glitches that come with things like OpenAI's Sora.
These findings come from a paper published in the Nature Machine Intelligence journal on Nature.com, automated construction of cognitive maps with visual predictive coding, from James Gornet & Matt Thomson of the California Institute of Technology (aka Caltech). The paper, released to the public just yesterday, details exactly how this was achieved in exhaustive detail and even shares the code on GitHub and Zenodo.
One of the two researchers who worked on the project, Matt Thomson, spoke to TechXplore and provided a few noteworthy quotes about the process and what led them to undertake it.
Per Matt Thomson, "There's this sense that even state-of-the-art AI models are still not truly intelligent. They don't problem-solve like we do; they can't prove unproven math results or generate new ideas. We think it's because they can't navigate in conceptual space; solving complex problems is like moving through a space of concepts, like navigating. AIs are doing more like like rote memorization— you give it an input, and it gives you a response. But it's not able to synthesize disparate ideas."
James Gornet, the graduate student who led the project, encouraged the use of Minecraft and studied neuroscience, machine learning, math, statistics, and biology under the Department of Computational and Neural Systems (CNS) at Caltech. He did not provide a quote about the process, but Thomson says that CNS is uniquely suited for James's work and that "we're hoping to learn about the brain in turn," not just advance AI.