Fei-Fei Li Says AI’s Next Frontier Is Spatial Intelligence
Mini summary: Fei-Fei Li says spatial intelligence AI could shape the next phase of computing. Speaking at HUMANX in San Francisco, she argued that language alone is not enough. She pointed to 3D world models, World Labs and its Marvel system as key building blocks for robotics, gaming, healthcare and autonomous mobility.
At HUMANX in San Francisco, Fei-Fei Li argued that the next major step in artificial intelligence will not come from language alone. Instead, she focused on spatial intelligence AI: the ability of machines to understand, reason about and generate the 3D and 4D world of geometry, movement, interaction, physics and change over time.
That argument sits at the center of World Labs, the company Li founded to build AI systems that go beyond text and images. In her view, language models such as ChatGPT are a major advance, but they capture only part of human intelligence. Everyday life, work and decision-making also depend on operating in physical space.
“Human intelligence is not just linguistic,” Li said. She described spatial understanding as essential for perception, reasoning and action, especially in fields where machines must navigate environments, predict outcomes and interact with the real world.
Summary
What spatial intelligence AI means in practice
Li defines spatial intelligence as the capacity to perceive, understand and generate 3D or 4D space. This includes shape, geometry, interactions, physical constraints and dynamics over time.
In practical terms, this is the difference between an AI system that can describe a room and one that can understand how objects relate to one another inside it, how movement changes the scene and what is likely to happen next. In this framework, a world model gives machines a representation of space that can support planning and action.
Li linked this idea to a broader view of intelligence shaped by perception and embodiment. She referenced the long arc of biological evolution and cited the development of sensory systems “half a billion years ago” as a metaphor for why intelligence cannot be reduced to language processing alone.
Why World Labs was founded outside academia
Li said the opening for this work emerged from a convergence in 2022–2023. On one side were advances in generative AI driven by Transformer models. On the other were improvements in computer vision and 3D representation. Together, she said, these advances created the technical conditions for a new class of models centered on spatial understanding.
However, that opportunity also came with industrial-scale requirements. Li said the decision to launch World Labs reflected the need for compute, data and talent at a scale that is difficult to assemble in a purely academic setting.
“This requires enormous resources—compute, data, and talent,” she said. While emphasizing the importance of academia, including institutions such as the Stanford Human-Centered AI Institute, she drew a clear distinction between curiosity-driven research and company-building aimed at real-world deployment.
Her formulation was direct: as a researcher, she is driven by curiosity; as a CEO, she is a builder focused on impact.
How spatial intelligence AI connects to Marvel and 3D worlds
The most concrete example from World Labs is Marvel, a generative model that Li described as capable of creating true 3D worlds. She stressed that Marvel does not simply generate video. Instead, it produces persistent, navigable environments that users or machines can move through.
According to Li, these worlds began as relatively small environments. Then, they can be expanded into larger spaces and combined into more complex scenes. That distinction matters because a navigable world model has a different technical and commercial value from a passive visual output.
“Marvel is a generative model that creates true 3D worlds—not videos, but persistent, navigable environments,” she said.
The implication is broad. A model that generates spatially coherent worlds could become a foundation layer for interactive applications, from game development and digital design to simulation-heavy sectors such as robotics and autonomous systems.
Why data is the biggest bottleneck
Li organized the technical challenge around three pillars: models, compute and data. Of the three, she identified data as the hardest problem.
“The hardest part is data,” she said.
The issue is not simply volume. Large public datasets for language are much easier to assemble than large public datasets that accurately capture spatial structure, movement, physics and real-world interactions. Building 3D world models requires data that is harder to collect, harder to label and harder to standardize.
This challenge is especially acute in robotics, where the supply of useful training data is even more limited. For systems that must anticipate what happens next in the physical world, prediction quality depends heavily on the richness and realism of spatial data.
Li summarized the core value of world models in operational terms: predicting the next state supports planning and action.
Why synthetic data matters for spatial intelligence AI
To address the data shortage, World Labs uses a mix of real and synthetic data. Li said the way those sources are combined is a central part of the company’s technology.
“We train on a mixture of real and synthetic data, and how we combine them is a key part of our technology,” she said.
This point matters beyond World Labs. In sectors where real-world data is scarce, expensive or difficult to capture at scale, synthetic data can help fill gaps, diversify edge cases and accelerate experimentation. Li also noted that models capable of generating spatially structured environments could themselves become tools for other labs, especially in robotics.
As a result, a potentially important feedback loop emerges. World models trained on mixed data could then generate additional synthetic environments for training, testing and simulation in adjacent systems.
Where early applications may emerge
Li listed a wide range of possible applications for spatial intelligence, including gaming, art, design, robotics, education, healthcare, manufacturing and autonomous driving.
Some of the earliest practical impact may come in industries that already depend on simulation and physical-world prediction. In autonomous mobility, companies such as Tesla and Waymo operate in settings where understanding geometry, motion and interaction is fundamental. In robotics, world models can improve simulation quality, state prediction and action planning.
Healthcare is another notable area. Li pointed to the spatial interpretation of radiological data as one example of how 3D-aware AI could support clinical workflows. Gaming and immersive media may also move quickly, given the immediate value of persistent, navigable environments for content creation and interactive experiences.
Still, the discussion remained directional rather than commercial. Li did not provide a deployment timeline for Marvel, public availability details or quantified performance benchmarks.
How industry and academia support spatial intelligence AI
A recurring theme in Li’s remarks was that the future of AI will require both academic and industrial contributions. Academia remains essential for foundational thinking, long-horizon inquiry and scientific exploration. Industry, by contrast, can gather the compute, engineering capacity and operational focus needed to turn emerging concepts into usable systems.
This division of labor is especially visible in a field such as spatial intelligence, where frontier research and large-scale infrastructure have to advance together. Li’s own position reflects that dual role: she remains closely associated with the Stanford Human-Centered AI Institute while building World Labs around a commercial and technical mission.
The broader AI ecosystem reinforces the point. Transformer models enabled the language revolution behind systems like ChatGPT. Companies such as Anthropic have helped push frontier model development. Li’s argument is that the next stage will require a similar step-change for machines that understand the physical world.
What remains unclear
For all the strategic clarity of Li’s thesis, several important details remain undisclosed. There were no financial figures on resources raised, no specific numbers on compute scale and only limited technical explanation of Marvel’s internal architecture.
There was also little discussion of timelines for commercial rollout or public access. In addition, while the industrial promise was clear, the conversation gave less attention to safety, governance and ethical questions that may arise when AI systems generate navigable synthetic worlds or support high-stakes physical applications.
Even so, Li’s message was unmistakable. If language intelligence defined the last phase of AI, spatial intelligence may define the next one. For developers, investors, researchers and product teams, that means the competitive frontier may increasingly shift toward systems that can model the world, not just describe it.
In sintesi
Fei-Fei Li argues that AI’s next frontier is spatial intelligence, not language alone. Her thesis is that machines need to understand 3D space, motion, physics and change over time to support real-world planning and action.
World Labs is building toward that goal with world models and the Marvel system. The biggest challenge, according to Li, is data. Early opportunities may emerge in robotics, gaming, healthcare, manufacturing and autonomous mobility.

