Beyond Language Models: How Google DeepMind is Building Intelligence That Sees 👓 the 🗺️ World

When Demis Hassabis, CEO of Google DeepMind states that “We are trying to build AGI… it has to understand the physical environment around you,” he articulates a vision that challenges the industry’s current trajectory. At Google I/O 2025, Hassabis and Google co-founder Sergey Brin revealed their vision for artificial intelligence—not as text-based systems confined to screens, but as embodied intelligence that perceives and operates in physical reality.

The Thinking Paradigm: A Fundamental Shift

Hassabis describes his reasoning-based approach to AI as transformative: “We’ve always been big believers in what we’re now calling this thinking paradigm. If you go back to our very early work on things like AlphaGo and AlphaZero… they all had this type of attribute of a thinking system on top of a model.”

This approach delivers extraordinary performance gains. He quantifies the dramatic improvement: “You can see in games like chess or go… we had versions of AlphaGo and AlphaZero with the thinking turned off… it’s not bad, it’s maybe like master level, but then if you turn the thinking on, it’s way beyond world champion level. It’s like a 600 ELO plus difference between the two versions.”

In the DeepThink system demonstrated at I/O, we see this philosophy applied at scale—parallel reasoning processes that check and validate each other, creating what Hassabis calls “reasoning on steroids.”

Multimodal Intelligence: Building from First Principles

Gemini’s architectural foundation was “conceived from inception as a multimodal system rather than a language model with bolted-on capabilities.” This decision to build for multimodal comprehension from the beginning, despite significant additional complexity, reveals strategic foresight that extends beyond quarterly innovation cycles.

As Hassabis reflects on these architectural decisions: “Those were the hardest decisions, but we made them… now you can see the fruits of that with what you’ve seen today.”

In today’s discussion, Hassabis reinforced this perspective: “That’s why Gemini was built from the beginning, even the earliest versions, to be multimodal. And that made it harder at the start because it’s harder to make things multimodal than just text-only. But in the end, I think we’re reaping the benefits of those decisions now.”

The Spatial Revolution: From Screens to Spaces

In discussing Google’s revitalized approach to smart glasses, Hassabis observes: “I’m still a big believer in the form factor… I think the universal assistant is the killer app for smart glasses.”

What’s noteworthy is the inversion of priority: the glasses aren’t the innovation—they’re merely the conduit for intelligence that understands physical context. This represents a fundamental shift in computing paradigms, where the device recedes and the intelligence becomes central.

Sergey Brin, reflecting on lessons from the original Google Glass, acknowledges: “I definitely feel like I made a lot of mistakes with Google Glass. I’ll be honest… I think there was a technology gap. Now in the AI world, the things that these glasses can do to help you out without constantly distracting you, that capability is much higher.”

Hassabis believes the assistant that sees what you see represents the natural application for smart glasses—a perspective that aligns with Google DeepMind’s broader vision of intelligence that understands physical context.

The Race to AGI: Defining the Goal

The conversation inevitably turned to artificial general intelligence (AGI). While many industry figures have distanced themselves from the term, Hassabis embraces it while offering important nuance.

Addressing the question of what AGI truly means, Hassabis provides an insightful distinction: “I think there’s sort of two things that are getting a little bit conflated. One is like what can a typical person do, an individual do… But what I’m interested in, and what I would call AGI, is really a more theoretical construct, which is: what is the human brain as an architecture able to do? The human brain is an important reference point because it’s the only evidence we have, maybe in the universe, that general intelligence is possible.”

When asked whether one company would “win” the race to AGI, Brin acknowledged the competitive landscape while recognizing the collaborative nature of progress. Hassabis emphasized that agreement on AGI’s definition is crucial, as is ensuring these systems are built safely and reliably.

But when pressed on timeline, the two leaders revealed a subtle difference: Brin predicted AGI would arrive “before 2030,” while Hassabis positioned his expectation as “just after” that date.

The Future Computing Landscape

Looking ahead to how these technologies will reshape our digital environments, Brin offered a humbling perspective on the pace of change: “I think 10 years because of the rate of progress in AI is so far beyond anything we can see. Not just the web—I mean, I don’t know, I don’t think we really know what the world looks like in 10 years.”

Hassabis added: “I think the return of the web is going to change quite a lot… do agents really need to see vendors the way humans do?” This observation points to a potential future where the web transforms from a visual medium designed for humans to an “agent substrate” optimized for AI systems operating on our behalf.

Philosophical Frontiers

The conversation concluded with a fascinating philosophical exchange about reality itself, spurred by Hassabis’s recent social media post suggesting parallels between AI-generated imagery and the possibility we live in a simulation.

Hassabis clarified: “I do think that ultimately underlying physics is information theory. So I do think we’re in a computational universe but it’s not just a straightforward simulation… the fact that these systems are able to model real structures in nature is quite interesting and telling.”

Brin responded with characteristic insight: “I think that argument applies recursively right? If we’re in a simulation then by the same argument whatever beings are making the simulation are themselves in a simulation for roughly the same reasons and so on and so forth. So I think you’re going to have to either accept that we’re in an infinite stack of simulations, or that there’s got to be some stopping criteria.”

As we consider this technological moment—where intelligence begins to understand physical reality in ways previously reserved for humans—several questions emerge:

How will societies balance the benefits of ambient intelligence against privacy concerns when AI systems can perceive the world through our own eyes?

What cognitive and perceptual capabilities remain uniquely human as these systems evolve from text-based to fully embodied intelligence?

What new forms of governance will emerge as agent systems increasingly mediate our interaction with both digital and physical reality?

As Brin expressed during the discussion: “As a computer scientist, it’s a very unique time in history… there’s just never been a greater sort of problem and opportunity, a greater cusp of technology.”

Brin enthusiastically encouraged technologists not to sit on the sidelines during this pivotal moment in computing history, issuing what amounts to a call to action for the field: “I mean I think as a computer scientist, it’s a very unique time in history like, honestly, anybody who’s a computer scientist should not be retired right now, should be working on AI. That’s what I would just say.”

bsky.app/profile/schwentker.bsky.social/post/3lppqo7vrck2i