Google just dropped 11 videos of Gemini Omni and Gemini 3.5 in action, and if you’re not paying attention, you’re already behind. These demos, fresh from Google I/O 2026, aren’t just polished marketing fluff — they’re a clear signal of where AI is heading: truly multimodal, real-time, and eerily human-like.
What stands out? Gemini Omni blends vision, speech, and text into one seamless stream. In one video, it identifies a song playing in the background, then comments on the lighting in the room. In another, it helps debug code while describing the user’s surroundings. It’s not just answering questions — it’s situated in your world. Gemini 3.5, meanwhile, shows near-instantaneous reasoning across massive context windows, crunching a 100-page PDF and generating a narrated slideshow in seconds.
Of course, there are caveats. These are demos — curated, likely cherry-picked. Real-world performance may vary. But the trajectory is undeniable: Google is betting big on unified models that erase the line between perception and reasoning. The question isn’t whether this tech works — it’s how fast it reaches your phone. Watch the videos, but more importantly, start thinking about how you’d use a model that doesn’t just listen, but actually sees.
Source: Google AI Blog
Comments
No comments yet
Connect with Google to comment or reply.
Connect with Google