Mentatcurated
Artificial Intelligence high · corporate-pr

Gemini Omni

Google's new video model lets you direct a clip by conversation — change the camera, swap an object, light the scene — but it will not let you edit the speech, and it signs everything it makes.

Gemini Omni folds Google's text reasoning into video so editing becomes a conversation: ask for a different camera angle or a new background and each instruction builds on the last, the way Google's image tool already works for stills. The first release makes ten-second clips with sound and edits footage you already have, rather than only conjuring video from a sentence.

You can lend your own voice to a clip, but you cannot rewrite the speech in someone else's footage.

The line worth noticing is the one Google drew around it. The model is good enough to put new words in a real person's mouth — and that is exactly the capability Google held back. You can lend your own voice to a generated clip, but you cannot rewrite the speech in someone else's footage; Google says it is 'still working to test this' before releasing it 'responsibly.' Read plainly, in an election year: the deepfake button exists and was left off the panel.

So the story is restraint, not reach. The frontier of what these models can do now runs ahead of what their makers will ship — the gating happens at release, by choice, not at the limit of the technology. To make that choice legible, every clip Omni produces carries an invisible Google watermark and a tamper-evident origin record by default, verifiable in Chrome and Search and not switchable off. The capability is being rationed and labelled at the same time, which is roughly the only honest answer anyone has to synthetic video.

Why it's here

Watch where a maker draws its own line — what it can build but chooses not to release tells you more than the demo reel.

The lenses

Novelty 3
Impact · breadth 4
Impact · depth 3
Actionable 3
Substance 3
Hype 4

The facts

WhereIn the Gemini app, Google Flow, and YouTube Shorts — no setup
MakesClips up to ten seconds, with sound, edited by chat
Held backEditing the speech in existing footage — withheld over deepfake risk
ProvenanceInvisible watermark + origin record on every output, on by default, can't be turned off
Open https://deepmind.google/models/gemini-omni/ →