Artificial Intelligence › Frontier & AGI medium · corporate-pr

Interaction models

Mira Murati's lab put out a model that doesn't wait its turn — it watches you on video, listens, and talks back at the same time, all from one model instead of a stitched-together voice pipeline.

demo Thinking Machines Lab · 1 min read

Today's voice assistants are assembled: one component decides you've stopped talking, another transcribes, the language model answers, a fourth speaks. Thinking Machines collapsed that assembly into a single model that reads the room in 200-millisecond slices — perceiving and responding in the same breath, so it can interject, finish your sentence, or comment on what it just saw you do, without waiting for a turn to end.

Open systems have done full-duplex audio — interrupting, backchanneling — since Kyutai's Moshi in 2024. The wager that's actually new is the video.

Talking over you is the demo that gets quoted, but it isn't the new part: open systems have done full-duplex audio since 2024. The real move is twofold — doing it at frontier scale, and folding live video into the same stream, so the model reacts to a raised eyebrow or a botched rep, not just to sound. The lab frames this as a thesis: interactivity should scale alongside intelligence, treated as its own axis rather than a wrapper bolted on after the model is smart.

Worth noticing what's load-bearing: an independent reviewer flagged that the headline numbers lean on a second, slower reasoning model running in the background — an easy way to post strong scores. The bet here is less about any one benchmark than about a well-funded lab deciding the frontier isn't only a smarter model, but a model you can actually talk with.

Why it's here

It's a research preview behind closed doors, not something you can use yet.

The lenses

Novelty 3

Impact · breadth 3

Impact · depth 3

Actionable 1

Substance 2

Hype 3

The facts

AccessClosed research preview; wider release promised later in 2026

Weights / codeNone released

What it doesReal-time audio, video and text in one model — listens, watches and talks at once

Concepts

Vision-language model Human–AI collaboration

Open https://thinkingmachines.ai/blog/interaction-models/ →

Interaction models

The lenses

The facts

Concepts

More in Frontier & AGI

Gemini Omni