Mentatcurated

Interaction models

Mira Murati's lab put out a model that doesn't wait its turn — it watches you on video, listens, and talks back at the same time, all from one model instead of a stitched-together voice pipeline.

Today's voice assistants are assembled: one component decides you've stopped talking, another transcribes, the language model answers, a fourth speaks. Thinking Machines collapsed that assembly into a single model that reads the room in 200-millisecond slices — perceiving and responding in the same breath, so it can interject, finish your sentence, or comment on what it just saw you do, without waiting for a turn to end.

Open systems have done full-duplex audio — interrupting, backchanneling — since Kyutai's Moshi in 2024. The wager that's actually new is the video.

Talking over you is the demo that gets quoted, but it isn't the new part: open systems have done full-duplex audio since 2024. The real move is twofold — doing it at frontier scale, and folding live video into the same stream, so the model reacts to a raised eyebrow or a botched rep, not just to sound. The lab frames this as a thesis: interactivity should scale alongside intelligence, treated as its own axis rather than a wrapper bolted on after the model is smart.

Worth noticing what's load-bearing: an independent reviewer flagged that the headline numbers lean on a second, slower reasoning model running in the background — an easy way to post strong scores. The bet here is less about any one benchmark than about a well-funded lab deciding the frontier isn't only a smarter model, but a model you can actually talk with.

Why it's here

It's a research preview behind closed doors, not something you can use yet.

The lenses

Novelty 3
Impact · breadth 3
Impact · depth 3
Actionable 1
Substance 2
Hype 3

The facts

AccessClosed research preview; wider release promised later in 2026
Weights / codeNone released
What it doesReal-time audio, video and text in one model — listens, watches and talks at once
Open https://thinkingmachines.ai/blog/interaction-models/ →