Mentatcurated
Artificial Intelligence high · first-party

MAI-Thinking-1

Microsoft's first frontier reasoning model sold itself on 'clean, commercially licensed data, no distillation from other labs' — then its own technical paper listed Common Crawl, 24.2 billion pages of the open web.

At Build last month Microsoft unveiled seven models built entirely in-house — a reasoning model, a coder, image and voice systems — its clearest move yet to stop renting intelligence from OpenAI. The pitch that set the family apart was about provenance, not scores: these were trained 'from the ground up on clean data, without distillation from third-party models,' aimed straight at enterprise buyers who worry about where a model's training data came from.

Two Microsoft documents, contradicting each other, shipped the same day — and a month on the company has issued no correction reconciling the keynote's pitch with its own paper.

The not-distilling-from-rivals part holds up. The clean-data part did not survive the week. Reading the released model card live at the keynote, Simon Willison first wondered whether these were the first useful code models not trained on an unlicensed scrape of the web — then posted a correction: the paper lists Common Crawl, 24.2 billion pages, the same open-web dump everyone else uses. The keynote said 'enterprise-grade, clean and commercially licensed'; the paper said 'a mixture of publicly available and licensed human-generated data.' Microsoft has said nothing to reconcile them.

The flagship is real and may well be good — Microsoft says human raters, run blind by an outside firm across 1,276 tasks, preferred it to one of Anthropic's models, though it has published no win rate and no outside lab has access to check. But the durable lesson is narrower and cuts across the whole industry's marketing: a company built a differentiator out of data provenance, and couldn't make it through its own footnotes intact.

The lenses

Novelty 3
Impact · breadth 3
Impact · depth 3
Actionable 3
Substance 4
Hype 4

The facts

AccessClosed private preview on Microsoft Foundry; some models on OpenRouter and Fireworks; the coding model rolling out in GitHub Copilot
What launchedSeven models at once — reasoning, coding, two image, transcription, and two voice
Open weights?None confirmed
Open microsoft.ai →

How this connects

Tap a node to open it