Artificial Intelligence medium · first-party

LM Link

A preview of the LM Studio app pooled four Mac Studios into a 1.5-terabyte machine, ran a trillion-parameter model on it, and answered from an iPhone — no datacenter, no cloud.

demo LM Studio · 2 min read · originally announced 16 Jun 2026

At WWDC this month, four Mac Studios sat on a desk and, stitched together, ran Kimi K2.6 — a trillion-parameter model whose weights alone need about a terabyte, more than any single machine holds. A MacBook and then an iPhone queried it live over an encrypted link. The whole thing drew a fraction of the power a comparable rack of datacenter GPUs would.

The fast path needs the machines wired into a low-latency mesh; the simpler cabling mode, Apple noted, "doesn't speed up inference" and "isn't supported by all models." — Apple WWDC26 session 233

The clustering isn't LM Studio's trick, and that's worth being clear about. Apple shipped the hard part in macOS this cycle: a way to fan one model's math across several Macs over the Thunderbolt cables between them, fast enough that they behave like one big pooled memory. Apple's own WWDC session demoed the identical model on the identical four machines without ever mentioning LM Studio. Splitting a 671-billion-parameter model across a Mac cluster was already being done by hobbyists last year.

What LM Studio added is the last mile — the part that turns a research capability into something you use. It wrapped Apple's plumbing in the consumer app people already run their local models in, and bolted on remote access: your cluster stays on your desk, but you can talk to it from your phone across town, end to end encrypted. The reason to want that isn't the trillion-parameter headline. Kimi K2.6 is the first open-weights model to top a hard coding benchmark that had belonged to the closed labs. Running that one, privately, on hardware you own — rather than renting it through someone else's API — is the point the demo is making.

It was a preview built with Apple, not a shipped feature, and the four-Mac cluster runs about forty thousand dollars. But the direction is the story: frontier-scale private inference is close enough to a desk-side appliance that the app you'd run it in already exists.

The lenses

Novelty 3

Impact · breadth 2

Impact · depth 3

Actionable 3

Substance 4

Hype 2

The facts

Runs onFour Mac Studios wired together (~$40K), macOS 26.2 — no datacenter GPUs

AvailabilityA preview demo built with Apple, not a generally shipped feature

Remote accessQuery the cluster from a phone or laptop over an end-to-end encrypted link

Reported speed~28 tokens/sec (demo-reported, not independently benchmarked)

Concepts

Local inference Frontier models Mixture of experts

Open x.com →

How this connects

Tap a node to open it

LM Link

The lenses

The facts

Concepts

More in Artificial Intelligence

Agent Skills

The bottleneck is a transformer

Safety's rounding error

How this connects