Artificial Intelligence medium · first-party

slime

The reinforcement-learning framework Zhipu used to post-train its frontier GLM models is a free clone away — and Alibaba, the vLLM project, and AMD now build on top of it.

repo THUDM · 2 min read · originally announced 31 Aug 2025

The open-source code that finished the reinforcement-learning post-training of GLM-5.2 — a Chinese frontier model that trades blows with GPT-5.2 — is the same code anyone can clone from GitHub. By Zhipu's own account, that final stage took about two days: rather than one monolithic run, the lab trained ten-plus specialist 'expert' models and used slime to merge them, by reinforcement learning, into a single agentic model.

The disaggregated design runs the training engine and the generation server as separate services rather than one process — paying network overhead to scale and fail each half on its own. — THUDM

slime's job is plumbing. One engine (Megatron) does the heavy training; a separate service (SGLang) generates the model's practice attempts; a shared buffer passes data between them. The unusual choice is to keep those two as decoupled services rather than fusing them into one process, the way the incumbent open frameworks do — trading some network overhead for the ability to scale and fail each half independently. The maintainers pitch it as deliberately small: no grand abstraction layer, just enough to read and extend.

The quietly striking part is the dependency map. slime came out of Tsinghua and Zhipu, and it has since become shared plumbing well beyond them — Alibaba's RL stack, a framework from the vLLM project, and AMD's day-one support for its own GPUs all build on top of it. A piece of Chinese training infrastructure has become substrate that Western labs and chipmakers extend rather than route around. For anyone doing RL post-training who used to reach for the usual two frameworks, there is now a third option whose claim to fame is that it actually shipped frontier models.

Want to try it?

Clone the repo and read the README's two run modes — co-located and disaggregated — to see how the training engine and the rollout server are wired together.

Open the repo at github.com →

The lenses

Novelty 3

Impact · breadth 2

Impact · depth 3

Actionable 4

Substance 5

Hype 2

The facts

CostFree and open-source (clone and build)

MakerTHUDM (Tsinghua) / Zhipu AI

ProofUsed to RL-post-train the shipped GLM frontier models

Ecosystem~1,000 forks; seven downstream frameworks, incl. ones from Alibaba and the vLLM project

Concepts

AI infrastructure Frontier models Mixture of experts

Open github.com →

How this connects

Tap a node to open it

slime

The lenses

The facts

Concepts

More in Artificial Intelligence

The bottleneck is a transformer

Safety's rounding error

The Jevons bill comes due

How this connects