Mentatcurated
Artificial Intelligence medium · independent

A power law for the trolley problem

Feed 75 language models the crowd's answers to the self-driving-car trolley problem, and the gap between a model and the human majority shrinks with size on a fitted curve — the same mathematical shape used for training loss, now drawn over whom a car should kill.

The Moral Machine is the viral MIT survey that asked 40 million people, across 233 countries, a grim sorting question: when a self-driving car must hit someone, should it spare the doctor over the homeless man, the young over the old, humans over pets. Kazuhiro Takemoto, at the Kyushu Institute of Technology, ran the same dilemmas past 75 language models spanning from 270 million to a trillion parameters, and measured how far each model's answer profile sat from the human aggregate.

The relationship between distance reductions and practical improvements in real-world moral judgment quality remains to be empirically validated. — Kazuhiro Takemoto

The distance shrinks as models get bigger, and it shrinks on a clean power law — D falling with size to the −0.10 power, the identical functional form the field already uses to plot training loss against compute. This is the new thing: earlier work (including Takemoto's own) had shown only that bigger models correlate with more human-like moral choices. Here it becomes a fitted exponent that survives statistical controls for model family and for reasoning ability.

But read the exponent, not the headline. At −0.10, a tenfold jump in parameters buys only a modest step closer to the crowd, and the fit explains just half the variation — the rest is something other than size. And Takemoto is careful about what 'closer' means: closer to a mostly-Western majority vote is not closer to a moral truth, and whether shrinking that distance improves real-world judgment, he writes, 'remains to be empirically validated.'

The paper reached a general audience through an optimist's newsletter titled 'Why Smarter AI May Mean Safer AI' — the exact reading the paper's own limitations section declines to make. What Takemoto measured is that scale nudges a model toward the average human answer. Whether the average human answer is the one you want a car to give is a separate question, and this curve does not touch it.

The lenses

Novelty 3
Impact · breadth 2
Impact · depth 2
Actionable 4
Substance 5
Hype 1

The facts

Human baselineMIT's Moral Machine — 40M life-or-death choices from 233 countries
Models tested75 configurations, 270M to 1T parameters, seven families
The fitdistance to the human average falls as size^-0.10; half the variance stays unexplained
Openpeer-reviewed paper, open preprint, and the author's public code repo
Open arxiv.org →

How this connects

Tap a node to open it