▸ Concept also: test set leakage, benchmark contamination, data leakage
Training data contamination
When examples from an evaluation set appear in a model's training data, making benchmark scores higher than genuine capability warrants.
Learn first
In a nutshell
A benchmark works only if the model has never seen the answers. Contamination happens when crawled training corpora include pages that contain evaluation questions and their solutions — the model memorises rather than reasons, and the score inflates. The hard part is detection: training sets are vast and rarely audited against every benchmark released after the fact. A contaminated result looks identical to a real one on the leaderboard, which means the field's main progress signal can quietly stop measuring what it claims to measure.
Where it came from
Year2021
SourceBrown et al. — GPT-3 paper (OpenAI)
Why it matteredDedicated contamination analysis section flagged test-set overlap in pre-training data as a known confound — one of the first systematic treatments in a major model paper.
In megatrends
How this connects
Tap a node to open it
