Benchmarks
- Benchmark or Breakthrough: GPT-5.4 and the Ramsey Hypergraph Question
GPT-5.4 Pro improved a constant on a Ramsey bound in Epoch's FrontierMath benchmark. Here is what that actually means, and why the answer requires nuance.
- Open-Weight vs Frontier: How Close Is the Accuracy Gap Really?
Benchmark scores for open-weight models have converged with frontier cloud models on many tasks. But benchmarks measure what benchmarks measure. This is what the data actually says about where the gap is real and where it has closed.