Results: Translation Quality and Cost
On English→German:
- Transformer (big)reached BLEU ≈ 28.4, beating previous state-of-the-art (including ensembles) by more than 2 BLEU.
- It became the new SOTA with a single model.
On English→French:
- Transformer (big)hit BLEU ≈ 41.8 (or ~41.0 depending on table presentation), also state-of-the-art for single models.
- Training cost (in FLOPs and wall-clock) was dramatically lower than older systems like GNMT or convolutional seq2seq.
Key story: higher quality, drastically less compute time to reach that quality.


