← Back to News arXiv · April 2026

ThinkTwice: teaching models to fix their own answers

Models are like students who grind problem sets but never check their paper: they solve, they don’t fix. ThinkTwice trains “checking” into a skill — an 11.5-point gain on AIME pass@4. The research page carries the full story: background, method, key figures, and links to the paper.

Research pagearXivPromo copy
+11.5 ptAIME pass@4
5math benchmarks
2model families
1reward signal
+3%training overhead

People expect an AI to fix its answer when asked to double-check; standard training does not produce that ability. ThinkTwice writes it directly into the training objective.

Launch Highlights

  • Problem:Standard reinforcement learning rewards only the first attempt; the model never practices reviewing and correcting itself.
  • Method:Solve once, feed the answer back, refine once; both steps use only final correctness.
  • Finding:Qwen3-4B gains +11.5pt on AIME pass@4, with refinement adding on top of one-shot gains.
  • Why it matters:No critic, no process reward, no human critique data: self-refinement can be trained directly.

Continue reading

The research page covers the background, method, key figures, and paper links; for quick sharing, use the illustrated promo copy.

Open research pageOpen promo copy