ThinkTwice released - News

People expect an AI to fix its answer when asked to double-check; standard training does not produce that ability. ThinkTwice writes it directly into the training objective.

Launch Highlights

Problem：Standard reinforcement learning rewards only the first attempt; the model never practices reviewing and correcting itself.
Method：Solve once, feed the answer back, refine once; both steps use only final correctness.
Finding：Qwen3-4B gains +11.5pt on AIME pass@4, with refinement adding on top of one-shot gains.
Why it matters：No critic, no process reward, no human critique data: self-refinement can be trained directly.

Continue reading

The research page covers the background, method, key figures, and paper links; for quick sharing, use the illustrated promo copy.

Open research page Open promo copy

ThinkTwice: teaching models to fix their own answers

Launch Highlights

Continue reading