News · 新闻

Releases, papers, and lab milestones.

Updates from Coolwei AI Lab: research published, products shipped, and moments worth recording.


9 entriesLatest June 2026

2026

5 entries
MayMajor Release
May 2026Major Release

Yanlan V3.0 released

All 8 headline metrics won, 43 of 47 total comparisons won, setting a stronger bar for pre-publication Chinese correction.

AprarXiv
April 2026arXiv

ThinkTwice released

Models are like students who grind problem sets but never check their paper: they solve, they don’t fix. ThinkTwice trains “checking” into a skill — an 11.5-point gain on AIME pass@4.

MararXiv
March 2026arXiv

Grounded Chess Reasoning released

Engines are like master craftsmen who cannot teach: accurate, but unable to explain. Master Distillation gives a 4B model concise puzzle commentary that surpasses its teacher.

MararXiv
March 2026arXiv

OasisSimp dataset released

Rewriting official prose into plain language has no yardstick in most languages. Five languages, 9,519 sentences, written by native speakers — the first open evaluation for low-resource simplification.

FebKDD 2026 (CCF-A)
February 2026KDD 2026 (CCF-A)

SWE-Bench Mobile released

50 real iOS feature tasks, 449 human-written tests, ~500K lines of production code, and a best task pass rate of 12%.

2025

3 entries
AugCOLM 2025
August 2025COLM 2025

SEAM accepted at COLM 2025

A report should read the same in any format; models often disagree with themselves. SEAM quantifies cross-modal inconsistency in 21 vision-language models across chess, molecules, scores, and graphs.

AugProject Launch
August 2025Project Launch

Mobile-Agent-Bench project launched

The start of a long-running collaboration to evaluate coding agents on real mobile production codebases.

JunFounding
June 2025Founding

Coolwei AI Lab founded

Focused on safe deployment, evaluation, and real-world applications of large language models.

2024

1 entry
DecNeurIPS Spotlight
December 2024NeurIPS Spotlight

Report Cards receives NeurIPS SoLaR Spotlight

Like a teacher writing comments, Report Cards auto-write behavior reports for models — verified to genuinely help people tell models apart. A NeurIPS SoLaR Spotlight.