Research · 研究

Pushing the frontier of reasoning, code, and Chinese-language AI.

We work on small-model reasoning, agentic coding, and grammar-correction systems — research that ships into government, media, and education products.

6 papers 3 focus areas Updated April 2026

2026

4 papers

arXiv · 2026

Grounded Chess Reasoning

Master Distillation + RLVR turns a 4B model into a chess reasoner that beats frontier LLMs.

Distillation RLVR

arXiv 2026

ThinkTwice

RLVR that jointly optimises reasoning and self-refinement at +3% overhead, +11.5pt on AIME after refining.

RLVR Self-Refinement

arXiv 2026 · with Xiaohongshu

SWE-Bench Mobile

Industrial mobile-development agent benchmark on a real production iOS codebase — 50 tasks, 449 human-verified tests.

Coding Agents Benchmark

arXiv 2026

OasisSimp

Open-source sentence simplification dataset spanning English, Sinhala, Tamil, Thai and Pashto.

Multilingual NLP Dataset

2025

1 paper

COLM 2025

SEAM

Semantic equivalence cross-modal reasoning benchmark evaluating vision-language model consistency.

Multimodal Benchmark

2024

1 paper

NeurIPS SoLaR 2024 · Spotlight

Report Cards

Fully automated qualitative evaluation framework generating human-interpretable model behavior reports.

Evaluation Interpretability

Milestones

2026 4 papers released
- Grounded Chess Reasoning · arXiv 2026
- ThinkTwice · arXiv 2026
- SWE-Bench Mobile · arXiv 2026, with Xiaohongshu
- OasisSimp · arXiv 2026
Aug 2025 SEAM accepted at COLM 2025
- Mobile-Agent-Bench project launched
Jun 2025 Coolwei AI Lab founded
Dec 2024 Report Cards · NeurIPS SoLaR Spotlight
- First fully automated qualitative evaluation framework