← Back to News COLM 2025 · August 2025

SEAM: change the format, change the answer

A report should read the same in any format; models often disagree with themselves. SEAM quantifies cross-modal inconsistency in 21 vision-language models across chess, molecules, scores, and graphs. The research page carries the full story: background, method, key figures, and links to the paper.

Research pagearXivPromo copy
21models
16tasks
4domains
3,200base items
9,600evaluations

The same question, drawn as an image or written as text, carries identical information — yet vision-language models often give two different answers. SEAM turns that inconsistency into a controlled measurement.

Launch Highlights

  • Problem:OCR-style tests that screenshot text into images cannot tell whether a model fails to see or fails to reason.
  • Method:SEAM uses FEN/boards, SMILES/molecules, ABC/sheet music, and graphs/matrices to preserve semantics.
  • Finding:Vision usually trails language, and answer agreement across modalities remains far from ideal.
  • Why it matters:Researchers can separate perception failures from cross-modal reasoning failures.

Continue reading

The research page covers the background, method, key figures, and paper links; for quick sharing, use the illustrated promo copy.

Open research pageOpen promo copy