← Back to Research Dataset · Low-resource NLP

OasisSimp

Rewriting “Payment must be remitted prior to the commencement date” as “Pay before the start date” — most languages lack even a yardstick for this. Native speakers of five languages build the first open benchmark.

arXiv Project / Dataset
5languages
9,519source sentences
~4references per sentence
8open multilingual LLMs
CC BY 4.0license

A government notice or a medicine leaflet being public does not make it readable — for second-language readers, students, and people with reading difficulties, long complex sentences are a barrier in themselves. OasisSimp builds an open evaluation benchmark for sentence simplification in five languages — English, Sinhala, Tamil, Thai, and Pashto — with all material written by native speakers.

OasisSimp examples across five languages
Paper-native figure. Each language shows a complex sentence, simplification, and the operation used.

Public is not the same as readable

The rewrite in the headline is what natural language processing calls sentence simplification: making a sentence easier to read without losing its meaning. It underpins public information, education, and accessible reading: government offices want notices everyone can follow, publishers and educators need graded reading material, and the readability of health information directly affects whether it is understood.

English simplification research has years of accumulated benchmarks to draw on. For low-resource languages (languages with scarce corpora and evaluation data) such as Sinhala, Pashto, Tamil, and Thai, there was almost no public evaluation at all — not even a yardstick to tell a good simplification system from a poor one. OasisSimp supplies that yardstick.

A benchmark written by native speakers

All source texts come from authentic settings: government documents, news, and Wikipedia. For each of the 9,519 complex sentences, native speakers wrote multiple reference simplifications under a shared guideline (multi-reference — accepting several valid ways to simplify a sentence rather than treating one answer as the only standard).

The data is split 80% test / 20% validation and fully released under CC BY 4.0, positioned as an evaluation benchmark rather than a training corpus.

Table 1. Final statistics for the OasisSimp dataset
Lang# Comp SentencesAvg. Simp SentencesAvg. Comp LengthAvg. Simp LengthSource Domain
English25002.8624.3517.23News
Sinhala25005.0030.1228.78Govt
Thai14995.0648.2437.77News
Tamil5204.6623.2217.65Govt
Pashto25003.0028.8120.31Wiki

Where open models stand today

The paper evaluates eight open-weight multilingual LLMs using SARI (the standard automatic metric for simplification, scoring three editing operations separately — the ADD / KEEP / DEL columns in the tables) and BERTScore (a semantic-similarity score).

The results have two layers. Few-shot examples (including a few demonstrations in the prompt) improve performance across nearly all languages, showing that style can be calibrated. But absolute performance on low-resource languages still clearly lags, especially when simplification requires adding suitable plain wording rather than only deleting material.

Table 3. Results on English (OasisSimp-EN)
Model0 Shot1 Shot5 Shot
SARI Comp.SARIFrefSARI Comp.SARIFrefSARI Comp.SARIFref
ADDKEEPDELADDKEEPDELADDKEEPDEL
Aya 9.3244.9875.2343.1854.449.6844.9072.5142.3656.3510.1845.9171.1642.4257.20
Cmd-R 9.6944.9572.8942.5155.9010.9943.7177.5744.0955.0311.9145.2877.0944.7656.63
DeepSeek 7.0341.4776.3041.6051.887.8041.1276.8241.9151.929.4142.0377.2242.8954.15
EuroLLM 9.3245.6068.3641.1056.9810.9946.9869.3542.4457.9611.6346.5570.9343.0458.10
Gemma 5.2444.4368.5439.4051.876.5543.2674.4441.4152.349.1944.6777.0643.6455.27
LLaMA6.4843.3168.3439.3854.308.1143.4272.8341.4554.539.9344.7573.7542.8156.00
Mistral 8.5643.6677.4643.2352.4910.3143.8278.4344.1854.5511.6144.0178.5944.7455.89
Qwen 8.7046.0773.5342.7742.369.5446.4077.2544.3953.0310.8847.0177.0844.9955.27
Table 7. Results on Pashto (OasisSimp-PS)
Model0 Shot1 Shot5 Shot
SARI Comp.SARIFrefSARI Comp.SARIFrefSARI Comp.SARIFref
ADDKEEPDELADDKEEPDELADDKEEPDEL
Aya 0.6223.9867.4730.6949.171.0845.6058.6235.1060.831.7753.8147.1734.2568.25
Cmd-R 0.7550.8251.7334.4461.910.9354.4144.4433.2667.840.7056.5335.6230.9570.52
DeepSeek 0.5241.1960.7134.1438.650.9048.8354.5934.7863.650.9150.1652.5134.5366.26
EuroLLM 0.5054.2844.4033.0667.550.6554.8743.2832.9369.720.7855.3742.0932.7570.42
Gemma 3.8425.0870.7833.2356.954.4734.7568.5735.9361.475.3946.3961.9537.9166.04
LLaMA 0.7018.3470.2829.77-22.403.1546.2861.6737.0451.151.9646.5358.1535.5533.03
Mistral 0.9426.3668.1331.8147.731.4241.2063.0435.2261.311.5145.9358.6035.3564.40
Qwen 2.3447.4858.9236.2558.022.8149.8855.3436.0164.762.6253.7948.7135.0465.57
  • Multiple references matter.A single target underestimates the space of acceptable simplifications.
  • Few-shot helps but does not solve the gap.Examples calibrate style without erasing resource imbalance.
  • ADD is hardest.Models can delete redundant content, but low-resource languages make helpful additions difficult.

A reminder for multilingual AI

A reproducible benchmark for low-resource simplification.Thai, Pashto, and Tamil had almost no reusable sentence-simplification data.
Built for readers, not only leaderboards.Simplification affects the readability of public notices, education materials, and health information.
A reality check on English-only extrapolation.English performance does not imply multilingual simplification capability; OasisSimp makes the gap measurable.

Download the dataset

The project page hosts the data and evaluation details for multilingual NLP and accessibility research.

arXiv Project / Dataset