SafeGEO - Coolwei AI Lab

More and more purchase decisions now start with a question to an AI assistant: “Which air purifier is worth buying?” Sellers have noticed — web copy is increasingly written for AI readers, a practice with a name: Generative Engine Optimization (GEO), the SEO of the AI era. SafeGEO measures the question that follows: when a genuinely flawed product dresses its page up as an “independent buyer guide”, can the agent recommending on the user’s behalf hold its judgment?

SafeGEO overview: seller-controlled GEO sources influence recommendation agents and push flawed target products into recommendation sets. — Paper overview. SafeGEO targets the recommendation pipeline: GEO text shifts the evidence balance the agent sees, pushing flawed target products into recommendation sets.

When product pages are written for AI readers

Recommendation agents (AI systems that search, compare, and advise on the user’s behalf) are becoming a new entry point for commerce and content platforms. Their judgment rests on what they can read online — product pages, reviews, FAQs; and seller-controlled sources such as product pages (material written by the sellers themselves) carry a built-in incentive to be optimized, or manipulated.

The story in the headline deserves unpacking. In the era of paid search ranking, the script — platform optimized by sellers, price paid by users — played out once already: patients in urgent need were steered toward the highest bidder, not the best doctor. With the entry point shifting from the search box to AI assistants, the same contest moves to a new arena — what decides the ranking is no longer keyword bids, but the “evidence” an AI reads.

GEO is not inherently harmful: a clearer page helps humans and AI alike. The problem is the boundary — once a rewrite starts hiding flaws, fabricating reputation, or impersonating an “independent review”, the evidence the agent reads is systematically polluted. SafeGEO is the first controlled measurement of this risk: how far attacks can push recommendations away from the user’s interest, and how much existing defenses recover.

A controlled testbed

SafeGEO covers six evidence-grounded product verticals: AI meeting transcription tools, baby monitors, carry-on backpacks, home air purifiers, noise-canceling headphones, and office chairs. Each case keeps candidate products, canonical attributes, non-target evidence, and hidden utility labels (ground-truth quality annotations known to the evaluator but never shown to the model) fixed while rewriting exactly one seller-controlled source — so any change in the recommendation can be cleanly attributed to that single rewrite.

Recommendation cases	600
Avg. candidates per case	19.96
GEO targets per case	3
Attack variants	22
Total samples	40,800
Evaluation metrics	Target@3, HCV@1, GT@3, uNDCG@5

Benchmark statistics. Each base case expands into control and attack conditions so recommendation changes can be attributed to one rewritten seller source.

Evaluation uses four metrics: Target@3 (how often the attacked, flawed product enters the top three recommendations), HCV@1 (how often it takes the top slot), GT@3 (how often genuinely good products remain in the top three), and uNDCG@5 (how well the top five match the user’s true utility). The 40,800 materialized samples (concrete evaluation instances expanded from each case under different attack conditions) keep every condition pairwise comparable.

SafeGEO dataset construction pipeline: recommendation scenarios, candidate set, hidden utility and evidence annotations, GEO attack construction, instance materialization, and evaluation. — Dataset construction. Candidate sets, hidden utility, and non-target evidence stay fixed; GEO rewrites one target product source at a time.

Condition	Average source-text length
No GEO	3,911 [3,901, 3,921]
Truthful-rewrite	3,905 [3,895, 3,915]
Avg. GEO, 22 variants	3,925 [3,924, 3,926]

Source-length control. GEO and control contexts are closely matched in length, so the observed harm is not explained by simply giving the model more text.

How far attacks go

Experiments show that GEO attacks can substantially promote flawed target products. Realistic seller-facing variants are especially strong: they package false fit, evidence padding, and salience manipulation into one plausible-looking seller document, rather than mechanically stacking keywords. On DeepSeek-V4-Flash, the flawed product enters the top three only 6.2% of the time with no attack; under the realistic “selective comparison note” variant, that rises to 82.3%.

Target@3 uplift bar chart across 22 GEO attack variants, with realistic variants producing especially high uplift. — Target@3 uplift across the 22 attack variants. Realistic variants are strongest overall, suggesting that coherent seller-facing source realization matters more than mechanical primitive stacking.

Representative realistic GEO attack results (DeepSeek-V4-Flash)
Setting	Target@3	Δ	HCV@1	Δ	GT@3	Δ	uNDCG@5	Δ
No GEO	6.2	--	24.5	--	66.7	--	77.0	--
Truthful-rewrite control	4.6	--	23.0	--	67.7	--	78.8	--
Caveat-buried FAQ	77.5	+72.9	76.2	+53.2	57.7	-10.0	66.3	-12.5
Popularity-heavy profile	71.2	+66.6	71.4	+48.4	57.6	-10.1	67.3	-11.5
Citation-padded note	78.7	+74.1	78.4	+55.4	58.1	-9.6	66.2	-12.7
Independent buyer guide	77.9	+73.3	77.3	+54.3	56.5	-11.2	66.0	-12.9
False-fit checklist	79.1	+74.6	78.4	+55.4	57.7	-9.9	66.1	-12.7
Selective comparison note	82.3	+77.7	81.8	+58.8	56.9	-10.8	65.4	-13.5
Avg. realistic	72.6	+68.0	73.4	+50.4	57.7	-10.0	66.9	-11.9

Realistic variants raise Target@3 and HCV@1 while degrading utility-quality metrics.

Mechanistically, an attack succeeds almost exactly to the extent that it hijacks the agent’s citations: the more the model’s citations are steered toward misleading lines, the higher the flawed product ranks — a correlation of r=0.91.

Scatter plot showing misleading GEO-line citation rate versus Target@3, with a strong positive relationship. — Mechanism analysis. Variants that redirect citations toward misleading GEO lines also achieve higher target placement; the paper reports r=0.91.

How much defenses recover

Simple mitigations help, but they are not enough. Defensive prompting (explicitly instructing the agent to watch for marketing manipulation) reduces harmful promotion; evidence breakdown (requiring the agent to list supported, missing, and conflicting evidence for each candidate before ranking) is strongest, cutting Target@3 by 39.2 percentage points on Qwen3.6 27B. Even the strongest defense, though, does not restore no-attack behavior.

Heatmap of Target@3 reduction for mitigation layers on Gemma 4 31B IT across realistic attack variants. — Variant-level mitigation effects. L3 Evidence sheet is the broadest and strongest layer; prompt-only defenses help, but unevenly.

Mitigation results on the same attacked instances (excerpt)
Model	Mitigation	Target@3	Δ	HCV@1	Δ	GT@3	Δ	uNDCG@5	Δ
Gemma 4 31B IT	No mitigation	79.6	--	75.6	--	67.9	--	68.6	--
Gemma 4 31B IT	Defensive prompt	64.5	-15.1	60.8	-14.8	69.3	+1.3	72.6	+4.0
Gemma 4 31B IT	Evidence breakdown	49.9	-29.7	46.6	-29.1	69.5	+1.6	74.4	+5.7
Qwen3.6 27B	No mitigation	78.3	--	83.7	--	60.8	--	63.6	--
Qwen3.6 27B	Defensive prompt	67.3	-11.0	66.2	-17.5	68.5	+7.6	73.4	+9.8
Qwen3.6 27B	Evidence breakdown	39.1	-39.2	42.1	-41.6	69.7	+8.8	77.4	+13.9
Devstral Small 2	No mitigation	90.9	--	90.7	--	47.9	--	59.2	--
Devstral Small 2	Evidence breakdown	73.2	-17.7	78.9	-11.8	43.4	-4.5	56.3	-2.8

Evidence breakdown is usually the strongest mitigation, but it still does not restore no-GEO behavior.

A caution for agent safety

Makes GEO risk measurable.The work grounds visibility optimization in concrete recommendation choices.

Focuses on seller-controlled evidence.It studies information sources that can realistically be optimized or manipulated.

Shows simple defenses are incomplete.Prompting and evidence checks help, but GEO remains a serious agent-safety risk.