Your AI Rank Is Noise. Your Appearance Rate Isn’t.

POINT この記事のポイント

Across 2,961 AI queries, individual rankings almost never reproduced
What works in practice is the appearance rate from repeated measurement

#AI Visibility
#Brand Measurement
#ChatGPT
#GEO
#Appearance Rate

You open an AI-visibility tool, type the same question twice, and get two different rankings. Third place, then seventh. So the obvious thought is: how am I supposed to make a decision off a number that won’t sit still?

I think that worry is exactly right — and also the start of something useful. AI visibility is a young field, and the dashboards all look equally polished, which is the actual problem. The trustworthy numbers and the dangerous ones render in the same font.

So let’s line up three independent studies and sort the numbers into two piles: the ones you can lean on, and the ones that’ll quietly steer you wrong.

2,961 Queries Later: Rank Is Noise, Rate Is Signal

SparkToro and Gumshoe ran the test nobody wants to run by hand. 600 volunteers asked three different AIs for brand recommendations, 2,961 times in total. Two findings matter here.

The exact recommendation list reproduced less than 1% of the time, and the order matched about 0.1% of the time.
But the appearance rate — how often a brand showed up across all those tries — held steady.

So a single ranking is a die roll. If your brand is third today and seventh tomorrow, you haven’t measured a decline; you’ve recorded the weather. Stack those numbers in a slide deck and you’re presenting noise with a straight face.

The rate is a different kind of thing. When a high-visibility brand turns up in 97% of 71 tries, that’s not luck — that’s a number you can set next to last month’s number and actually mean something by the comparison.

So the first rule writes itself: ignore the rank, track the rate.

Estimated Prompts Are Wrong, But Useful Anyway

Here’s the next number people fret over. Most tools don’t watch real users type; they estimate the prompts on your behalf. So how far off are the estimates?

Otterly AI looked at exactly this. Real prompts averaged 15.1 words; estimated prompts averaged 8.8. Real ones carry more context and more first-person framing; estimated ones run short and skew commercial. Different animals, basically.

But — and this is the part that saves the metric — the brand rankings came out “quite similar” across both. So an estimated prompt isn’t reality. It’s a rough map of reality, good enough to see where you sit against a competitor. Use it to read relative position, not to believe the absolute wording.

”Prompt Search Volume” Looks Precise. That’s The Trap.

Now the dangerous one. A critical review from jaeckert-odaniel.com takes apart “prompt search volume” — the metric that hands you a clean-looking number for how often people supposedly type a given prompt.

The number looks precise.

The ground underneath it mostly isn’t.

The source data leans on desktop Chrome panels and misses a lot of mobile use.
The panel skews tech-heavy, so it doesn’t represent the real market (ie the sample isn’t the population).
And small samples get extrapolated into big claims, where the error compounds into false precision.

So this is a hypothesis tool wearing a KPI costume. Fine for “huh, maybe people ask about X” — dangerous the moment you let it move a budget.

What To Check Before You Trust The Dashboard

Open any AI-visibility tool and your eyes go straight to the rank and the score. But the two questions that actually decide whether a number is real are quieter ones: how many repetitions is this built on, and how is the appearance rate defined? Leave those fuzzy and the prettiest dashboard in the world is still guessing.

So look for the method. Does the tool publish it, the way SparkToro did? Is it averaging over at least a few dozen runs, or showing you one die roll dressed up as a trend? And whatever you do, don’t promote “prompt search volume” to a headline KPI — read it next to rank trends and first-party data, as a hint, not a verdict.

The honest version of AI visibility isn’t “what’s my rank today.” It’s “across enough tries, how often does the AI bring me up, and is that share moving.” That’s the number worth watching. The rest just renders nicely.

Sources

Rand Fishkin (SparkToro/Gumshoe), “NEW Research: AIs are highly inconsistent when recommending brands or products”, 2026-01-27, sparktoro.com
Thomas Peham (Otterly AI), “Real vs Estimated Prompts: I Analyzed 100s of Real ChatGPT Queries”, 2026-02-03, otterly.ai
jaeckert-odaniel.com, “Prompt search volume: Real data or all guessed?”, 2025-12-16, jaeckert-odaniel.com