Every AI brand visibility audit ever run, regardless of vendor or methodology, is trying to answer some combination of three questions.
- Recognition. Do the models know your brand exists?
- Recall. When buyers ask category-level questions, do the models surface you?
- Reality. When the models describe you, do they get it right?
Each question has a different remedy. Tools that collapse the three into a single composite make it impossible to tell which problem you actually have. A useful audit reports each separately.
This post is a working frame for interpreting audit results — what each question asks, why the three are related but not interchangeable, and what separates a report that helps from a report that only impresses.
Question one: Recognition
What it asks
Given the direct name of your brand, does the model produce a coherent, accurate identification of the company, its category, and its core offering?
Example prompt: "What is [Brand]?"
What "yes" looks like
- The model returns a paragraph that correctly identifies what you do.
- It names your category, audience, and one or two core product attributes.
- It does not confuse you with a differently-named company.
- It does not say "I'm not familiar with that company."
What "no" looks like
- The model returns "I don't have information about that."
- The model returns information about a different company with a similar name.
- The model returns something wildly out of date — a pre-pivot description, a legacy product positioning, or an old founding team.
Why this question matters first
Recognition is foundational. A brand the model cannot identify cannot be described, compared, or cited. If Recognition is broken, every other dimension collapses beneath it. Fix Recognition first.
What moves it
Recognition is driven primarily by parametric memory. The signals that feed it most heavily:
- A Wikipedia entry (or category entry that names your brand).
- Consistent coverage in industry publications.
- Structured presence on review sites and directories (G2, Capterra, Crunchbase, LinkedIn).
- A distinctive brand name that does not collide with unrelated entities.
Moves here are slow. A new Wikipedia entry or a wave of coverage typically takes one to three training cycles to show up in the next model generation.
A common misreading
A common mistake is to check Recognition by asking the model "do you know what [Brand] is?" — which is a leading question that sometimes produces false confirmation. Better: ask the model to describe the brand, and evaluate whether the description is real or hallucinated. If the model asserts features that do not exist, that is a Recognition-adjacent problem (the model thinks it knows, but is wrong) — which sometimes rolls up under the third question, Reality.
Question two: Recall
What it asks
When the user asks a category-level question without naming your brand, does the model include you in the answer?
Example prompts: "What are the best tools for X?", "I'm a [persona] looking for [outcome] — what should I consider?", "Which platforms should I evaluate for Y?"
What "yes" looks like
- The model lists your brand among the top few in its answer.
- When persona-specific prompts are used, the model surfaces you in answers targeted at the personas you actually serve.
- Your brand is named across multiple phrasings of the same underlying question — not just the one prompt you happened to test.
What "no" looks like
- The model names five competitors and omits you.
- The model names you only when the prompt is narrowly tailored to your exact niche.
- The model places you in the answer but in a low-prominence position at the end of a long list.
Why this question is the hardest
Recall is the test closest to what a real buyer experiences. A buyer who does not yet know your name asks about the category. If the model does not surface you, you are not in the shortlist; you are not even in the awareness set.
Recall is also where most brands discover the largest gap between how they see themselves and how the models describe the category. A brand with strong Recognition (the model knows the name) can still have weak Recall (the model does not volunteer the name unprompted). These are independent scores.
What moves it
- Presence in the third-party lists and comparison articles that rank highly for category queries.
- Category-level Wikipedia coverage that names your brand as an example.
- Analyst reports that place you in the competitive set.
- Keyword alignment between your positioning and the phrasings buyers actually use when asking the question.
- Sustained community discussion (Reddit, vertical forums) that associates your brand with the category.
Recall responds to a different signal set than Recognition. A brand can have a strong Wikipedia entry (good for Recognition) but be missing from every "top 10" article for its category (bad for Recall). The two problems require two investments.
For more on the relative metric that quantifies Recall at the category level, see Share of Model: What Share of Voice Becomes in the LLM Era.
Question three: Reality
What it asks
When the model describes your brand — whether prompted by name or by category — does it describe you accurately, with the framing you would recognize as fair?
Example prompts: "Describe [Brand]'s product, audience, and pricing.", "What do users say about [Brand]?", "How is [Brand] different from [Competitor]?"
What "yes" looks like
- The factual claims are correct: founding year, product category, audience, pricing structure, key features.
- The positioning matches your actual positioning (not a version you dropped two years ago).
- The tone is fair and specific — notes strengths, handles known weaknesses without exaggerating them.
- Differentiation from competitors is described in terms you would use yourself.
What "no" looks like
- Features described that do not exist (hallucination).
- Outdated positioning, tagline, or offering persists from a prior version of the brand.
- Incorrect pricing, founding year, or geographic focus.
- Tone is flat or dismissive — "one of many tools in this space."
- Comparisons frame you against the wrong competitors, or describe differences in terms that favor a rival.
Why this is the slipperiest question
Reality has three related sub-dimensions that often get conflated:
- Accuracy — are the facts right?
- Currency — is the description reflecting your current state or a prior one?
- Framing — is the tone favorable, neutral, or negative, and how does it compare to how competitors are framed?
A brand can score well on accuracy and currency but poorly on framing. Or it can be described with current, flattering language but get one critical fact wrong (pricing, for example) that undermines buyer research. The sub-dimensions respond to different fixes.
What moves it
- Currency of your owned surfaces — website, about page, product pages — across all the places a model might retrieve.
- Review site hygiene — G2, Capterra, Trustpilot with recent, accurate reviews.
- Community presence and sentiment — Reddit, HN, vertical communities shaping qualitative framing.
- Press coverage that uses specific, accurate language. Generic coverage does less than coverage that quotes specific product claims.
- Consistency of narrative — if every source describes you in the same precise terms, the model's synthesis is tight. If each source says something slightly different, the synthesis blurs.
Reality is the dimension where sustained brand work pays off longest. A brand that has invested in coherent, defensible positioning for several years has a Reality score that competitors cannot catch up to quickly.
Why three questions, not one
It is tempting to reduce everything to a single number. "Our AI visibility is 63/100." The reduction feels clean. It also hides the diagnostic information that the number was built from.
Consider three hypothetical brands, each scoring 63/100 in composite:
- Brand A: Recognition 90, Recall 40, Reality 60. The models know the brand, describe it fairly, but do not surface it on category queries. The remedy is Recall work — third-party listicles, category-level content, analyst coverage.
- Brand B: Recognition 50, Recall 70, Reality 70. The models surface the brand on category queries and describe it well, but are slow to recognize it by name directly. This is unusual and typically reflects a fresh rebrand; the remedy is to reinforce the new name in parametric sources (Wikipedia, press, LinkedIn).
- Brand C: Recognition 85, Recall 80, Reality 25. The models recognize and surface the brand reliably but describe it with outdated or wrong information. This is urgent — every AI answer is actively hurting the brand. The remedy is a surgical, current-positioning refresh across owned, earned, and reviewed surfaces.
Three identical composite scores. Three completely different briefs. A tool that reports only the composite cannot tell you which you are.
How this maps to the six dimensions
The three questions map onto the six dimensions of AI brand visibility roughly as follows:
| Question | Primarily measured by |
|---|---|
| Recognition | Recognition (25 pts), AI Discoverability (25 pts) |
| Recall | Contextual Recall (15 pts), Competitive Context (25 pts) |
| Reality | Knowledge Depth (30 pts), Sentiment & Authority (30 pts) |
The six-dimension breakdown is the detailed view; the three questions are the interpretive view. Both are useful. The six dimensions tell you the score. The three questions tell you what the score means.
The audit interpretation checklist
When you receive an audit report, run through this before drawing conclusions:
- Read the Recognition score across all five providers. Any provider scoring significantly lower than the others indicates a parametric gap specific to that model's training data.
- Read the Recall numbers with the category composition in mind. Is your category crowded (in which case moderate Recall is normal) or concentrated (in which case moderate Recall is a problem)?
- Read the Reality numbers with the qualitative samples in hand. Do not trust a score in isolation. Read five actual answers per provider and form your own view. The score summary abstracts away from the text.
- Compare across providers. A score strong on Claude/DeepSeek and weak on ChatGPT/Gemini is a training-vs-retrieval split. A score weak on everything is a foundation problem.
- Compare across time. An audit at one point is a snapshot. The useful signal is the trend line across weeks or months.
If the audit tool you are using does not allow this kind of breakdown — per-dimension, per-provider, per-time — it is giving you a decorative number.
What to do with the answers
If Recognition is the problem
Invest in the long-signal surfaces: Wikipedia, category entries, press coverage, structured review profiles, consistent brand naming. Expect 2–4 quarters before the fix shows up in parametric memory. In the meantime, expect retrieval-enabled providers (ChatGPT, Gemini) to produce better results than parametric-only ones (Claude, DeepSeek) — because retrieval can paper over weak parametric memory with fresh lookups.
If Recall is the problem
Invest in category-level presence: get into the "top tools for X" articles, earn analyst coverage, publish your own category-framing content, participate in the communities that discuss your category. Expect faster movement than Recognition fixes — retrieval backends pick up new rankings within weeks.
If Reality is the problem
Move urgently. Every day the model keeps describing you wrong is another buyer forming a wrong impression. Audit your owned surfaces for consistency, refresh your review profiles, update stale press resources, and address specific factual errors where you can — including direct correction channels where providers accept them (some AI systems allow brand owners to flag errors through official processes; check each provider's current policy).
The takeaway
Three questions. Recognition: do they know you? Recall: do they surface you? Reality: do they describe you correctly? Every meaningful audit answers all three separately. A tool that collapses them into one number is giving you less information, packaged to look like more.
If you want a structured read across all three questions, for five providers, at a stable prompt set, you can run a free audit in about two minutes — seven-day trial, no credit card.
See how AI describes your brand
BrandGEO runs structured prompts across ChatGPT, Claude, Gemini, Grok, and DeepSeek — and scores your brand across six dimensions. Two minutes, no credit card.