Running a quick manual check across the five major LLM providers is the first thing anyone investigating AI visibility should do. It will not replace a systematic audit — you will not get stable scores, cross-provider comparability, or per-dimension recommendations — but it will tell you whether you have a problem worth measuring. Most founders and marketing leads can run this diagnostic in a single fifteen-minute sitting and come out with a clear picture of where to focus.
This post gives you the prompts, the structure for running them, and the interpretation framework.
Why Manual Diagnostics Are Useful Despite Being Noisy
A single prompt to an LLM produces a noisy answer. Rerun the same prompt and you will get a different answer. Run it on a different model and the distance is larger. This is the core problem BrandGEO solves with structured scoring across 30 checks and 5 providers — noise averaged down to signal.
But manual diagnostics still have value for two reasons. First, they show you what actual users will see when they ask the model about you. Even a single bad answer is a real user experience. Second, they surface the biggest issues fast: missing brand entirely, wrong category, outdated positioning, hallucinated pricing. Those do not require structured scoring to detect. You will see them the first time you ask.
The discipline is to run the prompts systematically, record the answers, and not over-interpret a single run. Patterns that appear across three of five providers are real signals. Single-provider oddities might be noise.
The Eight Diagnostic Prompts
Run each of these in ChatGPT, Claude, Gemini, Grok, and DeepSeek. Record the answers in a shared document. Eight prompts × five providers = forty answers. Budget ninety minutes the first time, fifteen on subsequent runs once you have the template.
Prompt 1: Direct brand knowledge
What do you know about [Your Brand Name]?
This tests Recognition and Knowledge Depth at the most basic level. Watch for:
- Whether the model knows you exist at all.
- Whether the description matches your current positioning.
- Whether the company facts (founding year, location, founders) are correct.
- Whether the product description is current or refers to a past version.
- Whether the model confuses you with another company of a similar name.
Red flags: complete non-recognition on two or more providers, confidently wrong facts on any provider, or outdated positioning across all five. Any of these indicates a Knowledge Depth problem that will take months to correct.
Prompt 2: Category-level query
What are the top [your category] tools / services / brands in 2026?
This tests Contextual Recall. You are asking the model to generate a category list without prompting it with your name. If you are not in the response, the model does not associate you with your category strongly enough to surface you when buyers ask about the category.
Red flags: missing from the list on three or more providers. This is arguably the most expensive failure mode because it means your brand is invisible at the exact moment buyers are researching. Being missing on one or two providers is worth addressing; being missing on four or five is a five-alarm problem.
Prompt 3: Use-case query
I am looking for a [category] tool for [specific use case relevant to your product]. What do you recommend?
This tests whether your positioning aligns with specific buyer intents. If the model recommends competitors and not you for a use case you think you serve well, there is a mismatch between how you describe yourself and how the model has learned to describe your product.
Red flags: the model recommends competitors for use cases you believe are your strongest fit. This signals that your on-site positioning or your external mentions have drifted away from those use cases.
Prompt 4: Comparison query
Compare [Your Brand] to [Primary Competitor].
This tests Competitive Context. Watch for:
- Is the comparison accurate?
- Does the model correctly identify your differentiation?
- Is the tone even-handed, or does it favor the competitor?
- Does the model mention your strongest features at all?
Red flags: the model describes the competitor more favorably, omits your differentiation, or inverts the comparison (describing your strength as the competitor's). Any of these is a Competitive Context problem.
Prompt 5: Sentiment query
What do users think about [Your Brand]? What are common complaints or praise?
This tests Sentiment & Authority. The model pulls from reviews, Reddit, forums, and social — summarizing what the distributed internet says about you. Watch for:
- Is the sentiment summary broadly accurate?
- Are there hallucinated complaints (the model invents issues that do not exist)?
- Are there real complaints you were unaware of?
- Are your strengths correctly captured?
Red flags: confidently hallucinated negative claims. These are harder to fix than real negative feedback because they have no source to address. You have to flood the zone with accurate contrasting information over time.
Prompt 6: Recency query
What has [Your Brand] shipped or announced recently? What are their latest products?
This tests whether the model's knowledge reflects your current state or a stale snapshot. For training-data-only providers, expect some lag. For search-augmented providers, the answer should be reasonably current.
Red flags: the model's "latest" news about you is from eighteen months ago even on search-augmented providers. This suggests your recent announcements are not being indexed effectively.
Prompt 7: Founder and leadership
Who founded [Your Brand]? Who is the current CEO?
This tests Recognition of your specific people. It is often where the most embarrassing errors show up — wrong founder names, outdated leadership, confusion between your company and another.
Red flags: confidently wrong answers. Easily fixable through a cleaner Person schema on your leadership pages and better cross-referencing in your press coverage.
Prompt 8: Reverse identification
I am using a tool that [describe two or three of your specific features in plain English]. What tool might this be?
This tests AI Discoverability in a specific way — can the model reverse-engineer your product from its description? If it correctly names your product, your feature positioning is well-indexed. If it names a competitor or says "I cannot identify a specific tool from this description," your feature descriptions are either too generic or not well-associated with your brand.
Red flags: the model names a competitor. This means your positioning is similar enough to the competitor's that the model defaults to their name.
Running the Diagnostic Systematically
The pragmatic process:
- Open five tabs: one for each provider. Use a clean state (incognito mode, or a fresh chat) to avoid prior context bleeding in.
- Prepare a spreadsheet with eight rows (prompts) and five columns (providers), plus a "notes" column.
- Run each prompt identically across all five providers before moving to the next prompt. This is important — do not run all eight prompts in one provider, then move on. Consistent ordering helps you compare.
- Record the key findings, not the full response. For each cell, write a short summary: "recognized, positioning current, founding year wrong" or "not recognized" or "confused with Beta Corp."
- After all 40 cells are filled, look for patterns. The highlights will be obvious.
Interpreting the Pattern Grid
The spreadsheet at the end will tell you where to focus. A few typical patterns and what they mean.
Pattern A: Strong Recognition, weak Contextual Recall.
Prompt 1 (direct) returns good answers; prompt 2 (category) omits you. The model knows you when asked but does not think of you when asked about your category. The fix involves content strategy — more category-contextual writing, more trade coverage within the category, stronger entity structure that ties you to the category explicitly. See The Entity-First Content Playbook.
Pattern B: Accurate facts, stale positioning.
Founding year correct, founders correct, but the product is described using a tagline from two years ago. Training data carries memory that outruns your marketing updates. The fix is a combination of pushing fresh content to sources the model cites (press, trade publications, Wikipedia if applicable), updating your own on-site copy to be more explicitly current-year-dated, and being patient for the next training cycle to catch up.
Pattern C: Good recognition, weak sentiment.
The model knows you, describes you neutrally or negatively, mentions complaints you did not know about. This is almost always an indicator of Reddit, G2, or other community presence issues. See G2, Capterra, Trustpilot and the Reddit ladder.
Pattern D: Invisible across most dimensions.
The model does not know you, does not list you, cannot identify you from feature descriptions. You are genuinely not in the model's map. The fix is a full-stack GEO effort — earn citations on trusted sources, earn a Wikipedia entry if eligible, build entity structure into on-site content, and commit to a twelve-month timeline.
Pattern E: Conflicting answers across providers.
Three providers describe you accurately, two get you wrong. Usually means the majority-correct providers are pulling from better sources (Wikipedia, recent news) while the minority-wrong providers are relying on older training data. As base models retrain, the gap closes. Continuing to strengthen your external sources accelerates that.
What This Diagnostic Will Not Tell You
Several things manual diagnostics are bad at:
- Quantifying the gap. You see "the model does not recognize us" but you do not know whether you are at 20/100 or 40/100 on Recognition. Structured scoring requires aggregation across many prompts and runs.
- Tracking trends. A one-off diagnostic tells you where you are today. It does not tell you whether you are improving or declining. Monitoring requires repeated runs over time.
- Competitive positioning. The diagnostic tells you how the model describes you. It does not tell you how it describes competitors in the aggregate — which is half the picture.
- Per-category performance. Your 30 most important prompts in your category may have very different patterns than any single prompt.
These limitations are why structured tools exist. The diagnostic is the triage that tells you whether you need the structured tool.
The Output You Want
At the end of ninety minutes, you should be able to answer these four questions:
- Are we recognized at all by the major providers? (Yes across most, yes across some, no across most.)
- Are we surfaced on category-level queries? (Consistently, inconsistently, rarely.)
- Is our current positioning accurately reflected? (Yes, partially, no.)
- What is the biggest single issue? (A specific identifiable gap — wrong founder, missing category, stale positioning, etc.)
Those four answers are enough to decide whether to keep going. If all four look healthy, you can deprioritize structured measurement for a quarter. If two or more look concerning, you have a measurement and improvement project for the next six months.
When you want to turn this manual diagnostic into a per-provider scored baseline with concrete findings, a BrandGEO audit does it across five providers in about two minutes.
See how AI describes your brand
BrandGEO runs structured prompts across ChatGPT, Claude, Gemini, Grok, and DeepSeek — and scores your brand across six dimensions. Two minutes, no credit card.