Oh honey.
You sound like one of my annoying SEO clients that googles something on their logged in chrome and thinks that is representative of the entire world!
You cannot measure or track citations in LLMs. Anyone that tells you otherwise is a scammer, or an idiot.
LLM responses are generated, not retrieved. Unlike Google's index, there's no log of "this brand was shown to this user at this rank" because the model is producing tokens probabilistically each time. Two identical prompts can yield different mentions.
There's no public surface to scrape. You can't crawl ChatGPT or Claude the way you crawl a SERP. The "ranking" exists only inside a private inference call between the user and the provider.
Providers don't expose mention data. OpenAI, Anthropic, Google etc. don't publish per-brand impression counts, and aggregating personal chats would breach privacy commitments.
Outputs are personalised and context-dependent. The same brand question produces different answers depending on prior turns, system prompts, custom instructions, memory, geography, and which model version is serving the request — so even a sample size of "your own tests" isn't representative.
Sampling is the workaround everyone uses, but it's an estimate. Tools like Profound, AthenaHQ, etc. simulate prompts at scale and parse the answers. It's directionally useful but it's not measurement — it's polling. You're inferring share-of-voice from a synthetic prompt set that may or may not match what real users ask.
So when someone says "we rank #2 in ChatGPT for X," they mean "we appeared second in our own test runs." That's a useful signal, not a metric.