I keep circling that question and would love some real pushback - from where I'm sitting it looks like the second thing. But I might be missing something obvious.
Quick context. Running three projects simultaneously: a native AI Mac app, an AI web platform, and a small marketing agency that helps promote the first two. They don't share much technically - three Supabase projects, three Stripe accounts, a few single-digit TB of data spread across them. But the questions I have about them every week are basically the same: where did MRR move? Which cohorts converted? Which campaigns drove real usage, not just signups?
My current setup, mostly by accident, is pointing Codex at Supabase and Stripe and asking. It works surprisingly well. The thing I keep noticing is that most of the work isn't the SQL - it's me re-explaining the business every time. Which Stripe product maps to which app? What "active user" means this week? Which subscription states actually count as revenue? The agent is great at SQL. The slow part is teaching it what anything actually means.
The embedded side has the same shape. The agency's product ships reporting to clients, and right now that's Supabase queries with a UI on top. It works, but every new report quietly forks the metric definitions a little. Nothing dramatic - just enough that revenue on the dashboard and revenue in the weekly export don't quite match if you squint.
So the thing I'd love input on, especially from people running internal and embedded analytics on a few TB of OLTP Postgres:
At this scale, is the right move a proper semantic layer (I'm mostly torn between Cube and dbt Semantic Layer) sitting between the raw data and everything downstream, so internal questions, embedded reports, and the LLM all hit the same metric definitions?
Or is that overkill for this shape, and the more honest answer is a typed metrics module in app code, a small analytical replica (DuckDB, ClickHouse, or just a read replica with the right indexes), and letting the LLM rebuild context per session?
Happy to be told I'm overthinking it. That would honestly be the best outcome.
For additional context, someone in the original thread recommended dbt Semantic Layer as lightweight. Another person said yes, a semantic layer is a good next step for accuracy and also for the token bill. That aligns with what I'm leaning towards for anyone who's convinced themselves that asking English questions of the data is better than traditional dashboarding.