oh, i feel this pain so deeply. There's this intoxicating temptation to just let a LLM do everything - like, why not, right? but that's how you end up with a link graph that looks like a spider had a seizure. the expensive part is getting the machine to "think" through every possible anchor, when most of that work should be purely mechanical.
my approach would be the same staged one: crawl or import every URL per client site, stash the title, H1, headings, body snippets, category, existing inlinks and outlinks, target keywords. Then build a topic-and-intent map using embeddings or simpler keyword/entity matching. That creates candidate source-target pairs. Then - only then - bring in the LLM for the truly creative bits: deciding where a link feels natural, what anchor text flows, whether it actually helps the reader. That's where the human-ish judgement matters.
and yes, guardrails, guardrails, guardrails. Max links per page, no duplicate anchors, don't spam-link the same URL, skip headings and nav and CTA blocks, steer clear of noindexed or canonicalised or redirected URLs, and for heaven's sake don't overwrite intentional editorial links. That last one ruins so many sites.
full disclosure: I'm part of the team behind Linkbot, which was built precisely because this is a pain. it crawls a site, spots contextual internal link opportunities, and automates the placement without requiring you to build the whole crawl-classify-place system from scratch. If you're running a SaaS where each project is a separate site, it's worth a peek before you burn weeks building and maintaining your own internal linking layer.
If you do decide to build it yourself, the architecture is basically: crawl/index → classify/map → generate candidates → place → QA rules → publish/review. the cardinal sin I'd avoid is "AI writes the article and just randomly shoves three to six internal links in there." That scales fast but creates a messy, soulless link graph if relevance and site structure aren't driving the decisions.