I've been scratching my head over this for a while. Why isn't there a standard, comprehensive pipeline format that data analysts follow when selecting and executing models? Something like: upload your dataset, answer a few preliminary questions (e.g., "Do you care about explainability?" or "What's your business objective?"), then get pipelined to the next step based on your answers. Depending on your responses, you'd be prompted to clean data a certain way, apply business knowledge, or run a heteroscedasticity check.
I'm finishing my Master's in Analytics, and I keep hearing that it's impossible because every problem is unique and domain knowledge is key. But honestly, across all my projects, I keep hitting the same steps. Surely we can codify that?
I get that in the real world, the data, people, and politics are messy. And I've heard the counterargument: any steps generic enough to apply across all pipelines are too vague to be useful. But I'm not sure I buy that. Maybe the issue is that we haven't tried hard enough to build a decision tree or a framework that adapts.
Someone in the discussion pointed out that the variety in maturity, tech stacks, and BI tools makes it impossible. Maybe that's true. But I think we could still benefit from a structured starting point - a kind of content calendar for analytics. After all, every analysis shares foundational stages: understanding the business problem, data exploration, cleaning, modelling, validation, deployment. Why not formalise that?
I'd love to see a template or workflow that analysts can use as a backbone, then customise based on their domain. Thoughts?