This is absolutely right. And when you think about it, the reason behind has been staring us in the face: people who want to do machine learning approach everything as a machine learning problem. It's really common to see people handwave away the "easy stuff" because they want to get credit for doing the "hard stuff."
It's not just the data scientists fault. I once heard our chief data scientist point out that they don't want to hand off a linear regression as a machine learning model -- as if a delivered solution to a problem has a minimal complexity. She absolutely had a point.
Clients are paying for a Ph.D. to solve problems in a Ph.D way. If we delivered the client a simple, yet effective solution, there's the risk of blow-back from the client for being too rudimentary. I'm certain this extends attitude extends to in-house data scientists as well. Nobody wants to be the data "scientist" who delivers the work of a data "analyst." Even when the best solution is a simple SQL query.
Our company kind of sidesteps this problem by having a tiered approach, where companies are paying for engineering, analysis, visualization, and data science work for all projects. So if a client is at the simple analysis level, we deliver at that level, with the understanding that this is the foundational work for more advanced features. It turns out to be a winning strategy, because while every client wants to land on the moon, most of them figure out that they are perfectly happy to with a Cessna once they have one.
> Clients are paying for a Ph.D. to solve problems in a Ph.D way.
Ideally, "in a PhD way" is with careful attention to problem framing, understanding prior art, and well-structured research roadmaps.
I worry about PhD graduates who seemingly never spent much time hanging out with postdocs. Advisors teach a lot, but some approach considerations can be gleaned more easily from postdocs gunning for academic posts.
How good are data scientists in building reliable, scalable systems? My anecdotal experience has been that many don’t bother or care to learn good software development practices, so the systems they build almost always work well for specific use cases but are hard to productionize.
It's not just the data scientists fault. I once heard our chief data scientist point out that they don't want to hand off a linear regression as a machine learning model -- as if a delivered solution to a problem has a minimal complexity. She absolutely had a point.
Clients are paying for a Ph.D. to solve problems in a Ph.D way. If we delivered the client a simple, yet effective solution, there's the risk of blow-back from the client for being too rudimentary. I'm certain this extends attitude extends to in-house data scientists as well. Nobody wants to be the data "scientist" who delivers the work of a data "analyst." Even when the best solution is a simple SQL query.
Our company kind of sidesteps this problem by having a tiered approach, where companies are paying for engineering, analysis, visualization, and data science work for all projects. So if a client is at the simple analysis level, we deliver at that level, with the understanding that this is the foundational work for more advanced features. It turns out to be a winning strategy, because while every client wants to land on the moon, most of them figure out that they are perfectly happy to with a Cessna once they have one.