I’ve been with RapidMiner for about six months now, and in my time here, I’ve become acutely aware of a recent trend in our industry. I’d call it ‘a dark secret’ of enterprise AI, but it’s actually not a secret. At a conversational level, the trend is simply that many organizations are struggling to deliver the promised benefits of data science and machine learning. And it doesn’t need to be this way.
This is a topic that we’re passionate about at RapidMiner because we’re obsessed with model impact. Every fiber of our organizations is deeply invested in helping our customers deploy more models into production, where they can have a direct and measurable business impact.
We coined the phrase the ‘Model Impact Disaster’ to explain this phenomenon and demonize the issue so we can collectively rally against it. We documented the trend in our recent infographic so we could better understand how widespread it actually is and find the root causes.
If you haven’t taken a look at our infographic yet, check it out before reading any further!
The infographic provides a great snapshot of the phenomenon at scale, analyzes the stages of model creation and operations, and isolates challenges that manifest at each stage and contribute to the disaster.
What’s happening is ultimately not the fault of data scientists, the models, or the tools that are being used. But, there are unseen obstacles to overcome as you move down the path toward becoming ‘an AI-driven organization.’ If you know where these obstacles are hiding, you can hurdle them with confidence. If you trip over them, your team may lose faith in data science—and that’s an outcome that’s unnecessary and avoidable.
How These 6 Trends are Contributing to the Model Impact Disaster
These micro-obstacles encountered by analytics teams are often caused by macro-trends in the world. As a follow-up to the infographic, I thought it would be interesting to examine the macro-trends that are allowing the disaster to devastate data science.
1. The Need for More Experts—Part 1
This is the grand-daddy of all the troubling macro-trends. The skills gap not only creates the simple problem of not having enough people to do the work at hand, but it also means that organizations that do have data science resources are usually overworking them. There’s a lack of understanding that data scientists often have to pivot between domains (marketing, finance, HR, etc.) so fast, it would make anyone’s head spin.
Oftentimes, data scientists are so overwhelmed that proper iteration and communication is sacrificed from the scoping and model creation process. The result is that many models are created without proper communication with stakeholders and domain experts, and they may not make sense. They may not fit the workflow. In the end, the hard work goes to waste way too often.
2. Team Design is Hard
There’s no right way to architect and deploy a data science team. There’s a lot of great content already published on this topic, so I won’t even attempt to document the best practices in this blog (if you’re interested, the best overview I’ve seen starts on page 43 of the Booz Allen’s Field Guide to Data Science). Data science expertise (trained experts OR citizen data scientists) can either be centralized or decentralized.
As data science initiatives proliferate and expectations grow, especially in a business context, the importance of collaboration and involvement with domain experts and business units has never been more critical.
This is usually easier in a decentralized model. However, as you decentralize data scientists, sometimes human nature kicks in and they start to align TOO much with the domain and can become motivated to only build models that influence that function of the business. Or, they can lose touch with best practices and technical resources required to effectively do their job.
There’s no right answer, but at the end of the day the design needs to fit the business so that silos don’t develop that can stifle necessary collaboration and communication. This can make it difficult to develop models that have impact. It can also make it especially hard to foster buy-in from business counterparts to roll a model into production.
3. Accountability is the New Accuracy
For years we’ve been getting away with building shiny new models and instruments without consideration for documenting and managing the business impact of these science projects. That’s all changing—businesses want to see proof.
This means that the evaluation stage of the CRISP-DM process becomes more and more complex. Gone are the days where model accuracy and performance are the only things that matter.
As data scientists evaluate the hundreds (sometimes thousands) of models for a particular use case, additional steps, precautions, and calculations must be made to understand and select the model will have optimal business impact in terms of driving cost reduction, revenue gain, or risk reduction. This is hard, and most automated ML platforms don’t take that into account (blatant sales pitch: RapidMiner does).
4. Unintended Consequences of Democratization
AutoML has swooped in to save us all from the shortage of data scientists. It’s fantastic, and it alleviates and eliminates so many issues that AI-ambitious organizations run up against. However, it’s also very important to acknowledge that these new tools can create and amplify a new set of issues. Like I said earlier in this post, awareness of obstacles = confidently hurdling. Not knowing = tripping and falling on your face.
Over reliance on AutoML can sometimes obscure the ‘explainability’ of models, which makes it hard for stakeholders (and sometimes even the creators) to understand how and why models were created. This makes it difficult to tweak and fine-tune a model for the specific use case.
Furthermore, it can make it incredibly hard to garner cross-functional buy-in to push models into deployment. Buy-in is critical, because deployment requires resources and attention. When we can’t explain the models and obtain buy-in, we end up with way too many half-baked models and use cases unnecessarily dying on the vine. The disaster thrives on this.
The other issue is that the model creation funnel now becomes very top-heavy. More people are creating more models for more use cases. This is great! But it leads us back to the grand-daddy of these trends….
5. The Need for More Experts—Part 2
As more models are created, more stress is put on the ‘Evaluation’ stage. In many cases, this means an expert (data science team if available) is then forced to re-evaluate the quality and performance of models that other people have produced, in the form of data science QA.
This piles more work onto a group that’s already chronically over-worked. More importantly, this even further underscores the importance of explainability so the QA work can be done quickly and effectively.
However, model creation and evaluation aren’t the only labor-intensive parts of the data science lifecycle. Deploying a model usually requires technical knowledge, an understanding of your organization’s IT infrastructure and policies, and (occasionally) coding experience.
Sometimes, the person who created the model doesn’t possess these qualities. Sometimes, they rely on the domain expert or IT to finish the job, who may not be available or have other priorities. In large, bureaucratic organizations, this can mean securing resource commitment at a high level to ensure the work gets done—which means more convincing (and more emphasis on explainability).
In fact, the production process and deployment aspects are often left to disciplines that lie at the intersection of various practices such as Unfortunately, DevOps is another area where we’re facing a tremendous shortage of experts. 31% of respondents to a recent Gartner survey cited lack of DevOps as their main barrier to delivering business value with ML and data science.
6. The Need for More Experts—Part 3
Finally, as any economist knows, a scarce resource that’s in high demand almost always becomes a valuable one. In our case, this means many organizations who are successful in hiring data scientists still battle the constant pressure of turnover. This is especially troubling when you consider that a data scientist’s work doesn’t end when a model is deployed—you must continuously monitor how your models are performing and shifting with new data. If an expert builds a model, deploys it, then leaves the organization, who’s going to do this monitoring and management? This is the one area where automation and democratization MUST catch up, so that ANYONE can easily perform model ops.
How Do You Protect Yourself from the Disaster?
All of this may seem scary or daunting, but I will always be of the mindset that it’s best to shine a light on where you’re going before you start sprinting—this is definitely true as you embark down ‘the path to enterprise AI’.
Not all of these issues can be tackled with technology. Some of them require face-to-face conversations with other human beings and even restructuring of workflows and teams.
When it comes to technology, though, I’d be remiss if I didn’t mention our attempt to provide shelter from the model impact disaster and help address the micro-trends that organizations are facing with RapidMiner. Our platform offers:
- Simplified model deployment and management
- An unpackable, understandable, fully explainable visual workflow designer (as well as a notebook environment and automated data science capabilities)
- Skills-specific Academy courses to upskill your existing workforce and teach them how to do real data science
- A reduced need for DevOps and technical experts
Want to protect yourself from the model impact disaster and start making a real impact with data science? Request a demo today!