I’ve been with RapidMiner for about 6 months now and in my time here, I’ve become acutely aware of a recent trend in our industry. I’d call it ‘a dark secret’ of enterprise AI… but it’s actually not a secret. At a conversational level, the trend is simply that many organizations are struggling to deliver the promised benefits of data science and machine learning. And it doesn’t need to be this way.
This is a topic that we’re passionate about at RapidMiner, because we are obsessed with model impact. Every fiber of our organizations is deeply invested in helping our customers deploy more models into production, where they can have a direct and measurable business impact
We coined the phrase the ‘Model Impact Epidemic’ to explain this phenomenon and demonize the issue so we can collectively rally against it. We documented the trend in our recent infographic so we could better understand how widespread it actually is and find the root causes. If you haven’t taken a look at the infographic yet, check it out before reading any further!
The infographic provides a great snapshot of the epidemic at scale, analyzes the stages of model creation & operations and isolates challenges that manifest at each stage and contribute to the epidemic.
This epidemic is ultimately not the fault of data scientists, the models, or the tools that are being used, but there are unseen obstacles to overcome as you move down the path of being ‘an AI-driven organization.’ If you know where these obstacles are hiding, you can hurdle them with confidence. If you trip over them, your team may lose faith in data science – and that’s an outcome that’s unnecessary and avoidable.
These micro-obstacles (which we will investigate further in this upcoming webinar with Ingo) encountered by analytics teams are often caused by macro-trends in the world. As a follow-up to the infographic, I thought it would be interesting to examine the macro-trends that are allowing the epidemic to spread freely.
1. Need for more experts Part 1
This is the grand-daddy of all the troubling macro-trends. The skills gap not only creates the simple problem of not having enough people to do the work at hand, but it also means that organizations that do have data science resources, are usually overworking them, misunderstanding their work sometimes asking them to pivot between domains (marketing, finance, HR, etc.) so fast, it would make anyone’s head spin. Oftentimes, data scientists are so overwhelmed that proper iteration and communication is sacrificed from the scoping and model creation process. The result is that many models are created without proper communication with stakeholders and domain experts and they may not make sense. They may not fit the workflow. In the end, the hard work goes to waste way too often.
2. Team design is hard
There’s no right way to architect and deploy a data science team. There’s a lot of great content already published on this topic, so I won’t even attempt to document the best practices in this blog (if you’re interested, the best overview I’ve seen starts on page 43 of the Booz Allen’s Field Guide to Data Science). Data science expertise (trained exerts OR citizen data scientists) can either be centralized or decentralized.
As data science initiatives proliferate and expectations grow, especially in a business context, the importance of collaboration and involvement with domain experts and business units has never been more critical. This is usually easier in a decentralized model. However, as you decentralize data scientists, sometimes human nature kicks in and they start to align TOO much with the domain and can become motivated to only build models that influence that function of the business – or they can lose touch with best practices and technical resources required to effectively do their job.
There’s no right answer, but at the end of the day the design needs to fit the business so that silos don’t develop that can stifle the necessary collaboration and communication. This can make it especially hard to develop models that have impact. It can also make it especially hard to foster buy-in from business counterparts to roll a model into production.
3. Accountability is the new accuracy
For years we’ve been getting away with building shiny new models and instruments without consideration for documenting and managing the business impact of these science projects. That’s all changing. Businesses want to see proof.
This means that the evaluation stage of the CRISP-DM process becomes more and more complex. Gone are the days where model accuracy and performance are the only things that matter. As data scientists evaluate the hundreds (sometimes thousands of models) for a particular use case, additional steps, precautions and calculations must be made in order to understand and select the model that projects to have the optimal business impact in terms of driving cost reduction, revenue gain, or risk reduction. This is hard, and most automated ML platforms don’t take that into account (blatant sales pitch: RapidMiner does).
4. Unintended consequences of democratization
AutoML has swooped in to save us all from the shortage of data scientists. It’s fantastic and it alleviates and eliminates so many issues that AI-ambitious organizations run up against. However, it is also very important to acknowledge that these new tools can create and amplify a new set of issues. Like I said earlier in this post, awareness of obstacles = confidently hurdling. Not knowing = tripping and falling on your face.
Over reliance on AutoML can sometimes obscures the ‘explainability’ of models which makes it hard for stakeholders (and sometimes even the creators) to understand how and why models were created. This makes it difficult to tweak and fine-tune a model for the use case.
Furthermore, it can make it incredibly hard to garner cross-functional buy-in to push models into deployment. Buy-in is critical, because deployment requires resources and attention. When we can’t explain the models and obtain buy-in, we end up with way too many half-baked models and use cases unnecessarily dying on the vine. The epidemic thrives on this.
The other issue is that the model creation funnel now becomes very top-heavy. More people are creating more models for more use cases. This is great! But it leads us back to the grand-daddy of these trends….
5. The need for experts part 2
As more models are created more stress is put on the ‘Evaluation’ stage. In many cases, this means an expert (data science team if available) is then forced to re-evaluate the quality and performance of models that other people have produced, in a form of data science QA. This piles more work onto a group that’s already chronically over-worked. More importantly, this even further underscores the importance of explainability so the QA work can be done quickly and effectively.
However, model creation and evaluation aren’t the only labor-intensive parts of the data science lifecycle. Deploying a model usually requires technical knowledge, an understanding of your organization’s IT infrastructure and policies, and sometimes coding experience. Sometimes the person who created the model doesn’t possess these qualities. Sometimes they are relying on the domain expert or IT to finish the job, who may not be available or have other priorities. In large, bureaucratic organizations this can mean securing resource commitment at a high level to ensure the work gets done – which means more convincing (and more emphasis on explainability).
In fact, the production process and deployment aspects are often left to disciplines that lie in the intersection of various practices such as Unfortunately, DevOps is another area where we’re facing a tremendous shortage of experts. 31% of respondents to a recent Gartner survey cited lack of DevOps as their main barrier to delivering business value with ML and data science.
6. The need for experts part 3
Finally, as any economist knows, a scarce resource that’s in high demand almost always becomes a valuable one. In our case, this means many organizations who are successful in hiring data scientists, still battle the constant pressure of turnover. This is especially troubling when you consider that a data scientists work doesn’t end when a model is deployed – you must continuously monitor how your models are performing and shifting with new data. If an expert builds a model, deploys it, then leaves the organization, who’s going to do this monitoring and management? This is the one area where automation and democratization MUST catch up, so that ANYONE can easily perform model ops.
HOW DO YOU INOCULATE YOURSELF?
All of this may seem scary or daunting, but I will always be of the mindset that it’s best to shine a light on where you’re going before you start sprinting – this is definitely true as you embark down ‘the path to enterprise AI’. Not all of these issues can be tackled with technology, some of them require face-to-face conversations with other human beings and even restructuring of workflows and teams.
When it comes to technology though, I’d be remiss if I didn’t mention our attempt to cure the epidemic and help address the micro-trends that organizations are facing with RapidMiner 9.4. This new release offers:
- Democratization of the labor-intensive tasks of data science to close the skills gap
- Streamlined communications and improved explainability of models
- A new way to make model deployment and management easy for any user
- A reduced need for DevOps and technical experts
Learn more about the latest version of our market-leading platform here.