

I spend a ton of time immersed in the work that RapidMiner users are doing, and I like to think that I have my finger on the pulse of AI, from cutting edge developments to challenges and problems. This often leads to pondering what the future of artificial intelligence looks like.
Although even the best machine learning models can’t predict the future of data science, that doesn’t stop us humans from trying to figure out what’s coming. In this post, I’ll give you five predictions for trends that I think are going to have a significant impact on the future of AI and machine learning.
1. Models Controlling Other Models
Given the number of models that are fueled by machine learning (ML), I expect an increase in the training of models to detect a misfit between a production model and changing world circumstances, and possibly even correct those issues. What’s more, because a mismatch between a model and the real world is also the cause of model bias, models that can identify these issues would be a huge step toward ethical data science.
Because the world is constantly changing—in terms of behavior, technology, economics, costs, etc.—if models don’t change with the world, the resulting misfits will render them inappropriate for implementation. This is why I expect that to see a shift towards developing models tracking the misfits in other models, and implementing corrections as needed.
2. Democratization of Auto Deep Learning
Automated machine learning is not enough. Automated ML was a big trend in 2018, and then came deep learning (DL), which is great for highly unstructured data. Using a categorical or numerical input, with DL you can create a dynamic, complex output – for example, images of dreams based on EEG signals. But the power of deep learning comes with a downside: these models are harder to optimize, and the complex structures require specific forms of user interfaces.
As a response to these restrictions, in 2020 I anticipate that a democratization stage of automated deep learning will occur that will enable DL to be more readily and accurately applied to solve data science problems by creating more complex outcome models.
3. Training Without Labels
The most successful models use supervised learning where we know what we want the model to predict from our previous real-world experience, which allows data scientists to easily validate the model’s results. However, finding or generating the data necessary to train these models can be a costly, challenging undertaking.
To get around this issue, data scientists have developed a number of ways to create models without having access to lots of training data. In active learning, for example, models create data points that will help refine their predictions, and then ask a human for a decision on those impactful cases. We can also transfer models between different industries or applications areas, and then tune the model for the new use case, although this obviously introduces its own problems.
In the manufacturing industry especially, however, I expect a rise of the use of digital twins—virtual representations of complex processes that are created from both historical and live data—to generate simulated training labels.
4. Accountability is the New Accuracy
A cultural change in the data science community is coming where the focus shifts from creating the most accurate models to ensuring data science teams are held accountable for the impact that their models produce. Every newly deployed model will create business impact, whether it’s good or bad.
For decades, we’ve been over-investing in tweaking models to prioritize accuracy. But when we tweak our models to boost accuracy as high as possible, we’re only making them accurate for that particular moment in time. But because the world is constantly changing, model accuracy will decrease over time. In order to avoid nasty surprises when models go into production, we need to shift to creating more “resilient” models that can keep high accuracies for a longer time. A resilient model might perform with less accuracy than other models at certain times but be able to keep its accuracy for longer without tweaking.
Holding data science teams accountable for this long-term resiliency and business impact is going to change the way that both business and data scientists view the models that they build. Optimizing for accuracy is important, but accuracy alone cannot be the only way that we think about model impact in the future.
5. Ensemble 2.0: Deep Features and Explainable AI
Ensemble model take multiple distinct ML models, get predictions from all of them, and then use those to generate a single prediction in the hope that they perform better than any one individual model would. You’re essentially taking advantage of the “wisdom of crowds” but your “crowd” is a bunch of ML models.
For example, if you take a candy jar and ask several individuals to guess the number of candies in the jar, it’s most likely that the average of all the answers collected will be more accurate than the closest individual guess. The current trend in ensembles goes beyond the modeling and takes feature engineering into account as well.
You can combine “deep features” that are derived from a DL model with a more understandable model like a tree-based model. This gets you the best of both worlds: the predictive power of complex DL models combined with the understandability of simpler types of models.
Deep features is an area of DL where this approach is particularly valuable. With deep features, you use algorithms from DL just for identifying features within your dataset. Say you feed images of various animals into a DL network. Instead of saying this is a lion and this is a dog, the deep features process takes the more complex animal features like a tail or a long snout or short ears and then combines them with a decision-making model to create a smarter, more easily interpretable model.
We’re (finally) heading into a world where we understand model bias and drift, and we’re developing ways to address and account for these issues. While data science and the models created within this realm of business operations can automate decisions, there still needs to be a greater responsibility and accountability for operating under the limits of human society and law.
Bias and model misfits are not technical problems but real-world problems, and they create issues that will continue to exist within data science models and ultimately impact business and society. But with an increasing rate of accountability among data scientists, I’m confident that we are headed into a decade of more human-centric, responsible, and ethical data science practices.
If you’re looking to get started on a new machine learning project, check out our whitepaper: A Human’s Guide to Machine Learning Projects.