10 March 2020

Blog

A Manifesto for Data Science

I recently gave a keynote at our annual Wisdom user conference, and one of the main topics I discussed was the concept of model resilience and why it should be treated as more important than accuracy.

Of course, ignoring accuracy is hard to do since we have a long-standing culture of chasing the wrong incentives in data science. Starting in college, most data scientists are challenged to find more accurate models. And after college, data scientists continue to enjoy the puzzle of finding the best possible model.

Even famous competitions like the Netflix Prize prioritized accuracy over impact, asking how they could improve the correctness of their algorithm by 10% rather than how they could make better TV shows.

Wrong incentives in data science

Don’t get me wrong: having objective criteria to measure model performance is obviously a good thing. But it can become problematic if optimization becomes the only incentive for data scientists. Keep in mind that data scientists are people, and people do what they’re incentivized to do. But this can be a problem when the incentives are wrong!

This long tradition of focusing too much on model accuracy has led to companies further perpetuating the trend by posting data science challenges in the form of Kaggle competitions where people around the globe compete for highest levels of accuracy.

Kaggle, the smart people working there, and the smart people who participate in the competitions have contributed a tremendous amount of value to both industry and education, and my point is certainly not to criticize Kaggle or anyone else organizing machine learning competitions.

However, the existence of these kinds of platforms is a symptom of the current data science culture: that everything is supposed to be about optimizing model performance. Kaggle even has a ranking system with the highest rank called Grandmaster, very similar to chess. And some of these grandmasters desperately want to be in the number one spot.

Fraud in machine learning competitions

This ambition even led to a case of fraud in the winner of a Kaggle competition. This particular competition was about optimizing the online profiles of animals living in shelters to improve their chances to be adopted. One team found a way to leak the label information from the test set into the model, leading to them winning the $10,000 in prize money.

It is very sad indeed that such brilliant people, including a highly respected Kaggle Grandmaster, have gone to such lengths to defraud a welfare competition aimed at saving precious animal lives, solely for their own financial gain. —Andy Koh, the founder of PetFinder.my

Yes, it is. It is very sad. The financial gain was not even that large, given that the team of three split $10,000, not a million dollars or anything like that. In fact, the Kaggle Grandmaster admitted himself that it was never about the money, but about the “Kaggle points”.

But what’s even more shocking is that your organization might already be suffering from this kind of “fraud”. What do I mean?

We know that fixing data science mistakes often makes our models perform worse. For example, if you accidentally leak label information into the data as a result of feature engineering, correcting this mistake will of course make your error rates go up.

We obviously can’t be sure how often this happens, and we don’t catch it. Nor can we be sure how often data scientists do uncover their own mistakes, only to hide them to look good. Or the number of times they commit fraud on purpose, maybe to hit a tight deadline or angle for a promotion.

Fraud in scientific publications

Although we don’t know how often it happens, and we don’t know if it’s happening in your organization, we do know that it’s happened more than zero times. And honestly, we shouldn’t be surprised. This is exactly what result if you combine a system of wrong incentives with a lack of control.

This isn’t a new problem, nor is it one that’s unique to data science. Scientific publications have been suffering from issues with reproducibility and cherry-picking of data for decades. I highly recommend reading Are We Really Making Much Progress?: A Worrying Analysis of Recent Neural Recommendation Approaches and A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research.

This is what my colleague Dr. Martin Schmitz had to say after reading these:

This paper investigated if new, cool deep learning algorithms are outperforming ‘traditional’ methods. The results are SHOCKING to me. In most cases, simple methods are more than competitive. And a lot of the reported results are not even reproducible.

And:

The craziest thing is the last chapter on SpectralCF where it seems that a ‘favorable’ test split was chosen for better results.

Wow. And these are scientific publications!

The model impact disaster as an excuse

I’ve said before that we waste a lot of time on modeling. And we’re investing this time for the wrong reasons. Sometimes we hide mistakes or even commit fraud to make ourselves look better. That seems like a pretty bleak situation. But do you know what makes it even worse?

The fact that everybody seems to be aware of this problem, but because the expectation is that the models won’t be operationalized anyway, they don’t really care. Here’s an example of what I mean—this is a reply to a social media post about model production:

I understand your point. The proposed solutions are never intended to be applied directly. It’s an innovation arena to learn, share ideas and to rapidly find promising solutions around a problem. Orthogonally, there is a way to confirm ranking robustness.

This was the answer of an unnamed Kaggle Grandmaster when he was challenged about Kaggle competitions. The response pretty much said: “Well, it doesn’t matter, those models will never go into production anyways!”

It seems that we are using the model impact disaster as an excuse to take shortcuts, or to ignore mistakes in order to look better, or—in hopefully rare cases—even to commit fraud.

To me, all of this means that we need to fundamentally change data science culture. And we need to do it soon.

Just like other fields have a statement of principles to guide their behavior, like doctors with the Hippocratic Oath for example, data scientists need a set of standards to ensure that we’re using AI to benefit everyone.

Enter RADIUS.

RADIUS – A manifesto for data science

The goal of this manifesto is to help guide the behavior of data scientists like myself, to make sure:

The basic principles embodied by RADIUS are laid out below.

Summary of RADIUS

Here are the policies:

In conclusion

I hope this will help you to become better data scientists and avoid some of the traps that I—and probably others—have fallen into in the past.

I am convinced that if we all follow these basic principles, we will be able to do data science and machine learning with impact. And we will not only predict, but even positively shape our future!

If you want help making sure you’re doing data science right, feel free to reach out to us for a free, no-obligation AI Assessment to help you explore and define the potential impact of AI and ML on your business.

Related Resources