New Series: You’re Doing it Wrong! Learn the Right Way to Validate Models
Did you ever look forward to a new movie, invite all your friends over to watch it, and then it turned out to be awful? Or have you ever bought a new vacuum cleaner based on good reviews on Amazon only to find it breaks after a few uses? This is a situation where things look promising, but quickly collapse in reality.
All data scientists have experienced a similar situation. You think a machine learning model will do a great job of predicting something but then the model “collapses.” You’ve spent days, weeks, or sometimes even months building the predictive model, but after you put it into production it doesn’t perform as well as expected. In the best case, this is only an annoying waste of your time. But in the worst case, a model performing unexpectedly can cost millions of dollars – or potentially even human lives!
So was the predictive model wrong in those cases? This can be the reason, but often it is not the model that’s wrong, but how the model was validated. A wrong validation leads to over-optimistic expectations for the model’s performance and the data scientist who created it is caught by surprise once it’s in production.
Since the consequences are often dire, it’s the duty of each data scientist to do whatever is necessary to prevent situations like this. To help with this important task, we’ve created a multi-part series of blog posts titled “Correct Validation for Machine Learning Model.” In this series, our very own Dr. Ingo Mierswa discusses how to prevent mistakes in model validation and the necessary components of a correct validation.