There is much at stake when it comes to running your own business and as with any business tool used to enhance efficiency, there both are positives and negatives to using data science. In a recent webinar, RapidMiner’s very own founder and president, Dr. Ingo Mierswa, a data science pioneer and award-winning author on predictive analytics and big data discussed how to harness data science to make data-driven decisions and avoid typical modeling mistakes that lead to huge losses in your business.

Success versus Failure: Amazon & Google AdWords versus Tesco

Machine learning success stories that are familiar to most come out of companies such as Amazon and Google AdWords taking advantage of the opportunity afforded them by data science to match their customer’s interests through ads and recommender systems. This ability to read the minds of their customers by communicating with them indirectly funneled straight into their revenue, a difference of billions of dollars.

Tesco company timeline modeling mistakes

Adversely, we can see what happens when companies like Tesco invest heavily in predictive analytics to make more informed promotions and advertisements within their customer loyalty programs. In the beginning, Tesco achieved incredible profit growth—up to almost a seven X growth on the profits in 10.15 years. This success ended quite suddenly when the customer sentiment turned against them—customers felt they were sharing too much of their data with no return value from Tesco. This is a case where the predictive model employed was not necessarily faulty—just incorrectly used by Tesco.

Analyzing Aliens: Correlation and Causation    

There are a lot of interesting patterns found in UFO sightings that can (rather unconventionally) help you to understand the danger in confusing correlation & causation when building a model for your business. The National UFO Reporting Center offers a wealth of structured data pertaining to the number of UFO sightings per year since 1963. In 1993, after the first episode of TV cult series The X-Files was viewed by more than 5 million people in the United States, the number of UFO sightings dramatically increased. The chart below shows the number of UFO sightings by hour and day, Monday through Sunday, the yellow representing more UFO sightings. Dr. Ingo emphasizes the importance of being careful not to confuse causation with correlation as this will impact the accuracy of your model and results. That confusion could translate into trying to explain the trend of this data in several different ways; perhaps aliens are social beings who have adopted our partying habits and only visit Earth on Saturday nights, or, perhaps these sightings are reported by drunk frat boys mistaking planes overhead for UFOs?

UFO sightings day and time chart modeling mistakes

Making these modeling mistakes can incur huge loss for a business. People see something and they model it or the model leads them to think in the wrong way and consequentially their model won’t perform well in practice. Even if you do model your data in the right way but you validate your model in the wrong way, you still won’t know how well your model works.

Advice for Avoiding Deadly Modeling Mistakes

RapidMiner modeling screenshot modeling mistakes

Dr. Ingo has only the best advice to give on how to avoid losses and miscalculations in business when using predictive analytics. Start with simple models that are understandable and actionable because the data still needs to be investigated and understood—a simple model may even outperform a complex one. Spend some time on feature engineering and use common sense when doing so. Focus on data preparation over trying to always keep up with the latest and greatest models. Normalize the data before you do the model-building and the cross-validation. Check for correlations before modeling and remove factors with too high of correlations. Do not calculate the training errors and always use cross-validation—keep all data transformations that work across the inside of the cross validation when validating a model. Above, you can see a few of these tips in action as Dr. Ingo validates a model in the Titanic Survival data set.


To continue learning about how aliens and customer mind-reading can make or break your business depending on how you harness data science, watch this on-demand webinar and see the full product demonstration. RapidMiner can make your business more productive with real data science that covers the whole data science spectrum from data preparation to modeling. Download RapidMiner Studio today and become a part of our mission to help organizations unleash data-driven decisions and extract new business value.