14 May 2018


Scaling Data Science Without Data Scientists

Data is the new oil, AI is the new electricity, and data science is the sexiest job of the 21st century. Okay, the hype and hyperbole in data science and artificial intelligence is over the top, but the opportunity is clearly generational. I’d argue it’s most important disruption I’ve seen in my life, more important than the internet and the rise of mobile.

Did you see the Google Duplex demo from I/O 2018? Google Assistant can now have a two-way conversation between an AI and a human. For example, the new Google AI can call a restaurant to book a reservation and the person on the other end won’t know it’s an AI bot (well, Google changed its mind and they will now). It’s jaw dropping, unbelievable stuff that was unthinkable a few years ago.

Everyone sees the obvious opportunity with AI, but the challenge I hear over and over is that companies can’t hire enough data scientists to take advantage of it. Sure companies like Google and Facebook can hire vast quantities of data scientists, and hedge funds can hoard quants, but what about everyone else? It’s a real problem.

As Eva Short of SiliconRepublic points out, “Demand is high but, crucially, supply is low. While this works out well for data science professionals, it could be ruinous for the economy if not addressed.” A 2017 report from Burning Glass Technologies, Business-Higher Education Forum and IBM forecasted that the number of jobs for all data openings will increase by 364,000 to a total of 2,727,000 by 2020.

So how are we going to fill all of these data science roles?

Is there a shortage of data science or data scientists?

Last week, I ran into Liz, a close family friend of mine at North Station in Boston. Liz has led analytics teams at multiple billion dollar companies in the Boston area and is now looking for her next analytics role. I was surprised to hear that most of the companies she’s speaking with have a strong preference for R or Python experience. Liz is not an R or Python programmer, nor does she have a PhD (just a Wharton MBA 😂.) She’s what Gartner calls a citizen data scientist. She has a solid background in math and statistics, and has worked on a number of customer analytics projects using data science and machine learning. Liz has delivered hundreds of millions of dollars of value as a senior analytics leader. It’s flat out insane to me that a company wouldn’t consider someone like Liz for a senior analytics role just because she doesn’t code. It would be like turning down a procedure from a world-class surgeon because of the brand of scalpel they use.

Maybe the shortage of data scientists is self-inflicted? Consider the ideal data science resume: a PhD in math & stats or computer science. Ideally both. An intimate knowledge of R, Python, and ML and deep learning frameworks. Expertise in multiple areas of business and fluency in communication and storytelling. The ability to benchpress at least 300lbs and run a sub 6 minute mile.
You know, be a unicorn 🦄. And we wonder why companies can’t hire enough data scientists, at least as they are currently defined?

I think this gets to the core of problem companies face when scaling their data science efforts: to deliver on the promise of data science, companies need to think beyond the unicorn data scientist profile.

Teaching data science to analysts at Western Kentucky University

Dr. Lily Popova Zhuhadar is the Associate Professor of Information Systems at Western Kentucky University. Dr. Zhuhadar runs an annual data science competition for her WKU students, using RapidMiner Studio to analyze data for a variety of use cases including customer churn, marketing segmentation, financial risk modeling, and cross selling.

The students present their findings as a formal case study, including producing beautiful posters like this to showcase their findings. The complete list of student projects is at the bottom of this post.

Did I mention that these projects were done by undergrads at WKU? These students don’t have PhDs. Most aren’t R or Python programmers. They probably can’t even benchpress 300lbs 🙂 But thanks to Dr. Zhuhadar, the students were given the foundational knowledge of data science, and tool in RapidMiner that allowed them to focus on the problem and not the implementation.

That’s why we think RapidMiner can play a transformative role in bringing data science to more analytic teams. RapidMiner Studio abstracts away the implementation details that require the of specialized unicorn persona without compromising the quality of the data science.

Yes, you can do real data science without writing code. There, I said it.

Of course there’s still a need for highly specialized data scientists, I’m lucky enough to work with a bunch of them at RapidMiner. But the analytics teams of the future need a diversity of experience and solutions like RapidMiner to fully embrace the generational opportunity in front of us.

Thanks again to Dr. Lily Popova Zhuhadar and all her students at Western Kentucky University who took part in this. We’re humbled that you choose to teach data science using RapidMiner. To all your students, drop me a note @twentworth12, I’d like to send you something 🙂 To my friend Liz, I’ve got a free educational RapidMiner license for you, I’ll send you over the details.

Finally, here’s the complete list of presentations from the last competition at WKU in December:

Predicting Customer Churn, by Everett Taylor and Andrew Newton
Credit Risk Modeling, by Joe Edwards and Andrew Gibbs
Predict Which Customers Will Respond! Modeling Marketing Campaign, by Brenden Lutz and Aaron Dorris
Detect Medical Fraud, by Dolton Holland and Kyle Killebrew
Predict Stock Market Bidding, by Parker McClean and Spencer Embry
Predict Who Will Click, Buy, Lie, or Die! Web Analytics (btw LOVE this title), by Abigail Vazquez and Cierra Synder
Predict iPhoneX Buyers!, by Abdulaziz Aldehaim and Faisal Chowdhur
Predict Credit Card Default, by Eric Spiller and Kate Mukderink
Predict Boston Housing Market (don’t need a model for this one!), by Graham Goins and Ryan Weddle.
Telecom Segmentations, by Gus Madsen and Kyle Hart
Churn Propensity Modeling, by Jordan Myers
Predicting the Best Sport for You, by Sarah Smith and Jacob Wood
Phishing or Legit Ad?, by Nicolas Coffell and James Roark
Cross-selling in Retail, by Nihad Hasanovic and Sergio Ortega.

Want to learn more ways data science can have a lasting impact on your organization? Check out 50 Ways to Impact Your Business with AI today!

Related Resources