Data is the new oil, AI is the new electricity, and data science is the sexiest job of the 21st century. Okay, the hype and hyperbole in data science and artificial intelligence is over the top, but the opportunity is clearly generational. I’d argue it’s most important disruption I’ve seen in my life, more important than the internet and the rise of mobile.

Did you see the Google Duplex demo from I/O 2018? Google Assistant can now have a two-way conversation between an AI and a human. For example, the new Google AI can call a restaurant to book a reservation and the person on the other end won’t know it’s an AI bot (well, Google changed its mind and they will now). It’s jaw dropping, unbelievable stuff that was unthinkable a few years ago.

Everyone sees the obvious opportunity with AI, but the challenge I hear over and over is that companies can’t hire enough data scientists to take advantage of it. Sure companies like Google and Facebook can hire vast quantities of data scientists, and hedge funds can hoard quants, but what about everyone else? It’s a real problem.

As Eva Short of SiliconRepublic points out, “Demand is high but, crucially, supply is low. While this works out well for data science professionals, it could be ruinous for the economy if not addressed.” A 2017 report from Burning Glass Technologies, Business-Higher Education Forum and IBM forecasted that the number of jobs for all data openings will increase by 364,000 to a total of 2,727,000 by 2020.

So how are we going to fill all of these data science roles?

Is there a shortage of data science or data scientists?

Last week, I ran into Liz, a close family friend of mine at North Station in Boston. Liz has led analytics teams at multiple billion dollar companies in the Boston area and is now looking for her next analytics role. I was surprised to hear that most of the companies she’s speaking with have a strong preference for R or Python experience. Liz is not an R or Python programmer, nor does she have a PhD (just a Wharton MBA 😂.) She’s what Gartner calls a citizen data scientist. She has a solid background in math and statistics, and has worked on a number of customer analytics projects using data science and machine learning. Liz has delivered hundreds of millions of dollars of value as a senior analytics leader. It’s flat out insane to me that a company wouldn’t consider someone like Liz for a senior analytics role just because she doesn’t code. It would be like turning down a procedure from a world-class surgeon because of the brand of scalpel they use.

Maybe the shortage of data scientists is self-inflicted? Consider the ideal data science resume: a PhD in math & stats or computer science. Ideally both. An intimate knowledge of R, Python, and ML and deep learning frameworks. Expertise in multiple areas of business and fluency in communication and storytelling. The ability to benchpress at least 300lbs and run a sub 6 minute mile.

You know, be a unicorn 🦄. And we wonder why companies can’t hire enough data scientists, at least as they are currently defined?

I think this gets to the core of problem companies face when scaling their data science efforts: to deliver on the promise of data science, companies need to think beyond the unicorn data scientist profile.

Teaching data science to analysts at Western Kentucky University

Dr. Lily Popova Zhuhadar is the Associate Professor of Information Systems at Western Kentucky University. Dr. Zhuhadar runs an annual data science competition for her WKU students, using RapidMiner Studio to analyze data for a variety of use cases including customer churn, marketing segmentation, financial risk modeling, and cross selling.

The students present their findings as a formal case study, including producing beautiful posters like this to showcase their findings. The complete list of student projects is at the bottom of this post.

Did I mention that these projects were done by undergrads at WKU? These students don’t have PhDs. Most aren’t R or Python programmers. They probably can’t even benchpress 300lbs 🙂 But thanks to Dr. Zhuhadar, the students were given the foundational knowledge of data science, and tool in RapidMiner that allowed them to focus on the problem and not the implementation.

That’s why we think RapidMiner can play a transformative role in bringing data science to more analytic teams. RapidMiner Studio abstracts away the implementation details that require the of specialized unicorn persona without compromising the quality of the data science.

Yes, you can do real data science without writing code. There, I said it.

Of course there’s still a need for highly specialized data scientists, I’m lucky enough to work with a bunch of them at RapidMiner. But the analytics teams of the future need a diversity of experience and solutions like RapidMiner to fully embrace the generational opportunity in front of us.

Thanks again to Dr. Lily Popova Zhuhadar and all her students at Western Kentucky University who took part in this. We’re humbled that you choose to teach data science using RapidMiner. To all your students, drop me a note @twentworth12, I’d like to send you something 🙂 To my friend Liz, I’ve got a free educational RapidMiner license for you, I’ll send you over the details.

Finally, here’s the complete list of presentations from the last competition at WKU in December:

Predicting Customer Churn, by Everett Taylor and Andrew Newton

Credit Risk Modeling, by Joe Edwards and Andrew Gibbs

Predict Which Customers Will Respond! Modeling Marketing Campaign, by Brenden Lutz and Aaron Dorris

Detect Medical Fraud, by Dolton Holland and Kyle Killebrew

Predict Stock Market Bidding, by Parker McClean and Spencer Embry

Predict Who Will Click, Buy, Lie, or Die! Web Analytics (btw LOVE this title), by Abigail Vazquez and Cierra Synder

Predict iPhoneX Buyers!, by Abdulaziz Aldehaim and Faisal Chowdhur

Predict Credit Card Default, by Eric Spiller and Kate Mukderink

Predict Boston Housing Market (don’t need a model for this one!), by Graham Goins and Ryan Weddle.

Telecom Segmentations, by Gus Madsen and Kyle Hart

Churn Propensity Modeling, by Jordan Myers

Predicting the Best Sport for You, by Sarah Smith and Jacob Wood

Phishing or Legit Ad?, by Nicolas Coffell and James Roark

Cross-selling in Retail, by Nihad Hasanovic and Sergio Ortega.

Showing 7 comments
  • Cyndi

    This is a great post! Wondering if it’s possible to get the workflow and datasets for us to practice in rapidminer studio ?

  • Lily Popova Zhuhadar

    Thanks, Tom, for your post! The way you linked the Industrial field with Academia made this post even more interesting to read. Not to mention that most of WKU’ students who were mentioned in your post just graduated and are looking for jobs! Your post will be a valuable item on their CV. Last and not least, thanks to RapidMiner for providing a free version for students & faculty.

  • kypexin

    Hey Tom,

    Regarding this:
    “And we wonder why companies can’t hire enough data scientists, at least as they are currently defined?”

    I think I have kinda answer to this… it’s legacy. A company which have been building analytics team out of R and Python coders for a couple of years is not easily changeable… and I have seen plenty of examples when a company would rather hire an average Pythonist than an excellent data scientist with RapidMiner skills but who is not writing production grade code.

    Unfortunately, this tendency will keep driving the analytics / DS market for pretty long time.

    And those vacancy positions where they require “5+ years of data science experience with ability to write code in production environment” — they stay there open for months and years just because at the end those companies actually need data engineers and developers, not data scientists. And there needs to be someone to explain the difference.

    • Tom Wentworth

      Exactly. I see some similarities to the evolution of web development. Back in the day, we treated HTML like a programming language and hired computer scientists to build end-to-end websites. Then came visual tools like Dreamweaver that massively expanded the universe of web development. Then came content management systems like WordPress and Drupal that created an even higher level of abstraction for business users to build sites.

      10 years from now, I think we’ll look at starting a data science project in R or Python that same way we’d look at building a website from scratch using a framework like React or Angular. Yes, it will still make sense in a lot of cases but the vast majority of data science will be done by high level tools like RapidMiner.

  • This is a very informative discussion about Data Science
    Thanks for sharing…