(In case you missed it, here’s my recap of RapidMiner Wisdom Day 1.)
Wisdom Day 2 started bright and early on a sunny Friday morning in New Orleans. Which is dangerous because New Orleans can be a LOT OF FUN, especially on a Thursday night. Did I have fun? Yeah, just the right amount. Was I out late in New Orleans Thursday night? Maybe. But I definitely snuck back to my hotel earlier than most because I was due to hit the stage first thing Friday morning.
Onto my recap of Day 2:
The Rise and Empowerment of the “Citizen Data Scientist”
We came up with the idea for this panel from a blog post I wrote on Scaling Data Science Without Data Scientists. Hiring data scientists is hard. Consider the ideal resume: a PhD in math & stats or computer science. Ideally both. An intimate knowledge of R, Python, and ML and deep learning frameworks. Expertise in multiple areas of business and fluency in communication and storytelling.
That’s a really hard profile to hire for, and it’s no wonder companies point to a data science skills gap as one of the reasons why data science isn’t making the business impact they had hoped for. So we put together a panel with different experiences and backgrounds to tackle the question once and for all: Does data science require data scientists?
Industry analyst Gartner doesn’t think so. They were responsible for coining the idea of a Citizen Data Scientist which they define as: “a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.”
So do Citizen Data Scientists exist in the real world? And if they do, is that a good thing? We put a panel together to answer these questions are more. They panel included:
- Dr. Matt North, Author of Data Mining for the Masses and Professor at Utah Valley University.
- Joe Rappaport, Head of HR Data at Charles River Labs
- Elise Watson, Consultant at Clarkston Consulting
- Dr. YY Huang, Customer Success at RapidMiner
Matt and YY are have the classic data scientists while Joe and Elise are more Citizen Data Scientists. I was hoping for some conflict, but it turns out everyone agreed Citizen Data Scientists not only exist but are critical to the future of data science in organizations.
Here were three takeaways from the panel:
- Citizen Data Scientists are often the subject matter experts in an organization. To find Citizen Data Science candidates in your organization, look for someone who cares deeply about a problem.
- 80% of data science projects have nothing to do with math and algorithms. Focus on the big picture and the business process itself.
- It’s important to have a diversity of experience and perspective on a modern analytics team. Citizen Data Scientists won’t have the same feel for data science problems like bias and overfitting, so partner them with others on the team with the necessary data science experience. Collaboration is key.
RapidMiner Product Extravaganza Featuring the Latest from the Lab
Next up with a highlight of the day for many: an update on the RapidMiner roadmap delivered by Chief Product Officer Lars Bauerle and Product Manager Tobias Malbrecht.
Lars and Tobias had a lot to cover, as RapidMiner has delivered a ton of new capabilities in the past 12 months including:
- RapidMiner Turbo Prep, data prep that’s fast, fun, and intuitive.
- RapidMiner Auto Model to build predictive models in 4 clicks using automated machine learning and best practices.
- RapidMiner Real-time Scoring for high velocity, low latency prescriptive analytics.
- A new architecture for RapidMiner Server
- A new data core in RapidMiner Studio
- And so much more.
Lars announced an important new initiative called the RapidMiner AI Cloud. AI Cloud is a new cloud-native platform that we’ll continue to build out over time. The first application for the RapidMiner AI Cloud is RapidMiner Auto Model, which Lars announced is currently available in beta.
RapidMiner AI Cloud complements existing RapidMiner products Studio, Server, and Radoop. All RapidMiner products are built on a unified core and share a universal process language, so that processes created in the RapidMiner AI Cloud can be re-used inside the RapidMiner platform.
Using Text Mining to Improve Customer Service
Brian Tvenstrup of Lindon Ventures in another of the fabled RapidMiner unicorns 🦄
Brian’s Wisdom presentation covered text mining, one of the most popular use cases for RapidMiner Studio users. That’s because only approximately 10% of the world’s data is highly structured. That makes text mining a $10b market opportunity.
Brian shared a text mining use case he delivered for one of his customers, a national landscaping and tree care company with over $800m in annual sales. His client was looking to improve customer support by analyzing phone calls from their call center.
The call center received about 1,000 calls a day. Some calls were randomly monitored, but the customer wanted a more automated way to identify support calls with issues in order to better train the call center reps, escalate issues when needed, and most importantly improve the overall level of support.
Brian pointed out that building a predictive model with text data requires human judgement to label training data. The model he created used over 500 hand-labeled cases to train the model, and by doing so, he was able to deliver a predictive model that improved customer service with over 80% accuracy.
Recommender Systems: Complex Solutions made Simple with RapidMiner
Marco Barradas works at Master Loyalty Group, where he creates loyalty programs for a variety of customers that range from consumer goods, pharmaceutical, and financial services. Master Loyalty Group provides solutions to increase customer loyalty, where consumers earn points through achieving goals that can be measured by analyzing customer spend, sales, or whatever metric their customers choose. Points can be redeemed from a catalog of over 4000 products provided through a web-based platform.
Having such an extensive product catalog made it difficult for consumers to easily find those relevant product and services, so Marco created a Classification Tree in RapidMiner Studio, a simple model that allowed him to deliver personalized items, a fast response, and easy interpretation.
After tuning the models parameters, Marco found that a tree with long branches worked best, delivering deep personalization without overfitting. The confusion matrix outcome for his model was 64%, considered good enough based on the nature of the problem.
But insight without action is fairly useless, so Marco used RapidMiner Server to turn his predictive model into prescriptive recommendations delivered through their web-based platform.
The results? The average shopping cart transaction increased by 4%.
Turn Your Employees into Sentiment-Extracting AI
Joe Rappaport is the Head of HR Data at Charles River Labs. Joe had the very last presentation at Wisdom, but I’m happy to report that his session was (mostly) full.
As I mentioned earlier, Joe’s a Citizen Data Scientist. His background is Criminal Justice, and he’s spent most of his career in Human Resources. While he doesn’t have a classic data science background, that didn’t stop him from delivering a master-class on sentiment analysis.
His use case for RapidMiner was to analyze employee reviews to understand macro trends at large organizations. It turns out this is hard, because sentiment is often different to interpret. Human analysts only agree on sentiment 80% of the time. Therefore even a “perfect model” will only have an accuracy of between 80-90%.
Like we heard from Marco, the key to building a sentiment analysis model is a human-in-the-loop process. Joe recruited as many volunteers as he could to help label data. He gave specific advice not to use MBAs or interns as both often have more extreme views on labeling the data.
Thanks to all the Day 2 speakers for your fantastic presentations. You can find all of the presentations from RapidMiner Wisdom 2018 at https://rapidminer.com/wisdom/2018-presentations/.