Michael Martin, Information Arts
This session will provide a demonstration of the RapidMiner and Tableau Integration component developed by Bhupendra Patil of RapidMiner for classification and association mining.
00:00 Hello, everyone. Thanks for joining us today for today’s webinar. It’s titled Better Together: RapidMiner & Tableau. An end-to-end data science and visual analytics environment. I am your host, Scott Barker. I am the head of product solutions marketing over here at RapidMiner. We’re really thrilled to be joined by two presenters today, guest presenters, Nathan Mannheimer, Director of Data Science and Machine Learning at Tableau, and also Michael Martin, a long-time user and integration partner of both RapidMiner and Tableau. Michael was very excited when we first started talking about the integration between our two platforms and prepared a series of demos and we wanted to share those with the whole world. Before we dive in, a couple of quick, quick housekeeping items. So today’s presentation will be recorded, and we’ll send a copy of the recording out to all of our registrants within one to two business days of the airing of the webinar. So keep an eye out for that. And, of course, feel free to share this with your colleagues.
01:05 A lot of today’s presentation is about collaboration and how critical and crucial collaboration across disciplines and personas is in the world of data science. And so if you want to share this with some of your colleagues who might fit into these personas that we outline on today’s webinar, please do that. If you have any technical difficulties, not that I need to tell anyone how to operate remote meeting software in today’s world, but if you have technical difficulties, please, first best bet is to log out and log back in and that solves most of the issues most of the time. And if you have any other questions or concerns, we are here to support you. We’ll be available in the chat panel to help you with any additional issues you may have. And I mentioned that chat panel. Please feel free to enter questions. We will have time for questions at the end of today’s webinar, so please feel free to enter those in as we go along. And Michael, Nathan, and myself will address those at the end of the presentation, so. With that, I want to dive right into the content.
02:12 Now, before we get into the meat of the presentation, many of you are very likely familiar with Tableau. Many of the folks in the audience are Tableau users, power users. And really, essentially what we want to do is just tell you a bit about RapidMiner. You may not be as familiar with RapidMiner or more deeply embedded in the data science community. Our mission at RapidMiner is to reinvent enterprise AI so that anyone has the power to positively shape the future. Now, when we say reinvent enterprise AI, and I’ll talk a little bit more about this in just a moment, but parts of enterprise AI are broken today. And I think part of it has to do with the complexity of the practice, but many organizations struggle with implementing AI and machine learning into their analytics processes. And we don’t think it needs to be that hard, so we’re reinventing with our platform and our methodology the way that organizations can deploy machine learning and AI into their organization to have an impact. When we say anyone, we mean it can be anyone of any skill level or domain. You could be a marketer, or you could be a data scientist. We think we can make your life easier. And when we say positively shape the future, we mean report on what’s likely to happen and then drive impact with that reporting. So machine learning is about predicting the future, but then it’s about automating and optimizing based on that prediction. And sometimes the hardest part is kind of working that machine learning model into a practice or a process. And we streamline that component as well.
03:46 Now, most of you have probably mastered the art of kind of the first half of this analytics maturity curve, which is mastering the art of reporting on what’s happened. So taking raw data, cleaning it, building ad-hoc reports, and then creating standard automated reporting for your whole organization to be essentially on the same page and operating under the same parameters and against the same KPIs. Now, many of you have probably kind of explored the world of statistical analytics, which is kind of going into, why did something happen and what is likely to happen? And most of the RapidMiner clients come to us because they’re seeking to kind of creep up this analytics maturity curve and enter the world of predicting and acting in an automated fashion, in many cases, through the use of AI and the automation that AI can afford you. And starting to explore the world of predictive modeling, what will happen, and then AI and ML optimization, which is how to optimize for what we want to happen. So there’s a variety of use cases that pervade the world of marketing support, customer support, routing customer support tickets, finance with optimizing forecasting, operations with predicting large scale machine failures, and all of these use cases are tremendously, tremendously high-value for our clients and for any organization whose mastered kind of the art of machine learning and data science.
05:15 Now, we find a lot of organizations that come to us, as they start to explore the journey into the predictive analytics, machine learning, AI, it starts to feel complicated. So there are a lot of talking heads and best practices and templates and rules that are out there that people are told to follow. You’re basically told that, as you follow this path or any other path that’s similar, most paths kind of follow the decades-old CRISP-DM methodology, which is starting with a business problem, getting the data you need to tackle the business problem, building a model, deploying the model, and then in a cyclical fashion, analyzing that model and managing it and maintaining it over time and making sure that it’s performing to standards. Most rules and best practices and templates kind of tell you that you need this complex ecosystem of multidisciplinary talent working together. But each group in each discipline – a data scientist, a data engineer, an MLOps engineer – they’re usually tasked with kind of working only on kind of one corner of the project. Maybe it’s scoping of the project or building the model or deploying the model or maintaining the model. And I think at the end of the day, most organizations that we speak with don’t have the luxury of having this vast assembly of superhero-like talent. And when they look at these models and these best practices, their head start to spin.
06:46 And at the end of the day, when I talked about enterprises AI being semi-broken today, what I mean by that is that the average enterprise is drowning in data. They have way too much data and more data than they know what to do with, but they struggle to leverage it for predictive analytics. And I think there’s a lot of stats out there that really reinforce this point and showcase this point that I’m trying to make. But Gartner recently released a public article about this trend of machine learning models going un-deployed. And I’ve seen numbers that actually are a little larger than this 50%, but I’m fine with this estimate. And I think even 50% is alarming and that that to me says that a lot of organizations are tinkering with machine learning and trying to make it create an impact in the organization and they’re struggling. And so when we enter accounts, most of this comes directly from our VP of Customer Success who runs our data science practice at RapidMiner and our pre-sales practice.
07:48 But we find that most organizations are really struggling around three common issues that are preventing them from being successful with data science and machine learning. This is uniform. The challenges are always the same. Some clients only struggle with two of these, but most organizations, when we approach them and start working with them, in the past, they’ve struggled with all three of these issues. And they’re all actually very closely interrelated. So it’s usually they’re struggling with getting started. Oftentimes, they’re struggling with bridging the expertise gaps that they have within their organization. I’ll explain that in just a second. And then many organizations also struggle to sustain value over time. So they create a model, and they don’t know how to properly manage it and maintain the model and measure the performance of the model over time. Now, both getting started – which is, “Hey, the ROI’s unclear. Our data’s imperfect. We don’t have a data scientist. We don’t have the specialized expertise – and sustaining value over time, both closely link back to what I want to spend most of today talking about, which is this concept of bridging the expertise gap.
09:02 Now, fundamentally bridging the expertise gap means you either don’t have data scientists or your business users, your business experts, and the data scientists don’t understand each other. This leads to really two major things, but it leads to a lot of failures in a lot of different regards. It leads to two major things, which is the machine learning projects and the models that you’re building, they’re producing unactionable results and they’re not adopted. And users don’t understand how the models work and they don’t adopt it into their day-to-day business practice. It also means that maybe the model doesn’t fit the real world. It’s a great model. It’s very technically sound from a machine learning standpoint, but it doesn’t align with the way that the business operates. And so both of those problems create major issues, both on the front-end and the back-end of a project.
09:56 And this expertise gap comes down to really a couple of things, but in my mind and in the minds of our clients at RapidMiner, a lot of times it comes down to the fact that you’ve got the business expert, who’s commonly a Tableau user. They love data, but maybe not data science. They understand the rich context behind the data, the problems, the root causes, and they live and breathe the KPI’s, but they also live and breathe in Excel and BI tools like Tableau. Their interest in machine learning really relies around making machine learning easy to use. And then if you have the luxury of having a data science expert, who is usually someone who’s fantastic at dissecting complex problems, they’re builders and creators. They don’t usually have the context for the data, and they don’t want to leave the code or the data science platform that they’re operating in because they savor doing something that no one else knows how to do and they don’t really care about easy.
10:59 Fundamentally, this expertise gap is why we’ve developed the RapidMiner platform and why we’ve evolved it the way that we have over the course of the last two years. The RapidMiner platform now is essentially three different user experiences, which are a spectrum of ease of use and complexity and technical accuracy. The spectrum goes from a product we call RapidMiner Go, which is fully automated machine learning. You drop your data in, and it spits out a machine learning model. We have RapidMiner Studio, which is what we’re known for. It’s a flexible visual drag-and-drop designer with some automation built in to enhance productivity. And for the code-centric data scientists and data engineers, we have RapidMiner Notebooks, which is an approach for total customization and people who prefer to operate in coding languages like R and Python. At the center of it all is the RapidMiner AI Hub. And Michael talks about this kind of digital analytics campfire. That’s essentially what RapidMiner AI Hub is.
12:02 It drives collaboration between the users who are operating in these three different modalities. So if you have a data analyst who maybe doesn’t really understand data science or care to understand data science on a deep technical level, they can work together with a data engineer or a data scientist who is doing some complex operations and building some complex models collaboratively on the same project together through the RapidMiner AI Hub. Now, the AI Hub does a lot of other things. It automates execution of processes, it helps with insight delivery through dashboards and apps, and so on and so forth. It integrates with platforms like Tableau. And then it also offers security and user control and management and the high-powered computation that you need for data science. And so there’s all those things combined. The AI Hub kind of sits at the middle of the users and it integrates all their work together and allows them to collectively work on the same project.
13:07 Now, that wasn’t good enough, right? We really needed to bring the data science to where the business data lives, which in most accounts and most enterprises that we’re working with, that’s Tableau. And that’s what we’re here to talk about today, which is this combination of two best-in-class platforms with RapidMiner and Tableau. Essentially, we’ve built a bidirectional integration through the Tableau Analytics Extension and the Tableau Server Web API, and it allows you to do a number of things. You can prepare data sets for Tableau if that tickles your fancy. But maybe more importantly, because Tableau does have Tableau Prep, you can explore and train models using RapidMiner, you can create new real-time predictions in Tableau, and you can enrich existing Tableau dashboards with the future, with predictive models built in RapidMiner. And now the benefits are diverse here, but they all kind of go back to this closing the expertise gap, through collaboration, alignment of stakeholders, and mutual work across multiple disciplines on the same project. So if you have data engineers, if you have data scientists, if you have data analysts and business experts, they’re all collaboratively working in the same environment on the modeling and then also on the output and the outcome that’s displayed through the Tableau BI platform.
14:37 So the benefits are you can promote stakeholder buy-in through this digital collaboration. You can optimize data prep specifically for predictive analytics and machine learning. RapidMiner has some data prep baked in, but it’s mainly built for/optimized for preparing and transforming data. Specifically, for model prototyping and training. You can build no-code models. So we have automated machine learning to predict future outcomes. And you can deliver powerful insights to these user-friendly dashboards that Tableau is really known for. And at the end of the day, this closes this expertise gap, right? The RapidMiner platform and the integration with Tableau, it helps to close this expertise gap that we talked about, which causes so many issues with machine learning adoption in enterprises. We’ve got multiple team members working together on explainable machine learning-driven Tableau insights. And it helps you create a better machine learning apps faster. So business experts who don’t know deep data science, they can use the automated data exploration and machine learning in RapidMiner Go, the data science expert can use RapidMiner Studio and our python and R driven capabilities in RapidMiner Notebooks, and they can work together in the same project and productionize their collaborative AI and machine learning efforts through the use of the trusted corporate BI environment with Tableau.
16:03 And that’s a quick overview. As I mentioned, Michael here, we brought Michael on today because he’s developed multiple real-world analytic scenarios that showcase kind of the power of the integration of these two best-in-class platforms. Michael has essentially prepared for demos or real-world analytics scenarios with finance, an investment scenario, a marketing use case, which is clustering sales prospects or clustering audiences, a sales use case, which really orchestrates around market basket analysis, and then an operations use case in a manufacturing environment, which essentially helps predict catastrophic machine failure on that shop floor. So with that, I will turn it over to Michael to show these scenarios to you in real-time.
16:57 Thank you, Scott, for the introduction. It’s really great to be with you all, wherever you may be. Thank you for spending a little time with us. And yes, RapidMiner and Tableau really are better together. They provide a real end-to-end collaboration environment for data science and visual analytics. I’m going to step through a few use cases. Scott referred to RapidMiner AI Hub just a couple of minutes ago. And AI Hub you can think of as slightly analogous to Tableau Server. It does lots of things, just like Tableau Server does, in terms of scheduling executions, handling communications and notifications, and of course, as a repository for your content. And part of RapidMiner Hub is RapidMiner Go, which Scott mentioned. And RapidMiner Go is a lovely automated web-based machine learning environment where an analyst, you can upload your data, get a machine learning model back, and use it. And I’m actually going to show you that in Tableau because as Scott mentioned, some of the big impediments to machine learning model projects really paying for themselves is productionizing the output so that you’re going that last mile to really get the full benefit of your machine learning models. And of course, at the same time building trust and buy-in across the organization.
18:25 So I’m now going to log in to RapidMiner Go, and this is what you see. And I’m simply going to run a scenario, run a model, based on a financial investment scenario where there’s certain investment ratios that an investment firm has studied. I’m going to just grab the data from my computer. And essentially here I’m going to bring up a series of financial metrics that an investment firm has studied between different econometric factors, and there are 11 different ratios. And the investment firm is interested, was this a positive or a negative outcome by applying decision-making based on what these ratios were? So we’ve pre-coded the outcomes, which means this is going to be a classification problem from a data science perspective. We want to then prepare new ratio scenarios, run those scenarios through the model, and get a sense through the model, are these investment scenarios tending to look favorable or negative?
19:33 So of course, we’re going to tell Go we want to make a new predictive model. Go is going to ask us first, “Well, what do you want to predict?” So this is pretty easy. We want to predict the outcome. And what we can now do are define gains in costs because, sure it’s great to have a lovely model, but what you really want to know is, what is the financial impact of using this model? So we’re going to define costs. So we have a matrix here. If the model predicts a negative outcome and it’s actually a negative outcome, then we just leave that alone. There’s no cost because that’s an area we’re not going to get into. If the model predicts negative, but it’s actually positive, that means, well, we’re losing an opportunity. So on average, that’s going to cost us $12,500 if we predict a positive outcome, but it’s actually negative, well, even though we’re watching things very closely – oops- that is also going to cost us $12,500. So that’s added as a negative cost. But when the model predicts something correctly and it is actually correct, that’s an opportunity we’re going to jump on. So we have a potential there for $40,000, which is considerably more. So we’ve now defined a real cost matrix.
20:54 And now our next step is that RapidMiner Go actually analyzes our data and has literally warned us that there are a couple of fields we should not include in the data for various reasons. And this is really important in data science projects because you have to be very, very careful about what data you actually include. And is it robust? Do you have a full distribution of values, etc.? So RapidMiner Go already has told us, “Eliminate these values.” We’re going to follow its advice, and we’re just going to continue. And we have some real options here for generating models. In the interest of time, we’re going to do models that are classified as very easily interpretable, which means they’re more tree-based models or Naïve Bayes type models, as opposed to a support vector machine or deep learning. But they’re all available here. I’m just doing this in the interest of time because some of these other models may take several minutes to run.
21:59 So now we’re using these types of easily interpretable models, and the model analysis is now running. We’re using a generalized linear model, a fast, large margin, and a decision tree. And bang, it’s done just like that. And what we’re going to do is we’re going to just sort this list by gains. And we’re seeing that the generalized linear model is predicting a good cost gain for us if we use this model. So essentially, we’re going to use this model because it’s also rather accurate. So I’m now going to– there’s quite a few other statistics in terms of area under the curve, which is quite good, at 93.8. Recall was quite high. Precision a little low, but the accuracy is really great and the cost matrix is really great.
22:47 So we’re going to select this model. And now we’re seeing some additional information about this model. Seeing what our accuracy is, what our recall is, very high, and we want to apply this model. So from a RapidMiner Go perspective, all I have to do here is click on this, and there’s something I want to show you. We’re going to deploy this model. And this is something really important that you’re going to see in just a minute. RapidMiner assigns this model with a unique ID number, which I’m going to highlight. It starts right here. And I’m going to copy this into the clipboard. Because this is part of the magic of how you will be able to call this model within Tableau, okay? So I’m going to close this. And now we have a RapidMiner Go model that I can apply on new data. It will walk me through a dialogue thing, where is your source data, et cetera.
23:52 But I’m just going to now switch over to Tableau Desktop. And we’re now going to actually apply a RapidMiner Go model. And so here are newer observations that have come in post the model creation process. And essentially what we’re going to do now is take a model from RapidMiner Go, created just like the one that we just spent a couple of minutes stepping through, and we’re going to apply that model based on these new observations with all of these different metrics.
24:26 So how does all this work? Well, remember that ID that I copied into the clipboard? It all works via Tableau Calculations. And this is where all the great work that RapidMiner and Tableau have done together to coordinate the APIs and the calling and call-back mechanisms. All I have to do within a Tableau Calculation, a special type of calculation you’re all familiar with as Tableau users, a table calculation. I literally create a calculation, which is very similar if you’ve used the Analytic Extensions in Tableau to work with Python or R. And basically I’m just going to paste in that ID number, I’m going to feed in the data, and then I’m telling the model what the data schema to expect, what it is. And now I’m just feeding in the data, and I’m going to get back a positive confidence prediction. In other words, a percentage confidence in the prediction.
25:29 So now we are able to literally use RapidMiner Go’s model sitting up on the cloud. And for the interest of saving time, I’m using a model I created recently. And now, I’m looking at all this data that I want to say, “Okay. For the technology sector, the investments we want to make, I want to look at immediate, longer-term, medium-term, and near term from all different investment divisions. Of the new cases that I want to score, what do they look like? All I have to do is click the Apply button, the model will swing into action, and bang there you have the predictions back. Likely negative, likely positive. Then, I’m filtering just to show me the negatives. But here’s where were we leverage all of Tableau’s ability for filtering, for sorting. And, for example, if I just wanted to also look at government securities, all I have to do is click on the filter and I get it. And what’s really great is, if I want to just look at positive for both of these, what I can now do– and I’ll just look at positive for everybody as long as I’m at it.
26:42 What I can now do is drill down to another dashboard which is showing me the influencers. And this is part of the wonderful information that comes back from RapidMiner Go. And I’m looking at all my positive models. I can see that the gain from applying these is a bit more than $420,000. And I’m literally able to see that for positive predictions, the attributes really driving that are the ratio 10 and ratio 11. And for negative predictions, it’s really ratio 2 and ratio 7. So this is immediately actionable. People can go back and look at the data and reconstruct.
27:22 I can now return to this Calling Dashboard, which I just need to go into full-screen view. I can return to this Calling Dashboard, and off I go with another scenario. So I can look at basically all the information. I can look, just say at online and retail. I can just look for any combination of attributes that I want. If I just want to see the longer-term, I click here again. Now, I see the longer-term. I’m still focusing on positive. I can come back and say, “Well, I’m doing positive and negative.” So essentially, it’s really that easy. And that calculation you saw is really the magic, and that reflects all the hard work done by RapidMiner and Tableau to make it so.
28:11 What I’m next going to do is look at a couple of other use cases that Scott referred to that were built using RapidMiner Studio. And essentially the second one is a marketing exercise, clustering customers. So in this instance, an insurance company has built a model of its customers and their purchasing habits with some demographic information. They then got a huge prospect list. Then, they want to contact prospects whose needs are a good fit for their products who resemble existing customers. So it’s a very simple marketing exercise. And of course, we have a clustering model from RapidMiner, which has clustered the prospects using the model. But what’s wonderful is we can use all of Tableau’s – excuse me – interactive data features.
29:03 So, for example, we can hover over cluster 8. I’m coloring and sizing by average income. So this is folks with a very high income, averaging $127,009. But what’s really interesting is, there, if you look at the tooltip, their likelihood to purchase is only 19%. Okay. Well, they’re wealthy, but a low likelihood to purchase. So if we want to color and size by likelihood to purchase percentage. Bang! The clusters rearrange themselves. And we see these three look really promising, right? We’re just using Tableau’s interactive capabilities. And we see that cluster 1 looks particularly promising. It has basically 85.9% likelihood to purchase. And we have 1,500 prospects sitting there in a cluster that has almost 1,800 people.
29:56 So being Tableau, all I have to do is click, and I drill down to that cluster. I can use all of Tableau’s filtering to look at the demographics within the cluster, right? So I’m looking at all the folks within that cluster. I can look at these big fat bars or interesting likely prospects. So here’s a group of men, 40 to 49, who are married, who are making greater than $50,000 a year. I happen to know that’s a target audience. So let’s just click on that. It says 255 customers, 206 likely to purchase. So let’s click on that. And here are my 255 customers. I can now filter for likely to purchase, and there are my 206. So that’s immediately actionable. I can see who these– with their probabilities sorted. I can see who all these people are. I can now contact them. So I can take this to the last mile, hand this off to the appropriate people who can action this type of browsing. If I want to do another exploration, all I have to do is come back to the dashboard.
31:10 And Tableau just makes it wonderful. By using a bubble chart, that’s a wonderful way to represent clusters, right? So we have the wonderful information coming from RapidMiner, which has basically done all this classification of who’s in the cluster, and I have Tableau’s wonderful ways of representing the different clusters and the separations between them. This is something that everybody can gather around and discuss. And as Scott mentioned, it’s that interdisciplinary collaboration makes the Tableau/RapidMiner integration so interesting and attractive.
31:45 So I should mention as part of a Market Basket Analysis, RapidMiner, when you get it, it provides a whole range of templated processes that you can use, enhance, and study, and learn from. What I did is I just took one of the templates that RapidMiner provides. Oh, I must say I added some other things to it that I may get a chance to show you later. But what I really want to highlight is, is that RapidMiner has developed and delivered some very, very interesting what are called operators within RapidMiner. Those are discrete functionalities. You can see RapidMiner is a very visual environment. And there is one operator called Right Tableau Extract.
32:29 And of course, us Tableau users quite frequently were writing extracts. Extracts greatly increase Tableau performance. So here is an operator that will take a dataset, such as the output of a Market Basket Analysis in terms of frequent item sets or association rules, and write them to an extract on disk. Once they’re written to disc, what I can do with another wonderful operator from RapidMiner is I can write those extracts up to a Tableau server instance as a data source, making that data instantly available to anyone who wants to connect to it on Tableau Server. So the integration is deep, it’s efficient, and it’s effective.
33:12 So once we use RapidMiner to design a RapidMiner process for market basket, you get to see something like this, which one way to do this is in the form of a scene graph. And the business scenario was that a major retailer wanted to know that if it promoted certain brands of pizza with certain other food categories using particular conditions of sale, would it really drive lots of incremental sales for all the categories? That was a huge question. And with a scene graph like this where we’re basically looking at all associations within the same category, we’re seeing that pizza products really interact largely with, guess what, other pizza products. If we look at associations between different food categories, there are many more of them, right? But those main associations between the pizza products are the ones that really count.
34:15 And to make that– sorry. Wrong click there. To make that even more apparent, because it’s Tableau, we can say, “Okay. Let’s take pizza out of the mix in terms of the product that everyone buys first and let’s take pizza out of the mix in terms of the product that gets recommended by that first product or the product that’s associated with that first product in the market basket and we see, wow, the thickness of the line denotes how many transactions happen. It’s really only when you put pizza back in you really notice, “Oh my gosh. This multi-category promotion is really driven by pizza.” This was a very important learning for the retailer. And the good news was by everyone being able to discuss this and view this and query these types of dashboards, they realized they could solve that problem just by adjusting display and setting up special secondary display areas in proximity to the frozen food aisle where the pizza was. And they were able to, as a result of analyzing this, have promotions that were much more successful.
35:32 And you may remember just a moment ago I talked about these operators that load data up to the Tableau server. Well, here is what it looks like, for example. If I were to show you this Tableau sheet and basically bring in the data pane, I have basically posted these association rules to a Tableau server data source, is what you’re seeing here, and just within 30 seconds, you can build a view that looks at these pizza associations and sorts them by lift in the association, right? So you have pizza product 264 to 272 has a 131 lift, which is a fairly good lift. So all I’m trying to say is, through this wonderful integration, anyone who needs access to this data can have it. Because with RapidMiner and Tableau, you get the wonderful outputs of the model and you get an end-to-end distribution model by using Tableau Desktop and Tableau Server.
36:33 Here’s another view of the same data, which was done with a slightly different purpose. Because basically what they were very interested in was understanding, are we driving lots of different transactions? And in Tableau, of course, we can do things like size all of these nodes, which each corresponds to a product. And we’re sizing by transaction percentage, which could give us the idea that, “Wow, all of these products are really transacting a lot with other products.” But if we size by unit sales, you’re going to see something very similar to what you saw on the last dashboard. It’s really pizza that’s driving the action.
37:14 And because it’s Tableau, I can come up to this part of the dashboard, and I can zoom in on it, and take a look. And I’m now sizing by unit sales. And now I can size by in degree, which means which product is the one that gets the most product referrals from other products. And as I resize this, I see Pizza Brand 2 Product 12. And all I have to do is just hover over any node in the network and I immediately see the products that are associated with it, right? So I’m just going to go back, and I’m going to– there we go. So any node that I hover over, I immediately see the products associated with it. And I’m just going to zoom up the screen again.
38:04 And this is the beautiful interactiveness of Tableau. And Tableau users, you’ve probably guessed how this is done. This is done with Tableau Actions. So I’m getting all the great metrics from the RapidMiner machine learning model. For example, this is saying that the average product recommendation is worth 12 cents. All the products that this product, Pizza Brand 1 Product 8, recommended brought in an extra $5,500. And the average linked sales for this model was 10%, which means 10% of transactions involving this product involved a second product. So again, we’re able to leverage all of Tableau’s interactiveness in concert with all the metrics that came back from the RapidMiner model.
38:49 Now, of course, network maps are interesting and fun, but they may not be for everyone. For your category manager or sales force, you just want to be able to click on a product and see what products it recommends. So this is another dashboard that in this use case was extremely helpful. Where you just simply click on a product. You see the flavor of that product. You can notice right away, wow, it’s recommending other spicy flavored products. This was also another very interesting finding on this particular type of promotion. That a flavor tended to recommend a skew in the same flavor zone. So, again, very interesting to know. And this is a bidirectional association, which means all of these products are also tending to recommend Pizza Brand 2 Product 12.
39:39 So, again, very point and click. We’re leveraging, once again, the power of Tableau Actions. But we’re getting all these great metrics out of the RapidMiner model. And again, this is the type of information that everyone can gather around the campfire, so to speak and look at and discuss. Because as these things are discussed and understandings between domain experts and business people and users and coders and analysts, as this consensus is built as to what’s really important, that is what helps drive the success of machine learning models.
40:16 So let’s look at another use case, a very interesting one, machine failure. Now, this is a RapidMiner process that I built to predict machine failure. And the scenario is a companies making coffee machines. They’re very successful. It’s a new company. The machine is getting broadly accepted. But as it is a new company and the factory is new, they’re running into some issues with machines having breakdowns or needing more maintenance than usual. And as the company is getting lots of orders, naturally, they realize they need to have a second factory, but they want to avoid problems that happened in the first factory.
40:57 Now, if you look at a RapidMiner process, which is what generates the model, what’s really great is, you see, it’s a very visual environment. That means the people that build the model, the people that are going to be relying on that model, what’s really great about it is they can all gather around and discuss what are each of these parts doing. And the developers can explain this and explain why they do it, and that really helps build buy-in. So the model is built here. RapidMiner has a wonderful operator for helping explain why the model behaved the way it did. The model predictions can be written to a database or written to Tableau Server in the way that you saw before. The models are then joined to other metadata to write a more comprehensive view, etc. So a RapidMiner is a very visual development environment in the same way that Tableau is a very visual analytic environment. So that’s the use case.
41:56 And let’s go on to something that we haven’t quite seen before. And we are now ready to look at data that came in from the second factory. And what we want to do is feed in that data from the second factory and get a sense of, “Well, which machines within a particular class of machine could be vulnerable to failure?” So let’s pick surface grinders, for example. And I’m going to click on that model. The model springs into operation, and bam, there they are. There are quite a few of these machines. And we’re seeing that there’s some real failure risk.
42:34 What’s really cool is not only do you feed in the data using a calculation similar to the one I showed you a little bit earlier, you can feed in any number of Tableau parameters that will control how the model operates. Remember, this is Tableau, and RapidMiner can take any number of inputs to how a model functions. Because you can create macros in RapidMiner that will capture the value of a Tableau parameter. Let’s say I were to raise the ceiling for what is considered a failure from 56%, let’s say, to 66%. And essentially what I’m now doing is running the model again, and now we only have 4 failure risks.
43:20 By the way, I should mention – I’m just going to jump out of full-screen mode – part of that RapidMiner web service, which I deployed on AI Hub, that’s where the predictive model lives, has actually now written these predictions that I just generated out in real-time to a database, and you’re actually looking at them right here. So the Web Service not only generates the prediction and sends them back to Tableau, but it’s also writing them out to a database. I think that’s really impressive. It makes the data even more shareable amongst the people that need it.
43:57 So what I can now do is I can just click on any sensor here and I’m now taken to another dashboard where I’m able to look at all of these predictions. And I’m now able to say, “Show me just the predictions where machine failure is really possible.” And now I’m seeing what attributes are really driving it. Actually, where there’s not a risk of machine failure. I misclicked there. But we see that air circulation rotation and lower values of internal temperature are really the determining factor as to whether there’s a failure risk. If I want to look at machines where there was a failure risk, it’s pretty uniformly vibration, internal pressure, humidity, and resistance, with some power drain issues.
44:48 And now I can go back to this dashboard. And I can now say, “Well, let’s look at center lathes.” And I now click on the model and now I see all my center lathes. I’m looking at 66% risk. But of course, I can dial that back using a Tableau parameter. And now I’m going to see any of these machines that have a failure risk of greater than 56% or 56% and now we have more machines flagged. And as I mentioned before, I developed the process in RapidMiner Studio, and then the model is deployed to RapidMiner AI Hub, which, if you’ll remember, is the rough equivalent to Tableau Server.
45:38 And essentially through a web service call using the same type of calculation I showed you a few moments ago, I am getting live predictions right back to Tableau dashboard. Everybody can gather around what I would call a digital campfire and look at this together. They can discuss the probabilities. They can discuss the attributes that seem to be driving failure. And I should mention you’re going to see a quick example of this on Tableau Server as well as Tableau Desktop. That means anywhere in the world, anyone with a Tableau Server account can see these same predictions. Join the discussion.
46:16 What’s really also cool is we can do something else. We can use Tableau parameters to actually experiment with our data. For example, we saw that for the surface grinders, once I up the threshold, we saw more or less machines fail. There were 15 machines that had a failure risk initially. But what I was able to do is use parameters to throttle some of these readings and do a what-if analysis. If we want to help a particular machine that has a risk not fail, well, what if we were to dial back some of the readings? How would that affect the model?
47:00 So, for example, if I were to take internal pressure, which is now set at 50% of the input value, but if I were to actually put it back to what its full value was, watch this machine right here. We now get many more failure risks. If I were to now dial it back to 50%, we see that just dialing back this 1 sensor. So if we could reduce the internal pressure amongst the components of these machines, we would now take several machines out of the risk zone that initially were in the risk zone. So, again, this is something everybody can gather around and discuss.
47:48 This is the type of collaboration that Scott was referring to earlier. Where, essentially, you are able to get the best of these two mature best-in-class platforms. And it’s really more than a coincidence to realize that Christian Chabot, Jock Mackinlay, and Chris Stolte at Stanford, the same time, more or less, that they were creating Tableau there was Ralf Klinkenberg and Ingo Mierswa at the University of Dortmund creating RapidMiner. And now these two platforms which have kind of disrupted their respective business sectors, they’re now really fully capable, fully mature, and the time is really now for both of these platforms to be used together.
48:32 For example, if you were to look at this machine failure use case, here’s another way you can map out the factory floor. And these are from the new factory looking at the predictions that were just generated. So I’ve put them within a dashboard. We’re literally mapping out a factory floor. And if I wanted to look at the center lathe, for example, I just click on that and I see where the lathes are in the factory. And I can hover over anyone and I can see what was driving the prediction. Like all green means the machine is likely not to fail. And I can see it’s those high readings of air circulation, lower readings of outside temperature. I can look at a machine nearby and I can see, “Oh my God. Vibrations very high. Humidity is rather high. Internal pressure is rather high.”
49:18 So when you get these types of predictions back from the model and when you can literally see the sensor readings like the way you’re able to see them, this means you can operationalize and act. So I can look at this dashboard, and the RapidMiner model is telling us, “This is a high index, 65, as being a key attribute of determining failure risk for this machine.” And so for air circulation, look at all these greens for the higher values of air circulation. And that’s kind of collaborating what the model was telling us. That these are machines that do not have a failure risk. Whereas outside temperature, we see all these other machines that have higher readings.
50:01 Another way of looking at it is just to scroll down to this dashboard here, where we see all the machines with these higher air circulation readings and higher outside temperature readings. And you can see these have a failure risk. These are judged not to have a failure risk. So, again, using the wonderful interactivity of Tableau, combined with the wonderful outputs that you’re going to get from RapidMiner, is a real winning combination. These two best-in-class platforms are truly better together. And you know what, as Scott said, it’s going the extra mile. It’s getting the data to where it needs to be.
50:39 So you can leverage Tableau Servers scheduling and alerts capability. That if you’re a shop foreman and if you have a smartwatch, in this case, an Apple Watch, you can push predictions right as they’re run right out to your watch, phone, tablet, or of course as we’ve seen before, on your desktop. So by leveraging Tableau Server content descriptions, your predictive models, the outputs, the analyses, they can all be messaged to anyone who needs to see them, no matter what type of device they are using. So, again, another wonderful argument for taking these two best-in-class platforms and really yoking them together. So it’s really, really a strong argument.
51:23 I’m going to spend just a few minutes before I close just to reinforce one other point here. That here are the data sources that I published through Tableau Server for the market basket analysis. Everything that you saw in Tableau Desktop also works on Tableau Server. And so I’m basically now going to take the first model that I built in RapidMiner Go. I am now calling that RapidMiner Go model– it’s resyncing my content. Okay. So I’m going to take these two, and now it’s calling Go, and I’m getting predictions back. And what I can also do is leverage Tableau Server’s content pane, literally. And as you are working with the content, you can literally write comments. And everyone else who has the right to see those comments will really be able to leverage them. So the longer-term investments actually really tend to scale better. And we can see, yeah, these longer-term investments, even with the negatives, we have a $129,000 projected gain. So we’re leveraging the server platform, which means, of course, anywhere in the world you are, as you have a login, you can see this information and you can have a drill-down dashboard. Just as possible on server as it is in desktop.
52:50 And it’s the same thing here with the clustering model that I showed you. This is the same model now working. It’s just refreshing my content since I’ve been away for a while. But it’s now just is going to work just the same way on Tableau Server as it does on Tableau desktop. I can query the models. I can filter. I can do everything that I’m used to doing in Tableau Server. Except I have this great component of being able to apply a predictive model. Okay. So it’s taking a little time to refresh. There we go. We’re reloading content. There.
53:25 So I can go to any of these use cases that I showed you. Here’s the cluster drill down that you saw. Here’s the scene graph with RapidMiner. Oh, sorry. This one. This is the one I built with the first scene graph that was built with RapidMiner. This is all working on the web. It looks beautiful, just like it does in desktop. All the filtering is possible. Everything is possible. I get my tooltips. Same thing on server as on desktop. This one, as well, I can do all the same interactivity. It’s all within server as well as desktop. I’m now resizing the nodes. I can hover over any node and see the product associations that are associated with them. I can do the same thing that I did in desktop with this dashboard, which is maybe suited for a different type of user who simply wants to click on a product, see the associations with it. So I can click on a garlic-flavored pizza skew and now Tableau is going to run an action which brings up just the products associated. Now, here’s a chicken skew. Remember, it’s all the same features. Interactivity is all there.
54:38 Same thing with generating machine learning predictions. We did it in Tableau Desktop, but it can be done just as easily in Tableau Server. So if I wanted to look at hand roller press and a drill press, same thing. So it operates across Tableau Server. Tableau Desktop, you can leverage all the interactivity of Tableau Desktop. It just works really great. Same thing with explaining predictions where you can look at the predictions that we just generated and we can filter for any type of machine, such as surface grinder. It will do the same thing. It will show me if I want to see just the failure risk predictions, it’s going to show me the factors that are contributing to that. All live on the web. Same thing with the simulator we saw together. We looked at together a moment ago. Where I can adjust the values of various parameters. The model will recalculate. All works on the web. All works in desktop. All works whatever your device is, as long, of course, as you design the dashboard appropriately.
55:51 So to close with these two fabulous platforms. These best-in-class platforms. They really are better together. They’re mature. They’re fully capable. I’ve loved using both of them for many years. I have tremendous respect for both organizations. And the time is really right to yoke these great platforms together. The number of use cases you can address are many, many, many, and the opportunity for creativity is also practically boundless. So with that, I’m going to hand it back to Scott. And thank you very much for listening and best of luck with your Tableau and RapidMiner integration work. Thank you very much.
56:31 Thank you, Michael. That was great. I particularly like the Apple Watch bit. That was fantastic. So we’re just going to wrap up quickly here. And I just want to say that Michael did show RapidMiner Go, which is a great kind of easy way to build machine learning models and also kind of through Studio how you could see the visual framework and how that helps with explainability, which is huge for collaboration. So our clients tell us they prefer to work in Python and R, but they tend to show in collaborative group meetings the RapidMiner processes because if they opened up a notebook and started showing code, people would jump out the window. And so you can see how that kind of visual component can really help with collaboration. So I’m going to wrap up here just with tossing it over to Nathan from Tableau to give a little bit of perspective on kind of the integration and the partnership and why it’s important to Tableau and their perspective of data science and machine learning.
57:34 Yeah. No, thank you, Scott. And thanks, Michael. It was a great demonstration. I think it really shows off– I mean, Michael took us through the entire kind of end-to-end experience. And I think showing that that can be done so quickly and flexibly is a really big part of the overall value prop and a very critical and often all too overlooked part of the data science workflow. As we talked about at the beginning, we so often see on the field really good models that never make it that last mile into the hands of the business and into a place where they can actually change decisions or really drive an outcome. And so that means all of that effort along the way, all of the time, all of the energy and creativity, all of the technology really was wasted because it didn’t achieve its purpose of actually driving a business decision or changing an outcome.
58:29 And so when you can take that data science work and the power and flexibility that people are able to put into RapidMiner, whether they’re doing clicks or whether they’re doing code, and then you can plug that into a highway that will deliver that to where the business is already comfortable, where people are already working, consuming and interacting with information, you create this really nice synergistic effect where you can perform advanced analytics and then get the results quickly to the business. And then that means that there’s also this ability for the people who are on the consumption end to start to have more of a voice in that process as well because they’re able to touch and see and interact with the outputs of a predictive model and even start to understand what drives the model and what is driving those predictions. And that means that they can start to bring more of their voice into the conversation. They’re not just consuming results in a PowerPoint presentation that’s static and a one-time thing. They can sort of do it on their terms. And this really starts to amplify the value of the time and energy spent by data science teams and predictive analytics, and then also just the strength and competence of the business in being able to be a part of that process and really exercising their rights and roles as a stakeholder in the overall data science workflow.
59:50 So it’s a really powerful combination and something that we at Tableau are really excited and interested in continuing to develop is helping make sure that we are a platform that people can use to communicate and collaborate across data. Whether that’s working on data sources that have been generated by advanced analysis technologies, working on visualizations that are interactive and powered by those tools. And we see this interaction with RapidMiner as a perfect example of where you’re really getting kind of the best of both worlds. Really powerful, flexible data science, and the ability to make that business relevant and do that quickly and flexibly. So really excited to see everything we’ve done here and excited to be a part of this conversation.
01:00:31 Yeah. We’re excited to continue working with Tableau on this integration. And a lot of our clients are really excited about this integration. I won’t go through this in detail. We’re a little shorter on time. But we’ve got multiple examples of use cases that have been deployed in the real-world with kind of joint Tableau, RapidMiner customers. I talked at the beginning of the webinar about how a lot of times one of the biggest impediments to utilizing machine learning effectively is just getting started. And one of the hardest parts about getting started is identifying the most appropriate use cases. And we’ve actually pulled together an asset called 50 Ways to Impact Your Business with AI, which has 50 real-world scenarios of machine learning being deployed and delivering incredibly large measurable impacts to the business. And we’ve kind of isolated three here that really revolve around RapidMiner plus Tableau. And so this is an electronic trading platform who is automating real-time reporting for high-frequency transactions.
01:01:36 This is my favorite, which is Shorelight Education. It’s a fascinating case study in the use of predictive analytics to identify risk and then better allocate resources in your workflows to help address risk. And then lastly, this kind of falls in line with the last use case Michael showed, which is avoiding operation shutdowns with predictive maintenance. Preventing catastrophic machine failure in a shop floor environment through the use of predictive maintenance, which is by and large the most popular use case for machine learning in most manufacturing environments. And so I won’t go into these in detail, obviously, but I will encourage you guys, if you’re interested, to go to the RapidMiner website and just check out 50 Ways to Impact Your Business with AI.
01:02:21 We do have a couple of questions that I want to briefly get to. If you’d like to learn more about this integration, we have established a landing page for you with a confluence of resources, rapidminer.com/tableau. Pretty straightforward and easy to remember. And I also want to encourage you, if you need assistance with analytics, to engage with Michael. You can see how impressive he is and how thoughtful he is. And a lot of RapidMiner clients really love working with Michael, and he has extensive experience with both platforms. So I will address the first question, which is, when is this integration available? And I’ll just quickly and easily say right now. The integration is available right now. You just need both platforms to make it work. The second question just to confirm, this is– I think Michael addressed this one, but just think it’s worth reiterating. This capability is available on Tableau desktop and server as well, correct?
01:03:22 Right. Yeah. Yeah. Not a whole lot to say there either, but just Michael, I think you kind of showcased a lot of that extensively towards the end of your demos. Maybe that question came in before that. But certainly, worth reiterating. This one is maybe one that we can direct towards Nathan. So our analytic extension’s available through Tableau online?
01:03:46 Yep. That’s something that we’re really excited about and the capability we rolled out relatively recently at the end of 2020 and early 2021. So analytics extension. So this type of integration that you’re seeing with RapidMiner here where its dynamic and interactive individualization is available in Tableau online. So that means across all of our products. Desktop, Tableau Server, and Tableau Online. You can use the types of integration that Michael showed off here today.
01:04:12 Great. And this last question is really about the RapidMiner platform, so I can handle this one. It’s pretty straightforward as well. We’ve got some more questions. If we didn’t get to your question, please feel free to send it to firstname.lastname@example.org, and we’re happy to answer it directly. So this last question is really around Python and R. It says we prefer to do a lot of our modeling in R and Python. Is there a way that we can do that through the RapidMiner platform? And I will say that we’ve got multiple ways to integrate Python and R. We have a full-blown governed coding environment through RapidMiner Notebooks, which I talked about earlier in the webinar. But we also have just a quick and easy way to embed Python into your workflows.
01:04:52 Michael showed a lot of these analytical workflows that you can develop through the use of RapidMiner. And you can essentially plugin what we call a Python Operator, which is some custom code that you’ve created in Python. And you can allow other users to access that operator and reuse it, which is really nice when you’re working at kind of multi persona environment. So a lot of different ways that you can work Python and R into your RapidMiner modeling and model management processes. So I think we’re just about out of time. So with that, I will wrap up. And once again, huge thank you to Michael and Nathan for joining us today and co-presenting on the webinar. And thanks to everyone for joining us for today’s session.