Skip to content

Better Together: RapidMiner and Tableau

Michael Martin, Information Arts

This session will provide a demonstration of the RapidMiner and Tableau Integration component developed by Bhupendra Patil of RapidMiner for classification and association mining.

00:03 [music] [applause] Thank you for the wonderful welcome. It’s really great to be here with you. And what I’m going to say is also going to be somewhat influenced and colored by some of the wonderful comments we’ve already heard this morning from Ingo and from Michael and the other Michael, too, about our world of data science, data analytics, and business value. No one can really go it alone anymore. Individual heroics can be wonderful, and we all love to be heroes when we can, but is it really sustainable for our businesses, for the people that depend on us, and of course, even for our families and the ones we love?

00:41 So this is my one PowerPoint slide. This is from a very well-known Gartner study. Which basically says, yeah, data science projects are not really delivering the hoped-for value because of deployment issues, because of models not really being vetted, because of models being airdropped on people at the last minute. There’s maybe been a project been going for a year or two. There’s rumbles around the boardroom and amongst the line of business workers that something is coming. But then something gets dropped on them, without particularly much preparation or dialogue or vetting, and then big problems happen.

01:22 So basically, the point I’m going to discuss is, okay, well, given the ubiquity and the large uses of the Tableau platform, what opportunities are there to build what I call sort of a campfire that people can gather around and collaborate with on the development of predictive models, on the accompanying reporting and analytics that would go with the usage of predictive models? And so Tableau being so widely deployed and ubiquitous is a great way to consider enhancing the collaborative experience, bringing in domain knowledge from business users, and informing the development process.

02:08 So a couple of examples. I was asked a little while ago to test a hypothesis. One big client of mine in consumer-packaged goods purchased another big company that made pizza. They bought the entire pizza catalog. And this company wanted to understand, “Gee, now since we own this pizza line and since we’re active in so many other product categories, if we were to really kind of play around with our promotion strategy, could we really drive a lot of extra volume by co-promoting our new pizza products together with all the products in our other categories?” So this was a very interesting type of project. They weren’t ipso facto asking for a model right away, but they were wanting to test this very big hypothesis, which really had implications.

03:01 So this is a RapidMiner process, literally, that builds an association model. And here we go. What I want to highlight is that RapidMiner, in its ability to play really well with all sorts of different products and platforms, RapidMiner’s introduced a couple of new wonderful operators. And one of them here on the lower right-hand side is write data to Tableau data source. And what that means is you’re writing data as a Tableau data source on a Tableau Server, which is Tableau in a browser platform. It’s huge. It’s very popular. So now, right out of a RapidMiner process, you can write data to Tableau Server.

03:52 So if we look here– I guess I just have to go over here. If we look here, this is my client-facing Tableau Server website. And right here, we see right out of this association rules process. When that process is run, it writes these data sets to Tableau Server. And essentially within RapidMiner now, we have connections which can exist within our repositories. So we now have the capability to create a connection within RapidMiner Studio to a Tableau Server instance, that’s either within your company or one that you maintain, and you are now able to literally host any data out of a RapidMiner process on Tableau’s Server that your users, if you’re using Tableau, can then connect to.

04:49 And the reason why this is interesting and important is, is that in every phase of your data science project, right from the beginning, even when you’re not even 100% sure about requirements, right when you’re getting your first team together, you’re able to demonstrate an end-to-end capacity to be able to develop a model. It may not in any way be what you’re going to end up with, but you’re able to show the business that we can develop a model, we can bring in our data, we can post the outputs of what the model does so that we’re immediately able with our own infrastructure or to anywhere in the world connect to these model outputs and discuss them together as business users, as coders and developers, as management. So that right away, we are showing that we can deliver.

05:44 And once we’ve run a process like this, and once we come into Tableau, essentially, I was able to show them something. “Here are all the product associations that that model output as a network graph. And we’re sizing these bubbles by transaction percentage.” And when they saw this, initially, they were very excited. They were thinking, “My gosh. We’re driving all sorts of transactions. This is exactly what we wanted. We have pizza. We have ice cream. We have all these other categories.” I’ve had to anonymize the data a bit. But at first, they were very excited. And then I was able to say, “Well, that’s okay. There’s lots of transactions here.”

06:28 But there’s only two products involved. And if you were to size these nodes by unit sales, you’d see, “Uh-oh. The real action is in pizza.” And because in Tableau, you’re able to come in here and just zoom in, they now immediately saw that, “Well, you were driving transactions, but they were largely within the pizza category.” So they learned right away, and I was able to help them the next time this happened, we realized that products were recommending themselves within the same flavor group. So this was a garlic-accented pizza. It was recommending other garlic pizza skews. Here’s one. A different product. Again, garlic representing in the same product group.

07:29 So here’s the full season. Tableau allowed us to aggregate statistics about transactions. And then just by hovering over any node in the network, see which products were recommending it and which products it was recommending. And what was key is the manufacturer brought in elements of the supply chain from the retailer, bought in store managers, brought in category managers, and we workshopped this for about four or five days in different categories. And this was possible because RapidMiner was delivering these wonderful insights and metadata on the RapidMiner side, and then we were able to visualize this in a very visual, immediately understandable way.

08:18 And then we were able to have the discussion. Because after all, we had to have widespread agreement. And there were some arguing. And what came out was the next step of this project had to do with optimizing product display when you promote. Secondary display. And for some people, this type of network analysis is interesting. But then we developed a series of visualizations that were very simple to use. So, for example, pick on any given pizza product. You see these associations on the lower half of the dashboard. This is a very simple thing that a category manager or sales rep could use to understand these associations. So right from the beginning when we were doing this, we were thinking, “What are all the roles that come into play in this type of project? And then what is the appropriate reporting deliverable or analytical deliverable for each role?”

09:20 And so no one can really do it alone. The end result, of course, is you want business value. But what was useful – because the manufacturers and the retailer involved were all using Tableau – we decided, “Well, let’s yoke up a great data science platform like RapidMiner and let’s yoke that up to the de facto standard that was accepted by these companies for reporting an analysis and collaboration.” And what really interests me is building that campfire, you could call it a digital campfire, that everyone can sit around from their own perspective and discuss and share what they’re seeing. And that’s really very much what Ingo and Michael we’re talking about this morning as well. This is just in a more specific, targeted way. So this is all very well and interesting. Okay. This is one use case.

10:17 There’s another use case I’d like to talk about more from the world of industry, and that has been alluded to, and that’s machine failure. So a use case, obviously from the world of IoT is that sure, wouldn’t it be great if we could intervene and understand what machines are prone to fail? And wouldn’t it be good as part of the model development process if we could interactively in real-time, experiment with models, sit around that campfire together, and then actually see live predictions? So, for example, now, you can, thanks to another brand-new operator in RapidMiner, which we will discuss– and is Bhupendra Patil in the room? Bhupendra are you here? Great. This man in the back deserves a huge round of applause. [applause] Because Bhupendra, within RapidMiner, has been wonderful in terms of super creative ideas and the technical know-how for that first operator, we saw a moment ago, which allows you to write out data from any RapidMiner process. Think about we know what we can do in RapidMiner, but to put that right out there as a Tableau data source people can instantaneously connect.

11:41 So, for example, here is data that we have taken in. This is new data from a factory that has just gone into operation. A model has been built, which modeled machine failure from an earlier factory. And imagine that developers, imagine business people, imagine the different stakeholders of this project are in a room together. And they say, “Well, we want to now see this model you just deploy 10 minutes ago. We want to see now how it operates at various threshold risks.” So I’m going to say, I’m going to start, actually, by saying, “Well, show me these types of lathes.” This is data just loaded into Tableau. And what’s going to happen is, when I click on apply, Tableau is going to send the data that’s in the view out to a RapidMiner server, where that model is going to be called as a web service. It’s going to score the data, and it is going to stream the results right back live into your Tableau dashboard. So I click on apply, and there it is. It has literally now come back live and real-time.

13:06 And what’s really interesting is part of what that web service does is it writes these predictions out to a database. And there they are. Timestamped with today’s date, today’s time. And now I want to go back and say, “Okay. Let’s run that again, but let’s set the failure threshold at this point to say 66%.” So now we’re able to, from Tableau, pass a parameter to a RapidMiner server hosting a web service, and I just hit enter. And it’s going to go out and bam, we now see everyone at 66% higher is now flagged as a failure risk. The business can talk about this, and they can say, “Well, why? Why did this happen?” By the way, I can go back to my database. I can run this again. And there are the predictions that just came in, live.

14:10 So the RapidMiner web service is also persisting these predictions, which can then be, through another process, written to a data Tableau. Depending on how much time I have left, I can run that. That will take these predictions out of this table. Write it to a Tableau Server data source, so that more people within the organization, should they desire, be able to directly connect to this data to have that campfire conversation. “Well, what does this mean?” “Well, there’s more than one way to do this.” So in Tableau, Michael mentioned a few minutes ago explaining predictions. I have to log into my database.


15:04 So now after this refreshes, I can now pick any machine type that I want and I can say, “Well, here are the model predictions. Show me where there is a failure risk assessment of yes.” And immediately because I’m using output from the RapidMiner processees, which allow me to explain the predictions, which can be customized, I’m essentially seeing that what’s really driving the attribute importance and then driving these risk assessments from the model are vibration, internal pressure, resistance, and humidity. The business can have a discussion about that. In other Tableau workbooks for this particular project, I was able to map the factory floor on a map in Tableau and you could literally see the locations of the machines. And then for cases where there was not a perceived failure risk, well, why? Mostly air circulation, rotation, and low readings of exhaust. So being able to do this around sort of a shared environment, a collaboration environment, a campfire, for lack of a better term, it drives that type of discussion, collaboration that is needed.

16:27 And why should it stop just here? Wherever the field of action is, why not put your predictions right on a smartwatch? Because with Tableau, Tableau Server has a mechanism called subscriptions. So essentially what you’re able to do, what I did in this instance, is I have the predictions. I have the model. I can take a Tableau workbook, which is custom formatted just for a smartwatch, for example, or a tablet, or a phone. Wherever the field of action is for your use case. Or it could be a desktop. But you can bet that a shop steward, if he or she needs to know something right away, they’re not necessarily going to be able to run back to a desktop and boot up. They’re on the factory floor. So seeing it on a watch, or a phone or a tablet is obviously recommended.

17:21 Now, I want to be mindful of time. So let’s look a little bit about, well, how does this happen? How does this work? Well, essentially within RapidMiner – within RapidMiner server, of course – we have our deployment’s directory. And within the RapidMiner deployment’s directory, there’s a special extension that RapidMiner’s going to be releasing very soon. And then with RapidMiner Server, once RapidMiner server starts up, it deploys that extension. And then literally within RapidMiner Server– I should start with RapidMiner Studio for a moment. If we connect to our repository, we can now see that I have deployed some processes here in RapidMiner Studio. Sorry about that. I just closed it. There we go. And then what I can do is, within RapidMiner Server, here is the process that essentially generated these predictions.

18:37 And if I were to open this for a moment, here’s the process. In RapidMiner Server, I can turn that into a web service in three or four clicks. That’s existing functionality. Then, within RapidMiner Server, I have to do a couple of things. And this is all going to be documented. Here we go. Here’s my RapidMiner Server. I’ll log in. And we can see here, here are the web services. And within RapidMiner Server, what you need to do is create an anonymous user account. This anonymous user account will not be a member of any groups. But you’re going to be signing in from Tableau as an anonymous user. Then, within your system settings, there are two new properties that you need to add: com RapidMiner analytics web anonymous resources set to true and com RapidMiner analytics web anonymous services true. And that allows the anonymous user account that you just saw the permissions to access web services dynamically.

19:49 Last but not least, of course, in RapidMiner Server, what you need to do is set permissions. So you do that within RapidMiner Studio. For this particular model, for example, you can set the access rights. And what you just want to make sure is that, anonymous can never write, but anonymous can read and execute this model. So this will all be documented. I actually helped write the documentation in concert with BP. And essentially you set up your permissions, you deploy the component, and bang, in Tableau, you connect to RapidMiner as an external service. And I suppose I should bounce back to this just for a minute to show that really all that’s what’s happening is here, look at this calculation at the bottom of the screen.

20:48 This is a calculation that you write within Tableau. And what we’re doing basically is we’re leveraging that channel to tab PI for calling Python and then there’s another channel over port 6311 for R. And so what BP and team and with all the wizardry is, is that we are using a very similar syntax, where I’m calling a RapidMiner Query, I’m calling a model called Predict New Factory Machine Failure Number 3A, and I’m passing through an argument, argument one, and I’m requesting a field back called failure. And my parameter is called Machine Failure Prediction Threshold. That parameter is simply a Tableau parameter that I am passing into the function.

21:40 So when you think about it, you could have in one dashboard several different web services being called with several different parameters. Depending on how you design your RapidMiner process and the value of this parameter is basically being captured in a RapidMiner Server Macro. So if you design your process to have all of these placeholders for your incoming parameters, you as a developer, in order to explain different scenarios to your business users and line of business users and management, you’re able to say, ‘Okay. If we do this, we get this result. If we do this, we get that result.” So it’s a wonderful way to drive the conversation. And driving the conversation is what it’s all about. Because we can’t do it all alone. Particularly, when we’re in large enterprises. Particularly, when there’s lots of risk.

22:32 And so I’m going to close here. There’s just a few minutes left. BP, will you come up here, please? This man. [applause] And we’d love to take questions. BP can speak much better than I can in terms of the ongoing future development of this component. But this component is going to be out there very soon. There’s going to be content on the RapidMiner website. I will be involved. I would love to speak with any of you who are interested in this type of synergy and collaboration between these two great platforms.

23:08 And what’s really great is, about 15 years ago, two groups of renegades in two different universities, were getting ready to shake up the world. RapidMiner in the University of Dortmund, Tableau at Stanford. And now these two platforms are mature. They have grown mutually strong, somewhat apart. But now is the time as one answer to addressing the model deployment dilemma, an epidemic is yoked together these two very mature platforms. Which started off developing rather independently, but now, through their maturity, have a great deal of synergistic and creative opportunity for using them. Thank you very much. [applause] [music]