Hello everyone, and thank you for joining us for today’s webinar, Integrating Business Intelligence and Data Science. I’m Hayley Matusow with RapidMiner, and I’ll be your moderator for today’s session. We’re joined today by Vijay Kotu. Mr. Kotu is co-author of Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner. He’s been a member of the advisory board of RapidMiner, Inc. since August of 2016 and has practiced analytics for over a decade with focus on predictive analytics, business intelligence, data mining, web analytics, and developing analytical teams. We’ll get started in just a few minutes, but first, a few housekeeping items for those on the line. Today’s webinar is being recorded, and you’ll receive a link to the on-demand version via email within one to two business days. You’re free to share that link with colleagues who are not able to attend today’s live session. Second, if you have any trouble with audio or video today, your best bet is to try logging out and logging back in, which should resolve the issue in most cases. Finally, we’ll have a question and answer session at the end of today’s presentation. Please feel free to ask questions at any time via the questions panel on the right-hand side of your screen. We’ll leave time at the end to get to everyone’s questions. I’ll now go ahead and pass it over to Vijay.
Thank you, Haylee. One second. Perfect. Hello everyone, thank you for joining me for today’s discussion on integrating business intelligence with data science. I was fortunate to be involved in both these fields, and I’m very excited to talk about how to be bringing these two fields together. Perhaps these two fields are the biggest manifestation of analytics in any company, and these two fields form the foundation for any data-science-related journey that you want to take. Let me preface my discussion by– hold on. There are some PowerPoint issues. Let’s switch it back here.
Good. We’re back online. Good. Let me preface my discussion by defining and differentiating these two fields, business intelligence and data science. So the essential intent of business intelligence is making data accessible for a wider audience. In these days, perhaps everyone in the organization are a user of BI tools. In data science, the intent is to find useful patterns in the data. You usually have limited distribution, and your output of data science could be in PowerPoint or in studies. It reaches executive team to the decision makers, but perhaps not widely to everyone in the organization. So the technique used in BI is dimension slicing, versus in data science, they are deploying and exploiting algorithms. The output is insight and prediction in data science, but mostly access reporting or history reporting in business intelligence. So the technology that we use in BI space is OLAP tools or better known as BI tools and vendors like Tableau, Qlik, Cognos, MicroStrategy, and there’s lots of new players here. And the technologies used in data science is a mix of statistics, machine learning, and computing, and RapidMiner is one of the leaders in getting the data science in the hands of data scientists. So these two fields, even though they are involved in analytics, they have evolved separately. And why are they far apart right now? Move to the next one.
There are many reasons why these two fields evolved independently. Number one, people who practice BI and people who practice data science are separate, and the skillsets they have are separate. They usually live in two different organizations. BI, it could roll up to IT, finance, product organizations. In data science, either in R&D or in labs or product organizations as well. And the technology that we use have very limited overlap as well. There are a lot of new vendors in BI space. There are a lot of new vendors in data science space. And the specific use cases, what these tools solve, and the use cases are different, more importantly. One deals with all the history reporting. The other one deals with predicting or extracting insight in these cases.
Let’s consider one story in a scenario. Let me start with, say, if you are in marketing and written up investment of a campaign that you launched recently and concluded, let’s say it is 154%. This is a classic history reporting use case, which is pretty much what BI can do very efficiently. You can slice and dice this data, where your traffic came from, but it all deals with history data. Let’s go a step further. What if we know ROI for a future lead generation campaign is going to be 175%? This is definitely more valuable than the other information that we had, ROI of a past campaign is 154%. While the other information is very relevant and important, the new information is a little bit more actionable, and it is going to help us to shape our future campaign and make some decisions.
Let’s go a step further in this journey. There are 56 leads that have high propensity for conversion. This is the most specific. So imagine these other two information that you had is all aggregate information. Now they are getting very specific into actual leads who are having high probability of conversion. So that’s very interesting. And what if you take a step further? Send 15%-off promotion to these 100 leads, specifically we know those leads, and also we can predict how much the promotion should be. It’s not 10%. It should be 15%, and it should not be 20% as well. So this is getting more interesting. So if you notice from our previous example of ROI of launch campaign that was 154 all the way coming to this scenario, the value is increasing step by step. Actionability of this information is increasing, and also importantly, number of people who are involved in this decision making is increasing as well. In the first example, it might be the CMO or a few campaign managers who are interested in this information, but the last one, your organization is interested. Everyone in the marketing in the organization is interested, and the product manager is interested as well. So you have more people involved in making this more actionable, the better quality of this information.
So the first that you started off with is classic BI reporting. These two are predictive analytics. And the last one is more prescriptive analytics because it’s prescribing a particular action to achieve value for the overall company. So this is very, very interesting mind-shift for the scenarios that we’re dealing with by bringing these two together. Let’s get into more examples here. So yesterday’s revenue versus tomorrow’s revenue and then last one is, if this particular, specific business is bound to decline in this week. So that’s getting very specific. Now let’s go to the other example here. Number of customers last month versus who is more likely to get churned in the next 10 days versus positive customer right now and offer this particular offer. So that is getting very specific as well. Let’s go to another example. Production downtime. Report for history information versus what part of the process will fail in future and getting more information ahead if you use this replacement part in it for these five motors. So they are getting more specific in this. So this is a transition that is happening on what happened, what will happen, and how did it affect the future outcome.
So this transition can be enabled by bringing business intelligence, traditional business intelligence, with data science, and there are a few other reasons why we should do this. Number one, bringing history information, predicting information, and prescriptive information all together. Number two is distribution. So data science embedded in BI can have wide variety of distribution. BI already has this massive distribution, and now we are piling on having data science to it. So you can have the output that you have created for data science now can be accessed by hundreds and thousands of people across entire organization. And BI does the secure, relevant information delivery to right people at right time, and data science can leverage it, so the output can be reaching the right people at the right time with proper security parameters if you’re part of a large company. And it also creates a more viable environment for beyond big data. Now we can go to big insights and getting this information to the right people.
All right. How do we make this happen? So before we get into the question of how, I’d like to paint a picture of what we are trying to achieve, so let’s take some real example and see what is the outcome that you can expect when you integrate data science with BI? Okay. This is a classic BI model. You are getting data from different sources. You are bringing it in one particular target schema, and you have a BI tool on top of it, and you are creating dashboards, reports, ad hoc queries. So this model has been working fine by integrating, let’s say, customer data, product data, employee data, marketing. So you have one source of truth. Number two is, it’s all linked together, so as a customer, you can see what customer is talking about in social media about your company versus what customer had been purchasing. So you can bring all this data together. So it’s interesting. It’s also very relevant to source some of the information to other data as it’s filed?.
The dashboards that we develop in BI are used by many people, and it’s also very intuitive as well. So the dashboard that you have on the customer side, for example– let’s take this. So here’s a sample customer dashboard. Let’s say this dashboard is for one customer, Acme. It has a bunch of information, relevant information by source and different areas, classic BI report. But what if you could create a data science model on top of it by bringing all this purchase data, sample data, and create a computational model and put a big alert at the top that this customer had very high churn risk, and we have to send a particular offer or any of the campaigns that we are planning? What if it can then provide that information? Now this information is going to be used by every single account manager, and they can see this, rather than just sending a list of all the churn-risk customers as a separate file.
Number two. Let’s say if you’re a product manager, and there’s a bunch of relevant information presented in a BI tool, and what if you can consume some of this information, create a recommendation engine on top of it, and then we can provide the same information to the product manager, “Here is the relevance of product opportunities, all based on customer purchase pattern.” Let’s take another example here, Helpdesk dashboard. So wherever you find some text information that we report in BI, that’s a great use case for us to do some text analytics on top of it and do sentiment analysis. Are they talking about a product? Is it good, bad, or some way other? So that’s a good way for us to– wherever you find the text, it’s a good way for us to unlock the data and then mine that data using text mining.
Next one, on the forecasting, we have history report. That is great, but we can put together some time series forecasting model on top of it and predict the performance for the future. It could take information on past performance, but it could also combine other information and functions that we can have, and we can show it the same way in a BI tool.
Finally, imagine there is an operation dashboard. You have a bunch of metrics out there, but what if we could append some more information through data science like, for example, the number of customers who’s going to churn. Here is the list. Call them and provide some call to action. Number two. Hey, sales forecast changed. There’s a changing of pricing here. Do some scenario modeling. We’ll touch on the bidirectional BI as well. So there are perhaps some parameters here that we can change and see what the forecast is going to be. Number three is on affinity of product dashboards, so you can provide some recommendations on it. We talked about some interpret analysis based on text data that we received from our customers, so marketing data. So the interesting thing here is a call to action, rather than just presenting information assets, we can mine some more information on top of it and provide some call to action. That’s getting into the world of prescriptive analytics.
Good. Now let’s discuss some architectural options that we have. So if you go about integrating BI with data science, there are a few ways that we can do it logically. Number one, let’s start with the baseline that we have. So you have a BI installation. Let’s assume you have a bunch of data sources, and you’re bringing it into a data warehouse, and there’s a BI tool on top of it, dashboard, similar to the model that we have, very simplified. On the data science side, for modeling an activity, you’re bringing in a bunch of training sets. Assume that it’s test and validation sets as well, and we are creating a model, and we’re using that model to score unseen input in case of classification and re-creation, and in some cases model if the outcome rather than scoring the data, but that’s just the general framework here.
So how do we bring these two tools together? Number one, what we can do is use data science merely as an input for BI. So in this case, data science is quite independent with BI installation. All we have is just one input. For example, churn model. Data science can send you customer ID and the churn risk every day, and BI can consume the data and show it in the customer dashboard. So this works, and you’re keeping two different tools completely independently, except that one synapse of BI getting its source, but it’s a bit offline as well. What if there’s new data coming in real time using Spark and everything, and it’ll be very difficult for us to integrate these two.
Number two is data science is used as a modeling tool, and then we can take that model and put it in data warehouse or in BI. There are a couple of ways for us to do it if the model is quite simple. If you have a decision tree, you can code it in data warehouse like what you can do it in a production environment in any programming language, or you can code it in the BI tool as well. But it’s quite simple. Or you can do a PML export here. So what are the pros and cons of this particular model? The pro is a data warehouse model, if it’s simple enough, it can execute it even for new data. But con here is, if the model is a production-quality model where you have ensemble of multiple different models and some might have, let’s say, kNN models and everything, so it is very difficult for us to actually port it and reliably execute this model in the production environment.
Let’s see option number three. This is a bit more integrated. In this case, both data warehouse can serve as an input for data science tool because it brings more information, and perhaps data science tool can have additional input outside of the data warehouse as well. And we do the modeling in data science tool and the production model is ready, and data warehouse provides this unseen input to data science tool, and finally we have related version of this core data directly into our BI tool. This works, and this is unidirectional integration which leverages the strength of both the tools, and data science have this robust modeling capability, and data warehouse has the latest information that needs to be stored, integrated very well in the BI. So this is an online integration.
Taking a step further, we can do bidirectional integration as well. The only difference from the previous model is some of the modeling parameters for inputs can be provided from OLAP tool and have a bit of an interactive dashboards where you can change some of the modeling parameter. Best example here is, let’s say if you want to do customer segmentation, and it’s segmented three ways, but you want a more granular segmentation, so you can specify the clustering parameter there and have six segments instead of three segments. And it could happen in real time and data science keeps it as an input, it models it and sends the information back almost real-time. In many cases, this is done by Web services. We input some of the data mining process in the Web services, and the model gets executed almost real-time. So this is the bidirectional integration.
So before getting into a short demo, let me just delve quickly on rules and who is doing what as well, and how do we amplify our output from the data science side using BI to a lot of people. This is just a study just to show number of people involved in data science, and they are professional consumers, if that makes sense, producers versus consumers, it almost have 100x, in some cases 1000x impact when we get those results of data science into the hands of operation decision maker so that the impact of data science can be amplified manyfold. Other one is there are complementary skillsets involved between BI and data science as well. In BI side, heavy data engineering, bringing this information. That’s kind of little bit of overlap between these two roles. And in BI side, we have primarily dealing with OLAP and visualization, and data science, of many different algorithms. Your most time would probably be spent on action modeling but also in optimization and deployment, whereas BI team usually focuses about creating those dashboards and getting the right information to the right place. There is definitely a part for many people to appreciate more into the data science, get an introduction of data science. In fact, that is the first mission for my book as well, for BI professionals or outside the core data science professionals, they can understand data science, get a good introduction of data science using RapidMiner.
Let’s go into actual demo and the prototype. What I’m going to show you is creating a very quick process, very example process, and see if you can use some of the BI tools in the market to visualize this. Let me start with an example. So this is the same slide I showed in some of the mockups. What if you can consume some of the product dashboard– what if it can consume some product details and provide recommendations on the product? I’m going to take an example of– let’s assume that we have movies as a product. So there’s a movie example here. Perfect. Good. So what we are going to do is build a recommendation engine. I’m going to take the items on the product as movies, but you can assume the items are products that is relevant for your enterprise. It could be anything. It could be widgets instead of movies, yeah. Users, other customers– pretty much everyone has customers, and now customers interact with the product by making that purchase decision, or in this case, they interact with the product by saying ratings. You have a name. They like this product or not like this product. And then it’s actually a case where they did not see this movie, so they don’t have opinion because they haven’t purchased it. Let’s say User Two would say N/A, N/A in these two movies, and last one said “Yes” to Imitation Game movie. And User Three and Four, similarly. So we have a utility metric where people have responded positively to a few movies that they have watched and some movies they haven’t watched. So what if, let’s say, I’m User Five, and I know I’ve responded positively to two of these movies, and the last one, the prediction objective is how would I respond to the last movie that I haven’t seen? Can we predict that rating? So the essence of recommendation engine is to predict rating for items that you have not purchased or the movies that you have not seen so that the ratings are very high for that movie, then I can recommend that movie to you. And this is how recommendation engine work. In this particular case, not go into the details, we are going to use, essentially, the user recommendation collaborative filtering. And it just looks at the information and see how correlated my ratings here, my given ratings in this case, the last row, and how correlated they are with other users so that I can be in the cohort of other users like me and how they have rated the last movie, in this case, The Imitation Game. So heuristically, it feels like I would like Imitation Game because of just visually looking at my rating in sync with other, particularly the user being rating here.
Cool. Let me actually go into the demo side, and here’s an example of, say, if you just– the previous one was a high-level view. This is a little bit more product and more users, and you can see ratings matrix. And you can see this is a very sparse matrix as well. There’s obviously blank columns here, and that is what we– blank cells here, that is what we need to fill, like what you see here. And you can use RapidMiner to create this recommendation engine, and then we’ll use Tableau to visualize how this recommendation engine works. Let me invoke RapidMiner here. Fantastic. So I’m showing a sample process. Let me show you a new process here. I don’t want to type anything in the comment now. And let me create two datasets. This is movie-length dataset. It has 475K ratings. I just want to show you the inputs. Good. Similar to the example that I showed, and you can create this kind of dataset within your own company. All you need is your customer data and your product purchase data, bring it all together. So in this case, user ID, movie ID, rating, and timestamp, which we don’t need it.
The next one is– another dataset here, movie ID and title. So it’s always good to know the title and not just the ID. So why don’t we just join these together.
And I would specify– perfect. I specify what columns I need to join. We are in the data preparation side right now. Movie ID needs to be joined. It’s straightforward. There’s one more column in this whole dataset. I don’t need it, so I will use “Select attributes” to ignore that one attribute called “Timestamp.” Cool. So far, everything good. So for modeling, we are going to use recommendation engines, and that is available right now in the extension, and particularly, I’m going to use collaborative filtering here. So I’m using user kNN and remember, I need to find users similar to my profile and my rating profile, so I can see their ratings and aggregate it and predict a rating for the movies that I haven’t seen. There is one operator that we need, a step before this. It’s “Set row.” It’s a very specific operator because we need to declare “Rating” as my objective of the process. So in this case, I’m going to do name, rating, the label. And also there’s two specific things I need to do. I need to let the algorithm know which ID is considered “Item” and which ID I need to consider as “User.” Hopefully my spelling is right. Perfect. And now let’s execute the model. So it has about 75,000K. So it may take a few seconds then. And perfect, the modeling works. And I think this is collaborative filtering. That’s perfect. Now let’s actually use this for prediction as well, so I’m going to just do split data, and then explain because I need to use this as a test and training set as well. I’m going to use 95% of my data to actually model it, and I’ll use the rest for the prediction. Last one is, let’s apply this model. Cool. And we have model here, and 5% of data here, and I’m in that 5% right now. And we’ll do the model as well. Fantastic. Now the modeling works. So I’m just using this data to test my scores as well and the prediction, the last column, and the rating. You can use it for just a visual comparison. Say it’s stuck at 5. I’ve predicted at 4.1, 4.5. Sounds about right. And some of the 2 ratings that I have given is also predicted to be 2.4. So it is reasonably right, and obviously, you can use performance operator for you to look at an aggregation for all these performance parameters here in the model.
Perfect. Now you have a model. It is used to predict rating for a particular user, and we can use it as a recommendation engine. So we can get this output in Tableau. There’s one thing I’m going to show here. It’s actually saving this file and storing it in server and creating some of Web services as well, and we can invoke Tableau to get that information, actually. So here is where you’re getting full benefit of the visualizing the same model and the output as well. Number one, here is the summary of all the information and the rating and how the rating is spread, and here is the rating we have across all genre because you can slice and dice this data, and that’s what BI do. BI tools does it very efficiently. But the dots that you see are particular customer and how it is all spread. But let’s take one customer example because that is what you might be dealing in getting one customer’s input.
So when I click on it, you have couple of information here visualized in Tableau dashboard. Number one is, here’s a list of all the movies I’ve watched. Translate that in your business. This might be a list of all the items this particular customer has purchased. So this is looking at one customer. Number two is – this is the most important section – here we are recommending these movies, based on all the movies that you’ve watched and rated, here is the list of recommendations. This is the output that we can integrate in our product dashboard. So this is one example of how you can create a model in RapidMiner and use a BI tool to extract, visualize information and putting it in a relevant dashboard, and you can send it– you can put this in the customer dashboard, so whenever anyone is looking at the customer dashboard, we can come up with a list of recommended products.
So let me go back into my presentation. That’s the crux of integrating data science with BI. There are a couple of other examples as well. In this case, it is Qlik. There is a very similar– great. Cool. So in this case we have Qlik to visualize the data, and the process that they’re using is similar. It’s using upselling opportunities that they can come up with using an association algorithm. Similarly, the actual model was created in RapidMiner, and we are using Qlike for us to visualize this data.
Going back into our slide, so what is in it for a BI tool or team for us to know why we should integrate data science? For BI, data science offers great value by providing predictive and prescriptive data along with the history data, number one. Number two is now we are getting into the actionability part, and that is one of the key feedback anyone provides to the BI dashboards. How actionable is this information? And now we have data science on top of it or embedded in it, the information becomes much more actionable and providing right information to the right people. The last one is the call to action. And this is another extension of this idea rather than just showing the predictive information, we can make it more prescriptive by actually providing concrete action to our operators as well. And that provides even more value than just predicting a particular data point.
For data science, BI offers first of all, wide variety of distribution. It gets the output that data science professional creates in the hands of actual decision makers across the organization, 1,000-plus people, and it can amplify the work we have done in the data science side using BI’s secure platform. And it also provides security for the users as well, for them to accept some of the results coming from data science because they are already familiar with all the information provided in BI, and this provides the extension of that information. The last one is on the training set and the model deployment. Training set could be sourced from data warehouse because usually it goes to multiple processes, so that might be added benefit, and also it provides a very efficient model deployment. More than anything, as a data science professional, you are creating something of value. Any creator would want is your product being used by wide variety of people. That gives the greatest satisfaction, and BI provides that very good part for us to deploy all the work that you are doing in data science in the hands of wide variety of people and making decisions about it.
So with that, I’m going to conclude my session and open this for questions.
Great. So thanks, Vijay, and as a reminder to the audience, we’re going to be sending a recording of today’s presentation within the next few business days via email. I know a couple of questions came in, so we’ll go ahead and send the slides as well as the recording to you guys. So as Vijay said, now it’s time for the question and answer, so looks like we have a couple of questions that have come in, but if you have any questions, please feel free to submit those in the questions panel in the right-hand side of your screen. So we’ll go ahead and address those now. So the first question is, “How can I transfer my RapidMiner models to Tableau?”
Yeah. So currently, there is a couple of ways that we can do now. The model that I’ve shown as an example is actually an export of a Tableau format, and there is an extension operator available in RapidMiner for us to export that data and then import that data back to Tableau. So that is the currently available option in RapidMiner side.
But I think it would be fair to say and add to that is, the model execution is still occurring in RapidMiner. What’s being sent back and forth can be scored results and/or parameters to the model to be scored or to run in RapidMiner. The model itself is not really what’s exported.
Yep. That’s right. Right, it’s a scored result. That’s a reference to the analytical architecture that we talked about, number three and four. There’s no model export here in this particular–
Tableau doesn’t run models.
Great, thanks. That was Bill from RapidMiner here chiming in. Next question here, sort of a follow-up to that. It says, “Does RapidMiner also work with Qlik then?”
Yes, I think extensively. There’s a level of integration that’s happening between Qlik and one of the example I can provide is, you can export the RapidMiner results– I’m sorry. You can create a model in RapidMiner and use a server module for us to do the Web services, and that Web services can get integrated as a source for Qlik. It also provides a bidirectional integration as well. Say for example, if you change the modeling parameters in Qlik, and you can get the results back almost real-time.
I just wanted to comment that there’s another question that sort of dovetails with this. So the integration with– because there’s another one coming in now. It’s interesting. The questions are coming in now about integrations with not only Qlik but Pentaho and Power BI, Pentaho for people on the other side of the pond. People have different ways of pronouncing that one. But the net of this is the following. Think of it in two layers. There’s one layer of just us passing score results into those products. Now we’ve built native connectors today that export– for example, Tableau we can create a TDE file. For Qlik, we can actually create the QBD, if I have the right extension, file. But for others, it’s as simple as exporting out to a CSV or other importable file format that can be read directly into the front-end tool of choice. The second level is the bidirectional integration, meaning, do you have the ability to stay inside your BI front end or visualization front end and interact in both the model, real-time, run the score, send some parameters, score the results, and bring back the results? Today, we have published and built the two integrations for Tableau and Qlik, but depending upon the API capabilities of the front-end technologies, it’s doable. So we have the ability to both provide our API to invoke the models as well as share data if the function is prioritizing which BI tools are out there. But as users of those BI tools, there’s nothing that precludes you from doing that because our API is open. All right? We can go to the next question.
Great. So this question came in earlier. This person is asking about the analytical architecture, number four. And they’re asking, “Can you give one more example about the architecture?”
Okay. So I might have more examples here. Let me go to the slides. This is a bidirectional integration of BI with data science. So one example is what we provided, it changes the modeling parameters. Another example would the be the actual changing of the input data itself. So you might have some latest data available in the BI tool that can be sent as an input for the data science tool for it to score, and the score result comes back. So it appears to any model let’s say an anomaly detection. Anomaly detection is actually another interesting area that we talk about because you want to watch out for data anomaly as well. So it’s an interesting application of using the anomaly algorithms for us to see any data quality issues. As a BI professional, you would care about the data governance, the data quality, the information that is available to users. Don’t want to have some anomalies there. That’s another one example. There are more examples there, Bill?
Sure. Just to qualify things, I’m not a data scientist. I just pretend to be. But as a user of data science, one of the things that we did build internally, eating our own dog food, as they say, is to create a forecasting model. So when I take my historical sales results, I want to play with parameters such as product configurations, deal ASP, that’s deal sizes, and begin to look at that on a regional basis. And I’ll run simulations so that I’ll sit beside a BI tool that we use, and then I’ll send back different changes in ASP or different quantities of Studio being sold with server, and then I’ll press a button. It will go back into RapidMiner, run the score results, and then I’m able then to totally represent that against our historical sales. So I can see a prediction line against my historical line to see what that might impact in terms of net anticipated sales. And that’s a pretty common example to Vijay’s example there, that it’s really just marrying up a historical trendline with a prediction line, but the fact of it being bidirectional gives the end user– I don’t have to understand how the black box of the model works. I just need to understand the parameters that I can play with and get back the results.
And the conceptual framework here is a black box, right. Any input of that black box can come from the users playing around with the data as well, so it becomes a scenario modeling too.
Great. Another question here is, “How can I load dynamic data periodically to process in RapidMiner?”
So there’s a way to do scheduling in RapidMiner. I’m not the technologist here, so I can have you– we’ll have our presale team get back to you. But in essence, there’s ability both in the Studio product and through the server to do periodic loads or scheduling process.
And then the next question is, “Do you recommend PML for model exchange and integration?” So I’m going to give you my layman’s interpretation, and Vijay and I have not discussed this. I apologize if I’m going to put him on the spot. So while RapidMiner can consume PML and can export PML, what I do know is that the PML standard, you begin to lose various aspects of the model as it is, in our case, within the RapidMiner workflow. So it’s not a question of do I recommend it, it really is a function of where and when you want to export it to and what you’re trying to consume it for.
Yes. So yeah. You can export the model right now in PML, but it limits the functionality as well. So if there was prototype that you’re creating a very simple modeling, a very simple classification model, for example, and you’re applying it, yes, it can be done in PML. So you can export it. It just shows as a list of code, and you can consume it wherever PML is supported. Some BI vendors support that, and some database support that as well. But as Bill said, it fairly limits the functionality. If your model is quite robust, it has lots of parameters and also meta-parameters as well to parse the value dynamically, then that limits whether it could be exported to PML. So with simple modeling, absolutely, yes. For complex modeling, for probably most of the production models that you are using, the answer might be no.
Great. Thank. We have another question here. I’ll address this one to Bill. “Does it make sense if RapidMiner can also provide data visualization capabilities? Can it be one integrated product from end to end? Do you have any plans for that for RapidMiner?”
That’s a great question. And as interesting as it is for us to consider that, the reality is the data visualization tools on the market like Tableau and Qlik are best in breed. So we don’t want to really reinvent the wheel. And having said that, RapidMiner actually comes with– using the server component, a number of visualizations that you can use to express the score result and the parameters of the models and so forth. That comes with the product. But as extensive as it is, it’s not going to match feature-for-feature the quality and depth of capabilities of the data visualization products. So we feel a better investment of our time is to make and provide the API so that we can integrate with those products because frankly, most companies have already made that investment and have chosen a data visualization tool of choice.
Great. Thanks. Again, for those on the line asking about slides and a recording, we will be sending the slides and a recording after the presentation. So look for that in your email. And I have another question here–
I’ll read it off because of the computer.
“I’m a RapidMiner beginner. Are you aware of any online RapidMiner intensive training, or what’s the best way to learn RapidMiner?” So if you go to our website, there’s a section called “Getting started.” Now that provides a pretty exhaustive library of examples, use cases, sample models, and so forth. Inside the product itself, there are a number of tutorials that take you through some basic examples of churn, some very common use case examples to begin to learn that. I think the ultimate question to ask yourself is, “Are you proficient in data science, separate and distinct from learning the tool?” Because they really are two different educational requirements. So we RapidMiner also offer a Basics 1 and Basics 2 to make you proficient in the use of the tool. Having said that, we don’t provide really the basics of data science. You actually get some of that when you take those classes. But now I want to let Vijay selfishly promote the book that he has written because he is much more an appropriate educator on data science.
Yeah. We are dealing with introduction of data science, and that is one of the use case for RapidMiner. So as a data scientist, you want to learn about the practice– I’m sorry, the conflicts of the algorithms, just basics. It doesn’t involve much of math but a good math foundation would be helpful. And anyone who is involved in data or analysis, I think, it’s good to know about the introduction side of the data science so that you can appreciate the tool more. And then with the intention of creating some literature on showing the introduction of data science and concepts and also practice RapidMiner– because whenever you’re practicing it, the concepts stick better. So that was the focus on it. And there are some good courses available, really, online as well to teach the introduction of data science.
Another question, integration. “RapidMiner integrates well with Tableau, Qlik. Is it also possible to now put a RapidMiner to SharePoint?” And the answer is, we don’t provide a native connector to SharePoint, but you’re able to export again the results in a variety of file formats that can either be consumed as raw data– you actually can, with the visualization of RapidMiner, again nowhere near as extensive of a product, you can create JPEGs, PNGs of the various outputs. But I think you have to ask yourself, “What do you really want to send to SharePoint? Is it images? Is it pictures of the workflow?” That’s different than the actual score results. But the short answer to your question is, we don’t provide any sort of native connector to SharePoint.
And this also applies to– there’s another question on DotShare as well. And if you have your one programming interface for you to visualize the data, the same applies. And there’s also another one on, if you have RapidMiner servers, it can create the Web services and that can feed in some of the data as well.
Great. So I wanted to– it looks like we’re about time here, so I wanted to thank you, Vijay, again. Thanks, Bill, for joining us for the question and answer. For those on the line, if we weren’t able to address your question here at the webinar, we’ll make sure to follow up with you via email within the next few business days. So I want to thank everyone for joining us today, and enjoy the rest of your day.
Thank you, folks.
Thank you, Vijay.
Integrating Business Intelligence and Data Science
Your business intelligence platform can provide an effective way to integrate advanced analytical techniques into the business operations. Watch this webinar to learn about the powerful integration between your business intelligence platform and data science platforms like RapidMiner.