Ladies and gentlemen, I’d like to welcome you to today’s event, Putting Your Analytics Into Action, moving from model creation to business application. Before we get started, I’d like to mention that today’s session is being recorded, and you are currently in a listen-only mode. Now, I’d like to acquaint you with some of the ways that you can participate in today’s event. We will have a question and answer session at the end of today’s presentation, and you may type in your questions to the panelists at any time in the Q and A panel on the right-hand side of your screen. Click the send button, and your questions will be placed into the queue. During today’s webinar, we will also ask for your feedback by voting on several polling questions which will appear in the polling panel also on the right side of your screen. During that time, please click on the radio button or the circle that corresponds to your answer in the poll panel, and then click on the Submit button to cast your vote. Please note the polling question will only appear once the poll has been activated by the host. Finally, if you experience any technical difficulties during today’s webinar, please chat your issue to the host using the chat panel on the right side of your screen. So again, the Q and A panel is for any content-related questions, and the chat panel is there if you have any technical issues today.
At this time, let’s begin today’s webinar, Putting Your Analytics Into Action. I would like to introduce your three speakers today. First, we have Bill Doyle who is the vice president of FICO’s Decision Management Solutions. A bit later, we’ll be hearing from Bhupendra Patil, who also goes by B.P. B.P. is the director of Solution Consulting for RapidMiner. And our third presenter today will be Libin Varghese who is the principal sales consultant for FICO’s Decision Management Solutions. Bill, I’ll hand the floor over to you.
Thank you very much. Welcome everyone and thank you for making the time to join us today. And this is Bill Doyle. And as you’ve heard, I’ll be joined by both B.P. and Libin, B.P. from RapidMiner and Libin, my colleague with FICO. The telepresentation, Putting Your Analytics Into Action, and as important here, moving from model creation to business application. Many of you are beginning or deep into the journey of building predictive and prescriptive models. But how many of you are really fully reaping the rewards? In our presentation today, through a combination of both the RapidMiner and FICO technologies, we’re going to show you how to better realize the true and full value of the investments you’re making into advanced analytics. And before we get started, I do want to take a quick poll that’s going to let us get a sense of who you are, if my screen would work here. Here we go. Sorry . So if you would real quick just let us know who you are, and part of that is to help us understand how to tailor some of the presentation in terms of the balance of technical and business folks but also as we follow up and make sure you get the information you’re looking for, it’s critical that we understand the spectrum of folks that are represented here. All right. So I’m going to move forward.
Great. great. And we’ll go ahead and close that poll now. Just a reminder, if you’re still voting to go ahead and click the submit button. And those results should appear here momentarily.
Thanks. So let’s define the problem here. And we’ve been calling this the last mile problem. And you think about all the advances that are being made in the various artificial intelligence, machine learning, and various new albums that are coming out, all the niches that people are trying to launch. What’s surprising at the same time is how many companies are failing to take advantage of it and failing to get it into the hands of people that can actually use these types of predictive and Prescriptive models. And the answer is pretty simple. The why is because it’s not easy to deploy a solution that starts out as a model creation exercise. So the purpose of today’s presentation is to give you a solution to that. So before we get into the actual application, let’s talk about what we mean by deploying.
This is the common chain of events that goes on whenever you’re going to take advantage of predictive and prescriptive models. Business has a problem, and you can kind of segment business problem in typically three basic buckets: I want to raise profits, I want to reduce costs, I want to mitigate risk. And with that comes a pile of data where you want to believe that there’s some answers in there, some correlations to be found, some predictive signals. And so what you need is a data scientist who’s going to figure out the right way to do to go about building a model, but that itself is a challenge at times. What approach should they take? Which algorithms should they use? Are the correlations really true? And ultimately, the last challenge you have to really deal with here is you want something that actually can be consumed by the business user. And putting all these pieces together today is not an easy undertaking. And when you think about the byproducts, a lot of times of model creation exercises, a lot of that has wonderful outputs that can be consumed by the data scientist. But the business needs something that’s consumed by them. They need things in a business context. It can’t just be the stats and the math elements that the typical data scientists live in. They want to use applications that are visual. They want to use applications that can actually take actions. And it’s more than just getting one or two things. It’s actually a whole spectrum of applications that you can leverage on top of these models. You want to do what ifs. You want to do scenarios. You want to do simulations. So how do you bring all that together in the world of model creation?
So let’s establish a framework of a solution set. We think about this journey, the entire advanced analytic lifecycle let’s call it is you need the whole steps that the data scientists typically goes through in terms of pulling the data together, running a set of models, figuring out which models have the most credibility, in essence, candidate models. But what you want to do now is collaborate and work with the business analysts to actually let them help and test, bring their domain expertise and knowledge about the business problems to do some comparisons and test it with you, not just a siloed approach. And at the end of this exercise, you want a result in an application that can be used by not just even the business analyst but also the business users that to them the data science is going to be a black box. We don’t have to let them understand or know that you’re using a particular algorithm. They don’t need to understand a gradient-boosted tree or a linear regression and so forth. They want to work with applications in a way that makes sense to them. So let me pause there and move forward with letting the RapidMiner folks introduce their company and their technology. B.P.?
Excellent. Thank you, Bill. So what I want to start with is first what RapidMiner brings to the table for your data science initiative and, towards the end, we’ll combine RapidMiner and Xpress to build a joint solution here. But to get started off with RapidMiner as a platform, we really enable lightning fast data science with a focus on business impact. We do so by providing a visual, a very guided, a very automated data science platform for your teams. And we are doing this while we are making sure it’s a enterprise-grade software answering your questions around scalability, security, maintainability, and all that. The solution is focused on helping all kinds of things. So it has the depth for the data scientist. It is also very, very simplified for your business analysts, your citizen data scientists who want to start leveraging machine learning and advanced analytics today. The very good premise that we follow here is that the platform should not have any black boxes. By what we mean by that is your model should be trustable. They should be easy to explain and should be easy to tune. So it should not be a black box solution where you don’t know what’s happening behind the scenes there, right? And lastly, while the key things is we come from an open source background and we understand that the machine learning environment is scaling from there – you’re seeing a lot of new innovations there – our platform is very open-source-oriented. It is very extensible. More than half a million users actually use this platform with about 30K organizations across the globe use it today. We have about 500 enterprise customers and hundreds of market-based extensions that really bring to the table a platform that can solve any sort of data science problem we have today. Next slide please.
So these are the kind of personals typically that use RapidMiner. It could be anywhere from a business analyst to a data scientist. Executives obviously receive the benefit of all the hard work. But, obviously, they also want to collaborate and help drive the right models, the right solutions forward. We do so by thinking about the problems in three key phases: data prep, which is typically where you’ll spend a good amount of time more than likely 60, 70 percent of the time; we have the breadth and depth of machine learning algorithms that you need to solve your business challenges; and the last mile of it is deploying those for business consumption, right? We do that with our collaborative platform that involves a couple of key products, the Studio and the Auto Model and Turbo Prep are functionalities within that, the server, the Real-Time Scoring Agent, and the Hadoop bringing the enterprise capabilities around scalability, maintainability, and all that. Bill, next slide.
What we are going to focus today in the context of this webinar is what can RapidMiner Turbo Prep and Auto Model bring to the table, how it helps your citizen data scientist. It can start making them productive in the sense it’s easy for them to get into the data science journey. And obviously, on the other end of the spectrum, your TSDs, your trained data scientists will also see the benefit from being able to build solutions at a faster pace than having to code sometimes. We call ourselves the code free platform, and we see very, very good adoption with both the citizen data scientists as well as the trained data scientists. So with that hopefully you get the context of what RapidMiner bring to the table. As we will demo later on, you’ll probably see it in action. But I want to give it back to Bill for now and then do a demo later. Thank you.
Thanks, B.P. So quick about FICO. And many of you know us for our scores business. But I suspect some of you are not familiar with the fact that we’re a billion-dollar firm, global– 10,000 customers around the planet. A large organization. And when you look at our business, you can really divide it into three groups. At the highest level, the scores business that some of you have come to know and love and affect us all every day. But our software division actually is the larger piece of the business. It’s a $700-million software company you can divide into two parts. We have applications that are focused on credit lifecycle, fraud, and risk. But the applications themselves are built on our foundational platform which we call the Decision Management Suite. Now, the Decision Management Suite, the purpose of it is to enable companies of any size to build and deploy applications that leverage analytics and optimization to assist and automate and into intelligent decisioning. So the platform itself provides a complete portfolio of components that give you the necessary functional and tech capabilities to build and deploy these different types of applications.
Now, we can cover a wide range of applications from basic decision support where we’re just manually working with a model or generating scores. But where you’re going to ultimately want to graduate to is the ability to inject analytics and optimization into high performance transactional applications that are running models in real time. That’s the real when and where companies are going to realize the highest value of all the various models that are being built to market. Now, for today, we’re not going to go through all these capabilities. We’re going to focus on a very specific component within the DMS family. And, specifically, it’s called Xpress Insight. Now, this piece is built specifically for developing and building analytic apps. This gives you the capabilities to build your panels and your dashboards and allows you to interact with the models. This is the analyst side of the equation. This is where they’re going to do what if scenarios and simulations. Now, unlike visual analytic tools which allow you to just do sort of comparison of historical results to a predictive results, it’s actually going to give you the ability to run the what ifs and these scenarios that you can actually store for comparing to future runs or different simulations. So it’s giving you that bidirectional ability to storing and save information, not just view results. Part of our secret sauce is our scripting language called Mosel, and this is the glue that lets us bind to any modeling technology behind the scenes, RapidMiner being a great example. But you can bring other modeling languages to the equation. And the benefit of that is that when you think about this moving fast market of new techniques, new types of models that are being created on the market, you’re future proofing your investment by being able to preserve your development framework in the case of building applications within. But being able to bring in new types of modeling capabilities behind the scene. And so the obvious benefit here is that the data scientists can focus on doing data science instead of building applications. And those that actually can take advantage of the components here, you’re not having to rebuild it for every single application. You can share and reuse parts along the way reducing time dramatically. So with that, tying all the pieces back together here, we’re going to do a demo that shows how the RapidMiner componentry and the FICO componentry complete this suite of capabilities to give you the solution framework.
So another quick poll here. We wanted to ask folks to tell us what types of model languages are using today. Actually, the wrong poll question is coming up. There we go. There’s the right one. Take a couple seconds answering that folks, and I believe that’s multiple choice. All right. So as you’re finishing up that, I’m going to keep moving forward here. And let me set up the demonstration. So we’re going to use an example of churn forecasting at a telecom provider. And the intent here is to do a probability of subscribers leaving over a time period. Now, it’s going to be Libin and B.P. Filling the roles of the data scientists will be B.P., and Libin will be presenting the business side of things and how they’re going to collaborate together and ultimately deliver an application. This is a screenshot of where we want to land. I want an application that’s consumable by a business user. And as you can see from this simple example, it’s relevant and it’s in business context something that everyday Joe should be able to use. So with that, I’m going to hand over to B.P. to start the process and show us your application. So I got to stop sharing the screen, don’t I? Sorry. Floor is yours, B.P.
Excellent. Thank you. Jessica, you can make me the presenter. There you go. Thank you. I’m going to start sharing my screen here. Excellent. Thank you, Bill, for setting the context. What we’ll do here today is build a predictive model. As Bill said, I’m the data scientist for the day. What I’ve been tasked here is my business has found this cycle data for me. I’m going to quickly give you a glance of what that looks like. What we basically have here is from my 2018, ’19 data I know for each of my telecom subscribers, the phone number being an identifier, who has been a loyal customer or if he lost any customer that is marked as a churner. Obviously, from a telecom domain, you can expect standard data points like how much daytime minutes the user is using. Are there any charges? What’s the dollar value of that? Are they making a lot of evening calls, evening charges? Are they subscribed to an international plan? How many real calls are they making to international lines and so on, right? So really, what we have here is a profile that has been built for each of my subscribers. However, as you start looking at the data closely, you’ll start noticing problems already. For example, I see a bunch of question marks here. Now, this is not somebody trying to trick me. But many times, we may not know actually the data set because we maybe acquired a company and we never got the actual origination data and so on, righ? So these are genuine problems that are going to happen even when you have done your work and everything. So what RapidMiner brings to the table is the capability to the last mile data prep. So for example, if I look at this, the small red bar at the top shows me that there are missing values. It’s pretty insignificant. Only 1.3% of my data has missing values. But – you know what? – to be on the safer side, I’m going to start cleaning this up.
So for now, I’m going to highlight the column and say transform. And maybe I’ll just quickly filter and say should not be missing and hit apply. Okay? So far, I’ve taken one corrective action. You’ll notice my red bar has gone. I also see some histograms at the top kind of showing me the distribution of my data. And if you look closely enough– and I’ll spend some more time with the data. you’ll notice some of these kind of follow similar patterns. So maybe there’s some sort of correlation going on here. But we’ll tackle that later, right? But the idea is RapidMiner allows you to have a quick insight into the data. One more thing I want to do is since I’m predicting churners versus loyal, there’s a hypothesis. Maybe I have a service problem in a particular area code or a ZIP code. What I really want to just do is extract the area code out of the phone number because, typically, that’s tied to a geographic area. And maybe I have a network issue there or maybe I have old equipment there. So maybe there is some reason for me to believe that there could be a problem with the area codes also.
So for now, I just need to extract. So I’m going to quickly split this particular column, and we automatically have now three new columns dividing a phone number into the three parts of it. I’m pretty sure the phone number by itself is not going to be impactful for prediction. But I really want to just keep the area code. So let me just remove these columns, right? So RapidMiner then dropped ahead those columns also, right? And while I’m doing this, as you’ll notice, what’s happening is I’m taking the action. RapidMiner’s quickly showing me the results in a very interactive, intuitive way. I can really do my data prep here. It’s also capturing the history of steps I have done. So I filtered from the account lane. It’s not missing. I split up two columns, remove some columns, and so on. Obviously, I could keep doing actions here, or if I think this is a good enough point for a checkpoint, I’m going to commit a transformation. As I scroll through maybe the state now that I have the area code, I don’t need to worry about the state because the area code also describes– it belongs to a particular state generally. So let me just remove that column also. So as I do these corrective actions, these are some things I can know because of a little bit of business context here. But then there are certain things that may be helped actually by using the system and the automation that is built in RapidMiner to help me with selecting particular attributes and so on.
So for example, earlier I mentioned about how the day charge and the day minutes kind of have a similar histogram. I have a very, very big hunch here that the area code is off But I could simply go to cleanse and tell RapidMiner, “Hey, look through the data. Look at all the columns.” In this case, I have about 19 columns as you see at the bottom. In real world, you might have hundred of columns. It’s humanly impossible to go through them and find out which are correct and whatnot while as here I can simply say, “Hey, more than 90% threshold,” hit apply. RapidMiner dropped those column. And now, you’ll notice at the bottom there are only 15 columns. I like what I see. I’m going to commit this transformation, right? Again, we will not go through everything here. But the idea is I am cleaning up my data so that I can start building good quality models out of it. I’ve taken care of some of the known problems like missing values or columns that might be impactful in extracting more information. I’ve removed unnecessary columns to avoid unnecessary computation.
But then there are other feature functions that will allow you to prepare the data for your modeling exercises, right? Also, extracting new columns from existing columns, we have hundreds of functions that allow you to do that. You can do an aggregations, merging and all that. We’ll skip all of that. But for argument’s sake, I have a very, very good confidence that what I have here is maybe a good enough data set to start building predictive models out of. But to set the context again, my business guys have set me up for a project today which is let’s find out if we can build a predictive model for churners. And then, obviously, if we can, we want to deploy it further. I’ve so far finished my first part of the exercise which is pure data prep making sure my data quality is good to go into the modeling stage. And assuming we are good so far, I’m going to switch over to modeling by clicking on this model button at the top. And a very simple selection here. What kind of problem are we trying to solve? We are trying to solve a prediction problem, so I need to tell the system what’s my target variable. And this is now the Auto Model UI for RapidMiner. Obviously, all the details and what’s happening is on the right. But we’ll actually do this quickly. I first selected the problem and my target variable. This is now telling me the distribution of the data. And this is a very common problem. For example, your churners or machine failures, those are always on the lower side, right? So RapidMiner just set the distribution to make sure you understand what’s going on. And if you need to focus on a particular set of problem, in this case, our churners, you select that.
In this particular screen, the numbers can seem overwhelming but really what we are trying to do is we are trying to help you make decisions based on the data patterns whether a certain column should be used and would be a good indicator of a predictive pattern or not. For example, in this case my international plan is marked as red which indicates it may not be a good indicator. And the reason for that is the stability is on the higher side. Stability is typically when the same value gets repeated over and over. So obviously, in many cases, I don’t have customers with international plans. So there’s a lot of noes. The system thinks may not be useful. But again, talking to my business analyst, talking to my business guys, I know this might be an important factor. So I can override this. So again, I’m not throwing my data to an automated system and hoping for the best. I have the necessary information to make that decision. Similarly, customer phone area, evening calls, night calls. Orange is somewhere in between because they’re stable. Sometimes there are missing values and so on. But the real key thing here is I could go through hundreds of columns, and without having to worry about the minor details, I can get a nice summarized actionable list of what to do because next year– and at this point, RapidMiner shows you a list of models that are available for you. And as a data scientist, there are hundreds and hundreds of algorithms out there. What we have done here is given you the best of the class families of algorithm. All it takes is simply select which model you want to run or not and go from there. There are some advanced features which we’ll skip for today, but we can automatically extract text data. We can automatically extract meaningful date information as in blend of accounts or day of the month, day of the week, and so on. We can work with hundreds of features, find the right one for you, and also generate new features for you. Again, we’ll skip this for now. This is generally maybe my iteration two or three when I have the initial cold model. I try to improve it further. But for now, I simply hit next. I decide where I want to save decisions. So I’m going to quickly connect to my repository and will select a location for all the models and everything. Or maybe I’ll just create a new one here.
So let’s call this iteration six. And again, this is important because– like this is important because, while I’m building the solution, I want to make sure it’s traceable. I have a reason to explain when I have to audit the models and so on, right? So you’re going to save all these models and run them as we go here. And as we speak, RapidMiner started building those models. In the interest of time, I’m going to quickly switch over to a ready-made solution. So again, it’s the finished one. But this is what it will look like once you’re done with all the models. And this might be the end of my project, right, because I have found a model. I have found certain algorithms which are very, very good and it seems to be a winner. But that would be me risking my business if I really just go ahead and say, “Hey, you know what? My deep learning model is the perfect model. Go ahead. I’m using it,” I’m actually doing a disservice. I really want to now take a step back and understand what’s the context. I’m using a churner. Is it really good to you know mock everybody as positive or do I lose money if I start giving this attention? And so those are the business questions I need to understand. And some of those things are going to work in collaboration with my data scientist. Before I go to my data scientist with the algorithms and the models to use, what I want to do is quickly shortlist them. Looking at the accuracy chart is one way to look at it. But then what RapidMiner has done for us it has looked at various different measures. So errors, your positions and recalls and sensitivity. Again, these are DB dumps, helps the data scientist come up with a shortlist of how to use them or which models to go with, right?
But even taking it a step further, for each of the model, what RapidMiner does it obviously shows you the details about the model performance and the model charts and everything. So for example, if I dive into my decision tree, you’ll notice a nice tree here. I can go into my performance to see initial traffic for the model and so on. But all the other things we’ll do here quickly is we give you initial that simulator. The idea here is the user can quickly change the variables to the model to see how it behaves. What happens if I change my evening charge? Really not so much impact on the churner prediction. What if I switch my prediction law so it says yes? What happens if I use wrong values, right? So on each of the models you’ll notice the simulator allows you to test your hypothesis and come up with a conclusion, right? And also for each of the models I have predictions on the test data set that I held out but also some color coding indicator of which models are doing what as well as why they are doing it. Now, this is happening in my data science workbench. And like I said earlier, if I decide this on my own, maybe I’m not helping the business. What I really want to do here is, after looking at all these measures, I have shortlisted a few models at the top. And we migrated these three followed by my GLM and the deep learning and maybe even the logistic regression. And these are the models I want to hand over to my business analyst team so that they can actually test this data. Maybe we can do a joint exercise for the next few weeks or few months to make sure which is the model to use and what should we actually deploy.
Now to do that, what is happening here is RapidMiner actually builds the processes and the models for us. They’re all saved, as I mentioned, in a central location. So you are traceable historical data and all that. And then what I have done here is built a workflow that can now be really deployed across for consumption by experts in this case. All it takes is to convert this model into a integratable solution is couple of clicks. A workflow like this which is actually taking one of the models– we have parameterized it with a model ID. I’m also allowing my business analyst to kind of come up with a value of threshold. So those are some of the settings I’m going to explore for them. I simply right-click here. And the workflow that we just saw in RapidMiner is now a variable. If I click on the export web service here, that wraps the whole workflow, in this case, one that takes the model as an ID as input. It also asks the user for two more inputs like minimum and maximum. And what it really does for me is whenever my business analyst want to test a new set of data they can call me and get the output of what the model predictions would be and how it behaves and all that. And the real reason for doing that is I want to give the ability for my business analyst to try the models against the data sets maybe for the next few weeks or few months and then decide which is the right model to use. So I’ll take that web service and then hand it over to my counterpart, Libin, in this case, who’s going to then deploy it for our business analyst. Thank you.
Thanks, B.P. So let me quickly share my screen over here. All right. Great. So let’s do a quick recap, right? So let’s go back to this model that Bill was presenting and let’s see where we are now. So what B.P., the data scientist right now, has completed is creating the initial models. And he did some validation to pick a subset of models that he wants to share with me, the business user. So as the business user, what we are going to do right now is basically complete the validation by using Xpress Insight, right? So B.P. started off with hundreds of models moved into four models that he wants to expose to me, the business user, gradient-boosted trees, deep learning, GLM, logistic regression. And my job, as a business user, is to pick the best one, right? And again, I won’t be doing this in a vacuum or in a silo. But I will have the ability to interact back with my data scientist so that we come to a complete decision. So the key point here is that me, as a business user, I have the ability to run my simulations and what if analysis before selecting that one model that would go into production.
So let’s pick one of these views over here. What I also have access to is scenarios. So what Insight works on and is fundamental to Insight is this concept of scenarios. A scenario is a collection of inputs and outputs, right? So I have a bunch of different scenarios pre-created over here. Let’s just pick one of these. So I’m going to pick on deep learning. When I pick on deep learning, I have access to this control panel. So this is a control panel where, me as a business user, I have the ability to change some of these parameters. So the most basic one, in this case, is this dropdown which has the modules that B.P. just selected, right? So he selected four models for me, the business user, to validate. And that’s what I have a over here that I can pick from, right? I have some threshold values that I can set or additional parameters that I can set. So anything above this threshold value is classified as churn. So I have the ability to change that, and I can pick where the data is coming from. I can upload my own files for running the simulation or running the models, right? So basically, Insight lets you or lets the business user play around with that model that B.P. just built in the background, right, or the four models that B.P. just built on the background.
So taking all of these inputs when I click on this run models what’s happening is it’s going out, calling out RapidMiner passing these parameters that they just set, evaluating the models, and coming back to me with some results, right? So it just did that run. And what we have access to now are some reporting views, right? So in this particular view, we’re taking a look at what the state of my business is when I use this particular model that B.P built for me, the deep learning model, right? So this is basically looking at the output of the model in a business context, right, and taking a look at what are the number of calls during the day that came in and how do they behave differently between my churn population and my loyal population, right? So I’m trying to see if it makes sense at this point. So if we just look at number of calls, the distribution kind of seems similar, right? And I guess that makes sense. But if I look at international plan over here, I can see that people who are on the international plan tend to churn more compared to people who are not on the international plan. If I look at my customer service calls here, somewhere here on the bottom I can take a look at it and I can see that my churners tend to call in slightly more than people who are loyal, right? So at this point, what am I doing as a business user? I am seeing that the model predictions make sense in a business context. Are the lines on these charts moving the right directions, right? So customers making more calls to customer service tend to churn? Sure. That makes business sense. Customers on an international plan probably tend to travel a lot hence they are picky with international calling rates. Sure. That makes business sense, right? So if there’s anything that I see is awful here, right, something’s not behaving correctly based on what I know in the business, I can go back to my data scientist and say, “Hey, you know what? It looks fishy here, right? Can you take a look at it and see why it would behave that way?” So what we’re doing is we are bringing in the business user much earlier into the development process so that they have an active say as these models are being developed and before it’s put into production.
Now, I can also take a look at some model-specific charts if I want to. For example, here I have an accuracy curve, right, or accuracy plot. So I can see that the deep learning model is behaving decently well. And here’s where we can use some of the what if capabilities of Insight to see how these other models look like, right? So I can go into this scenario. I can clone this scenario as a brand-new scenario, right? And what that does is it essentially creates another scenario for me. Now, with these scenarios, what I can do is I can go and change some of these inputs. So I just picked a scenario that I created earlier. But in this particular scenario, generalized linear model as it’s called, I’m picking another model, right? So I can create as many scenarios I want by tweaking these parameters that are made available to me. And once that’s done, what I can very easily do is bring them all on the shelf, right? So these are all sitting side by side. And now, whatever reports we have built here, they become multi-scenario. So now, I’m looking at all of these different models on the same chart. And I can see that, “Hey, gradient-boosted tree seems good, right? Its accuracy is pretty high. It’s up there on the top. Logistic regression looks decent, I guess, right? So this is giving a window into the models for the business user, right? So I can also take a look at some of the other charts. So I can take a look at some of the true positives versus false negatives. So we are looking for churn over here, right? So a false negative means I think someone is going to churn and they don’t churn. That’s not bad. But if I think someone is not going to churn and they churn, that means they lose a customer, right? So it’s important to take a look at these charts over here to figure out where that threshold is that I need to use that gives me the best results, right?
So over here, I can take a look at all of this. Again, coming back to the story, so gradient-boosted tree seems to be a good model that I may want to put in production, right? So what I can do is then narrow it down to that particular scenario. So just pick the gradient-boosted trees. And now, what I can do is I can run all sorts of other simulations in here as well. So I have this whole concept of start dates, end dates. I can upload data sets, etc. So once we have picked that, we can also do time-based simulations, right? So here I have the gradient-boosted tree run across different data set. So you can see how quickly and easily you can create these scenarios. And at the end of the day, the business user can run all of these scenarios, all of these what if analysis, and be satisfied with what model he thinks or he or she thinks should go into production. And at any point, if they want to interact back with the data scientist, you may have noticed that there’s this whole username that I have over here on the top. So Insight comes built with user management which means I have a user called data scientist who I can share this scenario with. So when the data scientist logs in, they see the exact same scenario that I’m working on. And we can use this to collaborate and see why something is behaving funny or why something is looking great, for example. So in this case, I want to pick the gradient-boosted tree. I can move it back to the data scientist, give it to the data scientist, and that’s what he can take and build or put into production.
Okay. So with that, we’ve completed this step over here. So we had hundreds of models, and B.P. picked four models for me to this. And I picked one model out of that, gradient-boosted tree, right? And this is what I want to hand back to B.P so that he can build a day-to-day application for me, right? And in terms of our business process, here’s where we are right now. So we are done with the create and validate portion, and we’re moving on to the deploy portion. So in the deploy portion, the data scientist, once he has built the application, does not come into play, right? So it’s the business user who is actually using this app on a day-to-day basis. So in this case, they’re using this app on a day-to-day basis to see how customers churn. And they want to take proactive action, right? So if somebody is churning, I want to do something so that they don’t churn and we don’t lose business.
So if the task of my department is to basically reach out to these folks who are going to churn and make them not churn, 70% is my success rate, right? So not going to more details about this dashboard. But essentially, this is a view for the current state of a business in my day-to-day application wherein it’s utilizing the model but it’s in a business context that I’m seeing it. So like I said, this is for an executive or a manager. Let’s take a look at how this would look for an analyst who is working on solving churn meaning he sees that there’s churn and they’re going to take an action. So what we have for them is something like this which is a control panel. So just going to zoom in a bit. So here let’s say I’m an analyst who’s responsible for a particular branch in Texas, right? So I can pull all the customers for this particular date, get the current data, pick a particular state, pick a particular branch, choose a threshold, right, that denotes churn, and I can run this to fetch the customers and the churn, right. So here I get a list of customers some attributes of those customers and a probability more importantly over here that they’re going to churn, right? So it has a bunch of customers in here. I think I have a lot of them. But for today, for my day-to-day work for today, I’m going to work on a selected number of these. So let’s say I’m going to pick five of these guys to work on. So I select them and I can run this over here, fetch prediction reasons. Zoom back out.
And once I do that, what I have access to is a particular chart over here that basically shows me at a customer level what are the factors that are important or what are the factors that are causing these people to churn, right? So over here on the bottom, I have all of these attributes or characteristics of this particular account. And I can see that international plan has a major impact on this particular customer churning, right? If I look at someone else, this keeps on changing, right? So what this lets me do is that I can have individual treatments for these customers based on these prediction reasons, right? So let’s say for this particular customer 358 where international plan seems to be a major impact, I can go in there for that same customer and give him or her an international plan discount, right? So now, what you’re seeing here is, as a business user, I am using the model in the background to come up with a churn. But I’m using it to take business action, right? I’m going to take an action to give this person an international plan discount so that they don’t churn because the model told me that’s the important factor for causing them to churn, right? So I can go in, start giving treatments to all of these people. And then I can assign action for– or action to be taken to solve for the churn problem, right? So I can pick all of these, click an assigned action, and boom. Some downstream process happens where all of these actions are taken. And hopefully, we prevent churn, right?
Now, with Insight, like Bill mentioned earlier during his talk, was that we can do optimization within Insight as well. So here, a business user is selecting these particular options. But we could have optimization fix these options for you based on global constraints and global objectives. Say you have a budget constraint, right? You don’t want every analyst to go in and start giving out an expensive option like international plan discounts. You may have a global constraint saying, “This is the only budget you have. Do what you can with that.” And that’s where optimization comes into play, and this platform lets you do that as well where a mathematical solver is going to give you the best optimal solution so that you meet global constraints but also reduce churn. So that’s a whole nother topic we can discuss as well. But with this, I want to move it back to Bill for closing.
Thank you both, B.P and Libin. Let me steal back the screen here. Okay. So folks, we saw a lot there. And I appreciate everybody hanging in. It’s a lot to absorb. And as you can imagine, it’s covering a broad spectrum of capabilities. But just to really quickly recap, right, we started with data scientists taking a pile of data handed by his business analyst or by the business and doing some initial data prep on it and then ultimately running a number of automatically generated models based on running a number of algorithms against it automatically. Then B.P played the role of the data scientists and started to figure out which things were right, wrong, and different and ultimately narrowed down what were the right models to even consider as candidate options for ultimately being deployed. Now, typically, that kind of breaks off, and you just kind of throw it over to the business analyst to figure out if they want to use it or not. But here, we showed now an interactive capability where he was able to take those candidates, publish it to the business analyst, and, now, he was able to consume that first in more of a what if simulation scenario mode where then you saw Libin take some of those models, run through various simulations using some different data sets and ultimately do comparisons and leading him to the conclusion, “Hey, the GBT was the right model to use. So let’s now use that,” and then ultimately publish that into a final application that he and his team can use on a day-to-day basis. So a lot of powerful things going on there, and I appreciate it was a lot to absorb. But the key takeaways we want you to have from this is number one, you saw a collaboration being facilitated between these technologies in ultimately landing this exercise of model creation all the way through the application being delivered.
In essence, what you’ve done is putting data science in the hands of that business user but in a business context. Again, they don’t need to be data scientists. They need to be able to just use it in the way that they want to apply things to their business. They don’t necessarily have to understand how it works. They just want to be able to push buttons and get the proper results. And that led to the final piece of it which was I’m actually now taking actions, and I’m taking actions that are based on very intelligently refined decisions that I’m being able to make because I’m now statistically relevant. So what I do want to take a quick last poll before we do Q and A is so now that we’ve agreed on what the definition of what does it mean to deploy applications – and you’ve seen an example of kind of very robust approach – we want to get your feedback. How are you doing this today? And my guess is a lot of you have some challenges with that just given the scope and complexity of this. So please let us know how you’re doing this today. Let me give you a couple seconds here.
Okay. So please finish answering that. And just wrapping things up, I think this slide’s probably pretty obvious to most of you as data scientists based on the polling results. Pretty much this audience is 50/50 data scientists and business analysts. Not surprising. But think of how your life is being made easier by leveraging the first part of the RapidMiner session and ultimately leveraging Insight yourself. It’s not just for the business analyst. You can actually create effectively notebooks that you would use to keep results, do your own what if scenario and tests and so forth. And then obviously for the business analysts you’re now able to work with the benefits and the virtues of predictive and prescriptive analytics in a way that makes sense to you. All right. So next steps, we want you to get a hold of these technologies. We offer both these products in a free form. So you can download the Xpress stack, and you can download the RapidMiner stack from our respective websites. This presentation is going to be made available and sent to those of you that have participated today. Also supporting your journey here is to provide tutorials and documentation that’s accessible by you. Obviously, we’re going to follow up with you. But please don’t hesitate to reach back out to us. We want you to have a positive experience. We want you to ask questions and want you to leverage this technology obviously. But again, there’s a lot to consume there. So with that, I’m going to open up the floor to any incremental questions that were never being answered along the way. But I think we still have some of those come in.
Great. Thank you so much, Bill. We actually do have quite a few questions that have come in. Our first question today comes from Randy. And that question is a two-parter. Let me grab the first one here. Yeah. Actually, we’re going to go over to another question while that one’s being– it looks like he’s adding to that one. So another question here is can we check how the actions have/will affect the churn?
Good question. Libin, you want to tackle that one? Libin, if you’re talking, you’re on mute.
It looks like Libin dropped off his phone. I don’t know if you could step in?
And B.P, feel free to augment my answer here. But the net of it is there’s nothing that precludes you from doing your traditional capturing of the data that you have generated from this exercise and keeping track of that and marrying it back up with the actual results that you get. So that’s the benefit of any perfect closed loop scenario where you’re not only capturing what you’re going to mail out. So what Libin showed you there was the different potential delivery modes that in theory are the highest predictors of what they’ll respond to. And as that individual does respond, you can marry that back up with the original record and do your own correlation of did it affect the change that you were looking for or affect the response you looking for. And for what it’s worth–
–people do that and build apps around that using Insight.
Absolutely. So just to add to that, on the RapidMiner side and to the models that are being deployed, there are services. One additional step we could do is before we send out the scores over web services or write it back to database, we could also write it to any system where you want to keep track of the predictions. Eventually, two months down the line, three months down the line when you have actual, we can compare and give you a performance of how the models are behaving and so on and what did work and did not work. In today’s application, we used a gradient-boosted tree. But maybe Libin could have come to the conclusion of, “I want to do an A/B testing.” So that’s one of the scenario we can support in RapidMiner. And then Xpress Insight can be the business layer on top of it.
Thanks, B.P. And we do have Libin back on the phone. Libin, if you’re there–
Yup. I’m back.
Yeah. Yup. Nothing to add Bill and B.P’s points.
Yeah. Let’s move on to the next question.
Okay. Great. The next question is again that two-parter question. Where do the actions come from? Are they set up by the user, or do they come with the tool? And the second part is, how do the actions get allocated to any particular customer at least in the context of the demo?
Libin can tackle that one.
Yeah. I can take that one. So that’s a great question, right? So it completely depends on how you want that application to be built, right? So the way we show it in the demo is where the user actually picks the action themselves, right? So they take a look at the output from the model where the– what the churn probability is. And they have this fixed set of actions that they can take, and that’s what they choose to do for a particular customer. But the next level of this is optimization where the tool actually tells you what the optimal action should be. And at that point, you’re taking a look at this holistically where you’re not looking at one customer but you’re looking at that whole portfolio of customers that you have. And you’re taking a decision that is best for the whole portfolio. So for a particular customer, you may decide to do nothing for that customer just because, to the overall picture, it may not be good to give that person international plan discount, right? So that’s where the tool lets you do both of these things depending on where you are in the analytical maturity curve.
Great. Great. Well, thank you so much. I think we have time for one more question. Our next question is, since using Auto Model, we don’t need to adjust parameters like tree size, depth, sample rate, threads?
Yup. So one thing Auto Model does for you is if you notice there were a subset of families of algorithms. So there was a decision tree, a gradient-boosted tree, and deep learning. Behind the scenes, a lot of automation and intelligent automation is going on. So when we select a decision tree, we’re not just building a decision tree with the default parameter. We’re actually looking at the statistics and the data and coming up with some numbers so assuming you know what’s the right size of depth or what’s the right size of confidence and so on, right? So really, when I go to an Auto Model exercise, the final results are the best of decision trees, the best of gradient-boosted trees. Behind the scenes, we have tuned the parameters. We have actually done feature selection, new feature generation if, obviously, you have enabled those particular flags. So that’s why initially when I’m just clicking through, in reality, I actually subset from hundreds and thousands of models in many cases down to the best of the results of that particular algorithm. And then, obviously, we handed it over to Libin based on what numbers they found from the best results, so on. So all that automation is built into the product. One thing we did not cover is that’s what Auto Model does for you. It does a lot of automation. But it can also open up the workflows, the processes that are designed and tweak it as much as you want. And that could be replacing an algorithm with totally something new. You could bring in an R or Python script if you wish to. So the idea is the Auto Model gets you the initial winning sketch, gets you 80, 90 percent close to the goal post, and then the final 10 percent is again coming back from Libin, “Hey, let’s use this threshold or maybe I need more detail in the tree and so on, right?” So that’s the last mile that the RapidMiner platform can provide with the editing capabilities we have there.
Okay. Well, thank you so much for those responses.
Can I answer one last question? But I know you’re going to cut us for the time. But I think it’s a very good general question. The one Jeremy asked, the question about, would the same type of process you demonstrated for churn work for something like forecasting, parametric management. And the answer is yes. I want folks to understand nothing today– the framework itself, there’s nothing vertical or horizontal specific about it whether you’re using RapidMiner for model creation and/or the Insight sessions for the application development. The sky’s the limit. They’re frameworks. They’re platforms. You can make it as vertical and specific as you so choose. So this is more about techniques and capabilities. Now, there are starters in there will giving you ideas and hints on which which models make most sense for what particular use cases. But you’re not constrained. They’re guidance. So hopefully, that applies to everybody here that there’s nothing specific about this. Apologies for cutting you off. I know you got to wrap us up.
Nope. Thank you so much, Bill, for answering that last question. I think that’s great. Well, with that we’ll go ahead and close our webinar for today. Again, thank you to our speakers. Thanks, everybody, for joining in for your questions and participation. We will go ahead and wrap things up. I do have a last poll up on the screen there. Please go ahead and fill that out if you’d like someone to follow up with you. Just remember to click on that submit button to get your response in there. All right. Thanks again, everybody, and have a great rest of your day.
Thank you folks for making the time.