In a world where the amount of available data is increasing at lightning speed, using machine learning and data science is the most natural way for improving most analytical tasks and moving to much more flexible solutions compared to traditional heuristics and business rules.
When tasked with developing a predictive model, your objective is most likely to help solve a specific business problem. This can require not only having to cooperate with developers and other data scientists on a pathway for delivering the technical solution itself, but also convincing your boss and stakeholders of the necessity for using applied data science methods. At this point, not only do you need to master applied methods and algorithms, but also understand how to align them with real business needs.
In this webinar, Vladimir Mikhnovich will help you overcome this by discussing:
- Key differences between communicating results to fellow data scientists, analysts and stakeholders
- Why you shouldn’t aim for building too complex models in favor of simple yet understandable ones
- How to use proper business metrics and cost-sensitive approach for evaluating models performance and fine-tuning them
00:05 Hi, everyone, and thank you for joining today’s webinar, Selling Data Science: Unboxing the Black Box. My name’s Ryan Johansen. I’m a customer success manager here at RapidMiner, and I’ll be moderating today’s webinar. I’m joined today by Vladimir Mikhnovich, data scientist at the Friendly Finance Group. Vladimir is a certified RapidMiner analyst and an expert data scientist and a thought leader in the fraud analytics space. As the customer success manager here at RapidMiner, I’ve worked with over 100 customers on getting their data science projects up and running with RapidMiner. And one of the biggest issues some of our customers are facing and kept talking to us about was, how do I sell these solutions to my business and help them trust my models? So with that said, we reached out to one of our biggest evangelists and brightest data scientist, Vladimir, to cover this very topic and share his wisdom on the subject with us today.
00:59 We’ll just get started in a few minutes, but first, we have a few quick housekeeping items on the line. So today’s webinar is being recorded and you’ll receive a link to the OnDemand version via email in about one to two business days. You’re free to share that link with any interested colleagues who are not able to attend today’s session. Second, if you have any trouble with the audio or video, your best bet is to try logging out and logging back in. That tends to resolve the issue in most cases. Finally, we’re going to have a Q&A session at the end of today’s presentation. Please feel free to ask questions at any time via the questions panel on the right hand side. And we’ll leave time at the end to get to everyone’s question. With that out of the way, thank you, Vladimir. We’re excited to have you and take it away.
01:49 Thanks, Ryan, for introducing me. And hello, everyone, and thank you for attending our webinar today. So we’ll start. And here are key aspects we’re going to discuss today: the question of model’s complexity, some communicating issues, and why business metrics matter and how to map performance metrics into business metrics. Also, during the webinar, I will show you a use case of a simple technique, how to align performance with business metrics just by using standard RapidMiner toolset. And we will go through this use case taken from real life experience.
02:44 This is one of the most well-known standards for data mining, which is CRISP-DM. And here on the diagram, it starts from what is called business understanding. But the diagram suggests that it is actually a cyclic process. So basically, it also ends up there at business understanding point. And there might be different schemes that offer slightly different order of steps, but anyway, the common thing is that there is always some kind of underlying business problem there. In other words, you usually tailor your data science solution to real life problem, but not abstract metric. So why simple models are good and why simplicity matter.
03:48 This is a well-known technical metric that, first of all, the more complex the model is, the more tends it to adapt to training data and the less it is able to generalize on testing data. And this is pretty well known. But then, it comes also the question of understanding how we can benefit from the model being really simple. So from a common sense point of view, not only simple models tend to show most able performance and also more robust results, but also, simple models are actually much easier to build trust in the proposed data, the science method.
04:47 So simple models make, actually, much more clear how the data is used, what data is used, and how the predictions the model gives are actually aligned with the data. Once this is made clear, you can actually move towards more complex models, but the simple things are a good place to start. And also, there is one good real-life method to test for simplicity. I guess you all know that just try to explain this to children or just draw the scheme. This is just a couple of examples. It’s pretty straightforward. To the left, we see a diagram showing a decision tree, and to the right is a simple neural network.
05:45 And both algorithms, actually here, implement some kind of reasoning, which means just decision-making process. But the algorithm to the left is much more plain and clear and self-explaining. And this is just an illustration of why simple algorithms are really more useful, at least at the beginning. Okay. So how do we talk to different parties in the process? How do we communicate the results? With our fellow data scientists, we can and we actually do speak the same language. And this is where we master our technical knowledge and we master our applied algorithms.
06:44 The next part is communicating with analysts and to analysts. We, actually, explaining how to interpret the results taken from the model because analysts are usually– they are mostly users of those results. But the most important thing here is communicating results to business owners and stakeholders to whom we actually should explain the business impact, first of all, and no matter what algorithm we have actually chosen. So, for example, these money flows shown on the slide is something we could actually show to a stakeholder to prove the viability and usefulness of the methods. And later on, the use case, I will explain how we can actually do that.
07:45 And for now, a few rules of thumb about how to actually sell data science solutions. Rule number one, you should actually sell results. Sometimes it is really important to make a good introduction to existing machine learning methods and techniques, but also a good prototype working on the real data will actually get or speak for itself and prove the viability of the method. So if you are able to produce some measurable results, this is where you’re going to benefit from. The second rule, you should always speak same language with the business.
08:44 First of all, you have to learn the way of mapping performance metrics to business related figures. And we’ll go through this during our use case. From a technical point of view, you can measure performance of your models using pretty much different metrics. But you should always remember that if you tune your model and it shows some gain in, for example, every year on the curve or the recall metric, it does not automatically mean the gain in revenue it will bring to the business. The third rule, the comparison is very important. The A/B testing method is one of the most widely used, for example, in marketing, but it’s also pretty applicable in other domains.
09:48 But for data science processes for machine learning models, it is always useful also to consider some kind of order case and compare what would happen if we had no model at all, if we had nothing instead of the proposed solution. And put these cases side by side and see how much really we could gain from implementing the proposed machine learning solution. Few more rules to consider. Of course, one cannot expect any manager to have very deep expertise in data science. And on the other hand, we also cannot expect a data scientist to have a deep expertise in business solutions. However, we actually expect a collaboration between them.
10:43 So you should never be shy and you should actually cooperate with those who might understand business needs and processes better and share the knowledge. So it’s a good practice to turn and ask for help from financial analysts, business analysts, risk analysts, those who can actually provide help in providing the correct numbers. And another important thing is that you should always find examples of best practices used in the industry. And sometimes you can even use some kind of sucker punch and showcase the example of competitive companies from the same industry. And this might also be good just because the more you find out about how other companies do that, especially in the same domain and in the same industry, the more confidence you might become in your own methods. And this actually has been proved many times in real life situations.
12:00 So that’s enough for the rules. And now, let’s consider the use case, which has been taken from real life. The use case about implementing fraud detection machine learning solution and understanding its impact on the business. So the objective is to build the model, which is going to detect fraudulent transactions online and also to evaluate its business value, which is based on the performance metrics. This is the snapshot taken from the real data set.
12:45 And we are considering a merchant with some 12,000 transactions. And this merchant has around 1% of fraud level, which is pretty high by the way. So we actually need to put it well under control. And as you see on the picture, we have different kinds of data like bank data, cards data, geographic data for customers, also information about the internet service provider, some transactional data like sums of the transaction, etc. So the traditional validation method when building predictive models, and this one is actually used pretty often, is like this, we take a fraction of data from a certain period.
13:44 Let’s, for example, take the data starting from January till June, which makes half a year. And the traditional rotation method suggests that we use the same time period both for training and validation. This way we just split the data and use, let’s say, 80% of the data for model training and validation and 20% of the data we use as a holdout whole data set, but both sets are actually taken from the same time period. There is more practical approach to test. And let’s call it, what if we already deployed the model at some period of time? And basically, this method would show you the results which are more close to real life situation.
14:42 In this method, we are taking the same half year time period, but we actually take 100% of five months data, which still is used for model training, tuning, and validation. And then, we are taking another month. In our case, it’s data taken from June. And we use 100% of this one month data for testing the model. So this way we actually, let’s say, pretending that we have deployed the solution starting June and we are actually able to see how would this model perform on our actual data if we actually deployed it starting from June. I will actually skip the model-building phase. I suppose you are all good at it.
15:44 So this is the resulting confusion matrix. It looks pretty good and we see some good numbers in recall and precision. But the question is, what are there behind these numbers and are these numbers reflecting business metrics well or maybe not so well? So as I promised, here is a little trick which can be used to map the performance into the business value. So literally, if we speak about the payment transaction, the real cost of this transaction is proportional to its monetary value. So the trick is that we take the sum of the transaction and actually turn this value into the weight of each example.
16:44 So each example is assigned a weight which is equal to sum of the transaction. So how it would affect our confusion matrix then? Now, it looks pretty much different. So this is, basically, the same model and the same confusion matrix, but this now takes into account weights, and those weights are different for each example. Again, the question, how will it reflect the actual business metrics? And our numbers in the bottom of the slide are actual numbers which reflect the business situation. Okay. So now, I have to explain a bit more details on the business model. So when you are a merchant which sell something online, how much do you actually profit? How much do you actually earn?
17:45 Usually, you put into your pocket much less that you sell for. So let’s say that if you sell for 100, you actually put into your pocket only 10. And this is a very good number. And most of the online versions have just around 3% of profit margin. So for our situation, let’s see, let’s say that profit margin is 10% of your revenue because the rest is the cost of goods. You’re selling your payrolls, your advertisement budget, etc. From the other hand, how much do you actually lose due to fraudulent activity? As soon as you’re a merchant, you’re 100% liable for all fraudulent transactions you’re facing. So if you face a fraud, which equals to 100, you actually lose this 100.
18:44 But in many cases and usually in most of the cases, you actually already shipped the goods. And this means that you also have lost those goods. So far, let’s say, if you have fraud which is equals 100, you lose minimum this 100 plus cost of goods. So far, you lose 140. And let’s finally make these assumptions. If you have average cost of goods which equals 40%, so you lose 140% from each fraudulent case. And if you have the profit margin equals 10%, so you net earn only 10% from your total revenue, this means that to make your numbers more precise, you have to make these sorts of adjustments. So all the numbers related to fraud, you have to multiply by 140% and all the numbers related to your net income, you have to multiply by just 10%.
19:54 And now, the final version of the confusion matrix, which is really looking very different. So the numbers you see in the matrix actually telling us real business impact of implementing this model. And you see that after implementing the model and testing it on the payments data from June, you actually was able to save around 58,000 because you detected fraudulent activity. Also, you have lost around 26,000 because you didn’t detect this part of fraud, so those are false negatives. You lost some small amount because of false positives and you blocked some transactions which were not fraudulent.
20:49 And you have some certain figure for your net profit. So now, that’s comparison time. What if you had no model at all. And this is actually pretty easy to simulate because in RapidMiner terms, no model means you can just test the same data set on a default model, which always predicts no fraud in 100% of the cases. And as we see from this confusion matrix, it always predicts no fraud and the total fraud loss is actually much higher. So that’s the final numbers we can put, say, into this single table.
21:43 And given the certain number of total revenue from the test period, which is one month, actually applying data science methods saved us around 58,000 per month, which actually makes on average around 690,000 per year. So did we sell the model this way? Well, I would say yes. And let us get back to the money plot we are actually going to present to stakeholders to prove the viability and use and, let’s say, the benefit from deploying the model we developed. So again, using our model for prediction allowed us to save around 58,000 per month. So you see the numbers. And these numbers actually speak to stakeholders much better than any metric you might be using for yourself like recall-precision or area under curve. So this is it from my side. And I think now it is time for questions and answers.
23:05 Thank you very much, Vladimir. As a quick reminder, we’ll be sending the recording of today’s presentation within the next couple of days via email. Now, it’s time to get to some audience questions. If there’s anything specific that you’re dealing with in your role, we’d love to hear while we have an expert on the line to help out. The first question we’ve gotten so far is, how does this value translate to regression, Vladimir?
23:32 Basically, if we speak about the regression, so it’s a different algorithm which deals with continuous variables. But in reality, the calculations would be the same, because, for example, using the regression, you can predict sales amounts or something like that, but as soon as the metric you’re predicting can be mapped to real monetary figures, the calculation is basically just the same as I explained. So the monetary value, the cost of the predictive value is actually important. But the type of the value, whether it’s discrete or continuous, does not matter much in this case.
24:31 Thank you, Vladimir. Next question coming is, how did you get to the 140%?
24:38 Oh, 140% was just an example. And it really depends on the business. So it depends whether you, as a merchant, is selling physical goods or you’re selling some digital stuff. So this 40% might be as well like 110 or 190. So it was just an example. It is very dependent on the type of the business you actually do.
25:08 Awesome. So another interesting question, if a simple model is not good enough, how do you sell a complicated model, i.e. deep learning?
25:20 Good question. So far, as soon as you already succeeded in providing good and viable results with simple model, you can actually move on towards more complicated model. And you are the owner of the figures, so as soon as you can improve the monetary value starting from simple model and gradually moving to more complex models, then this means that your approach is viable and you still you’re able to sell more complex solutions. So, as I already mentioned, business related figures are what is matter actually in this case.
26:19 Excellent. Thank you. Next one’s a very good question. Is it better to have the data science function embedded within the business division or to have it as an independent entity?
26:33 I would say we actually expect the collaboration between data science and business, but technically, those are pretty much different departments. For me, it’s good practice that if you’re part of data science team, you still have to understand the business to understand what is going on in your company in terms of the business models. So it’s a matter of constant communication between them. But data scientists, at the end of the day, are doing some technical stuff and business people are actually making money. So I would put it this way.
27:22 Okay. Thank you. So next question is– hold on one second. How do you try and explain the uncertainty of prediction intervals to business stakeholders?
27:41 Let’s say that our– most of the classification models and classification algorithms, of course, provide us with some confidence parameter. And the final decision, the final result, of course, is not just the answer like yes or no, whether there’s transaction fraudulent or not fraudulent. So you have to actually tune your model and you have to tune it, keeping in mind the business model related stuff. And in most of the cases, you are actually working on setting up some kind of threshold, which actually leads you to the best results in terms of the revenue. So, for example, in my model and in my use case, I didn’t show that in the presentation, but there is actually a threshold. So we count as fraudulent transactions only those with a very high threshold, otherwise we would block too much. And this is another part of the data science approach, finding the right threshold and finding the right balance between how much you can block, how much you can get through like this.
29:16 Awesome. How do you take into account seasonal drifts in your training data since you only use five months of data?
29:24 Well, again, if you understand that your business model is actually prone to some seasonality, the correct thing here would be is just to introduce the seasonality factor into your model. So, again, taking into account– let’s say, taking only June as an example month for testing is pretty much simplified. So in reality, if you know that seasonality factor is there, you should put it into the model in the form of this or that parameter. So basically, in real case, it would be a bit more complicated.
30:14 Okay. Thank you, Vladimir. Another great question. Once the model is built, data distribution used to train the model can change, so the model should be updated or should be maintained. Should this be pointed out from the start?
30:31 I would say yes. And, again, it depends on the underlying business model, but good approach, good practice is that you should keep in mind that no model stays there forever in initial form. For example, if we speak about payment fraud, the fraud patterns are changing pretty fast and you may face the situation that the model you have built, let’s say, a few months ago is totally outdated in a few months’ time just because old fraudulent patterns have changed. So, again, depending on the specific of your business, of course, the models should be either updated or fully repealed, but how often, you have to decide together with the business. I would say the life cycle of fraud detection model might be one to three months in general. So, yeah, the answer is yes and most of the models should be updated.
31:52 Okay. Great. Next one that comes in is, do you build a model to optimize the business results or is it a byproduct of the prediction?
32:05 Can you repeat, please?
32:10 Next one is, do you build a business– do you build a model to optimize the business results or is it a byproduct of the prediction?
32:22 Yeah. I’d rather say that I, initially, built the model to get some good performance metrics, and this is the very first stage. So I have to make sure that the data I’m working with and predictors I’m working with are actually viable and the metrics I’m trying to predict are actually predictable. And this is the first stage. But then, the next stage is actually optimizing the model to meet the business needs. So it’s, let’s say, half and half actually, at least in real life problems.
33:10 Okay. Great. Very good. And how long did your simple model take to develop from concept to delivery?
33:19 Good question. It all depends on the quality. First of all, it depends on the quality and the scale of data you actually have. So there are cases where you can get a first working prototype of the model built and deployed within, let’s say, a couple of weeks. And this is pretty easy case. Sometimes it might take just a few months to put all the data to work. And this is a hard case. So it really depends on what data you’re working with. But I should say that the well-known metric that says that a data scientist spends around 80% of the time cleaning and preparing the data is actually very true. And this is very important to go through this and to overcome it, because otherwise you may end up just hating data science, which is not the objective here. But, yeah, getting data clean is a very big part of the work, actually.
34:47 Okay. And one last question. You mentioned sort of in the beginning that you want to make some allies within the business, what’s the best practice to get other stakeholders involved? Is it early in the process and probably how’s the best point to do that in your experience?
35:06 Well, as I mentioned, stakeholders are actually like very much some real figures. So the objective here for you, if you want to sell the solution, is how do you get the figures? And preferably, you should get those figures correct, because I suppose that most of business owners know about the business model a bit more than just a data scientist. And this way you have to cooperate with those people in your organization which actually work closely with numbers. And as I mentioned already, those can be risk analysts or financial analysts. And I faced situations where I was a bit stuck in actually evaluating real business impact from the deployed solution. So I went to, let’s say, a risk manager and asked to help me to evaluate the solution from their point of view because they are a bit better with the numbers. And as soon as you get those numbers, you present them to stakeholders.
36:36 Okay. That’s great. It looks like we’re just at about time here. Vladimir, thank you so much for your time. This was a very great webinar.
36:43 Thanks a lot.
36:44 If we weren’t able to– yeah. If we weren’t able to address anyone’s question on the line, we’ll make sure to follow up via email within the next few days. Thanks again, everyone, for joining us as well for the presentation. Have a great day and thank you so much.
36:57 Okay. Thank you very much and goodbye.