Looking to determine the business value of your models? Try ROC charts.
In the video below, you’ll learn more about ROC (receiver operating characteristic) curves and lift charts. These popular visual approaches for comparing model qualities focus on how to determine the business value of your models.
Okay. Are you ready for number one? That makes zero sense. Come over here. Let’s explain something. We will talk about ROC curves and lift charts in this little session here. And they could become really a great tool if they aren’t already, but many people find them a little bit confusing. So let’s start on the ROC curves first. And actually, it’s kind of easy to create a curve like this for the model you create. You put the two positive rates on this axis and the false-positive rate on that one. True positive means where’s the percentage of your data points, and what will correctly identify it as a positive data points? And false-positive rates is kind of like the false alarms. Out of the negative ones for binary classified, how many of them have been falsely identified as positives? And how did it create all those lines here? Well, you just take your data set where you actually know the outcome, false extras here, and then you take your models and apply those models on this data set, create the predictions, and you also will get specifically a confidence level for all those predictions. And you sought all your predictions according to this confidence level for the false positive. And then you start because you’re going line by line, starting the first line here, and say like, okay, you start here at the bottom left corner. And if it’s actually a positive and you correctly say it’s a positive, that means this adds to your true positive rates or you can go one step up. Same for the next row here, one step up. But here now, this is actually a negative. And we say it’s a positive so that’s a false alarm so we go one step to the right. And you do this for all your data points and then a curve like this will appear. And now, how can you use those curves to actually comparing models? First of all, the perfect model will be something that is here in the top left corner. It looks like this. And you will never see this because, frankly, that will be very suspicious. Most just never are that perfect. But you will get something like this. It should be above this green line which would be a model that’s just randomly guessing. And now we can, for example, say this red model of the here, this curve, is actually less good than the purple one because it’s below the other curves across the board. But for the orange and the purple one, that’s no less clear. Which one is actually better? And that actually depends on what we call the classification costs. So sometimes there’s different costs for the type of errors you could make. And I’m not going into the details, but it comes down to depending on the cost, you can actually find a linear curve like this one here, and the slope will be defined depending on those costs. And then you try to find attainment which is S marching to the top as possible, and then let’s say the slope looks like this. Now, the purple one will be better and if it looks like this, now the orange one will be better. But here’s the thing. Not even I find those really intuitive, and I’ve been doing this for like 15, 20 years now, and most people I know don’t like this either.
So ROC curves are great for visual comparison for like ruling out models just like the red ones. You can actually even calculate something in the area of the curve which is the area under the curve, and that gives you one simple number. It’s a bit independent of the actual classification costs at the end. So those are the positive sides, but it’s kind of hard to connect the ROC curve to business value, and I personally find also a little bit harder than necessary to find good candidates for threshold – this isn’t thresholds – based on this curve alone. Again, it’s possible. It’s just not very intuitive. So good comparison tool, but some other drawings. What are the alternatives? Okay. Write number one, write number two, please. One alternative I actually like to use is the so-called lift charts. Here, you start with the same table. Remember with the confidence levels here, and now we actually put those rows here into let’s say five buckets. The top 20%, then the next 20%, and so on. And for each bucket, obviously, the last row has some confidence value, and those will become your decision thresholds at the end. So the first bucket are all the top 20% of the rows with a confidence value of let’s say of higher than .8. That depends on your model and your data. So now you could look at those 20% of your data points, and you can actually count or calculate how many of your positive cases have been covered by this bucket already? And typically this is higher if your model is doing a decent job. So it’s not going to be 20%, but let’s say in this example 45%. So that means I would only need to look into 20% of my data, but I would cover already 45% of my positive class which is good, more than 20% which we’ll be random. And a quick look into the top 40% I could get to 80% of a positive class. And here’s the thing. Now, I can directly connect to business values. So let’s, for example, say this model should help you for making targeted campaigns for predictive lead scoring or something, marked campaigns, something of that nature. Let’s also say that you know your total campaign reaching out to all the people would cost you $1 million. But you almost based on your conversion rates and everything else would estimate even if all people who could theoretically buy, would buy, they would deliver also a revenue of $1 million. So nobody with more than two brain cells would ever do that. You don’t spend one million to get one million. I mean, that doesn’t make any sense. But what if you would actually only focus on the top 40% based on the model confidences because now you’re only paying 40% of one million which is 400,000. But you get 80% of the revenue which is 800,000. So now, all of a sudden, this is a good deal. And this is what I mean by lift charts, in my opinion, match easier to connect actually so you can connect with model and the predictions to the actual business value.
Hi, Dan, thanks for stopping by. And also it’s kind of like easy to find a decent threshold value because you just look at the last row for each packet. And you know what is the cut-off point here? Now, the only thing for that ROC chart’s a little bit better it’s a little bit harder to compare different models with lift charts. I wouldn’t actually recommend to do this in the first place. So if you really would like to connect this to the value– business value, find the thresholds, go with lift charts. Otherwise, why put number one here? Has the ROC curves for you for model comparison you would see similar calculation. Both are great tools. Both should be used. Have fun with them, and thanks for today.
Don’t just make the best data science decision, make the best business decision. Learn how to create a confusion matrix and better understand your model’s results.
Get a complimentary copy of the 2020 Forrester Wave: Multimodal Predictive Analytics And Machine Learning Solutions