Assessing risk using machine learning to hedge effectively
Presented by Brian Meagher, Vice President, Analytics, Shorelight Education
Shorelight brings together universities and international students with the goal to help educate the world. In just five short years, Shorelight has grown to enroll more than 10,000 students from 120 countries, all while maintaining a first year completion rate of 90%.
There are 1.1 million international students studying in America today, but many of them cannot visit these universities before choosing where they decide to go. Since it’s such a large purchase decision, Shorelight deals with both sides of this problem: They help advertise their university partners to all the students across the globe and help educate the students about the universities to make an informed decision (leading to a higher conversion rate).
The Problem? Even with all this help, there is a chance that the student’s visa request gets rejected. Since this would be a massive loss for Shorelight, they need to build out a model to see what kind of students are at most risk for rejection and help strengthen their profiles.
The Solution? Shorelight used RapidMiner to create a workflow, automate the series of questions traditionally asked by the visa guides, and create a risk profile. Through this, they were able to drive $1.5 million in income to these universities. This all leads to a greater increase in operational predictability and growth in total profitable customers.
Watch the full video below to learn how Shorelight uses data science to help fuel growth and student success, particularly how to convert the right students to the right universities.
GET THE SLIDES
00:03 Hello, everyone. As Scott said, I’m Brian Meagher, vice president of analytics at Shorelight. And this is?
00:10 Muddasir. I’ve been up here on the stage just a few minutes back. Thank you.
00:15 So we’re going to talk about a few things. But first, I do want to ask a couple of questions. Anyone in the audience study at a university outside their home country? How many? One, two, three, four, yeah, five, six. Great. And so everyone else presumably studied at a university inside their home country. So for those that studied in their home country, raise your hand if you got a chance to actually visit the university before you actually enrolled. Got it. Okay, so it’s good to know. I just wanted to see kind of what– I’m seeing and meeting a lot of international folks here. So this would be pretty relevant. So we’re going to talk about a few things. Actually, I’ll first skip this and go straight to objectives. So what we want to give in this session is an understanding of what Shorelight is and how we help universities grow their international population and help students succeed. Then, walk through a specific use case that is kind of fresh, just happened in January, where we were able to, with the help of analytics team and Muddasir and the Anblicks team, drive an incremental 1.5 million in tuition bookings to Shorelight universities. And then the first is talk through some of the challenges specific to Shorelight but not necessarily unique to Shorelight. Hopefully, we’ll talk through some of those things. But first, I’ll quickly say what Shorelight is. So at Shorelight, we partner with US universities to recruit, enroll, and teach international students. So we have about 21 university partners today. We are about five years old as a company. And so we’ll walk through a little bit of what I mean by that.
02:08 So these are three of our students, real students. I’ve changed their names to be anonymous but putting a face to our customers, so to speak. So Katie is from Vietnam. She’s studying at University of South Carolina. Yang is from China. He’s studying at Auburn University. And Adella is from Omaha, and she’s studying at Florida International University. So these three students are three of 1.1 million international students who are studying in the US today. So what you’re looking at is actually every one of their home cities, where they come from. The size of the circles is relative to the volume of students from that city who are currently studying in the US. So the US currently enrolls the most international students of any country around the world. It is widely considered to have the largest volumes of high-quality universities that we have to offer in the world. There’s tons of very good, high-quality universities around the world, but the US is uniquely positioned to have many, many high-quality universities. And for international students, it is one of, if not, the top destination of choice to study outside their home country if they choose to. So these three students have an issue. They actually have to choose between these thousands of options and they have to do that sight unseen. They don’t visit the university. They actually don’t even know what the difference is between– I’m from Kansas City, Missouri. They don’t know the difference between Kansas, Missouri. New York, maybe they know. They certainly know what we talked to– and they certainly know Stanford, MIT, Harvard, and then there’s the rest. And so when we talk to them about their options and what all of these different universities have to offer and specifically our 20-plus university partners, it is a long road.
04:09 Another challenge that they have is imagine trying to buy a house without actually visiting the house. So they’re paying 40, 50, 60 grand a year in tuition and housing and whatnot. And so over the course of their degree, they’re spending $200,000 on something that they would never have visited before they make that purchase. So this is an intimate and an intense purchase decision for them. So what we do is we try to help them with that process. Another good example of a challenge in that process, so about 60,000 Chinese students take the SAT every year. Well, the SAT is not allowed in China. So what do they do? They fly to Hong Kong multiple times a year to take multiple SAT exams. So these are just a few examples of some of the challenges that they have to face. So for Shorelight, we help partner with these universities to solve this two-sided problem. So on one side for the university, they have this globally distributed customer base to try to reach. Certainly, there hot spots of Beijing, Shanghai, Mumbai, where a lot of students originate but lots of places to go. And on the flip side for universities, they don’t know which is the best option for them. Certainly, a lot of them do apply to Harvard University for which they will get denied. I think Harvard enrolled like 30 Chinese students last year. But on the flip side of that, once they do enroll, we find that actually, they’re not being successful. So this stat on the far right, 63% is the average first-year retention rate of an international student at US institutions. So that means you fly all this way, come to the university, and more than 30% of your international students actually don’t stay. So you spent all that money to try to convert them and try to get them to stay. So what we do is we partner with the universities to say, “We’ve got your back. So we are out in the world representing you. We can recruit for you and enroll students at your universities.” And we also help with the first-year experience. So we’re on campus and we designed a first-year curriculum specifically for them.
06:27 So these are our universities. So we’ve got universities all around the US from very, very high ranking universities to unranked universities but a lot of different choices. And we have representatives all around the world who have in-country and local knowledge of what’s going on and how to actually talk to parents and students about making the right choice. One other thing that’s going to come up in this presentation that I actually want to highlight is we’re a young company. And a lot of the things that we’re doing are like the first time ever we’re doing them. And I wanted to point this out because it’s an important point when we talk about analytics use cases where a lot of these people and all the functional leaders in our organization, it will be the first time they’re dealing with how to measure things. We’re literally building KPIs on that third graph in 2016 around what works, what doesn’t. So a lot of the things that we do work one year and the next year or two years down the line, actually, don’t work because we’re growing at a rapid pace. And so that was something I wanted to point out and will come up later in the presentation.
07:44 So around the funnel– so I’ve simplified it here from when a student starts to first consider to study abroad to when they actually arrive on campus. And depending on the student, depending on the profile, depending on the degree category, if they’re an undergrad or postgraduate student, it can vary by length. But for today’s use case, we’re going to concentrate on one part of the funnel for when they have an offer in hand and then they say, “Okay, I want to take up that offer. Here’s my deposit. And so I’m going to come.” So then they then start the process of getting ready to come to the US, getting prepared to get a visa. So this visa thing is also another tricky issue. To enter the country, we all need a visa to enter. So they enter on a student visa. So this is a part of the funnel for which we have zero control, theoretically. A student walks up into a consulate office in Beijing, has about two or three minutes to kind of prove their case, so to speak, that they’re a student and they’re going to be studying in one of these universities in the US. And the consular officer has complete and utter control at his or her discretion to approve or deny that visa. And they do not have to give any reasons why. So these students who, you imagine, a 17 or 18 year old, who speaks English but it’s not their first language, goes up to an office and they present their case. They have all their paperwork, they have all the proof, they have all the documentation from the university that they’re supposed to come, and the consular officer actually makes that decision for them.
09:28 So we actually have a team, the marketing and recruitment team. Once the students get to this part of the funnel, handed off to our services team. Who kind of from a concierge’s perspective, handhold the student through that process and the parents through that process so that they know everything to prepare for the visa, interview, to prepare for arrival, know what day they can come, how to be prepared when they get there. “You’re going to fly into Atlanta. Then you’re going to have a shuttle that’s going to pick you up. There’s going to be someone with their name on a board. And you’re going to drive an hour to Auburn, Alabama. And then you’re going to be greeted by these people.” So we do a lot of work to prepare them to arrive. The January intake is kind of an off intake. Most students arrive in August for the fall intake, but we do have a January intake. So that’s what we’re going to talk about today. So these are real numbers. So these are really exactly what happened a few weeks ago. We had 1,280-plus students say, “I want to come to one of your universities.” And they gave us 2 or 3 thousand dollars to kind of prove that they want to come. And then they start the process to try to get a visa. Typically, over the years, we’ve had a lot of melts, as we call it, from the deposit stage to the visa stage. So students who have very similar profiles walk up to a window, one gets approved and one gets denied. And the consular officer doesn’t have anything to say about reasons why they did or did not. So our expectation is that once we get this many students to actually agree that they’re coming and give us a deposit, then, typically, what happens, 60 to 64 percent of them actually can get the visa and arrive.
11:19 So that’s where the use case comes in. So for us, actually calculating ROI on that. So every one-point improvement in that, at least for this intake, is 13 students. And each of those 13 pays on average– it varies but pays on average about $22,000 for a year’s worth of tuition. So it’s very valuable for the students who wants to come and is a totally legitimate student to come to actually help them through that process. So I’m going to kind of give away the ending here. We actually did improve that for the January intake by five points. So we got that many more incremental students than we expected to arrive. And all things considered, there’s a lot of variables in this. And I get that it can be a little hard to say this is literally the exact same thing that actually drove the increase, but it was one of the things that we did. So we’re going to talk about that today.
12:22 Do you want to talk about the problem, or should I take that?
12:26 Oh, yeah, you can take it.
12:27 So as you’ve seen that they have a problem of melt. A student gives out a deposit, and then there’s the visa stage, which is highly uncertain, and then the student does not end up in the US. So that is a huge loss for Shorelight. They are losing a lot of money because of that because they’ve worked on the student for a long time, and then just because of visa, they’re not ending up in the US. So that’s a bummer. So that’s still kind of a problem when you have what you want to do. And it’s kind of just started. I think we just thought about it that maybe we should do this. And we started like, “What if we have some way of prioritizing students who are at risk of getting a visa rejection?” And that is how we kind of did some whiteboarding and came up with, “Yeah, let’s do visa prediction.” And that was really new for both of us, and that’s how we started building it out. So if I were to– I wanted to perfectly define with this slide what we are trying to solve. So the problem of Shorelight is that we want to predict the students who are at risk of getting a visa rejection so that the enrollment services team, the team that provides the services, they kind of have an actionable outcome and do something about the melt. So the first thing that they would want to do is prioritize the high-risk students in their workflow. So that is kind of a problem of efficiency. So they are prioritizing students and they’re saving time. And the second one is work with the candidates to strengthen his or her profile. So that is about saving money. I mean, they want to strengthen this profile and make sure that he or she gets the visa. And they do that with the preparation sessions, something that this services team do.
14:02 So when we started off with this problem, this was one of the most unique problems we’ve ever seen. The first step that we did was sit down with the stakeholder, someone who is the head of enrollment services from Shorelight. And then we started talking about– we asked the basic question, like Engle was mentioning this morning, about having common sense. So we started with, “What do you do?” So we had the most basic questions, “What do you do? How do you do it? What does your workflow look like? What kind of problems are you facing in your funnel? What is your everyday job, or the action items that you do?” And then we were able to visualize the enrollment services workflow. So this is how it looked like. It was a basic first in, first out kind of a workflow wherein there was no matter of prioritizing. And kind of one point, Brian, I think, that we wanted to mention is that enrollment services teams’ spread out across the globe, and we have really low proportion enrollment services versus the number of students they are catering. It’s like 8:1,000. So that’s a lot of number– that’s a lot of students. So if I were to take you guys quickly through the workflow, so first they would get an alert that someone has deposited. Then after that, they would have to get on a discovery call to understand the student profile. And that takes about 20 to 30 minutes on average for a student. And then they do a follow up with them about the documents based on the conversation they had in the call. And then also based on that, they understand that, “Okay, this profile looks kind of risky. We need to have more conversations with them, and ultimately, plan visa prep sessions.” And that is how they come up with classes, classroom sessions for us to understand the student, how to help him answer more questions, make him more confident because he really has 30 seconds of time when he approaches the visa consular. So–
15:53 Yeah, sometimes it’s very short.
15:55 It’s a very short time to prove yourself. Then finally, the student goes up for the visa interview. He gets either accepted or rejected. But instead of diving right into the machine learning aspect of it, we took a more holistic approach to it. We were like, “We want–” we identified opportunities to improve efficiency. So we’re here. For example, right off the alert and before the discovery call, we got to understand that there’s a fixed set of questions that they’re asking everyone. And we back it up with the data that we already have in the historicals and we also confirmed it with the EST members. And they are like, “Yeah, these are about the same questions that we ask.” So we’re like, “Why not automate it? Why not send out an email blast and make it like a marketing campaign so that they can respond and then you get the information beforehand, before you even get on the discovery course? They’ll have more information about the student.” And so that’s one of the things that we suggested. And then, ultimately, we wanted to have, as you started, the machine learning solution so that we can prioritize the students based on his risk profile. But then the question was whether we use the machine-learning solution in the funnel because if you move it towards the end, the impact kind of decreases, but if you move it up in the funnel, the impact kind of increases. So what if you have the information, the entire profile of the student through an automated assessment automation and then you also have the information about the risk profiles? And then you can decide, “Do I even want to do the discovery call? I might not because he looks like a nice student. I don’t think he’s a risk profile right now.” So that’s the kind of holistic approach we took up with Shorelight for this particular use case. And then we added all of these. So we had a few more suggestions when it came to workflow and efficiencies, opportunities.
17:41 Yeah, yeah, I would just, sorry, interrupt you–
17:43 Yeah, that’s one of the challenges that we face. And I’m going to come back in the later part of the presentation. But yeah, so that is what we did. And–
17:53 Well, I’d say, Muddasir and I have been working together for three years. And we’ve done a lot of use cases at Shorelight. And one of the things that we always say is add value no matter what. You don’t have to– so my team, we have a list of use cases that we work on for the year. They are bigger projects. And then we break those down into smaller pieces. We have an agile workflow. We consider ourselves analytics product managers. We build analytics tools to add value for the team. And so we take a product approach to that. And so that means with our work with Anblicks, when they embed themselves in our team, it just adds value. So if there’s a solution that’s like, “Duh, just change your process a little bit to get more effective,” that’s a good solution that gives you more currency to work with the stakeholder and be like, “I want to work with them more. I actually want to see what else they have to offer.” And they’re open for those types of solutions, so.
18:52 Yeah, I agree. So finally, coming on to the machine learning part of it and then we decided, “Okay, now this looks like a good funnel, a good way of working through this.” And then we started the actual part of data science, machine learning. And then even there– so we’re going to talk about all of the challenges from these slides later towards the end. So this is something– I put it up here so that you guys can relate to it. This is what their dataset looks like. We had a couple of variable names– kind of data points that we already have in the historical meter that, “Hey, maybe we can use this to do that.” So I just want to call out the fourth one, days until visa interview, because that’s a very important attribute. Because as you near your visa interview date, you become more riskier because you are at the end date. You might not end up in the school because there are deadlines happening there. And then we also have weekday of visa interview. So in our EDA, exploratory data analysis, we understood that I can’t– okay, we understood that weekday plays a very important role. For example, Mondays and Tuesdays had higher visa approval rates when compared to Thursdays and Fridays because, I don’t know, maybe they have a quota and they finished that up in the early part of the week.
20:08 So that kind of things that we became aware. We did a lot of EDA. And then we finally used RapidMiner. So being very quick on the feet again. And it helps us create more and more trial and errors. The one thing I love about RapidMiner is that it integrates very well with a lot of different technology. So I’m going to show that in the next slide. But before we get to that, the final outcome that we created through these models about the risk profiles is in the form of high, medium, and low because Ranjith showed the funnel– we also worked on the earlier part of the funnels and we understood– so earlier, we were too technical and we came out with numbers of the outcomes like, “Hey, 70 means very good, but 70 might mean different for someone else.” And, Brian, you were saying some example about–
20:58 Yeah, we have 60 people located in countries all around the world. And so I’ve lived for several years in Lima, Peru, and I worked in the education system there. And for students, their equivalent GPA is out of 20. So the average GPA of a class is 13 out of 20. That’s good. That gets you into most good universities in Peru. So for me, 13 out of 20, I’m like, “Wow, that’s not very good.” So for our Chinese colleagues in China, you had to explain what 70 out of 100 meant. They’re very likely to do– and so we’ve done a lot of trial and error around how to translate a lot of this stuff to the end user. And one of the things we learn in this case is let’s keep it simple. Our goal is to make sure that we are getting in contact more than once with the high-risk profile students. And the low-risk profile students who are going to get visas anyway, we actually don’t need to spend 20, 30, 45 minutes on the phone with them practicing and giving them advice on how to do it. So that’s where it came into play where the solution can be a translation of a lot of what’s going on, on the kind of analytics and machine learning side.
22:19 Yeah, so the bottom line is numbers mean different things for different people. So that is how we decided that we need to have some kind of differentiation in the form of high, medium, and low. And that made sense for a lot of people who are nontechnical. And this is just a slide to show how we have integrated RapidMiner in the already existing infrastructure of Shorelight. They have their database hosted on Amazon Redshirt and the CRM and Salesforce. This load of Salesforce is going every day into it. Redshirt data warehouse and then RapidMiner takes that up. We create our own ETL pipelines and model building and then throw out the results every day. And all the applications are scored daily. And then all of them end up in the back of the CRM. And where the end users can have a look at the visa risk scores, visa risk categories over there. And then they can prioritize the student profiles. The scores also end up in a Tableau dashboard because that is how you create a holistic picture for everyone to see. So the entire journey of reaching here had a lot of challenges. Brian, do you want to continue here?
23:23 Sure, so I’ll talk about the first con. [laughter] So we always have challenges when we work on some of these things. Specific to this one were six ones that we chose to present here. So the first one is just process improvement in operations. So we found out through our discovery of the suboptimal process. It wasn’t the fault of the functional owner. They had created something from nothing in the year prior. So they were trying to get their team who was distributed in all these different countries to follow this process. So it was fine, but we had the trust with the relationship to say, “Hey, I think we can improve the process along with actually creating analytic solutions for you.” So that was one. I mentioned at the very beginning the evolution of the organization. So a lot of the– some regions of the world were using one process; other regions of the world were using another process. It was what worked for them. So like in China, instead of sending an automated email, they sent WeChat messages. And so those are the different nuances and differences of it. And they were charged with getting students across the finish line. They couldn’t wait six months to develop a holistic solution that worked for both China and the rest of the world. They just had to go, go, go. So we had to lean into that sort of process and make sure we added value with those sorts of things.
24:54 Global footprint, so we are a global organization. And that is a big challenge. So we have to move quickly and actually distribute and communicate information on a global scale. And that’s very, very challenging for us. So it is very important to repeat the messages until you’re blue in the face. Get with them. So I was in China, thank God before all the things that are happening now, which is another challenge that we’re facing right now. And it’s good to get to know, get in their shoes, understand what’s going on so you can actually gain that trust so that when they receive a random email saying, “Hey, you got to do this differently,” they know that it’s good and that they should adopt it. So that’s a big challenge for us, not just for our analytics team but also, for everyone in the organization.
25:48 Yeah, and I want to add to Brian’s first point about process improvement because that is the most challenging part of the whole solution that we created. Because enrollment services team is someone who does the repetitive job every day. They tend to get mechanical with the way they work every day. And you coming into their lives and saying, “Hey, you should do it differently.” They were like, “I don’t want to. I mean, that’s a lot of effort. Why do I add two more steps in my everyday thing?” We have to show the value that, “Okay, this will make you more efficient in your job.” So that’s one of the things. And on the third point on global footprint, one thing, again, I want to add is that you want to be– since Shorelight is spread across 60 countries, I think, you want to make sure that everyone’s on the same page, which is very difficult like Brian just mentioned. And then, also, from the technical side, when we started off this project, the first thing that kind of struck me was the availability of data points. I mean, we wanted to create a solution, but we did not have the right data points to get to that. So I think that’s one of the common sense thing which I was mentioning that a lot of organizations, they want to do machine learning and stuff like that, but we should make sure in the first place that do they even have those data points in there for the first place? And so that is how we recommended that you need to start capturing more and more data about the student, about the way he speaks, about how commanding he is in English, how confident is he. So stuff like that plays a very important role when you are creating a model.
27:20 And then, of course, the adoption strategy, convincing the stakeholder of the value. I think that is the most difficult part of, not just this, any other data science project. And you need to be – I don’t know – creative in different ways to show the value. But in our case, I think, we were lucky to have Brian and the stakeholders as well who were quite accommodating and open to ideas. And that is how we were able to see this project through. I mean, in all honesty, there was some project that did not go through.
27:51 Yeah, yeah, we’ve worked on many things together and not everything has worked. And so I think having the ability to learn and move quickly is pretty key.
28:00 Yeah, and the last point, I don’t want to drag this but feedback loops, I think, are the most important aspect of a data science. This is something I learned on the job that with an organization like this who’s spread out across the globe, you need to know what the end user is thinking about your solution. You can’t just have one person telling you what you think. So that is where we started the feedback loops kind of thing. And it was still, in the beginning, quite challenging because we had to do it twice a year or something because again, extra work for the end user. So that was kind of a problem. But then, I think we are getting there. We are having more frequent feedback loops, which is helping us. Do you want to talk about–?
28:42 Yeah, I’ll take this one now that we have just a few minutes left. So these are the results. So, basically, it’s a version of a confusion matrix, but we put it in our kind of translation of it. So–
28:53 Performance metrics.
28:55 Performance matrix as Muddasir likes to say. So the 1,282 cases and what we evaluated them from high risk, medium risk, and low risk and then what the result was. And so what we found was that definitely the team on the bottom note here, those in the medium and low, we kind of guessed correctly that they were highly likely to get visas and that we actually didn’t need to work–
29:21 Guess is the wrong word. [laughter]
29:23 Yes, that’s true. Guess might be the wrong word, but. And then on the top end on the high-risk category, it was almost a 50/50 thing, but they spent more and more time with those students to try to prep them and get them prepared more often. So we did see, as I said in the very beginning, that 5 point uptick in the total number of students who were able to get visas this time around. So with that, I know we have one more minute left, I’m just going to end it there and leave some time for questions. [applause]
30:08 And done.