Scaling R-based demand forecasting with RapidMiner
Improve the supply chain with highly accurate, highly scalable demand forecasts
Presented by Ryan Frederick, Manager of Data Science, Dominos
Forecasting demand across the supply chain is crucial for an organization that prides itself on reliable service and speedy product delivery. See how the data science team at Domino’s tackled the challenge and worked through a complex time series forecasting exercise – from prototype to delivery – and uncovered an innovative way to scale R-based time series models to drive reduced errors and faster runtime.
Watch the full video below to learn how the data science team at Dominos used RapidMiner to improve the supply chain through extensible time series forecasting and scaled R-based models.
GET THE SLIDES
00:04 Okay. A little bit about me. I’m a manager of data science at Domino’s. I’m a first-time Wisdom conference attendee. So obviously, I agreed to present without any context whatsoever of what we talk about here. I had a former mentor who used to say to me, “Be afraid of the guy that wants the mic.” And apparently, I’m that guy today. So afterwards, privately, someone tell me how I did, okay? So enough about me. High-level Domino’s, we’re the number one pizza company in the United States by market share. The road to number one is filled with technology innovations that I think probably many of you are familiar with. You will have potentially seen some of the marketing around our mobile app. So there were times when the mobile app can do voice recognition and image recognition. We had a project where we would allow a loyalty customer to take a picture of any pizza and earn 10 loyalty points. So that’s the kind of innovation we’re doing. It could have been a picture of a dog toy, and it would still– a pizza-shaped dog toy, and it would still work. So it’s with that kind of focus on innovation that I come to talk to you today about supply chain demand forecasting. Fun.
01:23 So the goal of my project for my customer, which is supply chain, is to provide highly accurate, highly scalable demand forecasts. The problem is there’s shared resources across the ecosystem, and the ecosystem is expanding rapidly. There’s many of us doing data science with fixed resources. And so the solution I’ll talk to you about is make extensible the time series forecasting tool that I use and think creatively to keep the footprint small so as not to disrupt some of my peers. So at its core, we’re talking about the store inventory life cycle, right? It starts with all you hungry customers that order food and deplete inventory in the stores. It goes to store operators who, at the end of the day, take inventory. They order inventory replenishment through online tools. Our supply chain system then shows up later on, and they fulfill the inventory request, restocking the store’s inventory. And it’s this store operator process that throws off tones of data for us to analyze. So that’s where we’ll be mining insights.
02:37 So the goal, again, highly accurate, highly scalable demand forecasts. One quick example. I can’t give anybody information they would use to reverse engineer my stuff. So you get a graph of cheese demand and pounds with no axes and no dates. The blue line obviously represents the history of cheese demand for this collection of stores, and then the red dotted line is our forecast. You can see some important data points, and they’re represented by a gray bar. So I want to tell you exactly what that means, but that might be an important calendar event. Certain days of the year, people order more pizza, might also be a national promotion. So why are we doing this? The business value comes from what we get out of the forecast so we can give our suppliers a heads up for a demand boost that’s coming. Nobody likes to be shocked with large demand and have to figure out where to source the product from, so we give our suppliers heads up. It gives us the option to reduce food waste. So in the stores and in the supply chain centers, let’s optimize against food waste. And lastly, we can scale demand to meet the demand, right? So if it’s going to be a lower volume week, then maybe we don’t need as many folks producing dough.
03:50 So this is where the business value comes from. And you all know how important that is to the kind of sell tier C suite, how to get your product going. So how are we going to solve the problem? We have a lot of resources available to us at Domino’s. I’m on a team of about 50. I don’t manage all 50. I just manage 5. But the team has many with advanced degrees. My point of this slide, by the way, is that there are many ways we could solve this problem, and I’m just going to show you the one that we did. We had folks with advanced degrees in chemistry, computer science, applied statistics, electrical engineering. One guy has three masters, one of which is nuclear science, some talented people. And then we have a comprehensive text act, which kind of touches on the user’s desktop environment, an AI/ML server side environment where you can run RapidMiner, JupyterHub, RStudio. We’ve got a couple of Nvidia GPU servers in our database at the bottom there with SQL and Hadoop. Most importantly, the RapidMiner SAC. And this is going to be principal to a number of the techniques I talk about. We have three queues, so if you have used RapidMiner server, you know what a queue is. We have three of them kind of named after who pays for it. But I have access to use any of them when I need to. And each queue has 2 machines underneath with 40 cores on the data science queue, 40 on the marketing, and 80 on the memory queue. So these are my tools.
05:16 So the prototype, where we started, it’s not meant for you to read every process here, but the concept is query a SQL Server database, receive the inputs needed for the model. I’ll explain in a minute. Pass the data into the model, run the modeling fit and forecast, and then write the results where they need to go, which is some downstream production system. Raise your hand if you’re a coder. Maybe you should be sitting towards the front. So the R-script, I’ll just high-level talk about it because not everyone’s a coder. The idea here is receive from the database the three pieces of information that– and by the way, I should say, we’re using Facebook’s open-source time series forecasting tool called Prophet. So Prophet requires a number of inputs. SQL query that RapidMiner receives passes the example set into the forecast function. We filter it down to a single SKU supply chain center combination. So think Michigan cheese or Georgia pepperoni filtering down to just one thing. We run fit in forecast, and then we wrap that whole thing up with a parallel process, and ours do parallel package, so we can do 16 scenarios concurrently.
06:32 So the timeline of enhancements, this thing is in production, so it’s kind of survived the great barrier that a number of data science projects kind of run into. Ingo mentioned this morning how fewer than 1% of projects actually end up in production. Here’s how I got there. RapidMiner isn’t integral to every piece. I’m going to focus on the ones where it is, just some high-level thoughts about what each milestone means. So first, we launched our prototype on a single VM, remembering back to that RapidMiner architecture. We were asking it to do 200 forecasts, and it took eight hours. Anybody satisfied with two 200 forecasts eight hours? So the first thing we did was we took a look at why it takes so long, and the majority of the time is simply retrieving data from the database. Data engineering solved that problem. We got down to one VM, 200 forecasts in 15 minutes. That’s a lot more interesting. So then the business said, “Great, you’re getting some performance from runtime perspective. How about model performance?” So we took the original model, and I’ll get into this in a minute. We did some grid search in Bayesian optimization to replace the default, the Facebook profit defaults. That took our MAPE from 6.5% to 6.23%, so we got a nice little boost from simply training hyperparameters. And the business said, “Hey, this is great. Okay. You’ve been doing a pilot set of inventory items. Let’s do them all.” That meant a 20X increase in terms of what they’re asking the workload to do. So my runtime went to a now regrettable, one VM, 4,000 forecasts, eight hours again. We’re back to eight hours, and the data footprint was over 150 gigabytes on the disk. So also, not good because my database is limited in size and I need to shrink the footprint.
08:25 Back to data engineering. We use clustered comps store indexes. If you don’t know what that is, don’t worry about it. The outcome was our data footprint shrank back down to five gigs. So we kind of solved our problem with the scale expansion there, but we’re still living with this eight-hour runtime. So what can you do besides make it run faster? You can ask it to start earlier. And most of our jobs, simply scheduled to start at 4:00 in the morning or when we anticipate no one is on there. I built a little RapidMiner process to make it event-based. So it just checks. Are all predecessors done? Are all predecessors done? And the second they are, then it kicks off. So I saved myself 15, 20 minutes. Big win there. And then I’ll end on two things that I haven’t done yet, but we’re soon going to do, which will get us down to where we’re going. And that’s we’re going to use all six of the VMs. We’re going to do 4,000 forecasts in 27 minutes. Remember where we started. 200 forecasts, eight hours, so we’re much faster and a huge volume more of forecasts to do.
09:30 So now I’m going to just zero you in on the things where RapidMiner was integral to the solution. First was we wanted to tune those hyperparameters so the business was comfortable with the accuracy. So I took the function that you saw before, and I parameterized it, right? I just said, “Let’s allow the default variables to move around a shift, and we’ll pass what’s called a random grid search list of scenarios to test.” If I run it on a single VM, it’s going to take 60 hours, and I don’t want to wait 60 hours. I want to see the results tomorrow. So this is where I kind of hacked RapidMiner to do what I want, which is parallel parallel processing. That’s what I’m calling it. So the idea here was we’ve got a loop, a sub-process and then six of these schedule processes, and they simply point to each of the RapidMiner queues that we have twice. And the way the assignment works is when the listener gets the first job, it sends it to the first machine, and milliseconds later, the second job hits the listener and it sends it to the other one. So I’m taking my workload from running on one machine to now splitting it across six. So this is a little trick there with schedule process.
10:46 Once the grid search was done, we use the grid search results as a seed for what’s called Bayesian optimization. And again, all we’re doing is taking the existing function and just parameterizing certain pieces of it and calling R-package called our R-Bayesian optimization, which kind of balances the need to find a hot spot in terms of your parameters with searching an unsearched area. I’ll stop here for a second. Did anyone go to the hackathon yesterday? So one of the things I took away from that is low code is probably better than this much code. And so one of the things I have as a homework assignment is to figure out how to do this sort of thing in the native RapidMiner functions. So how do we do with grid search plus Bayesian optimization? I kind of get the answer already. We got MAPE improvement from 6.5 to 6.23 percent. We used our hack to schedule the sub-process across machines, and instead of taking 60 hours, it took 10. So the next day, I had the results ready for analysis. And for any of you who are sitting in the front row, you might be able to read the grid on the right, which is nothing more than a list of all the scenarios we tested iterating over those default parameters. And you’ll see that the R-Bayesian optimization did a pretty good job at finding the hot spots out there.
12:11 So back to my small win, event-based processing, right? So I don’t want to kick this thing off at 4:00 to find out that not all predecessors were complete by 4:00 so I’m running on incomplete data. I also don’t want to keep my process off at 4:00 when all of the predecessors would have been done by 3:00 AM. So instead, I’ve kind of hacked RapidMiner to search for a token that says, “Everything is done. You can start now.” So now I run 15 minutes earlier, again, 15 minutes sort of thing. Just quick snapshot of the event-based trigger, right, at the top. It’s just a loop saying how many times am I going to do this test? I used 60 based on empirical evidence only. Then I have a sub-process here that will throw an error and send me an email if it runs more than 60 times. So I’ve got noticed that things didn’t perform the way they should. At the bottom, I have a SQL query that looks for the token I’m looking for that says everything is done with a time stamp on it, and I wrap that up with Extract Performance, one of the native operators, and I say, “Is this binary condition met or not? No. Exit.” Trip everything else down line. So what’s next, right? One VM, 200 forecasts, eight hours. This part’s in grey. I haven’t done it yet. I’m going to do it in the next week or two. The idea is to take this process the same way that I handled the hyperparameters, the grid searching, and to split it into six mutually exclusive pieces and hammer each of the VMs. So not everyone’s going to think throwing more cores at it is the sexiest solution, but that’s how I’m going to do it right now.
13:54 And then the last thing. So we’re using Facebook’s profit model for the time series forecast. And if you kind of read up on the get entries on where they’re at with that, there’s likely going to be released in the short-term where you can pass the function, a true/false statement for, “I want you to do Monte-Carlo uncertainty sampling.” It’s an expensive calculation to draw the uncertainty intervals on the graphs at the end. You don’t need the uncertainty intervals at this point. I like to turn it off. I could do it by downloading the source code and just commenting out that section, but then I don’t want it to have to maintain code. And I remember someone earlier in one of the keynote sessions talking about maintaining code not being the most interesting thing. So I’m going to wait for Facebook to roll out the new one where you could simply pass at a false for doing Monte-Carlo simulation. And the important takeaway there is you go from 1.3 hours of runtime to 27 minutes. So that’s where the gains and run times come from. Lastly, Michael from Forester said something about using optimization as a skill set on top of prediction, right? It’s a complementary skill set. That’s where we’re going next. Optimization problems really is the call-out there. And what was it I wanted to say that– saying that he a funny comment this morning. Use math to spend cash, something like that. That’s what we’re going to do there.
15:25 So I would just want to kind of close on and give some time for some questions on how RapidMiner helped me in all of this. It’s a low-code interface, right? So if you do it right, you don’t necessarily have any code. If you do it the way I did it here, you still get speedy development and speedy testing. Obviously, we’re integrated with the scripting languages. It’s great for orchestrating across systems. We’ve got the server, so we do all the server side. I don’t have to have my laptop tied up for several hours. It’s all done on the server side in parallel execution, right? With the last thing being the event-based hack that I put out there to get my job to start earlier in the day. So that’s how we achieve the goal. Highly accurate, highly scalable demand forecasts with the problem of shared resources are limited. And my peer and partner data scientists are spinning up their own projects and gobbling up all those resources as we speak. So the solution is creative thinking to keep the footprint small. I went faster than I planned, so.