Christian König, Data Science Coach, Old World Computing
In the real world many projects in the domain of machine learning face problems with the deployment of the solution. In many cases there’s a limited understanding about machine learning to specify the target solution at all. Hence a data scientist needs to approach that in an agile way, which requires the ability to swiftly create end user interfaces to showcase results and make them “feelable”. After showcasing, the results need to be reusable for real deployment in order to not waste money, effort, and time.
In this presentation, Christian demonstrates a new extension that adds these abilities to the RapidMiner platform in a flexible and seamless way. RapidMiner processes are used to build the app and specify the data logic behind it.
00:02 [music] Thank you very much and thank you all for still being here. This will be, I think, the last talk for today on this stage. I have the best thing right now. I physically crashed my computer and I had everything set up, and three minutes before I have to talk, reboot, reboot. So I’m a little bit in panic mode right now but no problem. So Web apps. Do you know what a Web app is? Have you ever used some kind of Web app? We have some nods there. That’s good. We built, like Scott said, a new extension, the Web app builder extension. It’s an extension that builds Web apps. Why would we do that? Okay, let’s have a look at what we all are doing. We are following the very old standard of CRISP-DM, the cross-industry standard process for data mining. And as you can get from the term data mining, it was invented in the 1990s, 1996, I guess, and it has these different phases that the project usually follows. And in the first phases, the business understanding and data understanding, that’s the part where you usually talk to a customer, where you get to understand the business, where you get to understand the problem. What are they trying to solve? And then you go to the data understanding part. What kind of data do they have? Do they reflect the problem? That’s why there’s an arrow back that you get in your problems with the data that you need new insights. These phases are the collaborative phases of the CRISP-DM standard, and that’s where usually the customer is most involved. So he’s explaining you what the data means and you try to get to understand the problem. After that, that’s where we relax. No, that’s where we do the work.
01:57 That’s usually the data scientists crunching the data, preparing data for future generation modelling. So that’s usually when you’re alone. So you have a basic concept. You made some choice on what you want to achieve, some measure. And now it’s the part where you do the work. So you go back and forth, modelling, data preparation, new features, whatever you do. And at a certain point in time, you manage to achieve what was said before as the goal, what you want to achieve, certain accuracy, whatever. So then after you evaluated it, you go to the customer and present your solution. Ha, ha. I have the perfect model here. And the customer in our past experience sometimes says, “What? What is that?” So as an engineer, from our perspective, we did everything right. We did very good models that could create great predictions but the customer couldn’t simply use it. He didn’t understand it. He didn’t trust it or he had something completely different in mind. The solution we proposed was not what he expected. So in the first phases, when we talk to them, there was a slight misunderstanding. So the customer didn’t know enough about data mining or data science and we didn’t understand the problem enough. So when we thought we were understanding what we needed to do, we started doing, doing, doing, and we reached a point very late on where it is apparent that’s not what the customer wanted. It is not useful. We now have invested time in a model that’s simply not going to be used. The customer’s angry. And now that’s the reason why the CRISP-DM has this outter circle. We have to start over from the beginning.
03:57 To avoid this, to avoid the problem that you are working alone, that you can not give the customer any feedback, it would really be cool if we had a method of early on involving the customer how a solution might look. because most people won’t ever have a RapidMiner studio open or use of visualizations of RapidMiner studio. And for these customers, for the end users, it is much more convenient to have some kind of Web app. And that is the reason why we chose to create an extension to easily make Web apps, to have an early demonstrator on the project. So even when you’re not having a model, you can start by asking the customer, “If we did it, if we met your criteria, is this what you want from us? Do you expect the solution to look maybe like this?” When we have a first model, however bad it is, we can use it. We can show the customer this is the way the predictions will look like. Is this what you can use to improve your business? The deployment that is usually at the very end of the circle has to be actionable and it usually has to support feedback. So when a user gets a prediction, he usually has to act on it. And it can be that he gives feedback so that the model can improve. Web apps can help break these long cycles that you arrive at a model that you are quite confident, which is good, but there is no real solution for the customer.
05:45 And you break the long cycle in terms of you have very early on demonstrated that you can show to the customer and the customer can very early on say, “Oh, no, that won’t work for us. We have special tools. We need another solution to deploy.” And the what do I get out of the project, which people that are not data scientists usually ask can be answered this way really, really early on. And Web apps as an answer are seeable, attachable and are interactive. So I will show you a demo afterwards that the user can really fiddle with the Web app and see that there is some thing behind it that we really did. It helps non-data scientists to understand the benefits and possibilities in case that it’s much more visually appealing than having some Python code or having a RapidMiner web service exposed. And for us, it avoids frustration at the end of the project. And for the customer also, it avoids overspending if he wants to have other results because he can communicate it very early on. So why with RapidMiner? The answer to this is that we are in a data-driven context. So when we do a data science project, we already have lots of data and we usually build models on the data. So we want to show the results also based on this data. And since the data is already available in RapidMiner and processed, why not stick to the tools we know? And the most required functionality is mostly already implemented in RapidMiner. So the basic concepts like loops and stuff, it’s already there. I don’t have to program it. I don’t need a specific web programmer to create a solution for me.
09:43 So it enables adaptive apps that can be used in dynamic contexts. As an example, if you have input fields, depending on some kind of data that some customer has to enter more in a dynamic context, you don’t have to program that in your Web app. You can simply loop over all the attributes that you require of the customer and automatically the form is built upon this data. The data can be changed and can be entered effectively by users because there’s a direct back channel that can use the data that is entered by the customers directly in the RapidMiner processes that are running in the background. So let’s take a look. Please fasten your seatbelts because we are going to the third dimension. And I’ll give you some time. This is after some tests, some internal tests, the best representation where you don’t get crazy. I tried it with all tabbing between a browser window and RapidMiner studio, and up to 15 switches, it was messed up in my head and it was messed up in the viewer’s head. So please bear with me. In our experience, in 30 seconds, you’re getting used to it. So what do we see here? On the left side, we have a partial view of the RapidMiner studio. And on the right side, we have a blank web browser. The core of our extension is always the created Web app operator. That operator creates HTML or JSON code whenever it is called and returns it. And when this is exposed as a web service, it acts as a real Web app.
11:31 Inside of the created Web app operator, you can now modularly, you know what I mean, with operators enter some things that you want to show. Of course, the hello world label as the first test will show the string hello world. But since hello world is nothing that a customer would pay you for, let’s start doing something which will finally lead to the demo. I will show at the end. Inside the Web app, we see it on the left side, we have the preview of the process. We want to have a new window that which is decided with a nice top bar and something down below where we want to show our results. So we just created, we take two layouts, a top bar, we name it top bar and main panel. And if we refresh our browser, it will look like this. Of course, there’s nothing inside yet. So let’s put something inside. On the left hand side, you’ll see that inside of the top bar, we now created three columns and we will populate them with a nice logo with a company name and with three graphics. And the interesting thing is that these three graphics are based on a variable state. So it’s a traffic light that either is red, yellow or green. And based on the data that is there, the correct label will be shown. If we refresh our browser, we see everything but the traffic light but they will be there in the demo. So let’s continue. We want to have something in the bottom, and for that we simply enter the lower layout and create two columns there on the left hand side. We create a sidebar to be able to control what is shown. And on the right side a multi view.
13:28 A multi view is a dynamic view where you can switch different views and it is controlled by the sidebar. If we refresh our browser, we already see the sidebar, which we configured with a monitor plan and orders. And on the right hand side, we want to have something to show. So we go there. We created a monitor production plan and orders. You will see that in action later on. So these are three different views that you can interchange. And on the first one, on the monitor, we simply create big high charts, graphics to have something visible to the user. You can see that there’s an operator before because high charts has very, very many options, which if we wanted to encode that all in one operator, would be a real mess. So we decided it should be possible to have JSON configuration code directly be able to put into this operator for additional special parameters. Inside of the high charts operator, we create a line plot and if we refresh now, we already have something that is quite useful to the customer. This may be something that the customer, the end user that is going to use our predictions already has in its dashboard. So we are very early on and very quickly trying to create something that the customer can use and can understand. If we continue and add some more linear plots and arrange plot, it will lead to more lines being drawn. And for our demo, we also put a timer in there.
15:19 This timer is used to get new data periodically, like every second from a data source so that this chart is automatically updated live. You can see that on the right hand side, so the chart will automatically be updated. On the bottom side, the dashed line should indicate our predictions. And just seeing something is boring, so we decided we want to be able to control something. So on the right hand side, we add some controls with the tap view so we can configure our live monitor. Inside, we put two elements, the live monitor and the control panel. And inside the live monitor, we add a label so you can have a description. We can put a history slider, which is the component horizontal slider, which is the component that lets you change some values. To submit these values, we also have a component of a button which you simply connect and you have now a Web app that can be used to change the settings of your underlying control program. I will show you now how this looks in the browser.
16:56 So here’s the Web app, which we now filled with all the content. We built this for a company that creates cardboard boxes and these cardboard boxes have some components that are going into the production and we want to optimize the speed in which we can create cardboard. So we have some values that are shown live to us, something like the glue temperature. We have the production line speed, which should be as high as possible so we make much money. We have steam temperature because when we apply glue to the paper, it gets wet and wet paper is not good. So we have some hot steam that tries to dry the paper. The thing that we are controlling is the paper humidity, which should be in a certain range to have a good quality of paper. This what you see here is the prediction how the paper humidity will look like. To be able to see it better, I will turn off the other parameters. And you will see, okay, now in the range of our values, we see that we have some live data for our paper humidity and we have a prediction from our model where with the current settings, this will go. Now, we added some control panel, which is right now in automatic mode. This is a, yeah, very bad simulation that, yeah, is rule based more or less. So if the humidity of the paper is too high, then the temperature gets higher and you will see that the line is very much going up and down.
18:50 Now, if you are the end user and you want to control something, we added the opportunity, the possibility to change some of the values. So the production line speed is something that we want to have as a very high value. And if I do that and send it to production, we should see that there are some rise. So with this easy Web app, it is quite possible to directly have a working prototype that can send data back to production. There are other panels, which I can quickly show. So we also added some production plan where I can see the current data from our production system with details of some jobs. And we have the orders for some customers. We have the possibility to upload new orders directly into the Web app and, yeah, okay, it should work. With that, we can confirm the contents which are now showing in the bottom of the current orders tab. So very much interactive, lots of possibilities for the customer to directly interact with the model or the thing that we built, and all that is done in RapidMiner. So no one has to know how to code HTML, JSON, or stuff like that. Thank you. [music]