New Tools from RapidMiner Labs

Gisa Meier, Software Engineer, RapidMiner

Gisa has built a brand-new extension for RapidMiner that can take a process and, with a few clicks, wrap it into an extension, complete with an icon. In this presentation, Gisa demonstrates this code-free method of building extensions.

00:03 Okay. So this is about this new extension that was put on the marketplace earlier today. So what can we do? If you have something like a building block– we had these before and we shared some of those, but we couldn’t really share them as operators. That would be much nicer. So now I will show you how you can do it. I just take this process here inside this nice building block created by VP some time ago, and create a new process out of it. Now it’s down there. Let’s put it here. But doing it using different rate operators and then creating an average over it, I can show you with an example how it works. So this was user dues data set. If I run this process now, it gives me some importances for my attributes.

01:17 So how can I create an operator out of this quite big process? I added this new Create Custom Operator option. So now I give it a name – Average Weight – and now I need an operator. Let’s search for something with weight. This one looks nice. So it’s just weight. It’s pretty easy. I am copying this here. Now I should write a longer synopsis, or synopsizes here. And description, the even longer thing down there. But since you don’t want to wait for me to write so much, it’s not so nice description.

02:10 So the interesting part comes now because now you can use some of these parameters in the process to create new parameters of the operator you want to create. And I’m choosing the sorting direction of the last operator in the process. And I don’t like Sort Direction. Let’s call it Sort Order. And I also adjust the description, now Sort Order again. So do you see you can change the name of the parameter and give completely new description if you want to? And now I save this operator here in some new empty folder I created here. Okay. Now I have saved this description.

03:05 What can I do now? I go to Extensions; I create a custom extension. So what’s the extension of my first extension? I don’t fill in Vendor and Homepage; by default, it’s just RapidMiner. But if you create your own extension you should definitely do this. And it’s not 1.0 version, it’s more like 0.1. Now I have to supply the folder where I stored the operator in. So it was this one. And now I can choose a color. What about light green? No. More light blue. It’s more blue day-to-day [laughter].

03:58 Okay. So we ignore this for now. I press Okay, and it’s creating my extension in the background. It takes a bit of time because it creates an extension jar. And now it has created. And I want to note it. Where is it?

04:30 It should be there [laughter]. Can you look in the Operator section? It’s not there. That’s fine.

04:38 Yeah. Of course if you–

04:41 Live demos.

04:42 Yeah, live demos [laughter].

04:43 Live demo.

04:45 Now I am doing it by hand. There it is. Now I’m loading it by hand. Of course, when we checked it before it worked perfectly. But that’s live demos. Here’s this extension. It has the average rate operators. And now if I look in my installed extensions, there’s my first extension. 0.1 version. Okay. So I can now use this new operator. Just bring it in. Yeah. Let’s try again with these yields and not– run my process, and I get again this importance. And I have here my Sort Order. I can change it to this ending and see also the description. And below here, my great synopsis and description [laughter]. And here, the description of the parameters. So I change it now to this ending, and if I run it again then it changed.

06:05 Okay. So this was something out of a process. There’s other things that we can use. For example, something with scripting. And that’s mostly Python or the scripting operators. So let’s show you Python first. So that gives you a Python operator, and we look at the first toil process. So what does it do? It uses the data, applies the K means from S key cluster, and returns the cluster data. Here we have this label thing. It would be nice to use the utilities we talked about before, but for now I am just deleting this row. And I create again a process out of it, where I connect the input port.

07:21 Here’s the micro K, which is used inside the process here as the number of clusters. And what I want to do now is to use this micro as the parameter in my new operator. So Python is doing– believe me, there is no snake. Let’s take a run. And again, I should write some synopsis and description. So the interesting part now is I want to take the value of the K thing here. Now, I’m calling it K, and say it’s the number of clusters. Okay. Saving this again in my folder from before. And now I can create an extension out of– again, I’m putting it in the same extension as before. My first extension. And no, it’s not enough to call it 0.2. Let’s stay with 0.1. I’m giving it the same folder as before. Here it is. Yeah. I tested the blue. I didn’t like that one, so let’s take this blue. And we’re doing it again.

08:57 Probably, the reloading won’t work again [laughter]. But we can try. So now you can– yeah? No, of course not. So I’m doing it the longer way. About this hack that I’m doing now, Jan will talk in a second. So now here is my second operator, Python clustering. Dragging it in again. And when I run this process now, you can see it takes some time because it calls Python the background and it fails. So now I can go Open Process and look in what is the cause for this failure. No, I don’t want to save it. And now it says the Python script failed. Oh, it does not convert string to float. Okay. It doesn’t like this one. So I want to show you an example that works.

10:31 I think this one worked before. So that’s when we–

10:42 Try Sonar.

10:43 Try Sonar. Yeah, yeah. Sonar. It doesn’t like me today [laughter]. I tried it just before and it works.

11:00 Too late now.

11:02 Yeah. You still have the label in there because you removed that from the scripting operator, I guess.

11:08 Yes. Then let’s– include the selection. Should I do this?

11:20 In Select–

11:21 Single.

11:21 Yeah.

11:34 Ah, finally. So now it has two clusters as you can see here. And if I change this number to a higher number then it will of course get more clusters. And if, without an error, you want to look into the process, you can do it like this and open the process inside. So I jumped a bit ahead when I used this workaround to load the extension because– this is something that Jan built. So I hand over to Jan.

12:13 So – I don’t know – who’s been around from, let’s say, RapidMiner 7 or so? Show of hands. Cool. Who’s actually developing extensions themselves? A few. Okay. So who’s using still Eclispse to do this [laughter] So that’s good. So a lot of people switched over to IntelliJ, I guess. So way back when, I created a little extension called Rapid Development that tried to help extension developers get an easier start in Eclipse; extensions development and redeploying it into RapidMiner. And now we just this morning released an updated version that is assigned, so it’s working now with newer RapidMiner versions as well. So what you saw Gisa do is just reload externals on the fly. So we all know RapidMiner sometimes takes some time to start. Right? Sometimes it’s a little bit sad. So this is just– what this allows you– it can do much more than that. But it basically allows you– if you already have an extension downloaded, you don’t have to necessarily have to restart RapidMiner. This might not work with all extensions. Be warned. It’s still a little bit of a beta phase there. But it’s basically just running it through all the things you have to run through anyway when you start up Rapid Miner.

13:36 So let’s actually have a look, because what Gisa did is quite amazing, but we do have another colleague who just built up on that. So let’s see. Who of you do know Balash Barani? Yeah. So he built two new extensions. Oh, actually three, sorry. Changing a lot of stuff here. So as you can see, I’m searching for Envy because I know his extension is called Database Envy. It’s not there yet. So as you might have known, the search will just show you that it’s here. That’s perfect. We will add this. And while we added– oh, no. The other one, we already have. Right? So we just can download this now; go through the normal steps here. No, we don’t want to restart now, thank you, because this beauty is coming in. So we’re going to manage the extensions. We have the database and the extension here. Just open it. Come on, run. There you go. And now you can already see the operator searches through there. So we now have these two new operators in here, which are from this extension. So how long did this take? Half a second or so? Right? So just find the Java from your extension. Just press Okay, check file now, load it up, and you’re done; you can use it.

15:08 So now let’s actually look at what Balash built here. Come on. Recent processes. It is this one. Who of you always wanted to join things not just based on equality but also on inequalities and stuff like this? So databases can do it. Right? So RapidMiner should be able to do it. So what Balash did here was just exposed first and second attribute that you want to merge on, and then just a simple expression that you might have seen already if you are using some other of our expression-based operators. Right? So you just say like, “Okay. A times B should be between 10 and 20,” so we join on that. And the other one is just, take the absolute difference between these two attributes and make sure that the distance is smaller than 1. As you can see here, A and B is a little bit innocuous. So A and B are just sent here for the first and second attributes.

16:13 So if you just run this– this takes a little bit. For this one here– which ones did you actually take? It was input 1, attribute 1, and input 2. So let’s just put those two next to each other. And believe me, we checked this again. So if you multiply those two, they are always between 10 and 20, so that’s nice. And yeah, that’s basically– you can now do what databases could do a long time ago. But it’s in there. So you can use it and also we can just have a look at how we did it. Yeah.

16:57 What’s the name of it?

16:58 What’s the name of that?

16:59 Oh, sorry. Yeah. Let me show you. It’s just called actually Database Envy because we are always–

17:04 [crosstalk].

17:05 Oh it’s called Database Envy since we are envious of the database doing this. Right? And as you can see, it’s made by Balash and you can find his website here. It’s already on the marketplace, as you just saw me download it. So if you want to check it out feel free. And as you can also see, if I just click down here, there’s a nice synopsis and description and whatever else. So what Gisa’s extension also lets you do, it lets you customize a lot more of your extensions. So if you want to go big on documentation, you can do that. I think there’s a Readme file of that. That–

17:40 We will provide documentation how to use it, though.

17:42 Yeah, okay. But we will provide you with some documentation, how to use that. So the basic idea is this is actually a script operator that he was using. So if you have data similar to what Martin just showed with Python, we can do the same of course with just a normal scripting operator Groovy. And so you can bundle this all things up and just share it with your colleagues, or even put it on the marketplace as Balash did.

18:07 And the other one, I will just briefly show. It’s here. Right? Yeah. So don’t know if you could read this. So again we are loaded. So have a look in Start Extension. And what he did now is processing. So again, something that’s a little bit lacking in RapidMiner. But what Balash did here now is he built up something that’s using even not just core functionality, but he’s using even more libraries involved in that. And so this is all just bundled up as one. You don’t have to code. You just know like, “Okay, I need this, this, and this.” Put it all together and just deploy it. And that’s basically it. [applause]


Learn more about creating operators with RapidMiner: