The Power of Collaboration for Data Science Teams

RapidMiner AI Hub (formerly RapidMiner Server) makes it easy to share, reuse and operationalize the predictive models & results created in RapidMiner Studio. RapidMiner AI Hub’s central repository & management, dedicated computation power and flexible deployment options support analytic teamwork and rapidly put results into action.

Watch this RapidMiner AI Hub product showcase to learn how to:

Build processes and libraries for use across your team
Create working groups to share data sources & models
Create a team dashboard to visualize your results
• Accelerate predictive analytic process development & deployment

Hello, everyone, and thank you for joining us for today’s webinar – The Power of Collaboration for Data Science Team: A RapidMiner Server’s Showcase. I’m Hayley Matusow with RapidMiner, and I’ll be your moderator for today’s session. I’m joined today by Tom Ott, our Marketing Data Scientist here at RapidMiner. Tom, I’ll get started in just one minute, but first, a few housekeeping items for those on the line. Today’s webinar is being recorded and you’ll receive a link to the on-demand version via email within one to two business days. You’re free to share that link with colleagues who are not able to attend today’s live session. Second, if you have any trouble with audio or video today, your best bet is to try logging out and logging back in, which should resolve the issue in most cases. Finally, we’ll have a question and answer session at the end of today’s presentation. Please feel free to ask questions at any time via the questions panel on the right-hand side of your screen. We’ll have some time at the end to get to everyone’s questions. I’ll now pass it over to Tom.

Thank you, Hayley, and good morning, everybody. I’m excited to be here to talk to you today about the power of collaboration using RapidMiner Server. Now, RapidMiner Server is something that was created, pretty much, to start collaboration. Many years ago, when it was first incubated in Dortmund, it was really to help the internal team organize libraries of processes and models and so forth, and quickly share them instead of passing around a USB stick or e-mailing the processes to one another. So really, RapidMiner Server, the base, was started in collaboration. So let’s go down here. So today’s agenda, and we’re going to do an introduction and then we’re going to do an overview of RapidMiner Server and then we’ll go with the demo. Of course, as Haley mentioned, we’ll leave some time for Q&A, which I’m pretty excited about to hear your questions on it. So let’s do the introduction.

Where does RapidMiner Server fit into our ecosystem? Well, if you see here, at the top under design, we have RapidMiner Studio. You can’t really offload processes or do your modeling or productionalize things on RapidMiner Server without RapidMiner Studio. And really, RapidMiner Studio is the design tool where you build your processes, put them on the collaboration server, and then allow your colleagues or your AWS or whoever it is to review your processes, test your models, and so forth. Of course, we also have, below RapidMiner Studio, the RapidMiner Radoop Environment, which also works really nicely with RapidMiner Server too.

So what is RapidMiner Server? Well, I touched upon the first thing. It’s a collaboration, a place where you exchange models, test them, write processes that you can reuse over and over again by your team, and so forth. It’s also a computing chip. It allows you to crunch larger data sets. And by that, what I mean is that RapidMiner Server could sit on its own independent server, you could load it with as much memory as you want – 64 gigs, 128 gigs, 32 gigs, if you want to go small – and use that to pull in larger data sets from databases or from other places and crunch them in the memory of the RapidMiner Server environment. And finally, it’s a deployment engine. It’s where you expose REST APIs, web services, builds of dashboards, and so forth. We’re not going to focus so much on the computed deployment part on today’s webinar, though it’ll be a part of another webinar coming up in the future. But we will talk a little bit about dashboarding, which is part of the deployment engine which you can use to quickly visualize your results with your team.

So why collaboration? Well, I’ve already touched on a few of those things at the beginning. It’s really the ability to share data sets and data connections, databases, and so forth. It’s also the ability to create and share processes and models. I could create a process on my RapidMiner Studio, save it to the server, give my colleague, Jeff, in Texas access rights to view it, to test it, and my colleague in Germany, Martin, may have developed a model which I could use and escort the data on my side of the pond. Okay. And also, it allows you to create and work in groups and build libraries. Now, we typically always work in a team. We typically work on projects. So what happens a lot of times is people will use a server to create a group – working group or for a project group – and everybody on the team is provided access and they can build their processes, share their data, do all that type of work, and then, also, interestingly enough, create their own set of libraries. They can create global libraries, meaning that everybody in the office or on the team or at the company can use a predefined set of processes. Normally, this is done for best practices. For instance, the example I will give if you work in one company, they may have a specific naming convention for their data or their data tables. And so you can build bits of processes that you could always reuse for every type of thing that you do, like maybe NewGL or maybe you use a standard type of modeling technique using, say, logistic regression, and so forth. So you can build these libraries together and then drag and drop them in as you do your work or your team’s work. Okay.

So a quick overview on RapidMiner Server. We’re going to do some general and some technical review over what it is. It can be installed on Linux, on Windows, any types of Unix-based type of system. It needs a database as a backend, and this database is really used for RapidMiner Server to do its backend stuff: storing data, storing your processes, running cron jobs, and so forth. It does support LDAP and Active Directory. So I will show an example of how we create the user in a manual way, but if you have LDAP or Active Directory, you can merge that together with server and do it very effortlessly. And of course, the best part about what this RapidMiner Server can do is you can offload a process that you built on your laptop in RapidMiner Studio or even a Radoop big-data process, and then save it to the server and then click “Run” on the server, thereby freeing up your machine to do other tasks. And like I said, it also has some native dashboarding capability – mostly HTML-5 based and some older version it supports as well. So technically, it runs on Java-8, it runs under the JBoss system. Like I said, the database backend, it could actually be open-source, like PostgresSQL or iSQL. Now, it could also be proprietary, like Oracle or Microsoft SQL Server. As long as you have a JDBC driver, you could pretty much use any database at the backend, provided you make a couple of tweaks to some XML files. And like I said, it can be installed anywhere in your corporate corporation as long as you have an IP address which you can then log into.

So for instance, in today’s example, I will be showing you a local installation of RapidMiner Server that’s on my laptop and I have full admin rights to it. And that happens to be at my localhost port 8080. However, when we’re doing the demo, I will be showing you a connection to a server that sits in Dortmund, Germany. Actually, we’ll have a URL. In this case, it’s sales.rapidminer.com and there’ll be some credentials that I can log in. So really, that server can sit anywhere in your company. You could have your team of data scientists, say, in Moscow and you can have the server sitting in San Francisco, and you may have a team in Texas that logs in to consume the models that your team in Moscow has created. It’s that flexible. As long as you’ve got an internet connection, you can put it anywhere; or an internal connection network. And best of all, it’s a very simple installation process. As long as you have your database, you have Java-8, it really is an eight-step process to install. Okay.

With that, I’m going to now take a moment to pause this and switch screens into RapidMiner Studio, and also into the web interface of the server. So if you’ll just bear with me for a second. Well, I’m switching this. Okay. You should all see RapidMiner Studio. My assumption for today’s call is that a lot of you have already been using RapidMiner Studio or are aware of RapidMiner Studio’s interface. The question is why am I showing you RapidMiner Studio? We go back to one of the first slides I showed where the server sits in the ecosystem and I made the comment that you need Studio to interact with RapidMiner Server. That’s mostly true. You can access RapidMiner Server through the RapidMiner Studio interface, or you could actually access it through the web interface. So, we’ll do both. We’ll do the overview since I have admin rights to my local server. And then we will do the connection to the German server. So let’s just do it real quick.

If you’re using RapidMiner Server, you know that over here on the left, typically, is where you have your repositories. A repository is a place where you see local data, where you have your processes, and maybe other different things. And typically, if it’s stored locally, you should see like a little HD screen over here like a little TV screen. And the way we typically always add a new repository is we come over here and click on the pull-down menu and we create repositories. If you’ve done this several times and you click here, you’ll see two choices. Normally, if you don’t have RapidMiner Server, you should still have this option to select here, but it won’t go anywhere. Most of the time, I would say 90 percent of the time, if you don’t have a server– actually, 100 percent of time if you don’t have a server, you will use a new local repository. But if you do now have a server running, you would click on here, go through next, you would give it a name, “server” or something like that, you would get the URL – which is something you would get from your server admin – a username, a password, and so forth. So for instance, since I’m using localhost 8080 and I have admin rights, If I were to, now, correctly enter my password and I click on check connection status, if I can make a connection with it I should get a success, which I have here. And if I don’t, it’ll give me a red X. So that’s really a great way first to test to see if you can connect your Studio into RapidMiner Server. I’m going to cancel out on that.

So how does it look when I actually make a connection? Well, I’ll scroll down here to the bottom. If I make a connection to server, I’ll get a different icon. You see right down over here, I have three server connections. And one is my local server, which you’ll see I just clicked on it and I expanded it. It says local server. It says it’s connected. And I have a bunch of folders. And I have a group folder, a home folder. I look over here, I can see that I have admin, that happens to be me, where I store some of my processes and so forth in here. And I also have a couple of other users, which I created for this particular demo webinar today. A webinar folder webinar backup folder and so forth. I also have two other connections. One is to my internal data warehouse and another one is to my German server, which we’re going to talk about a little later. And you could see here if I just click on that and expand it here, it will go ahead and refresh. It’s now traveling through the internet, through a VPN, I guess, and over to Germany and now refreshing all these different folders. Okay. So this is a very simple way of actually interacting with a server through the Studio. You can access it through the Web. Now, let’s go pull that.

So I’m going to come over here to my Web browser. And what I’m gonna do is I’m gonna go to my localhost, type in 8080, and report, and I’m presented with the login screen here. I click on log in. You should be presented with this. Now, every user on RapidMiner Studio– I’m sorry, every user for RapidMiner Server will have access to this type of screen. However, depending on your access rights, you may or may not see a bunch of folders over here once you’ve toggled places you can toggle on to. When I log into the German server, you’ll see that the administration panel pretty much mostly be blank. But as admin here, you have full control over everything. So let’s just explore these little tabs over here. Click on repository. Well, just like with repositories, it’s a place for you to look at what you and your account have loaded up to the RapidMiner Server. I can click on Browse Repository. Click on that. Now, we’ll see all these different folders and so forth over here that I’ve been working on. I can take a look at, maybe, what’s in this folder here. HP stocks, you can see that I have more subfolders, some script on processes. And I can come over here and I can do a couple things like I can maybe click on this. And if I do, it should show you the representation of an XML file. The XML file will look like over here, it loads in a second. I can refresh. I can rename them over here on the right. I can even download it or I can set certain permissions over here. We’ll talk about permissions in a few minutes. And I can do different things as well too. I can run the process on the server. I can even upload a new version. And of course, I can always check the history. Part of that versioning control that RapidMiner Server also has. So this is a very manual way to interact with it, but it allows you to do some finer tuning if you need to do that as an admin or somebody who has the rights to do that. And of course, I can always click here and search for processes if I happen to use one.

So the next one here is processes. Here’s where we start doing some more advanced things. These are the things that we will talk about in a deployment webinar, but for the most part, we’ll just touch on these real quick. Here’s my process scheduler. I can take a look and I can see which processes are currently running, which ones are cronned. In this case, I have nothing cronned here. You could see which processes I’ve broken – ironically, quite a bit – and which ones have actually successfully processed with a green checkmark. And of course, I can always look at the log file over here if I wanted to see what went wrong or what went right and so forth. Next, we have services. This is also a deployment feature. This is where I create REST APIs and web services. And last but not least, I also have something called Triggers where RapidMiner Server can monitor folders or wait from email attachment in order to trigger some process down the line. And last, let’s take a look at the administration. And this is where you now start to do user management and groups so forth. So let’s just take a look at this.

Now, If you actually have an old app or active directory, you’ll actually have a little more functionality over here. I don’t have that incorporated for this webinar, but we can create a very simple user. So what we’ll do is we’ll create a user named webinar1234. We’ll give them a password. Check the password again. Display main name Webinar224 and then hit Submit. And now, it should be created. And you’ll see here, right now, that Webinar, the news user, has been created and you’ll see that this person has actually is assigned to several different groups automatically. He’s assigned to Analyst group, Execute group, Report Editor, and so forth and whatnot. Now, if I click on the groups tab over here, RapidMiner Server comes pre-populated with a few groups automatically. These are automatic groups, you can choose to use them or not or you can create your own group. Like If I were to add a group over here, I would be able to come over here – I’m going to say yes – and create an LLMwebinar. Like that. And it automatically creates a group. And then, if I wanted to, I can come back to my user list here, I can toggle on webinar1234 and then I can assign them to the group, like our webinar group. Copy them over. If I don’t want them to be part of the Report Editor and Report Manager and Report Viewer, I can remove them as well and so forth. So you have that granularity to really, as the admin, to assign groups, to create groups, assign users, and either restrict them or give them access to a bunch of other groups on support. Very simple.

What else do we have under Administration? We have Database Connections. Very important. Here’s where you actually connect your RapidMiner Server to various data sources. You could do that by creating a new connection entry, let’s say, most of the time, it’ll be a database. You could see here there’s a few of them that are populated with the drivers. And like I said before, if you have some database that has a JDBC driver that’s not listed in this system, you can, of course, add it by appending an XML file on the RapidMiner Server installation directory. You go through the whole thing. You would then test it and see if it works. Now, sometimes, and the reality is as we have different databases. Some have financial data, some have other sensitive material, and there may be certain people that are allowed to see certain data sources. You can actually restrict the access of databases and other connections through the use of access rights. Of course, if I were to come down over here, you could see that– let’s just say webinar1234 and anybody working in the webinar group could actually have access to that particular database. Very handy and also some very fine-tuning that you could do on the back end.

What’s this Connections over here? Connections is another thing that you could do, but these are more for things like server– I’m sorry, things like Twitter or Salesforce or some Dropbox or some other type of connection that you can create to some other external third-party type of program or application. And of course, then we have things like System Settings and so forth where you can then add different properties and so forth. There was a great blog– that blog post came out today on how you can actually add Python scripts and Arch Scripting in separate path and values and so forth. So go check out RapidMiner.com/blog for that entry. And another neat thing is these are more admin-related tasks, but you can check on things like the operators of extensions that are installed on your machine. You can see here that, locally, I have a lot of stuff installed. And last but not least, System Information. You can check the system load. You can look at the server logs and so forth. Manage your preferences, manage your licenses and so forth. Last but not least, if you do get stuck on anything, please go visit the documentation, docs.RapidMiner.com. Or if you have a support license, click on Support to access one of our great support people.

So that’s just the web overview diving into the server. Now, let’s actually get busy and get productive. For today’s example, what I’ve done is I started with a local server. I started with a local repository. And what I was doing was I went to the Kaggle website and I downloaded an old Kaggle dataset which was the Walmart Sales Forecasting Kaggle event. And in there, there were a bunch of sets. One of them was called Features. One of them was called Stores. And what I’m doing here now is I’m just purely working locally on RapidMiner Studio. We’re going to now go from Local to Server and then finally end up with Buildings and Dashboards. So let’s just take a look at this data. This is completely local right now. And I’ll wire it out. If you’re familiar with RapidMiner, that’s how you look at the results. Over here. And hit Play. I’m going to pause here for a second and I’m going to make a distinction that this Play button means that you execute the process locally on your laptop or wherever your local machine is. If you have a server running, you should see – if I pull this down – two additional options- run process on server or schedule process on server. You’ll see that they’re greyed out right now. That’s because this data and the process that I’m going to build happens to be local to my machine. It’s not sitting on RapidMiner Server. It will be shortly. But right now, it’s not. So let’s just look at this data.

Okay, one of them, we have a bunch of store numbers, we have a type – whatever that means – we have a size, that’s in the store dataset to retrieve features one as a date, some temperature, fuel prices, and so forth. And what we want to do is we just want to merge two data sets together. So I can do that locally here. Just put a joint operator. Drag it in. Make a connection. Toggle off my ID, as attribute is key. And then, come over here click on Key Attributes and select Store. Very simple. Now doing an interjoin and hit Run. So now, both those data sets are actually joined. Okay. That’s local. Now, could I go ahead and do this by running it on the server? Well, If I did that– I can’t right now. I need to save this process to the server.

So how do we do that? Let’s scroll down. Let’s go to Germany because I’ll be working in Germany on this. Let’s go to groups. I created a group for this webinar. It’s called Webinar Group. Under Webinar Group, I have already created three folders- Data folder where I’m going to store the data. You can see here that I have nothing in here right now. A Library, which I’ll show you how we can use a library. I’m going to use a neural net to fit a trend line. And I also have some pre-populated processes here. Okay. But we’re going to create a brand new process here. Let’s go to a few things. So I could come over here, I can right-click on this and store this process here. We’ll call this one Join Store and Features. Once I do that, you’ll see a couple things change. You’ll see in the upper left part of the window, all of a sudden, my path changed. Now, it says Germany. The group’s Webinar processes one feature. So now, this is actually sitting on the server. And if I scroll down here, you’ll see that it dropped it at the bottom. If I right-click on this folder and I do a refresh, you’ll see that it automatically, then, will alphabetized or put it in order in this way. You’ll see that it pops up at the top here. Now, if I go ahead and do this, it’ll give me two options. Okay. To run process on the server or schedule it. So if I happened to do just run it, nothing happens. Doesn’t look like it happened. I just see this little note down here that says successfully submitted to the process server. But how do I know if it ran correctly? Where did the data go? What’s going on here? I don’t see anything populating back into my window, right? Normally, right, we should we should expect to see something like the results. These are the old results. We should see 8,190 examples, 14 regular attributes, and so forth. But we see nothing. We see nothing here.

So what happened and how can I figure out what happened? Well, I could go back to logging into my Sale server. Let’s come over here. And I’ll log out of this one. And I’m going to switch to one in Germany now. So in Germany, I have a different URL and I have a different login name. So let’s log in. And I can go check out my process scheduler, right? Remember, we’ve seen this before, how things run. And you could see right here. Here’s Ott – that’s me – and I just ran this process and it crashed. Okay, something is not right. It says reason requested repository does not exist. Okay, something’s not right here. But what happens if I can’t get to the Internet? Can I actually see if a process crashed on the server? Yes, you can. What you would do is you come to your RapidMiner Studio. Go to Review. Show Panel and toggle on Server Monitor. I happened to have that toggled on down over here. So let’s expand the one that says Germany. It’ll take a second. There’s my process. Says right here. This one that we just created failed. I expand it. It says, the data is not available. So what’s going on over here? Well I saved the process to the server. I executed on the server. And if I click on one of these data sources, you’ll see that it’s referencing server collab, which is on my local machine. That’s what’s causing it to fail because if I turn off my computer and I shut down RapidMiner Studio and I cron job this, the RapidMiner Server will try to– or well, it can’t get back from the server into my local machine to pull that data, so it will crash. So that’s what’s giving you this warning now. It’s saying, “All right, guys, I can’t access your local data. This is why it’s not working.” So we need to put the data on the server for it to use.

So let’s go do this. We can do this very, very simple. Here’s another great way of going from local to your server environment. Scroll back down to Server Collab. I’m going to take my Stores. Control-click. Grab it both. Copy them. Scroll down. Put them into my data folder. Paste it in. It should copy it over. And then, what we’re gonna do is we’re going to just delete these guys out. Drag them in. And you’ll see that they are completely relative path node, which is really what you want in a production environment. And now, we can save it. And now, we can run it again. And now, we’ll see, hopefully, if we did this correctly, a green checkmark. And there you go, it did successfully executed. But one thing that we don’t have is the result. That’s another thing that you need to know about RapidMiner Server is you need to tell it how you want the result to be displayed and stored. If this becomes a cron job and I close things down, it will execute, it will do it perfectly, but it’s not storing the data anywhere else. It’s not putting the results anywhere that you can interpret them. So what we need to do here is add a store operator. Store. Drag it in. Come over here to my little folder and scroll down to my RapidMiner Server in Germany. Click on Data, and we’ll go call it as Joined Data. Okay. Hit Save again. And now, let’s run it on the process on the server. Successfully submitted. In a second, we should see a second green checkmark. Success. And now, if I came back over here to my data folder, right-click and refresh. And there we have it. There’s my new data source, Joined Data. Perfect. Excellent. Now let’s go do something cool with this.

So the Walmart dataset has fuel data, temperature data, all these different things, and well, we’re not going to do today is we’re not going to really get into the heavy data science of seasonality and these seasoning things. We’re just going to focus on how you can build libraries next and how you can drag and drop and use different processes within the server and make things easier for you and your team. So let’s open this next one here. 02 Aggregate Store by Average Temperature. Open. Okay. You’ll see here that it’s all red. There is no data source because it was waiting for this data source. I’m going to do two things. I’m going to drag this data source in here, connect it all – so all the metadata now propagates and it’s fine. You could see I’m still in the German server. The directory still says Germany. And what I can do now is two things: I can either run this locally, if I so choose to, or I can run it on the server. We just showed you how to run things on the server, but what happens if you want to run it locally. Will it actually work? So if I click Run now, it will work. What’s it doing is it’s loading the data from the server, loading the process from the server, and then pulling the results into my local machine. Okay. So I can do that. Now, a word of warning to you all- if you’re using very, very large datasets and you’re bringing this across your network, this may take a while to do. I just happened to be using a small dataset and it happens pretty quickly. The whole idea for the compute part of RapidMiner Server is to do all the data lifting, do all the data crunching outside or somewhere else and then bring the results that you can find and inspect locally back into RapidMiner. Trying to run the entire process and do all the large data crunching and try to bring it into RapidMiner Studio will probably try to be a little bit nuts, but you can do it if you lose the power of the server to do all the heavy lifting in the background and then just consider the results. Hey, so this ran locally and you could see here we have a bunch of stores. We calculated the minimum temperature at that store, maximum temperature at that store, and the average temperature in that store. Very simple type of ETI work that we’re doing.

So if I ran this on the server again, I do that too. Hit Okay. We’ll see a green checkmark. Right, we forgot to do one thing. We forgot to store it. Okay, now we’re going to bring a store operator in. Just got out of the way. We’re going to use him shortly. Got back down. Store the data again. And we’ll call this Store Temperature. Okay. Excellent. Okay. We’re running it locally. And guess what? We just ran it locally. We ran it locally. Did it store back to the server? Well, actually, yes, it did. There it is. So you do have some flexibility. When you’re working in the server environment, and sometimes, it kind of gets blurry. It’s really meant to just be in the background to really help you do a bunch of things. So you just have to be aware of where the path goes and where you say it data and do those different things. But it’s really easy. I typically work mostly on the server. I load everything up. I save all my processes. I can run them locally. And when I’m done, I swap out smaller datasets for bigger datasets and just push it into the background and let the server run on. So that’s really cool stuff. You can effortlessly go between both environments. So in some cases, not even realizing that you’re doing it.

So now, let’s go look at the next one and talk about libraries. Let’s open up Fuel Prices. Same thing. I have another process here that I need some data for. I can drag the joint data into here. Come over here. And if we were to run this locally, let’s wire this out. We want to build a small trend line for fuel prices at store number 2. You could see here that the temperature at store number 2, the average was this, the fuel price average was this over a time period, and so forth. Now, I can do some work. I can over here and I can grab a Fit Trend operator and I can build all this, but somebody on my team in Germany has already created that for me. Let me wire this here. Somebody came over here and created a library folder and put a process in there that I could use. So I can take this and I can drag it out like this and drop it on here. And I go ahead and hit Run. Let’s hit Run. And if I look at the data again now, not only do I have the store, the date, fuel price, and temperature, now, I also have a trend value. If I look at my charts. Come over here. Go to my series multiple chart. And then I can select trend. And fuel price. And look at it by the date, that’s my index dimension. All right, very good.

So how can you create libraries? Libraries are fantastic to create, and they really are. I come over here. Let’s open this one up. It’s really just in this case this one operator. I just wire the input and I wire out the output, I put this Fit Trend operator in here with a neural net– and because it’s red, it means there’s no data in here. So I can drag and drop this anywhere I want and connect it up to my other process. And I’ll just digress for a minute. I want to show you some of the libraries that we, as RapidMiner, use quite a bit for our internal work for whatever we need to do. We’ve created a very large library. We have things like process libraries that relate to missing attributes, Google Maps, things like connecting to desk.com, speech recognition, cross-validation– all of these are really just– if I look at them. I open this one up here. These are really just bits of code, bits of processes that could be used over and over again. Recording components like buttons and file upload. Server administration stuff. Anything that a user or somebody could test to be used over and over again. Instead of rebuilding it every time, you can drag and drop in to RapidMiner Server. Right, that’s the value of all this. That’s the value of all this. Speed to market. Save yourself time. Why reinvent the wheel?

So let’s just go back to our little demo here. Okay, so we have fuel prices here and we have a trend that we just created, but we haven’t done anything to it yet. We haven’t stored it. We can run it on the server, which we do here. We should get a green checkmark. But once again, we didn’t tell it to save the results. Let’s go, wait a second here. And there we go. Success. Over here, I’m just going to run a store operator. Fantastic. And let’s go save it again. And we’ll just say Fuel. Great. As a data scientist, you’ve done this work now. You’re excited to show your findings. You’re going to say, “Hey, my colleague, Jeff Kiwanis, at Texas, I want you to check out this data source.” Or, “I want you to check out this data here.” Come over here and let’s go refresh this. And I say, “Hey, Jeff,” if I right-click on this, say, let’s edit the access rights here. Let’s say, “Hey, I want you to check this out.” He happens to be JKiwanis. I want you to just be able to read this data. But Martine Schmitz, who is my boss in Germany, I’m going to want him to do a couple of things. So actually, let’s go to Scott. I’m going to want him to have full access to everything in this folder. So I’m going to come here and I’m going to edit his access rights. I’m going to say MSchmitz– where are you? Here. Here you are. I want to grant you all read, write, execute, and so forth. So Martin can come into every subfolder, look at every process, look at every model, but Jeff can only access this data source right here. So now Martin can do different things with this. Jeff can only look at the data, but Martin can definitely work on anything. If he has something specific in this group, you could create another folder. You could say something like Models. You can actually create a model folder and so forth and work from there.

But reality is that, sometimes, looking at this data– and it’s kind of hard to do by just looking at it in a tabular form. Say if I wanted to look at this fuel trend data again, it’s nice to look at it this way, right, but ultimately, we want to see it maybe in this methodology and so forth. This is local to my machine. These are the visualization tools of RapidMiner Studio. Maybe somebody on my team doesn’t have time to load up RapidMiner Studio or just wants to be able to see what we’re working on or monitoring what we’re working on. So how can we actually collaborate with them doing that? That’s the way of the internal dashboarding capability we have. So let’s actually build a dashboard from this particular process here. And the way we do that is we do a couple things. We’re going to now enable these. And with dashboarding, it’s very, very simple. What you need to do is you need to do a few things. Let’s come over here. The simplest thing is you prepare the data in a format you want to visualize and then you use a published app operator. In this case, I have two published app operators. One is going to publish my fuel price and another one is going to publish the temperature of that store. Okay. So what we’ve done here is we’ve executed and create a– let’s go do a temporary block– I’m sorry, a breakpoint here. And when we stop this process mid-flow here, you’ll see that we have store number, the date, fuel price, temperature and a trend. So we’re going to create two visualizations. One is for the fuel price. And if I use my select attributes operator, I can see what I’m selecting to output there. It’s going to be the date, the fuel price and the trend. And on the bottom branch, I’m only just going to output the date and the temperature to my temperature visualization. It’s that simple. Come over here. Click– stop, let’s go run this on the server. Say Yes. And now let’s go visualize it. Now for that, we have to log into the web interface. Over here. Go to our App Designer. At the bottom over here, click on App Designer. And let’s click on the app.

Okay, we’ll go over this in more detail on the deployment webinar. But for the most part, the simplest thing is this is very simple to building widgets. Come over here, the first thing you need to do is create an initialization process, which, in this case, happens to be that last process number three for fuel price. So we come over here, open up Groups, open up Webinar, open up Processes, select Fuel Price, select Location, run the initialization process and then we come to Layout. Come Layout. And here’s where we build– this is a blank canvas. Similar to RapidMiner, you start with a blank canvas and you add components. So we got to do visualization one. Just grab it by coming down here and clicking Visualization. And then, you click on the little Edit button here to configure it. The neat thing about using the Web Apps, which is really cool, is that they already pre-populate when you run that initialization process. So if you remember, I had one called Fuel Price and I had another one called Temperature. Let’s go back over here. If we did this correctly, we come down to Subscribe to Object. And you should see them right there. Fuel Price. Let’s grab that one. It’s going to be a HTML-5 chart. It’s going to be a series chart. In this case, the default is column. So if you click on Refresh, let’s take a look at this. Kind of a neat little visualization, but not in the format I want. So let’s come back over here and change it. Change it to line. Let’s hit refresh again. Much better. Now, this is what I want to see. So what we could do is come down here to click Submit. Let’s actually maximize this. Then I have one ready and we’re going to drag him out of the way.

Let’s go put a new visualization for temperature. Drag these guys together like this. And same thing, come to data and format and we will, this time, click on Temperature. Click on Refresh. And we can just, in this case, get a bar chart of the temperature over time. If we look at index attribute, we can do the date. Hit Refresh, and it’ll give me the dates and so forth. And when we’re done, we can click on Submit. We have our boxes. And finally, we can click on Preview, which will, then, be a dashboard that we could share with our colleagues. The way you would do that is we do a couple of things over here. You would be able to give people some access by giving them a link like so forth over here. This is a protective link, they need to log in, provided that they have access or site access rights to this. And it’s really, really that simple. So we save it. Done. We come over here. Come over here and see this. I’m sorry, we go to Groups. My bad. We go to Groups, Webinar, and I’ll just call that– actually, no, we’ll put it under App Dashboard. Save it. And now, if we go back to RapidMiner Server– I’m sorry, RapidMiner Studio, we come back to the App Directory. You should see that this is already saved in here and so forth. And then, once again, I can do this. I can come here to edit access rights and then I can assign different people to different groups to actually access it and so forth- read, write, and so forth. Very, very simple.

With that, I’m actually going to, now, open it up to questions and answers and then turn it back over to you, Hailey.

Great. Thanks, Tom. So as a reminder to our audience, we’re gonna be sending a recording of today’s presentation within the next few business days via email. And like Tom said, go ahead and submit your questions via the Questions Bin on the right-hand side of your screen. I see we already have a couple of questions coming in. So Tom, I’ll go ahead and ask you the first question here. So this person is asking if we can install this anywhere as long as we have internet connection? Is it safe to assume that this is cloud-based?

You can put this on Amazon, like AWS for sure. Now, most of the time, I make a database connection to my DreamHost server for data and then I could view my server running it that way on my local machine. So yeah, I mean, all you need is Java and you need a place to install it to and a backend database. So yeah, it could sit in the cloud if you wanted to.

Great. Thanks. Another question here- are the datasets stored on the server accessible from outside without using an export process? What kind of formats are the data that’s stored in on RapidMiner Server?

So the data, the local repositories, these little guys right here, are stored in the RapidMiner data format. What you can do is you can export them. They come back over here. I’m trying to remember. It’s here. You can export them, I think, as CSV or as an XML, if you wanted to by coming to Browser, Repository. Let’s see, browse them– let’s go see if we can actually do one right here. Come back over here. Let’s see. Here. Yeah, there’s multiple ways that if you click on the actual dataset. Let’s go to this dataset here. Data. It’s the Irish one. You could then download it, I believe– yeah, you can download it here in different formats. You could download as HTML, XML, ARFF, even as Excel and so forth. Yeah.

Great. Thanks. One other question. Does this integrate with front-end tools such as Qlik or Tableau?

Yes, RapidMiner Server sits in the background. Using Qlik – I know Qlik off the top my head – allows you to connect just a simple API call. What you would need to do is with RapidMiner Server is create a REST API, give that URL to Qlik, and then Qlik can interact with Studio– I’m sorry, Qlik can interact with Server. Yeah. Tableau, I believe it’s a little bit trickier because you have to use their OData format. You have to do a little bit of more installation work, some binaries you have to install and so forth. But Qlik is the easiest by far.

Great. Another question here- for the dashboard, do you need RapidMiner desktop or is the dashboard accessible via the web?

The dashboard is only available if you have RapidMiner Server. So if you have Studio and no Server, you don’t have the dashboard.

Okay. A bunch of more questions are coming in, so I’ll go ahead and try and ask them. Can you deploy the Analytics app as an Edge application?

No, not to my knowledge. I don’t think that’s been tried. They can download the free version of Server and give it a try. Talk about it in the community, I would love to hear that.

Yeah, that’s a great idea. So another question here- will the graphs in the dashboards update automatically if the input data sources are continuously updated?

So what they would need to do is they’re not like not like – what’s the word? – a push notification. No, that’s not capable at this time. However, though, if they logged in or if they click the Refresh button, then it would automatically pull in all the latest data.

Great. Another question here. Can I share the dashboard with someone who does not have RapidMiner?

Yes. Yes, you would do that through the link below. You could either create as a– I’m sorry, let me go pull this up here real quick. You could do that through– now, let’s come back to App Designer here. There’s a way to do it either through a protected link – that means that they need a login – like right down here. This would be the link. Or you can give them– you would click on the access rights and assign it to anonymous, and that means anybody can access it if they have the link. Yeah.

Great. Thanks. I have a few questions on this topic. Can I use Studio and use Power BI for the dashboard?

Well, you could, technically, use Studio and save it to some format that Power BI could use. I don’t know enough about Power BI to really speak at length about it, but if Power BI were to be able to pull in a REST API, then you would probably do it on Server’s end.

Okay. Another question here – how are reoccurring jobs handled?

That’s very, very simple. That’s done through the cron job. So one thing I didn’t show, and this is more about the deployment aspect, is we showed over here to run the process on the server here, but if I click on this one, to schedule the process on the server, I would then come up with another dialog box where I could pick which server I want to run it on and I can create a cron job either by manually entering the cron– make a star, question mark, Monday-to-Friday whatever it is or actually just toggle these on and then have it written right in automatically, hit Okay and then Run.

Thanks, Tom. Another question here- how does this integrate with Hadoop or Radoop?

Actually, a very good question. So RapidMiner Server, we’ve had clients that actually use server in this capacity. They have a team of data scientists, and like any data scientist, they would love to get their hands on all the data in the Hadoop cluster and run Jobs and all kinds of things. From a purely internal management style, they use RapidMiner Server as a gateway. So they use processes built in RapidMiner, deployed on the Hadoop cluster to do, maybe, ETL work or something like that, and then deliver the datasets to a folder, like a group folder, where the data scientists can then log into and pull the data do whatever they want with it. And that could be sitting on a separate machine. We’ve also seen clients install RapidMiner Server on an Edge node of the Hadoop cluster where they can then get more power from the memory and so forth to be able to compute things that way. And also APT as a governance tool between the users and the Hadoop cluster and so forth.

Great. Another question- so what about triggers for such schedules? Like example, a script fires up a job when a source updates.

So, yep, that’s another great question. Touched on it a little bit here. If we go back to the Sales server, there’s a trigger folder over here. And what you can do is you could do two things as far as I know right now is that you can have RapidMiner Server monitor a directory. So if something gets dropped into that directory, like a file or something like that, it will then trigger a process to do something with that file or do something. Or you can actually set it up to monitor email attachments so that if somebody sent email attachment, like maybe the latest financial spreadsheet or whatever it is, you could load the attachment in and do whatever processing, downstream it and deliver it how you want to deliver it.

Thanks, Tom. A few more questions, we’ll take now. So how big of a dataset can I store in RapidMiner Server?

So because of the way the- because the database that runs the backend of RapidMiner Server, the data store is at the store on RapidMiner Server or stored really in that backend database. So it really depends on how much size and space you have allocated for that database or what you have on your server. Of course, right, it does all its crunching and memory. So if you have room for several terabytes and you want to use RapidMiner Server to crunch all that terabyte data, you’ve got to make sure that you have a strong enough server with good memory to do all that in. One note of caution is that RapidMiner Server is not necessarily a replacement for a Hadoop cluster. It’s really meant to really be between what you could do on your local laptop and a Hadoop cluster to be able to crunch more heavy data. I mean, in reality, the majority of users think they have big data. They may have other issues like wide data and so forth. But a RapidMiner Server– if they think they need a Hadoop cluster, but in reality, a RapidMiner Server really fits the bill.

Great. One more question here- how do you handle versioning control?

Oh, great question. So I touched a little bit on it, there’s a built-in versioning control with RapidMiner Server. If you’re working on the server, you can actually– we would open up a new window for– I forgot what it was called. I think it is called Version or it’s called something else, and you can create a new version of the process and it’ll record every time you made a change to it and who made the change and what date and you can always open the old– if you made a mistake, you can always open the old process and so forth and load them back in.

Great. Thanks, Tom. So looks like we’re just about at the top of the hour so there are some questions that we weren’t able to address, but we’ll make sure to follow up with you guys via email as soon as possible with your question. So thanks, Tom, and thanks, again, everyone, for joining us for today’s presentation and I hope everyone has a great day.

Great. Thank you, everybody. Bye.