Installing RapidMiner Server on AWS
With its latest product release, RapidMiner has made a free version of RapidMiner Server available to everyone (thanks RapidMiner!). The free version of the server is small–it’s primarily intended to allow users to develop proof-of-concept use cases and test how a server would provide benefits not available in RapidMiner Studio, such as web apps or reporting deployed via a dashboard–but it is fully functional, which means it is a powerful tool and you’d probably like to take advantage of its many features.
Of course, you can simply install RapidMiner Server on your local machine (the same one that you use for RapidMiner Studio, in fact), but this doesn’t generally represent the conditions that you would want RapidMiner Server to operate in a production environment. So instead of running it locally, we can take advantage of the Amazon Web Services (AWS) cloud infrastructure environment to put RapidMiner Server in a place where it can run 24/7, always connected to the internet and liberated from the constraints of your local machine, and where it can be scaled very easily. This article will provide some guidance on the necessary steps to get yourself a fully functional cloud-based instance of RapidMiner Server, for free (or very low cost).
Getting Started on AWS
First, you will need to create an account with Amazon Web Services. That’s easy and free, and you can do it here (see the signup link there). For hosting our first version of RapidMiner Server, we are going to be using the AWS Elastic Compute Cloud (EC2) resources, because AWS offers 750 free hours of cloud computing per month on certain configurations, and they are easily set up and modified in the future. If you aren’t already familiar with the basic concepts behind scalable cloud-based virtual server infrastructure, there is an excellent “Getting Started” guide from Amazon here.
Once you have an AWS account, you’ll log into your AWS console and select EC2 from the “Services” dropdown menu in the upper left. That will bring up a screen where you will have the option to select the blue “Launch Instance” button near the top of the page:
That will initiate the setup wizard, which will walk you through the steps necessary to launch your virtual machine. The first step is to select an AMI, which is basically a pre-configured OS to start your server off. That screen looks like the following:
You’ll want to pay attention to the “free tier eligible” icon underneath each specific listing, since you will need to use one of those if you intend to take advantage of the free AWS server option. In this case, since I’m more comfortable managing a Windows Server than Linux or its variants, we are going to scroll down the page until we locate the Windows Server options. I selected the basic 64-bit Windows Server 2012 R2 version (just hit the “Select” button on the right):
The next step is to choose your machine instance. In our case, we are going to start with the t2.micro instance: although this is a small server (only 1 GB RAM and 1 CPU), it runs the free version of RapidMiner Server without any problems and it is also eligible for the free tier of AWS service. After the first year, the current cost for this server would be about $9.50/month for a Linux version and about $13/month for a Windows Server version in the US Northeast Region.
One brief aside: a very nice feature of the AWS EC2 architecture is that you may choose to upgrade your instance at any point in time without having to reinstall and reconfigure everything. There is a simple way to do this, as described in this Amazon article.
However, it will require you to stop your instance to make the changes, and then restart it again. Unfortunately, this will cause your IP addresses associated with the instance to change (unless you had already selected a dedicated IP address), which means you will need to go back and create a new connection to the instance (as described below), and when you connect to your RapidMiner Server from Studio or via the web interface you will also need to use this new IP address. If you are going to pay for a larger server because you intend to utilize it in an actual production environment, you may at that point also want to consider a dedicated IP address in order to avoid further reconfiguration later if you need to make other changes at AWS.
The next step is to configure the instance. At this point you may simply leave all of the default values (although we’ll need to make some modifications later):
Hit the “Next” button at the bottom and it will bring up the storage options. Again, the default setting here is fine:
Although 30GB is not much space, if you add more now it will automatically put you into a paid plan. You can always add more storage later if you want to upgrade your virtual machine.
Hit “Next” again and you will get screen for tag management. You may tag your instance however you would like (this doesn’t affect server functionality, it is purely to help you keep track of your machines if you have multiple instances related to different projects, etc.)
The next screen is for security management, which is very important. You’ll need to make several changes on this screen for your instance to work properly.
If you already have a security group at AWS then you can simply apply an existing security group (and you probably don’t need this guide either!). If not, you will create a new security group and it is strongly recommended that you set it up so that your server can be accessed in certain ways (defined by the protocols and associated ports) only from specific IP addresses. Otherwise your machine will be open to the world and there is a very high likelihood it will eventually get hacked.
So you are going to add some rules to the security group to limit the access. You’ll need to modify the existing rule and add another rule, which is done by clicking the “Add Rule” button in the bottom left of the screen. Once you click that button, it will create a new line in the table. You will configure it using the dropdowns. First you select the “Type” which is protocol you are going to control. Then you select the “Source,” and for that they provide a handy “My IP” option which will automatically prepopulate your machine’s IP address in the correct format. You can choose a source of “Custom” if you need to enter an IP address or range manually.
If you know what you are doing here, you can make your machine as secure as you’d like. The one essential rule change will be to set the RDP protocol to operate only from your IP address, because this controls the Remote Desktop Access features of Windows Server, which we will be using shortly to connect to the machine. So for that rule (which is prepopulated in the table with source as “Anywhere”), you can simply change the dropdown to “My IP” and it will work. You should also add a rule for the HTTP protocol with Source “Anywhere,” or else no one (including you!) would be able to access the RapidMiner Server web interface or web applications once you have deployed them. You can add other rules as needed if you want your machine to be accessible from other specific locations or via specific protocols. For instance, I added a rule to allow SSH traffic because I set up an sftp service on my server to make transferring files easier.
Once you make these security changes, you will hit the “Review and Launch” button which will bring you to a summary screen of your selections so far, and at the bottom of that page is a blue “Launch” button. Once you select that, you will receive a popup window that prompts you for a security key:
From the first dropdown, you can either use an existing key (if you have one), or you can create a new key. You should not select “proceed without a key pair” (even though it is an option). Creating a new key pair is as simple as giving it a name and then selecting the gray “Download Key Pair” button in the middle of the screen. Your browser will prompt you with where to put the file (which is just a small text file), and then you can select the “Launch Instances” button. This will bring up a confirmation screen while AWS is launching your new AWS Server!
The complete launch process will take several minutes. Once it is complete, you will be able to find it by navigating back to your EC2 dashboard (under “Services” from the top menu) and then selecting the “Instances” link on the far left. That will bring up a dashboard like the following:
You’ll need to select the correct instance by using the blue selection field in the upper table. That will bring up the information about that instance in the lower table.
In my next post I’ll go over connecting to a Remote Desktop and finalizing the RapidMiner Server installation on AWS.