Taming the Complexity of Hadoop Analytics

Taming the complexity of Hadoop Analytics: RapidMiner Radoop makes it easy to do hard things

Predictive Analytics in Hadoop: big AND complex

Everyone who has delved into the complexities of Hadoop has experienced how hard it is to handle everything that is outside of the data and the analytics itself. Big Data environments entail a huge number of configuration options that each client has to be aware of (High Availability, HDFS encryption, Kerberos configuration, etc.). There are enterprise firewalls to deal with and the difficulties of accessing different DataNodes which could potentially be in different data centers and you may not even know where they are – just thinking about it can give you a headache!

RapidMiner is here to help! Let’s examine this latter issue first.

Getting into the cluster

In general, a Hadoop cluster is a closed environment that administrators will only reluctantly allow users to connect to through a very limited set of ports and access points.

hadoop-analytics-1

Figure 1- Security considerations constrain access to the nodes

However, to get value from that data, a comprehensive Big Data Analytics tool like RapidMiner Radoop needs access to most of the services located in all the Hadoop nodes and potentially from any user laptop in the company. How can we achieve this without compromising the environment’s security?

RapidMiner Radoop’s new proxy connect

In RapidMiner 7.3, the RapidMiner Radoop Proxy solves the issue of navigating complex Hadoop infrastructures by allowing you to install RapidMiner Server as another component within the Hadoop cluster and configuring it as a proxy. This functionality allows all the communication between RapidMiner Studio or RapidMiner Radoop and any Hadoop component to go through a single machine (RapidMiner Server) and a single port. Even the JDBC connection to Hive can use this proxy.

hadoop-analytics-2

Figure 2 – The RapidMiner Radoop Proxy as a single entry point

To adhere to industry-accepted security standards, the RapidMiner Radoop Proxy can, of course, be configured to use SSL.

By selecting your newly configured RapidMiner Radoop Proxy, you can work from RapidMiner Studio’s visual design interface exactly as if your laptop were in the middle of the Hadoop cluster, which makes accessing and leveraging your data fast and easy.

hadoop-analytics-3

Figure 3- Easy proxy configuration

Let’s now turn our attention to the second issue: what if you have dozens of variables (many of them difficult to interpret) that need to be set in your client?

Let’s ask the experts

Most widespread Hadoop distributions include industry leading tools like Cloudera Manager or Apache Ambari, that provide ways to easily configure and monitor clusters. So why not leverage their preconfigured connections to make your life easier?

hadoop-analytics-4-1 hadoop-analytics-4-2

Figure 4 – Cloudera Manager and Apache Ambari: the main Hadoop admin consoles

In RapidMiner Radoop 7.3, we have made it even easier to create connections to Hadoop clusters. We now allow you to quickly import connections from Cloudera Manager or Ambari by simply providing the Hadoop manager’s URL, your user name and password.

hadoop-analytics-5

Figure 5- Retrieving the configuration is as simple as this

If your environment consists of several clusters, you will be able to select which one you want to connect to.

Then RapidMiner Radoop will retrieve all the needed configuration variables. If the environment is configured with High Availability or HDFS encryption, Kerberos or any other fancy option, RapidMiner Radoop will identify it and automatically fill all the details for you. In most cases, you will only need to update your user information or the particular Hive database you want to use as default.

Which is MUCH easier than configuring all this by hand!

hadoop-analytics-6

Figure 6- Some of the variables in our demo environment

 

Short & Sweet

You already knew that RapidMiner Radoop was a powerful tool for simplifying Big Data Hadoop Analytics – NOW you know that configuring it just got even EASIER!

Download Now!

Leave a Comment