23 January 2012

Blog

New Plotters for RapidMiner

After quite some time of hard development, the RapidMiner team is proud to announce the birth of its latest baby: a brand new plot component presenting you a shiny, powerful and flexible visualization of your data and process results.

The new plotters support bar charts, area charts, scatter and series plots with a single configuration. Instead of preselecting a diagram type from a list of templates the new plotters allow you to freely choose the visualization type of each attribute. You can plot more than one attribute at a time, create additional y-axes, combine aggregated bar charts with scatter plots and add a number of error indicators if you feel the need for it. Enough talking, this is what the new plotters can do for you (of course with your all-time favourite data set):


What do we see in this plot? As you might recognize, the points depict a scatter plot of two attributes of the Iris dataset, namely sepal length versus sepal width, where sepal length is placed on the domain axis (x-axis) and sepal width on the left range axis (y-axis). The colors and also the shapes of the points are chosen accordingly to the label of the data point. This is also represented in the legend on the right.

Talking about the legend, you might want to have a closer look on it. The upper part reveals the plots in this diagram. The first entry labelled sepal length (cm) with the circle in front of it shows us, that the plot consists of single data points, i.e. it is the scatter plot we just talked about. The missing color and quite undefined shape tells us to look at the bottom part of the legend to get the semantics for colors and shapes: moving our attention here we discover that each unique color and shape represent one of the label values iris setosa, iris virginica and iris versicolor.

Now everything left to explain is the bar chart, which is also easily spotted in the legend: it is a histogram of Iris, grouped by label,  over the sepal length. Note that the heights of the bars refer to a second range axis on the right.

The attentive reader will have noted that the bars are slightly transparent: this shows another feature of our new plotters – everything is formattable and customizable, starting at customizable presets and gradients for the plot colors, different shapes for each data series, plot and legend background up to the fonts of the title and the axes. What else do you desire? Bars oriented from left to right instead of vertical ones? No problem, two clicks and you are done. Aggregate your data to calculate averages and plot the standard deviation of each data point? No problem, everything is possible 🙂

The true plotter experts will even be able to beam good old Iris to New York and celebrate the arrival of the new plot engine with a fireworks never seen before in RapidMiner:

Oh yes, this truly is the Iris dataset. Can you guess from the legend what you are seeing?

We hope that we could awake your interest for this new feature. It will be part of RapidMiner 5.2 beta which is expected to be shipped at the end of this week. As usual you will be notified via RapidMiner’s auto update about its availability, or you can just download from our website.

There have been some major advancements to the RapidMiner platform since this article was originally published. We’re on a mission to make machine learning more accessible to anyone. For more details, check out our latest release.

Related Resources