Over the summer the Data Science teams at RapidMiner were hard at work updating several of our extensions. We are proud to announce 6 new operators have been added across the Operator Toolbox, Smile, and Converters extensions. Here’s a quick overview of these extensions and what’s new.
We can observe that the model found a pattern. The likelihood of surviving is generally higher, the higher the passenger fare, but the model ‘saw’ a higher likelihood of surviving also in the 100 range.
One of the tutorial processes in the ‘Generate PDP Plot Data’ operator shows how to generate this chart.
We now fit a linear model which predicts the demand from the investments into our marketing channels. The ‘GLM Contribution’ operator can show us the individual contributions to the overall prediction.
This can easily be visualized to show the share per month.
Check out the tutorial process in this new operator to see how to do this analysis yourself!
Previously, there was no way to access the individual tokens in this Document with operators. This is now solved by the new ‘Extract Tokens’ operator. This operator returns the tokens either as an ExampleSet, where each example corresponds to one token or as a collection of documents, where each item is one token.
We’ve added two new machine learning algorithms to the ‘Smile’ extension: Gradient Boosted Trees and Random Forests. These two algorithms are implementations from the Smile library. Smile is a very interesting library, which is very fast and competes with our existing implementations. You now have the choice between two different implementations of these popular algorithms. Note that both algorithms currently only support regression problems. We will add classification versions of it in a future release.
Take a look at our extension library on RapidMiner Marketplace. Please keep in mind that these extensions are not officially supported and we, as a team, may sometimes make changes which are not backwards compatible!
Operator Toolbox
This extension adds a bunch of operators to RapidMiner. They range from utility operators to improve the flexibility and usability of the process design, offer additional outlier detection algorithm, and additional performance criteria to advanced analysis methods like Local Interpretation or the SMOTE algorithm. Download Operator ToolboxWhat’s new with Operator Toolbox
Understand your Models with Partial Dependency Plots (PDP)
When building predictive models, you often want to better understand the dependency the model is exploiting. RapidMiner already provides a lot of features for this – Most importantly ‘Explain Prediction’ and various feature weight methods. In version 2.2 of ‘Operator Toolbox‘ extension we provide you with yet another way to understand your models – Partial Dependency Plots! Partial dependency plots are univariate methods to understand the dependency between your model prediction and a numerical input variable. The fundamental idea is to score many different examples with different values for our attribute. Afterwards we take the average prediction value or confidence to see the impact. The algorithm to generate PDP data is the following:- Take a value x between the minimum and maximum of attribute k.
- Set the attribute value of all examples in the ExampleSet to this value.
- Score the ExampleSet and calculate the average response. The response is either the predicted regression value or the average confidence of the positive class.