RapidMiner Studio - GUI Intro

  • Panels – Repository, Operator, Process, Views (Design & Results), Parameters & Help
  • Importance of Ports within the Process panel
  • Re-sizing the panels through expanding and contracting the boundaries
  • Re-arranging panels through drag & drop
  • Restoring default view
  • Best practices to follow for every new project, create a new Repository

Super-charge RapidMiner Studio with Extensions

  • Extensions Menu- > Manage Extension option (to add & remove)
  • Popular Extensions are under Top Download
  • Recommended – Text processing, Web Mining, Python/R integration, Anomaly Detection, Series Extension & Radoop
  • With Radoop Extension we get another view (next to Design & Results)

Visualizing Data in RapidMiner Studio

  • Sections under Results: Data, Statistics, Charts, Advanced Charts & Annotations
  • Understanding terminology in Data tab: Example set – Entire data, Examples – Rows, Attributes – Columns
  • Rearrange – Change size/width, Sequence & Sort order of columns
  • Understanding Statistics section to review data
  • Types of Visualizations/Charts

Data Preparation & ETL

Loading Data via a Process

  • Process – Create, Run, Export, Import, Save & Delete
  • Read operators & Read Excel operator
  • Import Configuration Wizard
  • Name, Comment & Unit rows – Name row corresponds to column names

Importing Data in RapidMiner Studio

  • Importing data and storing it within RapidMiner
  • Operators, Parameters and Connectors
  • Retrieve operator
  • ‘Attributes’ are Columns of the data and ‘Example Set’ are Rows of Data
  • Adjusting the layout
  • Creating new Repositories with sub-folders ‘Data’ & ‘Processes’
  • Accessing Help & utilizing Tutorial Process

Data Preparation

  • Open and launch RapidMiner ‘Turbo Prep’
  • Data loading to prepare it with ‘Turbo Prep’
  • Transform – how to rename a column
  • Feature Generation – creating additional columns/attributes
  • Data quality indicators
  • Using the prepared data as input for ‘Auto Model’

Data Preparation

  • Remove duplicates or replace missing entries
  • Change data types with ‘Dummy Encoding’ or ‘Binning’
  • Reduce the number of columns with a ‘Principle Component Analysis’
  • Removing correlated or low quality attributes
  • Apply ‘Auto Cleansing’

Importing Data in RapidMiner Studio

  • How to merge data set in RapidMiner ‘Turbo Prep’
  • Adding rows via ‘append’
  • Using the help in ‘Turbo Prep’
  • Generate – how to add an empty text column
  • Scoring prepared data with ‘Auto Model’
  • How to do a (left) join

Loading Data via a Process

  • How to do pivoting in RapidMiner ‘Turbo Prep’
  • Transform – exclude/filter rows with missing data
  • Using drag and drop to create a pivot table
  • Assessing numbers in a pivot table
  • How to find out more on column details
  • Saving and exporting prepared data

Data Preparation

  • Process – Create, Run, Export, Import, Save & Delete
  • Read operators & Read Excel operator
  • Import Configuration Wizard
  • Name, Comment & Unit rows – Name row corresponds to column names

Connecting to Databases

  • Deploying and setting up a new driver
  • Connecting to a (MySQL) database
  • Connecting to Dropbox, Twitter, Salesforce, S3 or NoSQL databases
  • How to setup a custom database URL for connection to a specific instance
  • How to execute and customize SQL queries
  • Importing data from a database

Model & Validate

Creating a 'Decision Tree' Model

  • What needs to be predicted forms the basis of your model
  • Preparing the data for the model
  • Understanding ‘Filter examples’ operator to remove all Examples (rows) with missing values
  • Using the ‘Decision Tree’ modeler and understating the Result (Graph)
  • Traversing through the decision tree graph

Applying the Model

  • Understanding Scoring a model process
  • Predicting Label for the Example (rows) where the Label value is missing
  • Using the Multiply operator
  • Using ‘Apply Model’ operator
  • Understanding Confidence values

Testing a Model

  • Understanding the Testing a model process
  • Predicting Labels for Example (rows) where the Label value is NOT missing
  • Compare the existing Label with the predicted label
  • Using the Performance operator
  • Measuring the accuracy of the model

Validating a Model

  • Understanding Validation
  • Split Validation vs. Cross Validation
  • Using the Validation operator
  • Understanding a sub-process
  • Training & Testing Sections
  • Understanding the ‘Confusion Matrix’

Finding the right Model

  • How to find the best model amongst a number of ML algorithms
  • Understanding the Receiver Operator Characteristics (ROC) Curve
  • Comparing ROC Curves using ‘Compare ROCs’ operator
  • True Positive Rate = Hit Rate
  • False Positive Rate = False Alarm rate
  • Implications of the classification threshold

Optimization of the Model Parameters

  • Grid – Parameter Optimization
  • Cross vs. split validation in the grid optimization
  • Optimizing a Decision Tree classifier
  • Logging – use the log operator
  • Correctly logging validation outputs
  • Creating an iteration counter

Automated Model Selection and Optimization

  • Multi-level optimization
  • Optimizing model parameters
  • Optimizing multiple nested optimizations
  • Benefits of parallel execution
  • Remember/Recall and Set Parameters
  • Optimized ROC comparison

Auto Model - Classification

  • Introduction to the ‘Auto Model’ feature
  • Getting started with model creation for beginners
  • Rapid prototyping for advanced users
  • Guided classification model creation
  • Automatic parameter optimization

Auto Model - Clustering & Outliers

  • Step-by-step guide to automatic Clustering and Outlier Detection
  • Introduction to the ‘Auto Model’ feature
  • x-means and k-means clustering
  • Understanding the ‘Sonar’ data set
  • Edit your ‘auto-model’ process


Collaboration of RapidMiner Studio and Server

  • RapidMiner Server architecture
  • Creating a Server Repository to establish a connection
  • Editing server components required permission/privilege
  • Copying/Moving objects (Data, Processes…) from the Local Repository to the Server Repository
  • Deploying and Testing on the Server through the ‘Run Process on Server’ option
  • Scheduler a Process to run on the Server/Run the process remotely

Introduction to RapidMiner Server

  • Connecting to the Server through a browser
  • Admin account and Special users
  • Creating a ‘cron trigger’
  • Browsing the Process-Scheduler to scheduled and past processes
  • Pausing scheduled jobs and viewing the Queue
  • Defining/Creating Web Services

RapidMiner Server Installation - Preparations

  • setting up RapidMiner Server prerequisites
  • installing Java 8 on a Windows Server
  • Adding a Tablespace on Oracle 11g Express
  • creating a user for RapidMiner Server to connect to a database
  • SQL Developer demo and MySQL setup tips

RapidMiner Server Installation - Walk-through

  • RapidMiner Server architecture
  • Job Agents and Job Containers
  • RapidMiner Server installation step by step
  • Starting the Service
  • First login to the web interface / administration GUI
  • using JDK vs JRE for RapidMiner Server

Introducing RapidMiner Radoop

  • Radoop provides 72 operators
  • Connections Menu -> Manage Extensions
  • Connecting Hortonworks Sandbox
  • Radoop Nest operator – performs all operations on the cluster & not in RapidMiner memory
  • Monitor the submitted jobs in the Hadoop cluster vis Web Browser

There is much more where this came from! Click below for more FREE learning materials.