Introductions

 

  • Panels – Repository, Operator, Process, Views (Design & Results), Parameters & Help
  • Importance of Ports within the Process panel
  • Re-sizing the panels through expanding and contracting the boundaries
  • Re-arranging panels through drag & drop
  • Restoring default view
  • Best practices to follow for every new project, create a new Repository

 

  • Extensions Menu- > Manage Extension option (to add & remove)
  • Popular Extensions are under Top Download
  • Recommended – Text processing, Web Mining, Python/R integration, Anomaly Detection, Series Extension & Radoop
  • With Radoop Extension we get another view (next to Design & Results)

 

  • Sections under Results: Data, Statistics, Charts, Advanced Charts & Annotations
  • Understanding terminology in Data tab: Example set – Entire data, Examples – Rows, Attributes – Columns
  • Rearrange – Change size/width, Sequence & Sort order of columns
  • Understanding Statistics section to review data
  • Types of Visualizations/Charts

Data Preparation & ETL

 

  • Importing data and storing it within RapidMiner
  • Operators, Parameters and Connectors
  • Retrieve operator
  • ‘Attributes’ are Columns of the data and ‘Example Set’ are Rows of Data
  • Adjusting the layout
  • Creating new Repositories with sub-folders ‘Data’ & ‘Processes’
  • Accessing Help & utilizing Tutorial Process

 

  • Process – Create, Run, Export, Import, Save & Delete
  • Read operators & Read Excel operator
  • Import Configuration Wizard
  • Name, Comment & Unit rows – Name row corresponds to column names

 

  • Process – Create, Run, Export, Import, Save & Delete
  • Read operators & Read Excel operator
  • Import Configuration Wizard
  • Name, Comment & Unit rows – Name row corresponds to column names

 

  • Deploying and setting up a new driver
  • Connecting to a (MySQL) database
  • Connecting to Dropbox, Twitter, Salesforce, S3 or NoSQL databases
  • How to setup a custom database URL for connection to a specific instance
  • How to execute and customize SQL queries
  • Importing data from a database

Model & Validate

 

  • What needs to be predicted forms the basis of your model
  • Preparing the data for the model
  • Understanding ‘Filter examples’ operator to remove all Examples (rows) with missing values
  • Using the ‘Decision Tree’ modeler and understating the Result (Graph)
  • Traversing through the decision tree graph

 

  • Understanding Scoring a model process
  • Predicting Label for the Example (rows) where the Label value is missing
  • Using the Multiply operator
  • Using ‘Apply Model’ operator
  • Understanding Confidence values

 

  • Understanding the Testing a model process
  • Predicting Labels for Example (rows) where the Label value is NOT missing
  • Compare the existing Label with the predicted label
  • Using the Performance operator
  • Measuring the accuracy of the model

 

  • Understanding Validation
  • Split Validation vs. Cross Validation
  • Using the Validation operator
  • Understanding a sub-process
  • Training & Testing Sections
  • Understanding the ‘Confusion Matrix’

 

  • How to find the best model amongst a number of ML algorithms
  • Understanding the Receiver Operator Characteristics (ROC) Curve
  • Comparing ROC Curves using ‘Compare ROCs’ operator
  • True Positive Rate = Hit Rate
  • False Positive Rate = False Alarm rate
  • Implications of the classification threshold

 

  • Grid – Parameter Optimization
  • Cross vs. split validation in the grid optimization
  • Optimizing a Decision Tree classifier
  • Logging – use the log operator
  • Correctly logging validation outputs
  • Creating an iteration counter

 

  • Multi-level optimization
  • Optimizing model parameters
  • Optimizing multiple nested optimizations
  • Benefits of parallel execution
  • Remember/Recall and Set Parameters
  • Optimized ROC comparison

Operationalize

 

  • RapidMiner Server architecture
  • Creating a Server Repository to establish a connection
  • Editing server components required permission/privilege
  • Copying/Moving objects (Data, Processes…) from the Local Repository to the Server Repository
  • Deploying and Testing on the Server through the ‘Run Process on Server’ option
  • Scheduler a Process to run on the Server/Run the process remotely

 

  • Connecting to the Server through a browser
  • Admin account and Special users
  • Creating a ‘cron trigger’
  • Browsing the Process-Scheduler to scheduled and past processes
  • Pausing scheduled jobs and viewing the Queue
  • Defining/Creating Web Services

 

  • Radoop provides 72 operators
  • Connections Menu -> Manage Extensions
  • Connecting Hortonworks Sandbox
  • Radoop Nest operator – performs all operations on the cluster & not in RapidMiner memory
  • Monitor the submitted jobs in the Hadoop cluster vis Web Browser

Take your skills to the next level with RapidMiner live online training