This is a guest post written by Michael Martin Managing Partner at Business Information Arts, Inc., which delivers business analysis and reporting to the desktop, the web, and mobile devices.
RapidMiner and Tableau are recognized as leading best-in-class platforms for data science and data visualization. RapidMiner removes obstacles to developing useful machine learning outputs, and Tableau is great at showing these insights in a clear and visual way.
Imagine that a company has started manufacturing a new line of Espresso machines in a “pilot factory” with 136 machines. In the first full month of production, there were many lost product units due to machines failing, and limited understanding of why this happened. Each machine in the “pilot factory” is equipped with 25 different sensors that monitor mechanical and environmental conditions, and these sensor readings are in a database.
Due to market demand, another factory modeled after the “pilot factory“ is about to go online, with others to shortly follow. The company needs to quickly identify and rank factors that caused machines in the “pilot factory” to fail because as the old proverb says, “a stitch in time saves nine.”
The major milestones in building a predictive model and reporting solution to minimize machine failure and meet production targets would include:
- Profiling and preparing “pilot factory” sensor data to use as in input to the RapidMiner predictive model
- Building, validating, and interpreting the outputs of the predictive model, and generating failure predictions for all machines in the new factory
- Developing and distributing role based reporting deliverables for business stakeholders
- Taking action to mitigate issues identified in the predictive model and reporting deliverables
Step 1: Categorize the data to use as inputs to the predictive model and build an input dataset. This dataset will include:
- the individual factory locations
- the machines of various types in each of these factories
- the different types of sensors in each machine
- what each sensor in each machine in each factory measures
Factory Locations can be described as shown below:
Individual machines in each factory can be described as shown below:
Each of the 25 sensors in each machine can be described as shown below:
The sensor measurements for each machine in the pilot factory could be represented as shown below. The data has been coded to indicate whether or not a given machine failed during the first full month of operation (see the third column in the data below). This indicates that the predictive model will use various supervised learning techniques (more on that soon).
To ensure that everyone is on the same page, it could be helpful to distribute the reporting below which summarizes machine pass and failure occurrences in the “pilot factory” by machine type and machine bank (the letters A – W) in the pilot factory (the table of numbers). The report also shows the location of each machine on the factory floor – an example of how a background image can serve as a custom map to overlay data on (pass / fail indicators) in Tableau Desktop.
Step 2: Build, validate, and interpret the RapidMiner predictive model
The graphic below is a high level view of a RapidMiner workflow that builds a predictive model based on sensor readings from the “Pilot Factory”. With RapidMiner, you drag and drop Operators onto the design canvas and connect together to design an end to end workflow. Each Operator performs a specific task in the workflow. In RapidMiner, a complete workflow is called a process.
In the process below, we perform the following steps using a variety of Operators:
Let’s step through this process:
- RapidMiner reads the sensor readings from the “pilot factory”. This data will be used to train and validate the predictive model.
- We tell RapidMiner that what we want to predict is machine failure – with a binary output of either yes or no. This type of predictive modeling is called supervised learning because the input data includes both sensor readings and the “failure outcome” (either yes or no) for each machine in a data field named Failure.
- RapidMiner balances the data (using weights and other techniques) so that equal emphasis will be given to machines that failed or didn’t fail when building the predictive model
- RapidMiner determines to what degree (with a score from 0 to 1) each of the 25 sensors was correlated to machine failures (either yes or no) in the input data and saves this information in a database.
- RapidMiner optimizes the predictive model by varying a range of algorithm parameters when building the model. The best combination of parameters is used to generate a yes or no failure prediction for each individual machine. This model is written to disk, allowing it to be re-used to generate predictions based on other data.
- After the predictive model has been built, RapidMiner outputs an evaluation of the performance of the model, allowing the developer to further fine-tune the model if desired. In the example process above, a lift chart is generated that compares the accuracy of the model to random guesses. Additionally, RapidMiner generates a listing that compares the model prediction (yes or no) and value of the “Failure” data field in the input data for each machine.
- The machine failure predictions are written to a database
The best predictive models learn the most important and influential causal factors driving outcomes – they learn the signal in the input data, but ignore most or all of the noise. Better models will rarely output 100% accurate predictions against the data used to train and test them, but they “generalize well” and customarily provide more accurate predictions when fed wider varieties of input data never seen before. This is preferable to models that learn the patterns (the signal and the noise) of the original input data, but when presented with data it hasn’t seen before, makes predictions that are highly inaccurate. See the article “Understanding the Bias-Variance Tradeoff.”
A plan can now be made to mitigate issues in the “pilot” factory, and machine failure predictions can be made for the “New Factory” based on early batches of sensor readings using the RapidMiner predictive model.
The graphic below shows how easy it is to generate machine failure predictions for the “New Factory” in Rapid Miner:
The RapidMiner process shown above:
- Reads the predictive model (that was earlier saved to disk)
- Reads in sensor readings from the “New Factory”
- Generates predictions using the Apply Model
- Saves the predictions in a database, and shows the predictions on-screen.
Step 3: Distribute reports to analysts, factory foremen, and management. One example of this reporting (for shop foreman and management) could look like what appears below
This report built with Tableau is filtered to list the machines in the “New Factory” that have an elevated failure risk. The report can be output to Excel, PDF, a graphic file, a text file, or a Microsoft Access database.
The Tableau visualization below (targeted for an analyst) highlights variances in the values of the 9 environmental related sensors (outside temperature, dust, etc.) on each machine when average values for these sensors were exceeded between 4 and 6 times (between 44% and 66% of the time) on each machine. Sensor measurements for machines that are predicted to fail are shown on the red bars (12 of 15 machines shown). We see higher sensor readings for humidity, outside temperature, and dust in machines that are predicted to fail, and higher sensor readings for air circulation in machines that are predicted not to fail.
The next visualization (for management and factory foremen) shows the differences in aggregated sensor readings for each of the seven machine types for machines predicted to fail or not fail. The sensors are classified by type (environmental or mechanical). We see that environmental sensor readings for Outside Temperature and Humidity are also clearly higher in the machines that are predicted to fail for all machine types. Mechanical sensor readings for Tension, Vibration, and Internal Pressure are clearly higher in machines predicted to fail than in machines that are predicted not to fail.
See this demonstration of RapidMiner and Tableau in action. Watch the on-demand webinar where I include tips and tricks on how to get the most insights out of out of both platforms.