Predictive Maintenance in RapidMiner with OSI PI

While manufacturers can use AI systems in many ways, one use case rises above the rest when it comes to feasibility and impact – predictive maintenance.

Did you know that RapidMiner users can leverage valuable machine data at the millisecond level? The RapidMiner platform connects directly with data sources like OSI PI and uses that data for predictive maintenance. We’ll be covering all of these details in this Lightning Demo.

Watch the video to learn:

Join Jeff Chowaniec, Presales Engineer at RapidMiner for this 45-minute Lightning Demo followed by live Q&A on connecting OSI PI with RapidMiner for predictive maintenance.

0:00 Hello everyone, and thank you for joining us for today’s webinar, AI for Manufacturing. I’m Hayley Matusow with RapidMiner and I’ll be your moderator for today’s session. I’m joined today by Scott Barker, our director of product marketing, and Jeff Chowaniec, one of our data science consultants. We’ll get started in just a few minutes, but first a few quick housekeeping items for those on the line. Today’s webinar is being recorded, and you’ll receive a link to the on-demand version via email within one to two business days. You’re free to share that link with colleagues who were not able to attend today’s live session. Second, if you have any trouble with audio or video, your best bet is to try logging out and logging back in, which should resolve the issue in most cases. Finally, we’ll have the Q&A session at the end of the presentation. Please feel free to ask questions at any time via the questions box on your screen. I’ll now go ahead and pass it over to Scott.

0:51 Thanks Hayley. Thanks everyone for joining us today. I am here to talk to you about AI for Manufacturing. So, depending on who you believe or who you’re reading, obviously AI is an incredibly hot topic in the world right now and it’s becoming extremely widely adopted in the business world. Now if you are kind of staying on top of any of the publications that discuss and follow concepts related to machine learning, artificial intelligence, and predictive analytics, you have probably noticed that amongst all the industries that are adopting these technologies to drive incredible impacts within their business, manufacturing is consistently cited at the industry that is most ripe with opportunity to improve their business with these technologies. So today we’re really going to talk about a wealth of value creation opportunities for these cutting-edge technologies, but also really hone in on one, which we found to be, in our customer base, an incredibly impactful use case, which is predictive maintenance.

2:01 So here’s a quick agenda of what we’re going to cover today. I am personally going to talk through some of the changing dynamics in manufacturing that have kind of gotten us to the place we’re at now. I’m going to cover some of the wider breadth of opportunities for value creation with machine learning, artificial intelligence, and predictive analytics. Then, I’ll dive a little bit deeper into predictive maintenance, and ultimately talk about what you can look for to help your organization kind of evolve and build out a predictive maintenance solution of your own. Jeff’s going to showcase in the RapidMiner platform how completely easy it is to actually build out a predictive maintenance solution, a real data science backed predictive maintenance model in a fast and simple way. Then we’ll wrap up with some additional use cases for ML and AI drawn from our customer base. And then we should have time for some questions and answer.

3:06 So just like any industry, the whole world is changing at an incredible clip right now, but just like any industry, there are a ton of changing dynamics in manufacturing. Kind of a mid-industry 4.0 revolution that we’re in, manufacturers are essentially being forced to, in some cases, kicking and screaming, but in most cases really with open arms, embrace these cyber physical systems and use all available IOT data to drive predictive analytics and really innovate every aspect of their business. The reason why is if they don’t, if we’re not really kind of finding every little razor edge you can find to enhance and improve your business, you’ll essentially be out-performed by aggressive and nimble global competitors and today’s manufacturing world is not just about volume and efficiency anymore. There’s so many more angles, aspects, and complexities that our modern manufacturers are dealing with. It’s not the only way for it, obviously. RapidMiner being a company that can help manufacturers construct and operate artificial intelligence and machine learning models, we believe strongly in that aspect. But it is one major way forward for manufacturers to help navigate some of these. This kind of perfect storm of evolving technology is to utilize machine learning and artificial intelligence for things like smart product design. What’s the optimal way to build a product so that it goes down the shop floor quicker and more efficiently with fewer defects? Running a smarter factory. So how do I optimize my shop for reducing cost, reducing defects once again? Forecasting more, demand more efficiently so you can only produce what you need to produce. Better quality production. Reducing production downtime, which is a huge part of what we’re going to talk about today with the predictive maintenance use case. Helping the managed supply chain risk, which kind of goes hand-in-hand with that concept of forecasting more effectively and accurately.

5:26 And it’s not always about kind of rolling out a model, right? In production, sometimes you build a model just so you can understand these kinds of complex multistage processes that you deal with as a manufacturer. Within the manufacturing environment and within the supply chain, there are really complex multistage processes that you deal with. You may not understand every nook and cranny and detail of these processes, but if you utilize machine learning models and algorithms, you can, in many cases, learn things about your business that wouldn’t have otherwise learned. Even if you’re not rolling out a model or operationalizing a model so that you’re automating something based on the model, the models can really effectively— I think Jeff’s going to touch on some of this—it just teaches you about the business, teaches you about where are the least inefficient components of your shop or where are the parts of your supply chain that are most likely to break.

6:30 So as I’ve mentioned before (it’s why they publish), there are hundreds and hundreds of value creation opportunities for machine learning within the world of manufacturing. These five are really kind of the most five common ones we see with our customers. Predictive maintenance, which we are going to talk about a lot today. So, avoiding downtime of critical equipment with proactive monitoring of mission-critical devices, using it to get ahead of issues that may cause downtime. Precise demand forecasting, I’ve already talked about this a little bit. But accurately predicting the product demand so that you can reduce cost and increase profitability; you’re not building up a huge inventory of your goods, that’s completely unnecessary and then incurring a whole bunch of holding cost. You’re more accurately able to predict what the world is going to need, and then you can produce a number that’s more in line with that amount. Product quality assurance. A lot of steel manufactures, and a lot of process manufactures will leverage artificial intelligence and machine learning to help identify quality issues early in the process. So, identifying defects in steel manufacturing is one example that I always like to use because the cost of reforging that steel is incredibly high and so if you can identify defects early, then you can reduce all those costs that are completely unnecessary. Operations optimization. This can actually be a hundred of different use cases baked within this one use case, but it’s really any way to increase throughput and uncover hidden efficiencies in the manufacturing process; it could be assembly, could be something in process manufacturing again. Then the last one is also super common, right? It’s trying to find health and safety issues and identify things that cause health and safety issues. Is it understaffing part of your business? Is it having people working too many hours? What are the biggest causes of health and safety issues? What are the things that correlates to, and can ultimately cause health and safety issues? Because these can lead to accidents, down time, and they also lead to pretty costly legal ramifications. So, an incredibly common one. But again, for today’s purpose of today’s discussion, we’re really going to hone in on that first one, which is predictive maintenance.

9:08 Predictive maintenance is incredibly hot and important because almost every manufacturer is seeking to ensure maximum availability of their critical operational systems while minimizing cost of maintenance and repairs. This is kind of maintenance engineering 101. It’s the thing that everyone’s kind of seeking to do as it relates to maintenance engineering. As the internet of things continues to permeate every sector of manufacturing from transportation, logistics, automotive and utilities, the maintenance engineering goal is to reduce down time and maximize efficiencies become even further down that scale, razor thin, trying to find every little efficiency you can find to minimize cost and maximize up time.

9:55 And so maintenance engineering over the years has been evolving; it’s really come a long way in the past decade or two. With many businesses adopting smarter strategies to improve efficiencies, it really all began with reactive maintenance, which is kind of like how I run my household, right? If something breaks, I fix it. If the dishwasher breaks, you fix it. But then I can go a week without my dishwasher. If you’re a manufacturer, you can’t go a week without a component of your critical system. If you have to order a part and wait a month to get that part before you can fix it, you can hand wash the dishes for a little while, right? But that’s really where it breaks down for manufacturers. If a piece of your critical systems go down, reactive maintenance just becomes a nightmare strategy. And so this is not a new thing by any means, but many manufacturers have adopted a combination of reactive and preventative. Now preventative is really just a fancy way of scheduled maintenance. It’s a fixed calendar intervals, conditions and times, or finding a time to service the critical components of the manufacturing facilities. This is fine. It’s all well and good. It works better than reactive maintenance certainly, but there are issues with this as well. And it depends on the facility that you’re operating in. It could be that it creates scheduling issues for your employees because they’re required to work a certain number of hours, then when you have a schedule maintenance you end up having to pay them for hours they’re not working. It could be that you are excessively replacing components that don’t really need to be replaced just because it’s part of the scheduled maintenance. It’s really not highly flexible, and it also doesn’t prevent everything. Even though it’s called preventative, it doesn’t cost prevent every issue.

11:56 Now the next step, the next evolution, is obviously what we’re here to talk about today and that’s predictive maintenance. So a lot of times, I’ve heard the saying, the best way to predict the future is to create it, right? So if you can create a model that ultimately shows you when something is likely to break down, and we’ll go into more detail on this in just a minute, you don’t have unnecessary scheduled downtime; you don’t have unnecessary replacement of parts. One thing we hear really commonly, one of the benefits of predictive maintenance is that a lot of times there will be error messages that are popping up on your critical systems, and you don’t really know if the error message is something that’s urgent, or you need to shut down production to address it. Or if it’s something that you can kind of let it slide, and then when you have a break, or some kind of scheduled downtime, you can actually go ahead and address it. And that is really one of the huge benefit of predictive maintenance that a lot of our customers tell us about. So if you’re a brewer, right, which is a fun form of manufacturing, and you get an error message about your filling device, you need to understand if you need to actually shut down the filling device, reduce your total overall production, or if it’s just a case of, “Hey, you need some more clean bottles and when you get some downtime, you can fill the machine with some more clean bottles.”

13:30 So predictive maintenance in our world is really preemptive problem detection. It’s applying machine learning to operational maintenance and inspection data to predict equipment failure and fix problems before they occur. So as we talked about with every efficiency and element of productivity counting, the application of data science and machine learning to the vast sets of data that most people are collecting nowadays with the proliferation of IoT, it can help find the critical signal hidden in all the noise of your data. Really, in many cases, it automatically pinpoints deviations that indicate the possibility of damage, wear and tear, whether it’s partial or complete machine failure. You can predict where, when and why acid failures are likely to occur. And there’s no more guesswork, there’s no more over-reliance on the scheduled stuff. So it seems like a no-brainer, right? Why wouldn’t companies and manufacturers embark down this path more frequently? It’s not so simple. There are three major challenges that I see that are kind of causing fewer manufacturers to adopt predictive maintenance than should be the case. One, is that we’re flat out wasting good data. So if you believe Gartner, if you believe Mackenzie, there’s other stats out there I could have picked for this slide. But the vast majority of manufacturers are producing a ton of data but really not utilizing it properly. And so Gardener’s stat was the 72% of manufacturing industry’s data as a whole goes on used, which again, huge waste of the data. McKenzie, right, they’re focused in on an oil rig and in talking about IoT data not being used. An oil rig that has 30,000 sensors on it, only 1% of the data is actually even actively examined. So again, it’s just about wasting this good data. You should be doing something with it. There’s not always a signal in all that noise. But if you’re not actually using the data and analyzing it, then you’re never going to find the signals. One of the reasons for this is that it’s becoming increasingly difficult for humans to brute force answers out of this huge volume of data. And this is where the machines are better than humans, right? So this is where machine learning can help pave the way a little bit. Data silos, problem number two. In addition to the huge amount of data to sift through, it’s also coming from a ton of distributed systems and operations. So all different types of data from across the supply chain. They live in different databases and systems or data silos as we like to call them. So it makes it very, very difficult for someone to actually join the data together, find a common key for that data where it can be joined together, and then really produce meaningful analysis based on that joined data that’s going to produce useful findings and information that you can actually exploit to find deficiencies. So, incredibly difficult for a maintenance engineering team to find correlations across multiple data sets when it’s tricky to join the data silos together or if it’s actually impossible to join the data silos together because they don’t have the means to do that with connectors they’re constantly requesting information from their IT departments. So, again, this is where machine learning can help. Machine learning can be used to automate data prep, it can be used to join multiple things together in an intelligent way. Many systems, whether they’re data prep systems or machine learning systems like ours – we’ve got both – can be set up with connectors to critical systems so that you can be gathering data from multiple systems at once and then kind of intelligently and automatically joining things together.

17:29 Finally, it’s complicated, right? This is a human problem and a data problem. So the data itself can be incredibly complex. You’re getting millions of data points from different centers at different times. All kinds of different variations and sensory output. It’s hard for one person who really knows data – data in general – to know what’s normal and abnormal and kind of pinpoint anomalies in these datasets. They typically rely on the subject matter expert or a domain expert or an engineer. In this case, when we’re talking about predictive maintenance, a maintenance engineer. But that maintenance engineer is oftentimes not a data person, right? They’re not the person tasked with managing the data for the organization. So you’ve got multiple people looking at the datasets in a different way. And a lot of times, there’s disconnects, and there’s challenges. So we ultimately need a way for the subject matter expert to be able to do more with their data. Working with data is hard, and there’s a reason why there’s an incredibly hot job title for data scientists. Also for data engineers. Data engineer’s another hot job right now. And the reason why it is a hot job, it’s an incredibly valuable skill set, and many maintenance engineers as well may not have the skill sets to do the job of data scientists or data engineer. And so bridging the gap between these two folks to help them produce meaningful analysis of this multi-variate complex datasets can be pretty challenging. And, really, the first companies to do this— I think we’re probably past the point of the first companies to figure this out, but the early adopters of technology that’s going to allow people to figure these things out and to tackle these challenges and overcome these challenges are going to be able to gain huge competitive advantages in driving revenue, optimizing costs, reducing risk. If you think about how kind of Google gains advantages over traditional advertising platforms by applying big data techniques and machine learning techniques of their consumer mouse clicks into high-value information, that’s the type of huge shift that we’re talking about in competition because now Google is one of the largest companies in the world that has virtually no competitors to speak of. So as people embark down this path of seeking predictive maintenance solution or machine learning and artificial intelligence solution and it can help them produce predictive maintenance models in addition to other types of models, the things that we always stress to look for is, look for something that’s going to address the whole machine learning lifecycle, the whole data science lifecycle. You don’t want to just update a prep solution. You don’t want to just find a predictive analytics solution that’s built on code, right, you want something that addresses the entire lifecycle that’s going to go from data ingestion to data exploration to data prep to model construction and model operation and management. It’s so much simpler and so much more elegant when you’ve got the entire process captured in a single platform. You also need something that’s going to really handle all types of data. We talked about the wealth to different types of data or where the data is coming from especially in the world of manufacturing and IoT. So that’s a must-have checkbox that needs to be checked. And really, this is kind of interrelated with the previous point that something that can open up data silos for everyone. If you’re able to connect all those different data sets and bring everything together and merge and transform and join those data sets together, it really opens up data silos for everyone. Accelerate model creation and retraining, you can obviously do machine learning and artificial intelligence in code. You can use Python, right? You can certainly code, but it’s not always the best, most elegant solution for your problems. Speed is one of the most critical reasons why if you’ve got a platform that makes model creation very, very quick and easy, in a graphical, kind of, visual user-driven interface, it’s going to be a lot faster than coding flat out. Also, not everyone knows how to code. So something that’s going to accelerate model creation and also model retraining. So continually monitoring the model and retraining it with new data.

22:12 We firmly believe just because there’s a huge shortage of data scientists out there, because they’re in high demand, that a platform or a solution that can upscale everyone on the team is incredibly valuable. So something with a short learning curve that’s going to enable anyone to pick up the tool on solution and really comprehend the models that have been built is going to deliver much more value, again, using code as the alternative example, having a maintenance engineer kind of sift through Python code is really going to create a disconnect and it’s going to make it harder for people to collaborate and communicate the impact of the models that are being built and delivered. And then really, the ability to drive impact and this comes in a variety of different ways. One, productivity, by doing things faster in a visual, guided, automated way. It’s collaborative and open where multiple people can be on the same platform, kind of speaking about the model and figuring out how to make the model work. That’s going to deliver an actual business impact versus just a fancy model that’s incredibly accurate. Also performance. So something that’s going to fit into your enterprise, that’s going to scale if you’re a larger enterprise. As many manufacturers are, something that’s going to scale and fit into your whole IT deployment is incredibly important, high availability, high security. And then lastly is enablement. So a solution that’s got the ability to kind of jumpstart people and get them going on the platform and it’s not going to have a steep learning curve. Something that can upscale everyone as I’ve already talked about. So a tiny bit about the RapidMiner platform before I hand it over to Jeff. So the RapidMiner platform is fully transparent that’s got data prep machine learning and ModelOps. All contained within a single software environment. Connects to all your data no matter where it lives. It immigrates with all of your application databases and BI tools. So eventually everyone within the organization from data engineer, business analysis, domain expert, data scientist, even executives can all kind of live and operate in the same platform producing models, consuming models, understanding models, all that’s happening within one environment. And what our mission is to empower anyone to shape the future positively with enterprise AI.

24:46 And we’re trying to shift things from an older way. An older way that’s kind of built around older technologies that are kind of code-driven and code-based that are really slow. They’re slow to predict opportunities and risk. They require highly specialized expertise. They often times have opaque modeling, so really, only one person in the whole organization really understands how it works. And if that person gets hit by a bus, then no one cannot really be able to kind of decipher what’s going on, how the models working, why it’s working, maybe where it’s breaking down, using legacy kind of fragmented tools that created vendor lock-in. So what we’re trying to do is utilize artificial intelligence in a way that the levers lightning business impact, not just building a fancy hyper-accurate model but something that can be rolled into production and actually impact the manufacturing environment immediately. Something that has depth for data scientists. So we’ve got 1,500 machine learning algorithms baked into the platform. But they’re simplified for everyone, so you can roll them out through a graphical based UI, so that anyone can really pick it up and kind of figure it out.

26:04 And we also have augmented capabilities baked in that recommend certain actions too,which makes it very approachable and easy for someone who’s less skilled or maybe doesn’t have a PhD in statistics, like a data scientist would. So a platform recommends steps or recommend the actions, which makes it very simple and easy for everyone else. We believe in full transparency in governance. The platform kind of surfaces all the details about what was done to create a model, what was produced within the model, and what data was used. Data governance is incredibly important nowadays, and we believe in it wholeheartedly. As I talked about it, it’s an end-to-end platform where multiple different types of users can collaborate. So you’ve got a nice, clean, elegant solution and everyone’s on the same page. And lastly, we are technology that was built on an open-source kernel, started out as an open-source platform. Certain parts of it have been indemnified. But it’s open-source at its heart, which means extendable and it can fit into your environment no matter what you’ve got going on. What other data technology you’ve invested in, whatever your data strategy looks like, RapidMiner can fit in.

27:28 And then lastly, the last thing I just wanted to quickly talk about is we’ve recently rolled out. We talked about upscaling users and making people more deadly their data. So enabling someone who’s really a smart person, but really isn’t a data scientist to really learn machine learning, to learn data engineering, to learn and master different applications, and use cases of machine learning. So we have eight different learning paths within our academy that allow people to master different components of the platform and different components of machine learning, and artificial intelligence as well. We even have an executive function that teaches people how to consume and understand how models are built and what needs to be done within an organization to optimize the use of machine learning within your org. So that kind of helps everyone within the organization get the speed and know what they need to know if you’re going to tackle something like predictive maintenance. So with that, I will pass the ball over to Jeff. And Jeff can show you just how quick and easy it can be.

28:32 Thank you, Scott, for passing it over. We will take a look at RapidMiner now so let me pass over the screen to RapidMiner Studio. So I’m going to walk through a demonstration of how RapidMiner works with a predictive maintenance use case. So the general idea is it’s going to be a dataset where we’re getting real-time or time-series data from machine sensors where we are going to attempt to diagnose whether or not there’s likely to be an issue based off of that data. So RapidMiner gives me three ways to solve this problem. Essentially, I can build a process from scratch, which is going to be this blank option here, I can use more guided solutions like Turbo Prep and Auto Model, or I can start from a template. So there happens to be a predictive maintenance template for the AI machine learning in the manufacturing world. For our case, we’re actually going to use a little bit of the three. So I’m going to use a component of the predictive maintenance template. We’re going to build a little bit in the blank view, and we’re going to leverage Turbo Prep and Auto Model to actually complete out the project. Where we are going to start today is right in the Turbo Prep feature. So RapidMiner asks me to load some data in. I’ve got this data stored locally, but if I needed to bring data in from other data sources, like for example if I needed a flat-file off of my computer, I could quickly grab that CSV file or Excel file, for example, pull that into RapidMiner. Otherwise, I can grab that table right off of my database.

30:12 Now, in this case, I’ve got data stored locally. So what I want to do is I want to quickly grab that data. Once I’ve got that data loaded into RapidMiner, what you’ll see is I’ve got quite a few fields here. The nice thing about Turbo Prep is it’s going to just allow me to visually explore my dataset. So I’ll have access to a lot of information on my screen. For example, if I hover over a column, I’ll have my metadata statistics, as well as RapidMiner running a few tests for me. So there’s a missing value test, an ID test, and a stability test. It also tells me how many rows they have, how many columns I’m working with, and what my data types are. So as I scroll through, you’ll see I’ve got a plethora of sensor inputs. I think they’re sensors 1 through 25. So you can think of these as different readings coming off a machine on my plant floor. And then I’ve got some bookkeeping columns over here. So I’ve got my event ID, my downtime, so this is whether or not the machine went down for any significant amount of time, and then I’ve got a date/time of this reading. So it looks like I’m on a minute interval here. And then I also have my machine ID, so this is RapidMiner 102C.

31:29 On the front half, I actually have an error code and a failure description. So I can actually read what the actual error that was read out, the reason for the machine going down. And you can see there’s some other information we can gloss off of this. It gives us an error status minor. I can scroll through. I can see I’ve got an error status severe. So there’s a lot of information I can glean off of this. And what we’re going to do is we’re going to take this dataset and actually build a model to flag whether or not a machine is likely to go down. And there are some other use cases we can use for this as well. Now, here in Turbo Prep, we actually don’t need to make any changes. The dataset looks pretty good to go. The only changes I would make are, for example, if I’m not using this failure description and error code for anything, I could easily select these and remove them from the dataset. So let’s just hit remove here and commit to that transformation. And then I probably don’t need my machine ID since it’s all the same value. So I can also remove that column. And I’ll just simply commit to that transformation now. So now I can flag downtime. I can use my event ID as an identifier, and then I’ve got my date/time stamp here. In this case, I want to treat this as just simply a classification problem. I’m not going to use the time series data either, but I could extract some useful information out of this so we’ll consider that later. Now, if I needed to enrich this dataset with some more fields, I could easily merge this with another dataset or if I needed to pivot any of the values in my table, I can quickly come into the Pivot and I can start to visually build a pivot of my data table here. And then there are a few other options like generating new attributes or cleansing my data. The good thing about this dataset is it’s good to go, so now I can do a couple things here. I can either create this as a process, so this will open up the changes that I’ve made as a RapidMiner process and then we can build here on our own or we could pull that dataset directly into the Auto Model feature. So if I hit the model button, this will pull it into Auto Model and we can actually prototype out our model. One thing that I want to do first is actually take a look at my inputs here. So before we jump into the modeling part, I actually want to try to identify what are the sensors that are going to be most important to our dataset. So what we’ll need to do is we’ll need to tell RapidMiner that this downtime is what we want to predict. So if I go into my design view and I simply say, “I need to set role,” so I’m just going to grab that set role operator and what I’m going to do is I’m just going to tell RapidMiner, “Take our downtime column and turn this to our label.” This is going to just simply be our target variable. And I’m also going to tell it that our event ID is an ID, so that’ll get ignored, as well. And then, what I’ll do is I’ll just go ahead and run this process. So when I run this, you’ll see now my event ID is being identified as a special attribute, as an ID, and downtime is being identified as our label. Now, what we want to do is actually run some tests to see what sensor in our dataset are actually going to be driving this yes/no prediction.

34:50 We do this a couple of ways. You can do it in the Auto Model feature because there’s automated feature engineering, so it’ll do feature selection for us, but I’m actually going to leverage a component of the Predictive Maintenance template. So what I’m going to do is I’m going to insert a building block here. I’ve saved the group of operators from that template, which is going to be this determine influence factors. What this actually does for me is it uses a few operators. So I’m going to come into this subprocess here. It uses a few operators to basically build a weight table of how important each sensor is to our prediction. So in this case, we’re going to build a weight by correlation, a weight by Gini index, a weight by information gain, and a weight by gain ratio. And then, what we’re going to do is we’re just simply going to normalize those weights and then combine them together in one calculation. So based off of these four tests, it’ll output what are the most influential parts of our dataset. So when I run this process now, I’ll get a list of all of the columns. One column I can ignore is time, but it looks like it has a little bit of importance to the data. And if I take a look here, this is already sorted for me. So it looks like according to those four tests, sensor seven is the most important and then following close behind are sensors six, eight, five, two, and nine. So now we’ve kind of isolated the most influential. So we’ll look for these to have a big impact in our model moving forward. Now, what I want to do is I want to jump back to the Auto Model feature. So we pulled this data into Auto Model already. In the Auto Model feature, RapidMiner’s just going to simply ask me what I want to do with this data, so whether or not I want to predict something if I want to do cluster the data or detect outliers. In our case, we already know we’re trying to predict since we’re doing predictive maintenance. So I’m going to select the predict option, and I’m going to select the downtime column. And all I’m going to do here is just hit next. So I’m telling RapidMiner, I want to predict downtime. It takes a look at my target classes for me, so I can select the classifier’s interest, and I can map these classes to new values if I wanted to. I don’t need to make any changes. I’ll just go ahead and hit next. And here I’ll get to select my inputs. So RapidMiner is going to run a few tests that we’ve seen before. So, for example, in Turbo Prep, we had the ID test, stability test, and missing value test. It also runs a correlation to our target variable and a Text-ness test. What it does with this is, it basically lets us know if there’s anything we need to change or fix for our data sets. So RapidMiner will flag columns as red, yellow, or green, where red is mission-critical, yellow’s something RapidMiner’s flagged the warning about, and green is good to go. In our case, we don’t need Event ID since it’s our actual ID. And I’m going to ignore the time column as well. Because I just want to treat this as a classification problem and then we can go back and show some of the other features as well.

37:55 So what I can do here is, I can go ahead and hit next. So now I’ve got model types here. RapidMiner just basically allows me to select what machine learning algorithms I want to run here. I can toggle these on and off. So, for example, if I don’t want to run the fast large margin, I can disable it. And if I don’t want to optimize my support vector machine, I can disable that automatically optimize option. And so, basically, we give you availability to some of the most widely used families of algorithms here. So will you leverage these to actually figure out what machine learning algorithm works best for our data set. And then on the right-hand side, there’s actually options for extracting more information. So if we did keep our data/time field, we could actually extract items like the hour in the day or the day in the week or the month in the year. Items like that, we can extract out of the date to enrich our data set. Because maybe there might be some seasonality in the machines depending on their maintenance times. Then there’s also an option to extract text information. So if we had kept it in those text fields or maybe we want to do diagnostics on the problem or are working with quality assurance reports when we have actual problems, we can use this to extract those text fields and build a more robust data set for us in the future.

39:20 There’s even an option for feature selection. So this might actually be beneficial to us since we’ve identified that there are sensors that are more important and more influential to the data set. We could use this to just select down the minimum number of features that are beneficial to the model. And then there’s options for a column analysis as well. So what I’ll do is, I just let this run, and RapidMiner is going to chug through, building each one of these models individually for us. So as these models run and finish, they’ll propagate this overview table. So we’ll be able to see what the fastest model is, what’s the best accuracy, what’s the total time and computational time that it takes. And RapidMiner, I believe, is doing a split validation. So that means I can actually track a plethora of performance metrics. So if I wanted to look at classification error or area under the curve or my class precision, I can do that here. For example, if I switch this to classification error, I can see that value, or if I look at the area under the curve, I can see what that looks like. And I can also compare these visually with my ROC curve. So if I hit the ROC curve comparison, for example— obviously, I don’t have a lot of data here, so it’s going to look very stepwise—but the steeper this initial curve, the better the model, essentially. If I go back to my overview, I basically can use this to select what model I want to work with. So for example, the GLM does really well in overall accuracy. But if I switch to something like area under the curve I can see that my random forest actually does pretty well with my deep learning neural network a little far behind. And then I’m waiting on the Gradient Boosted Tree and it’s Support Vector machine. I think they’re getting heavily optimized around this dataset. So we’ll see what those look like as we click through here. But basically what I can do is use this as a comparison. Okay, so our random forest does do really well compared to everything else. So I could easily go to that model. And it’s nice because I get a bunch of individual decision trees, but I know this ends up being a group effort. So it’s not as easy to read as one might think. I can understand how each tree is making its decision but it’s obviously the combination of the trees. Where if I take something simple like my individual tree I can see what drives that prediction. Now, there are a couple of other tools that are really useful in terms of visualizing these results. So the first thing is I can take a look at the weights. So according to our random forest, these are the sensors that have the biggest impact. So I can see three, six, seven, eight, four. These are a few different values than what we saw before but we can see sensor seven is still important to this model as it was in our results here. So we had sensor seven, six, eight, five, two and nine. So there are some similar sensor values in here. So there’s seven, six and eight are still within the top five. Now there are some other features that we can take a look at as well. So there’s a predictions tab which is actually going to attempt to explain how RapidMiner is coming up with this prediction. What’s nice here is we can see what the label was and what the prediction was and we’ll get the confidence of those classes. Then we can use this to read the rest of the table here. So anything that’s shaded red is supporting the lower confidence value and anything that’s shaded green is supporting the higher confidence value. Sometimes this is the correct prediction, sometimes it’s the incorrect prediction since we have the classes here. So we can use this to see what’s driving that individual prediction as we go. Now there’s also a model simulator so we can take a look and see what drives a machine failure in this case. So what we’re simulating is not actually the model, we’re actually simulating the machine. So, in this case, we’re simulating if we were to read the data off of the machines at this very second, we’re simulating the values that that machine would have. So what we can do is we can play around with this. We can see that these default values that were 75% confident will have some scheduled downtime. And if I scroll down I can actually see what’s influencing that downtime. So sensor seven and sensor six and sensor eight are really influencing this machine’s propensity to have some sort of downtime. Now what I can do with this is actually explore those values. So if I find myself sensor seven and sensor six I can see— okay, does increasing sensor seven or decreasing sensor seven have some sort of effect? So it looks like we want to minimize sensors six and seven, depending on these readings, to lower our propensity for downtime. And now I can see that sensor six is still driving that. So as I lower sensor six and seven I start to have a less propensity for downtime. So maybe I want to explore what components sensor six and seven are reading on my machine, and actually identify what could be causing these issues on my machine. So I can use this to actually explore what’s driving these downtime predictions. What’s nice too is I can also optimize this so I can say, okay, well I want to see what machine reading generate the highest propensity for downtime, and I can also make certain attributes constant. So if, for example, I wanted to keep sensor six at a specified value, we’ll do 1.125, and then I’ll go ahead and hit next. And what I can do is actually figure out, “Okay. Well, what are the sensor parameters that will give me the highest propensity for downtime?” So I can see what’s driving that prediction if I keep sensor six at 1.125. So this just simply allows me to explore this model without having to put it into production and see how it’s performing. So I can look up downtimes from current machines and actually apply them to my model simulator and see what I’m getting. So I basically use this to go into a deeper dive of my model. Now, what I can do with this is just hit Open Process here and RapidMiner is just going to build this as a process for me. Ultimately, what I’ve done is I’ve taken the model that gets generated here. So if I scroll over here, here’s our random forest, and I’ve stored this model object to use later. But the nice thing about this process is it’s a great starting point for predictive maintenance use case, in the event that RapidMiner has kind of annotated everything for me so I know what’s going on. So this is calculating my performance, that’s explaining my predictions, this is creating my simulator, and it even leaves a sticky note for my outputs. So I know that my first output’s my simulator, my second’s by performance, and so on. The basic idea here is as a RapidMiner expert, I can augment the outputs I’m getting from Turbo Prep and Auto Model like I did at the Turbo Prep phase, and I can add in whatever operators I need. So if, for example, I do need to optimize this random forest, I can easily drop in and optimize parameters in here and configure it in the workflow, and I actually optimize the random forest model building process.

46:48 Now, the last piece of this is I actually want to operationalize this model. So what I have is I’ve actually stood up this predictive maintenance use case as a web service on RapidMiner server. So here, I’ve got a very simple process at the end here, where I’ve stored the model itself so, in this case, I’m using the support vector machine. SoI’ve stored that model, I’m grabbing my testing data, and I’m applying a model to that data. But, in this case, I’ve actually created a user input here so I have my user specify an event ID that we want to run with the model. So what I’ve done is, on the server-side, I can actually access this web service here. So if I go to my predictive maintenance web service, what I can do is simply give RapidMiner a value. In this case, I want to select the event ID so we’ll do 220 from my event ID. And so this selects, basically, the row of data that I want to test against the model so I hit Test here. RapidMiner will then grab that event from my data set; in this case, I could be reading in from a database table, a flatfile, what have you, as long as the server has access to that data source. In this case, it does so it’s grabbing that value and testing that sensor data against the model. So yes, we do get a flag for event time here, and I can customize this preview output. So for example, if I go to Edit Web Service, and I want to see this in a table format— I had it in the default XML, the XML or JSON’s really great for talking to other programs, if you want to embed this web service somewhere— but here I can quickly switch to chart. And if I hit test—no, I don’t want chart HTML. We want table. There we go. So what I can do is I can hit test. I test this value. I should get a table output here. But oh, here we go. See if we can get the chart out. No. There’s an option for a chart in here. Give me one second. Well, anyway, I digress. You can customize this output but reading this here’s my prediction. These are the confidence of my classes. This is my date-time stamp, my machine ID, for example, and these are all my sensor inputs. But basically, I can embed this into other applications. For example, if I’ve got my embeddable HTML code here I can put this in other visualization dashboards or I could just embed this on my website and I can have a webpage for my users to actually ping this web service. Or I can be just using this and have my machine that’s reading out its sensor data every minute ping this web service. So we get kind of on-demand usage of this model directly integrated with the machine, for example. Users can access this web service via a direct link or just by simply logging into the server and pinging the web service from there. So the idea is I can make this easily accessible and then if I wanted to write back these results all I need to do is change the process here. So if I wanted to append a database table, for example, with these results, I can just add a write database here and then have it select to append that input. So every time this web service gets pinged they can store those results right in table. And I can bring that table right into RapidMiner and see how this model’s doing. Is it actually scheduling and identifying downtime properly, etc? So what I’m going to do is I’m going to pass this back over to Scott to finish off our presentation here.

51:15 Thanks, Jeff. So as I mentioned at the beginning of the webinar, predictive maintenance is an incredibly popular use case for predictive analytics and machine learning. Many of our manufacturing clients come to us seeking to utilize this particular use case, and then expand into a variety of other use cases. We have shipyards, automobile manufacturers, steel manufacturers as I mentioned before. Any brand of manufacturing can really benefit from it; you’re only limited by your data. However, there are also, just like in any industry, there are a variety of other ways that you can benefit from machine learning and artificial intelligence, and really leverage it drives revenue, cut costs, and avoid risks in your organization. Forecasting demand is an incredibly popular one as I mentioned before. Driving customer insight from— you can do social media from customer interactions, support emails. Things like that. You can derive a lot of great insight. Whether using machine learning model or you just generate some type of predictive analytics to gather some ad hoc insight. Those are very popular use cases. Predictive maintenance, we obviously covered before. Optimizing production, so finding the weak spots in your chain and your manufacturing chain to fix, can be a one-time project for predictive analytics, or it can be something that you just have on your on-going running in the background, to constantly identify new opportunities for optimization. Supply chain efficiency, I talked a little bit about before; another incredibly popular use case for machine learning. And then, I spoke about some of these before, but minimizing EH&S risk, by predicting the likelihood of harm, and then identifying the factors that are correlated or can potentially cause harm in the workplace, is a very popular way of avoiding risk and there’s a bunch of different ways that you can leverage. Other ways are listed on the slide that you can leverage machine learning to help reduce risk within the organization. So, like I said, predictive maintenance, incredibly popular. Manufacturing, there’s a lot of buzz around it, but there are a tremendous number of other use cases that can be applied for machine learning in artificial intelligence and predictive analytics, within the world of manufacturing.

53:46 So, just to wrap up. I mentioned that we have quite a few customers utilizing this particular use case. If you’re interested in seeing what our customers think of us, there are a variety of ways you can do that. Also, there’s a lot of great insight in Forrester and Gartner. The Magic Quadrant, for Gartner, and then the Forrester Wave, on Multimodal. Data science platforms, RapidMiner is listed as a leader in both of those publications. And then, also sometimes it’s just best to hear it directly from customers. So, if you want to learn more about the RapidMiner platform, you can see what our customers think of us on Gartner Peer Insights. G2Crowd is another great resource as well. So, with that, I think we can wrap up with some Q&As, so I’ll turn it back over to Hayley to jump into some audience questions.

53:42 Great. So, thanks again Scott and Jeff for the great presentation today. As a reminder to those on the line, we will be sending a recording of today’s presentation, within the next few business days, so please look out for that. And now, it’s time to go ahead and get the audience questions. So, feel free to enter your questions in the questions panel that you see on your screen right now, and I’ll go ahead and look at the ones that we have at the moment. So, it looks like our first audience question, I’ll direct this one to you, Jeff. Our machine data comes in at real-time, how does RapidMiner handle that use case?

55:18 Yeah. So, real-time can get handled a few ways. The first one is via that web service, right. So, for example, I think I was loading in minute data, so something like that, you’re more so just exposing the model to whenever that data comes in. So, if you need a response every second or half second, or every minute, the web service is definitely the way to go. Now, if you have actual real-time use cases where you’ve got throughputs coming in on the millisecond scale, and actually need that modal to be scoring data as soon as it possibly comes in, in large volumes, there’s actually a real-time scoring agent available on RapidMiner’s server that is essentially just built for that sole purpose. So, if you are interested in learning more about that, feel free to reach out to us via email, and we can maybe setup a demo call or something like that. But, those are basically the two ways we can handle real-time data, depending on your throughput needs.

56:24 Great. Thanks, Jeff. I see another question here from the audience. Can I utilize R or Python within these workflows?

56:32 Yes. So, there are a few ways to integrate R or Python into our workflows. The two main ways is the execute R and execute Python scripts. We call your installation of R and Python in RapidMiner, so we’re not actually running that code in the RapidMiner engine. It gives us a little bit more flexibility, because you’re free to use whatever packages you like. And if you’ve got existing repositories you want to leverage in RapidMiner from R or Python, you can leverage them. But, it’s as simple as adding that script right into the workflow.

57:07 Great. Thanks, Jeff. So, it looks like we’re just about time here, so if you had questions that we weren’t able to address here on the line, we will make sure to follow up with you via email, within the next few business days. So, thanks again to everyone for joining us for today’s presentation, and I hope you guys have a great day.

Related Resources