

We recently released version 9.10 of RapidMiner Studio, which came with a host of improvements and additions. These included in-platform warnings when you have data that could lead to model bias, the ability to mix and match RapidMiner and Python in low-latency use cases, and a new extension for natural language processing that can identify people, cities, organizations, and other entities in text.
Also included in the 9.10 release was our new Open Platform Communication Unified Architecture (OPC-UA) Connector Extension. This extension supports IIoT use cases by letting analysts and data scientists tap into their organizations’ vast pools of industrial plant data. Users can create and manage connections to an OPC-UA server, and new operators aid engineers in discovering and integrating helpful data sources into RapidMiner processes.
Let’s take a look at what the extension does at a high level, as well as how you can use it in RapidMiner.
High-Level Summary
The Open Platform Communication Unified Architecture (OPC-UA) is an extremely popular protocol for machine-to-machine communication. It connects industrial equipment and IoT (Internet of Things) devices to data-collecting servers. The new RapidMiner extension provides a way for users to query both historical and live data that’s generated by IoT equipment and bring it into our platform for analysis and model training.
The connection provides deep insights about shop floor-level data, as IoT devices can be connected to a variety of equipment types, collecting data such as temperature and pressure in a chemical plant tank or conveyor belt speeds in a packaging plant. These devices can then be queried to get a sense of how the equipment is currently performing as well as how it has performed in the past.
Once data from IoT devices as been pulled into RapidMiner with this extension, it can then be used as the basis for both model training and generating model predictions.
Technical Overview
The OPC-UA extension uses the open-source stack of the Eclipse Foundation project Milo to establish a connection to an OPC-UA server. The server endpoint URL is stored in a new connection object, and the connection can be tested to ensure that the server can establish connectivity with the local Studio client.
The Browse Nodes operator crawls through all connected nodes on the server and returns a list of them, including their data types and, if selected, a sample value. There are a few things to consider:
- The node structure is quite complex; it has a hierarchical layout with multiple references to the same node. That is why on large servers crawling can take a long time and return duplicate entries.
- Not all nodes are human-interpretable, especially in namespace 0, and are meant for diagnostics and internal settings only.
The Read OPC-UA operator connects to a specific node and collects new incoming data. It will request values at a specified time interval and duration. Note that the operator will wait until the duration is completed before returning any results. Hence, we recommend running this operator frequently with short durations (for example with an AI Hub scheduler) rather than waiting a long time for one large result.
The Read OPC-UA History operator allows users to retrieve stored historical events from a node. Given a specific time window, the operator will collect as many data points as specified. The OPC-UA reads data in reverse chronological order, so it goes from the Start Date ‘backwards’ until the End Date. If users are retrieving high frequency or slowly changing data, we recommend skipping some values—for example, reading every other data point. To ensure you get all values for a specific period, simply supply a large enough number of data points since the operator will stop when there are no more data available.
Not all nodes have the feature to store historic data, i.e., “HistoryRead“. If you try to read stored data from a node without this property, you will see an error message like this:

Practical Demonstration
With the technical background out of the way, let’s look at a practical example of how to work with the OPC-UA Connector. First, we create a connection to a publicly available demo server: opc.tcp://opcuademo.sterfive.com:26543 and then check if the connection can be established:

We could choose to start scanning for all available nodes, but you can also take a shortcut and focus only on a particular node that you’re interested in. In this case, we’ll look only at a production node name space 1 with nodeID 1001. This is a demo asset of the server: a pressure tank with some sensors.
The output of the Browse Nodes operator reading data from the node (ns=1,i=1184) node looks like this:

For us, the most interesting attribute is Pressure (ns=1;i=1185) which provides a numerical value of the current pressure in the tank.
We now use the Read OPC-UA operator to start collecting new incoming sensor readings and use them right away in RapidMiner. We configure the operator to collect 20 measurements for a duration of 10 seconds:

Now, with some additional understanding of the data, we decide to analyze more data points, but without waiting for new data. Using the Read OPC-UA History operator, we can collect stored data from past events. Again, not all servers and nodes support this feature, but in this example, the pressure node has historical data for the past several minutes (remember this is a public server storing events only for a short time; in live systems data can reach back months or years).
We select the current time (while writing the post) as the start time, and the end time a few hours earlier. We also choose to retrieve only every 10th measurement which still gives us a 2-second time resolution:

With historical data we can now, for example, build an anomaly detection model to compare new events to these past data points and see if they fall within an expected distribution. We train a univariate outlier detection model (from the Operator Toolbox extension) and store the model in a RapidMiner repository for deployment. This model will let us see if new sensor values behave differently compared to previously observed values.
So now we collect new data within a short duration, say 10 seconds, and calculate their respective outlier scores:

We see the pressure values during this period are considered normal, but we could calculate a maximum outlier score and then monitor these scores in real-time, raising an alarm if a score goes above this threshold. To accomplish this, we could place the operators inside a loop and let the process run locally or use AI Hub for a more scalable and reliable solution. We could schedule the scoring process run regularly or place it in a Real-Time Scoring Agent (RTSA) for low-latency, on-demand deployment.
Conclusion
By using the new OPC-UA extension and RapidMiner, those who work with IoT devices have a powerful new tool to get data from their machines and processes in order to build models, make predictions, and improve their bottom line.
If you’d like to give the new OPC-UA extension a try and aren’t already a RapidMiner user, request a demo to get started today!