Skip to content

A First Look at RapidMiner’s New Streaming Extension

Share on twitter
Share on facebook
Share on linkedin

As you may have already seen, we launched RapidMiner 9.10.3 this week with the primary focus on enhancing our deep learning and streaming extensions. The updates we’ve made with this release further advance users’ ability to develop cutting-edge solutions in areas like anomaly detection, image classification, and computer vision—among many others.

In this post, I’m excited to give you a first look at the new streaming extension. It combines always-intuitive RapidMiner process design with the lightning fast and scalable power of data streaming engines like Apache Flink and Spark.

I’ll walk through the reasons that we felt this update was important, give you a full breakdown of the new extension, and briefly show you how it can be used to build a streaming process.

Let’s get started.

Why focus on streaming?

Data stream processing clusters are the backbone of many enterprise-scale businesses, largely because they’re designed to handle extremely high volumes of incoming data.

As the amount of data that businesses collect and manage grows, so does the need for a strategy to make sense of it all. When you think about this in the context of a large organization that collects new data every time a credit card is swiped, or a user takes action on an e-commerce store; the challenge becomes pretty clear.

Both Apache Flink and Spark provide excellent solutions for such problems by reducing the complexity of real-time big data processing. They thrive on large user communities and have been used in countless enterprise deployments, which means that large organizations can trust that they’ll deliver results.

However, the issue is that they are also code-centric platforms, where it takes highly specialized skill and time to write and deploy new applications.

Using RapidMiner’s streaming extensions, you can now create and submit streaming application tasks such as filtering incoming data or joining two incoming streams—even if you don’t have coding expertise.

RapidMiner’s new streaming extension

RapidMiner’s streaming extension applies the same intuitive experience that our users now expect to generate real-time analytics processes at a large scale on streaming clusters.

It’s even possible to deploy any pre-trained RapidMiner model directly on the streamed data, which provides an easy way to develop solutions for classifying incoming events or detecting anomalies. Put simply, you can now combine the analytical strength of RapidMiner with the raw processing power of a full-scale streaming cluster—all without writing a single line of code.

When it comes to evaluation of results, our new streaming dashboard allows you to monitor and interact with your deployed streaming jobs. The board shows all deployed jobs along with their status, and directly links to the cluster control instance.

RapidMiner-Streaming-Dashboard

As a member of RapidMiner’s Research Team, I am also excited that this extension is based on RapidMiner’s collaboration on the European Horizon 2020 research project INFORE, the goal of which is to develop a large-scale interactive analytics platform. As part of the project, we’ve worked on use-cases from a number of diverse domains.

Applications include life-sciences (simulation of the effectiveness of treatments on cancer cells), maritime event monitoring (detecting dangerous or illegal activities on the high seas) and monitoring volatile financial markets. The hands-on experience combined with the demands of expert users has driven the development of the streaming extension, which became a core part of this project.

Building streaming processes in RapidMiner

Building streaming processes relies on a common operator called the Streaming Nest, which encapsulates the functionality of the operators used inside. Once a process is configured and executed, the underlying code is translated into a streaming graph representation and sent to the selected cluster. From there, it’s deployed for execution.

Once deployed, the streaming engine takes over and ensures a smooth execution of the process.

Streaming-Deployment

High-level architecture for deployment of streaming processes

Again, the status of the deployed process can be monitored with the integrated dashboard that’s available as part of the Streaming Extension. It gives an overview of the status of submitted jobs and links to the control page of the related cluster.

Use cases for streaming data

Now that you’ve seen how RapidMiner’s updated streaming extension works, let’s walk through a few ways that streaming can be used in an enterprise setting.

Fraud Detection

By creating a model to monitor a constant stream of transactional data, financial institutions such as banks can automatically identify the complex behavioral patterns that are commonly associated with fraud. This allows them to stop fraud before it occurs and avoid the associated losses.

Patient Health Monitoring

Recent advancements in wearable technology allow healthcare providers to monitor patients’ vitals and body functions. This ensures that providers can stay on top of any noteworthy developments, and that patients can keep a close eye on their own health.

Product Recommendations

For eCommerce companies, making tailored product recommendations to site visitors is a great way to improve the likelihood they’ll purchase. Streaming data allows them to understand their visitors’ needs and interests so they can create more personalized experiences.

As mentioned above, streaming data has already become critical to many enterprises today, and its importance will only grow as companies seek to automate or enhance their decision-making.

Wrapping up

As you’ve seen, the new RapidMiner streaming extension can help you improve the speed and efficiency of your work by combining the platform’s ease of use with the power of some of the world’s most popular data streaming engines. Enterprises can use this functionality to build impactful solutions that help them harness the power of all their data, regardless of volume.

If you’re already a RapidMiner user, download it from the Marketplace. You’re also welcome to start a discussion on our Community to share your experience or ask any questions about how to use it.

If you’re not a RapidMiner user yet, start a free 30-day trial now. This extension is just one example of the ways RapidMiner can streamline your work and integrate with your current enterprise analytics landscape.

Additional Reading

David Arnu

David Arnu

David Arnu is Lead Data Scientist at RapidMiner. He studied Computer Science at the University of Dortmund (TU Dortmund) with a focus on statistics and machine learning. In the research department, he works on projects about predictive analytics and big data and the application of machine learning in the industry.