How to manage your data connections, speed up deployment and improve collaboration

Share on twitter
Share on facebook
Share on linkedin

If you’re reading this, you’ve probably experienced some of the pain associated with managing data source connections. If you’ve spent a good bit of time replacing connections while moving a process to production, struggled with collaboration within your team, or have simply found the current feature set too rigid, we have good news for you.

In the 9.3 release we introduced a new way of managing data connectivity. It allows you to easily and securely share connections via Server, move processes from one Server to another, and manage your organization’s connections at scale. This post will show why and how you should use this new feature.

Benefits of the new architecture

Share connections securely

Collaboration with your colleagues can be quite tricky, especially when you’re kicking off a project or when a new member joins your team. You want to make sure that she has access to all the necessary data sources and becomes productive as soon as possible. To date this required distributing connection configurations manually and/or using global credentials, making user management tedious.

As of the 9.3 release, our recommendation is to use Server for distributing and managing data connections. Setting up the necessary access rights and the new Vault service will provide you all the tools to share connections in a secure and scalable manner.

Deploy processes with ease

Many users of RapidMiner Server prefer to keep data sources and their access rights separate for their development and production environments. This practice has great advantages in making production more stable when committing changes. Unfortunately, doing this with the old connection architecture could be an error-prone task, due to the required manual effort to find and replace every connection in the deployed process.

The new architecture significantly speeds up this process and makes it more robust. By defining semi-absolute paths (e.g. /Connections/data warehouse) when referencing a connection one can copy and paste a process from one Server to another and it will automatically work, without human intervention. No need to manually check every operator. Studio will check the path to open up the appropriate connection for accessing data.

How to use the new architecture

Create a connection on Server

What makes the new solution versatile is the fact that it’s become part of RapidMiner’s repository system. Permissions to view, edit, execute can be granted or revoked just as with other repository items and connections can be dragged into the process canvas. In order to create one, just press create connection in the top menu bar, or in the right click menu of your local or server repository.

Enter the necessary information such as connection type (Database, S3, Azure etc.), name and location of the connection, and add a description or tags for better management.

Selecting database type will pre-fill the generic properties of the connection, so the only thing you will need to enter are user credentials, host and database. At this point you are ready to use the database connection in a process on Server or Studio, but the newly created connection will be available for every Server user with all the parameters you’ve just set.

Securely store user-level values

In cases where access rights to data are controlled with personal credentials, storing values in the connection configuration can be a security issue. Instead of entering your credentials, set them as injected parameters and mark Server as the intended source of these values. This setting determines what sources Studio will contact to retrieve the necessary information for initializing the connection. After the setup is saved, the Vault service in RapidMiner Server needs the user values. In this example we used Username and Password as parameters that need secure storage, but any other one could be marked and set from the Vault.

RapidMiner Vault: A recently introduced service to RapidMiner Server to stores values, that can only be accessed by the user. Each Server has its own Vault service, inheritance or copying is not yet possible across Servers.

As a last step, visit the Server page and find the connection by name in Repository / Connections. You can easily spot configs with missing values as they are marked with a warning sign (     ). Each item in the list is a unique representation of a connection in the related Remote Repository in Studio. “Show details” will open up the configuration. Add the necessary details with “Set injected values”.

Saving these values will complete the creation process. With this configuration anyone else trying to use the connection, will have to visit Server and add their own personal values.

Move processes freely between environments (Servers)

An additional benefit of using Vault Service is that it improves process deployment. As the Vault itself is unique per Server, connections can receive different values depending on the environment. One particular example can be to use credentials of a user with Write access rights when working on the process on the development Server, but use Read only rights in production. This can be easily achieved with 2 Servers, same relative paths and connection name in the process. Copy pasting the process is the only action needed to move it into and run it in a different environment.

Summary

To sum up, the new way of managing data connectivity will allow you to create a connection securely via Server, and to speed up RapidMiner process deployment. We encourage all of our users to start managing their connections in this newly introduced way. It will make you and your team more productive and create a more organized repository with clear dependencies between processes and connections. For more information, take a look at our ‘Connect to your Data’ documentation page.

The upcoming releases will include further improvements to the feature, including a semi-automatic migration tool to help with replacing old connections with new ones even in processes. Until then, you can still use, create and edit legacy ones.