Skip to content

You Need an Embedded Data Science Factory, Not a Research Institute

Share on twitter
Share on facebook
Share on linkedin

When talking with people in almost any industry these days, you’ll often hear the concept of a data (science) lab. The idea is to have a kind of research institute on site, a special environment—whether physical or a dedicated IT domain—where data scientists can do their stuff, testing out new things and creating fancy toys.

In this environment, the usual rules don’t apply. Employees aren’t stuck in pesky cubicles all day, but instead get a cool open-floorplan warehouse with a free coffee shop and food trucks that come every day at lunch. Preferably in the Bay Area, Berlin Kreuzberg, or some other hip location. And you’re not forced to use a crummy Windows laptop that runs like a potato, but instead get a powerhouse of a machine running Linux.

Regardless of what the details might look like, you’re free to explore new ideas and develop models that use the latest deep-learning, neural-net tech. This might sound great at first—and having Taco Truck Tuesdays is certainly not a negative. But the freedom that makes this sound like a good idea at first will come to haunt you down the road.

Let’s take a look at why what seems so cool is really a just a fad that you shouldn’t spend time or money investing in.

The Problem with the Research Institute Model in Data Science

What image comes to your mind if you think about research scientists? Eggheads in white coats? The crazy scientist from the Simpsons? Personally, I think of Phil and Lem from the canceled-too-soon Better Off Ted:

Whatever your image is, it probably involves some combination of:

  • A lab, apart from everyone else, full of scientists who don’t work well with others
  • Scientists conducting research on topics without regard for real-world impacts

Don’t get me wrong; there’s a place for scientists toiling away on pure research—in an academic setting. But in an enterprise environment, isolation and research for research’s sake are both big problems.

AI and ML are going to be standard business practices in 10 to 15 years, as standard as using a computer is today. And that means that, in order to be competitive, you need to be using this technology well as soon as possible, not siloing it away from the rest of the business.

To make sure you’re getting the right bang for your buck, it’s critical that you have cooperative teams that produce the industrialization of insight.

Two Critical Aspects of Successful Data Science Programs

Let’s look at these two aspects of a successful data science team—cooperation and producing impactful business insights—to help explain in more depth why silos can be such a problem. And please not here that by “data science” I mean any kind of data analysis, whether it’s using ML tools like RapidMiner or traditional analytics tools like SQL and Excel.

1. Collaboration

If you’ve spent any time on social media, you’ve probably read that domain expertise on the one hand, and the ability to work with domain experts on the other, are the keys for data science success. And this is absolutely correct.

It’s equally true that the notion of an on-site research institute doesn’t support the idea of collaboration. In fact, it’s quite the opposite. By setting aside data scientists as some sort of special wizards who need isolation and fancy amenities to get their work done, you’re only amplifying the already-extant problem of silos.

We recently wrote a whole breakdown of why collaboration is so critical for success in an enterprise environment if you’d like to dive into this topic more to get a better understanding of how critical it is for your success.

2. Business impact

There’s no reason to be doing data science in an enterprise if you aren’t producing value for said enterprise. And it’s basically a guarantee that you have the data to have a significant impact, whether that means clustering customers, predicting churn, analyzing net-promoter scores, or something else. The final metric for any kind of new data science project, regardless of whether it’s called a lab or an institute or an initiative, should be getting models into production and producing value.

This might seem a bit harsh, but it’s simply not worth an enterprise spending money on hypotheticals. Data science, machine learning, and artificial intelligence are capable of having a strong impact on your bottom line without needing years of internal research and development. How? See Number 1 above.

Again, there’s a place for doing research into data science and artificial intelligence, but that place isn’t inside a business that’s competing in a difficult market and looking to use ML to improve their processes.

So how do you make sure you’re getting both the collaboration and business impact parts of the equation right? Enter the embedded data science factory.

The Solution: An Embedded Data Science Factory

If it’s critical to make sure that both of these aspects are present as part of your data science work, how do you go about doing so? You need to build an embedded data science factory rather than a research institute.

Each of these parts is key to the success of a project like this:

  • Embedded because it needs to be part of the overall organization and not separated and siloed from everything else that’s happening.
  • Data might seem obvious here, but it’s not—your data factory needs to be working with the concrete data that your business is generating and relying on every day, rather than abstract “data” science that might be experimenting with random datasets that they found online.
  • Factory because what you’re building needs to produce something of value for your organization, and not just do science for science’s sake.

So how do you go about setting up a successful data science factory? Let’s take a look.

How to build a data science factory

There are two key pillars for a successful embedded data factory. The first is getting the structure and culture of the company right. This might include things like supporting upskilling for all of your employees (not just the data scientists), setting up a center of excellence to make sure that you’re continually iterating and improving your processes, and getting in the habit of integrating your data factory into your daily business operations.

The second key to success is getting the right tools in place so that everyone can take part, from subject-matter experts to the “cool” data scientists. There are a number of tools that will do this, and we’ve put a lot of thought into how RapidMiner can help anyone of any skill level or domain deliver impact with machine learning. (Check out Building the Perfect AI Team for more on how different people can work together to build a successful AI project.)

The result of getting these two things right is a data factory that will have a significant positive impact on your organization’s bottom line.

Just like a factory might take in iron ore and process it into something more valuable, a data factory can take raw data, which is often dirty and polluted with noise, and, but putting it through the crucible of solid data-science know-how, turn it into something more valuable: namely, actionable insights that let you improve your business.

Bringing it All Together

I hope this post has given you a sense of why it’s so important that you don’t buy into the hype around setting up some sort of “research lab” at your organization. Instead, build a factory that’s an integral part of your operations and is creating value by using AI and ML to support your critical business decisions on a

If you’re curious about how best to evaluate the value that an embedded data factory is providing, take a look at my recent whitepaper, Talking Value: Optimizing Enterprise AI with Profit-Sensitive Scoring, which will walk you through a few basic steps to help you understand how to make good financial decisions based on the output of your machine learning models.

Prove the worth of your machine learning projects in four easy steps

Getting buy-in on machine learning projects is hard, as is ensuring you’re making the right decision based on your model’s predictions. The best way by far to solve these common problems is to understand what your model is saying in terms of cold, hard cash. But how?

This whitepaper will show you how.

Additional Reading

Martin Schmitz

Martin Schmitz

Martin Schmitz, PhD is RapidMiner's Head of Data Science Services. Martin studied physics at TU Dortmund University and joined RapidMiner in 2014. During his career as a researcher, Martin was part of the IceCube Neutrino Observatory located at the geographic South pole. Using RapidMiner on IceCube data, he studied the most violent phenomena in the universe like super massive black holes and gamma ray bursts. Being part of several interdisciplinary research centers, Martin dived into computer science, mathematics and statistics and taught data science and the use of RapidMiner.