Data science teams do great work, but it is all for naught if the models they create cannot be operationalized in a quick and friction-free manner. Models are often built using one technology, but must be translated to another technology to embed in applications and business intelligence platforms. It not only takes too long, but accuracy is often lost in translation. Enterprises have an insatiable appetite for predictive analytics; data science teams need to deliver faster results.
Watch RapidMiner and Forrester Research discuss how organizations can operationalize predictive analytics.
Hello, everyone, and thank you for joining us for today’s webinar, Operationalized Predictive Analytics: How data science teams can close the insight-to-action gap. I’m Hayley Matusow with RapidMiner and I’ll be your moderator for today’s session. I’m joined today by Mike Gualtieri, principal analyst at Forrester, serving application development and delivery professionals. Welcome, Mike.
Mike’s research focuses on software technology, platforms, and practices that enable technology professionals to deliver precinct digital experiences and breakthrough operational efficiency. His key technology and platform coverage areas are big data and IoT strategy, Hadoop, Spark, predictive analytics, streaming analytics, and prescriptive analytics, machine learning, data science, and emerging technologies that make software faster and smarter. Mike is also a leading expert on the intersection of business strategy, architecture, design, and creative collaboration. Mike has more than 25 years of experience in the industry helping firms design and develop mission critical applications and e-commerce insurance, banking, manufacturing, health care, and scientific research for organizations including NASA, eBay, Bank of America, Liberty Mutual, Nielsen, EMC, and others. He has written thousands of lines of code, managed development teams, and consulted with dozens of technology firms on product, marketing, and R&D strategy. Mike earned a BS in computer science and management from Worcester Polytech Institute. While a student, Mike was awarded three US patents for inventing an expert system used to train air traffic controllers around the world. We’re also joined today by one of our own product experts, Lars Bauerle. Welcome, Lars.
Lars is the Chief Product Officer here at RapidMiner. He is a strategic and innovative product leader with 20 years of product and operational experience in enterprise software and analytics business intelligence market. Mike and Lars will get started in just one minute, but first a few moments for our audience. Today’s webinar is being recorded and you will receive a link to the OnDemand version via email within the next one to two business days. You’re free to share that link with colleagues as well who were not able to join on the line today. Second, if you have any trouble with audio or video, please send a note through the form of question in the Q&A box and someone on our technical team will respond back to you. Finally, we’ll have a Q&A session at the end of today’s presentation. Please, feel free to ask questions at anytime via the questions panel on your screen. We’ll leave time at the end to get to everyone’s questions. Okay, that’s enough from me. Now I’ll turn it over to Mike.
Thanks, Hayley, and welcome everyone. My name is Mike Gualtieri, principal analyst at Forrester, and I want to talk to you today about advanced analytics and operationalizing advanced analytics. So the first thing I want to talk about is priority, and based upon our data, we found that 85% of enterprises are planning implementing or expanding the use of advanced analytics. And there’s a simple reason for that. Advanced analytics results in more knowledge, and that knowledge comes in the forms of insights and historical analytics, or it comes in the form of predictive models. And to get that knowledge you need analytics. And our perspective at Forrester is that there’s four key broad types of analytics that all companies need. The first one is descriptive analytics, and that’s really your traditional BI, your reports, your dashboards on your historical information. And most firms have invested collectively billions of dollars on descriptive analytics. If you look to the right there’s the advanced analytics, predictive, streaming, which is real-time analytics, and prescriptive analytics. And these constitute the advanced analytics. And what’s interesting is the momentum that advanced analytics, the use of advanced analytics, has in enterprises. You can see from this comparison between 2014 and 2015, for example, look at predictive, look at the change in companies that report using predictive, location, streaming analytics, the advanced analytics. So there’s an enormous amount of momentum because there’s a lot of value in those analytics. The good news about this chart is that it’s still less than 50% reporting using it, right. So for companies that aren’t using these advanced analytics, there’s still plenty of opportunity to do it before their competitors use it.
Now, advanced analytics, I mean, analytics is a bit of a misnomer, right. Because analytics is about information and insight that human decision-makers can use, but it’s also about applications. And that’s really what we’re talking about here. We’re talking about how can we take those insights and those models and inject them into applications so they can make those applications smarter. Enterprises have access to plenty of data. We’ve done a recent survey showing that organizations have plenty of data but that they only analyze less than 20% of that data. They only use 20% of that data. But in order to make their applications smarter, more personalized, they need to inject those analytics in those applications. So most application-development organizations, design organizations, they don’t think of, “Oh, how can I use analytics for models in my application?” They really should, right. And so, what data scientists need to do is they need to find a way of injecting and making those applications much smarter. And there’s dozens and dozens of examples of how companies are injecting analytics. One is trading for financial services firms. What sort of models can they do to look at market data, to look at social media data, to find out what sort of equities that they could trade? Trading on good news, trading on bad news. The whole world of IoT, all of that sensor data, how can we analyze that data, create models in advanced analytics to determine if something’s wrong, something’s right, what’s happening based upon those sensors in those devices? There’s also location analytics that can be used in conjunction with predictive analytics to make recommendations. What if in real time these shoppers enter a mall. Do you make them an offer? Maybe you don’t make them an offer because their behavioral analytic shows that they are going to buy anyway, so save that 10% to your bottom line.
So there’s plenty of ways to use analytics and embed them in applications, but you can’t do it unless you can efficiently operationalize those analytics. So the good news is that data scientists know how to create these advanced analytics and know how to create models. And they use a combination of statistical and machine-learning algorithms in conjunction with predictive analytics tools to find those patterns, to find those models. But if they’re finding those models and patterns at their desk, again, that’s not doing any good in the operational business. And data scientists have an amazing set of skills to be able to find these models that have great business value. But again, they have to be able to operationalize them because they’re being judged not on the model but on the business model that accrues based upon that model. And the reality is that data scientists and organizations in general struggle to make those insights actionable. They struggle to operationalize those insights. And some of the problems with that have to do with creating a model and deploying it and then monitoring that. A model that data scientists create often has to be translated to code that will run within the target application.
So if there’s a beautiful model that needs to run an ERP system, can that ERP system accommodate the code of the model to do the scoring in real time? If it’s a web application, how is that model going to be run? Many models can be called via a service outside the application, but that will introduce latency in the model. PMML, which is predictive market modeling language, can limit the methods and algorithms. So PMML is a standard that some people use to deploy models, but that can often limit the methods and the algorithms used to find the most accurate model, right. So there’s a problem with that as well, and the models usually don’t include or deploy monitoring code. There’s no data scientist that’s just going to hand over a model and say, “This model is good forever.” No, I mean, that model, depending upon the application, that model may be good for a day, it may be good for six months. I mean, the point is that once you deploy a model, it has to be monitored, it has to be retrained on a frequent basis, but often deploying that model, there’s no code that they can easily monitor that model. So the solution, really, is to streamline this entire data science process, the process of discovering a model, and the process of deploying that model, and monitoring that model.
And so there’s a few requirements, general requirements, that you have, that data scientists and the organization at large has to have. Those models, number one, to be able to scale to handle high volume applications and streaming data, okay. The key there is that this model isn’t being used to create a report. This is an operational model, which means it’s going in an application, and that application, for example, if it’s an e-commerce application, has to scale to tens, hundreds, millions of user, right. So however that model is deployed, it has a requirement to be able to handle and scale and score at high volume. It also has to be able to access all of the data that’s needed at time of scoring. So you think of the model as taking inputs and then doing a score or doing some sort of an output. But what about those inputs, right? It can be very, very difficult to extract those inputs from the underlying systems, and the underlying applications where they originate in a flash. So the models can’t just be, “Here’s the pure model,” but the model also has to include methods and code for being able to access the variables that it needs at the time of scoring.
Operationalizing models means getting those models into the applications where they’re most valuable. So it has to work seamlessly within complex heterogeneous enterprise environments, okay? This isn’t sort of an Internet startup with hundreds of programmers that can just sort of make anything happen in any period of time. This is an enterprise. This is an enterprise that has built up a portfolio of hundreds, and sometimes thousands of applications, for example, on a bank. And how are you going to get that model and all of those heterogeneous applications? So you can’t just have one deployment model and force it in. It has to work seamlessly with all of the applications.
And then finally, you have to be able to embed those models and IoT applications, mobile applications, web applications, and as I’ve been talking about, enterprise applications. And when you think about the different technical challenges, these aren’t technical challenges a data scientist should be tasked with or even an application development group, right, because that’s going to impede, that creates friction from the time that the model’s created until the time that it gets deployed in the application, all right. So there’s just the pure level of effort that has to go into doing this, but there’s also the time to market. Models may have a very short lifespan before they have to be retrained. So the time from model to deployment has to be streamlined. It has to be as short as possible.
So companies are making very, very big investments in data science. You can see from that data slide that there’s a lot of adoption and momentum in the advanced analytics. But companies should not make that investment, or as they make that investment they should also think about, “Okay, I’ve got a brilliant model. How am I actually going to use it to make a difference in my business?” So operationalizing advanced analytics is a critical piece of this. And once companies can do that, they’ll be able to have a complete set of analytics that can inform applications and make a measurable impact on the business in both descriptive, predictive, streaming, and prescriptive analytics. So again, I’m Mike Gualtieri. Thank you for your time. I am now going to hand this over to Lars who’s going to go a little bit deeper on the mechanics of how you operationalize advanced analytics. Lars.
Thank you, Mike. And yeah, hi, everybody. What I then will talk about here is how we at RapidMiner provide a predictive analytics platform that has the features out here that Michael described. We make it quite easy for people to build predictive models from these type of projects, and then also operationalize those models into the business to maximize your outcome or the value. And we do this through a platform that can access lots of different data sources, provides a rich layer of data preparation tools and capabilities, the ability to model them, and even more so validate and really figure out what your models can do, and then take it to the operationalization or the deployment of these models into business systems in a variety of ways which I will describe a bit more.
And our product here on the platform provides some very unique capabilities. On one hand, it’s very easy to use, and I’ll give you a little demonstration here to get you introduced to the product. It’s very effortless to go ahead and build up some of these models. It’s also very powerful and has a lot of functionality, making it very fast to develop predictive models and manage that lifecycle. And we’ll dive pretty deep into the operationalization here. It’s quite straightforward, has a lot of nice features to do those things, everything from deploying the models, scheduling things, managing the models over time as Michael described. And lastly, the product is founded on an open source core and it comes with a great community of users. There’s this marketplace as well where people build upon and contribute and add additional features and functionality that’s available. It provides a very rich platform with new innovative capabilities around big data and new machine learning, etcetera. Next, we’ll do a demonstration here of RapidMiner. Let me switch over and jump right into the RapidMiner studio product.
Here what we’ll do now is actually analyze some data related to the Titanic accident, which hopefully many of you are familiar with. What I will do is just go out and add that data to RapidMiner first, and as I’m doing that, the product immediately looks at the information, starts to assign data types that I can also as a user make any changes to that to see and make sure I load the data the way I want it to be loaded. I can make any of these changes afterwards as well, but in this first step it’s a pretty easy part and a good point to do some of that. So let’s now add that data to RapidMiner here, and we immediately can see that it contains a number of records of passengers. They were in different classes. We got their names, their age, we know whether they entered a lifeboat or not. So a lot of good information that can help us determine here what made or what characteristics of these passengers made them survive. So that’s what we will build a model around.
What I’m doing here is taking a look at some of the statistical information, and it’s a view that shows me per attribute if there are any missing values in the data, which there are here around age, for example. Also, we can see further down that the cabin number is missing from a lot of the passengers. And so also the lifeboat here indicates that there are 800 records that are missing value. And in fact, that is actually the 800 people that did not make it onto a lifeboat. And we can see here total too, there’s 1,309 rows of data or passengers. So what we will start to do now is use RapidMiner to clean up this data sum and then apply a predictive model. In this case, we’ll use a decision tree to see what characteristics identify people that survived and those that didn’t. So we’ll jump into actually building a process here of doing that. So we’ll take our data sets, and what we wanted to do was to first clean up the data.
What RapidMiner does is it provides a lot of help for users in learning the product. For one example, it provides a lot of good getting started information where you can learn the basic mechanics of the product, how to import data, build processes, and so on. And then, it has a great set of tutorials here that will allow you to learn more about the product and how to do data cleansing and preparation, and then applying models, etcetera. Just good to know. What we’ll do now, though, is start to build up this process. And as we saw, the data here again had a number of problems, so we’ll go and address that. First, we’ll actually go and exclude a few of these attributes. We don’t really need them because they’re missing a lot of values. We’ll go in and add or replace some missing values for age here, and then we’ll have a pretty good dataset to work with.
As I’m starting to do those types of operations, at the bottom of the screen here you can see something called recommended operators. This is a part of the system that can recommend to you as a user what might be a suitable operation to take based on what you’ve done so far. And these recommendations are based on the common usage of lots of users. We call it the wisdom of the crowds. We are in fact, if you opt in, collect some data on your usage, it’s anonymized, and then we apply machine learning to that to see and sort of recommend what are good next steps. Well, in this case, let’s start here out with the select attributes. So basically, what we’re going to do is define which of the attributes we want to use in our model development. In this case then, we’ll exclude the ticket number. It’s kind of an ID which is similar to the name. We also saw before that cabin missed a lot of values and that lifeboat did so as well. And in fact, the lifeboat is a pretty strong indication to whether you survived or not. So we’ll exclude that column. Okay. Now the next thing we wanted to do was to replace some of the missing age values there, or all of them in fact. And what you can do in RapidMiner here is sort for a number of types of operators, and we have lots of them that can perform many good operations on the data. And in this case, we’ll take a replace missing value one. And here, what we will do is for age, we will decide to replace all the missing values by the average age in the dataset. You could choose other things, and in fact, you could also create a model that might predict even better what the right age of the particular passenger might be. But in this case, we’ll use just average.
Okay, well, let’s run this process now. We have excluded a few attributes. We’ve reduced those. You can now see too that for age here we’ve replaced all the missing values. And what we have left then are a couple of attributes which have one and two respectively then missing values. And what we will do with those is actually simply filter out those roles. And again, we can look at our recommenders here, our recommended operations, and the filter shows up here so we’ll use it. And here, we’ll in fact filter out any role that’s missing an attribute value. Very, very simple. Last couple of things here, I will also do something called set rule, which is an operation that identifies which in this case column we want to use to predict. And in this case it will be whether you survived or not. And we’ll call that a label. That means we are now using that as a predictive, or the predictive column. In fact, we’re going to build the model down around the attributes of the other pieces of data and then see who survived or not.
So we have done that, and then, yeah, let’s find and apply a decision tree here. And we’ll use that as our algorithm to develop the model. Okay. Well, let’s run this. And what we get then is a decision tree that depicts the factors here of how or why people survived or did not survive. We can see that the gender is a very– or is the key factor. If you were a male, a lot of the males did perish and only a few may have survive and they had paid– here we can see the passenger fare, if they’d paid over $387 they would have survived. Now, for the females the story is a bit different. Here in fact, the number of siblings or spouses that you had with you on board determines whether you survive or not. And if you had a large family, basically, a lot of those women did not survive. Now, we can speculate to why that happened, but could have been because they were trying to get the whole family together before they approached some of the boats or the lifeboats, for example. And in any case you can see here how it’s pretty easy with RapidMiner to get your data into the system, apply a number of data cleansing and preparation operations, and then a predictive model. You saw too that we have some nice recommendations at the bottom that makes it easier to find the right operator to use, as well as some of this early training material. So a very quick way, an easy way to get started with this.
What we want to do now, though, is to talk a little bit about how we take a model like the one we just created and how we can deploy it to the rest of your business. We’re in fact using more of the platform to operationalize this model. What we will do then is get into this particular area of the product. We talked earlier about some of the strengths of it and operationalizing your models is one. Here we are going to dive in a bit more into the strength of it around the scheduling and event driven model execution, and we’ll talk about how easy is to embed these models into other systems, whether they’re data visualizations or BI tools, enterprise applications, or even business processes. And then, we’ll also talk about how the system can do some self-learning or model management to dynamically and continuously update the model based on how the business is changing.
So when you operationalize things, you can do it at different levels of integration or automation, which is sort of on the X axis here, and also times plays in, like how frequently or how fast do you need to use this model. So take, for example, a churn scoring model. So you want to see if customers are potential churners or they might churn. And now you could run that once a day and update your customer list based on the activities that they have engaged in during that day or that week. That, of course, will give you some ability to take actions. In some industries that might be just fine. But the next step would be to do it based on a trigger or an event. So let’s say instead of doing at the end of the day or the end of the week, you actually re-score customers when they have done something with you. Maybe they went to your website and downloaded a white paper, and now you could look at that and re-score that customer. Maybe they are then becoming more of an interesting prospect to upsell them on a particular new capability or a new offering that you have related to the information they were interested in. And then lastly, you would have systems which really react immediately on event data as it’s happening. And Mike gave a good example of customers entering a mall or right when you do the purchase at the point of sale you could provide a recommendation for the customer to maybe buy something extra. Now, that requires even higher speed and more of a quicker response. And at the most highest speed of all, I would say today in the industry is, let’s say, algorithmic trading. And in this case, you also probably need a lot more IT infrastructure and even some of these complex event processing systems and etcetera in order to make sure that this works really, really fast. So it’s a fairly broad spectrum here on the timescale, but all of this classifies as various ways of deploying and operationalizing your system.
So let’s walk through then how we would do this in RapidMiner. So I briefly gave you a demonstration of the first phase here, which if you keep clicking we can access data, prepare it, we develop the model, and now we come to the deployment option. Once we have a model in one of these processes that I had created there, we can push that out to the RapidMiner server. On the server, we have functionality then to integrate it with other systems, there are web services there. There’s ways to integrate directly with applications which would allow us to tie that model down into those systems. As we get new data, the model can score it, and then through these interfaces or integrations, we can then communicate with other applications. So please pushed the button again. Here are a variety of those. For example, RapidMiner has its own web application environment where you could build applications that users can use in order to use the models. You can integrate with data [inaudible] like Qlikview or Tableau. You can at the API level also integrate with the server job applications and mobile devices. And we also have out-of-the-box connectivity to different systems like Salesforce.com, for example.
But let’s take a further look into that. So please click the next slide. Here I am then looking at some of the broader functionality of the server. In particular, we should look at some of the mechanisms that allow you to monitor the system, manage users and who has access rights, etcetera. We also have and we’ll dive even deeper into the model management capability, but also collaboration, for example, where different users could build models. We can share them centrally. Other people in the organization can pick up from there and do some of the integration work, let’s say, with some of the systems on the right. What’s also critical here is the ability to do bi-directional integration with other systems. For example, take a data visualization tool like Qlikview. Here you might have a dashboard or an application that shows sales data, but you want to provide some forecasting as part of that. While that application could then pass some of its data over to RapidMiner where we’ve built a forecasting model, it can then run through and deliver back to the Qlikview environment a set of data that it can then display in showing what some of the forecasts look like. And you could do that dynamically because maybe people are wanting to sub-segment their data and run it on that particular dataset. So allows for a bi-directional communication here. And again, can be done with a number of different systems and be very tightly integrated. The key mechanism here is that RapidMiner can very easily turn any of the processes built here and any of the models into an ex-post web service, which makes it quite straightforward to integrate with pretty much any system.
Okay. Let’s take a look at the model management then next. So here, basically, we’re showing how once we’ve integrated a couple of systems or a system with RapidMiner, as it runs then you will obviously create new data, new things are happening. We can feed that back into the system and start to use that data from the modeling processes again, evaluate the new models versus the old ones, and decide then if we want to deploy maybe an updated version back into the system and use that instead. So RapidMiner facilitates that kind of a workflow, and it can be fully automated or you might want to put in some triggers here to indicate if a new model might be a little bit better than the old one, and then decide whether you want to deploy that or not. But you can do a wide range of automation here.
Well, let me now step through a few other examples here and what this could look like. In this particular case, we’ll look at some data that comes from the government. It’s information about the auto accidents and how they have resulted in various– again, whether the accident was severe or not, what cars used, lots of attributes around the conditions of that situation. And with RapidMiner then you can develop a number of pieces here to take such information and develop an operational application. Here are a few screenshots then of getting the data in, preparing it, and in fact, doing some pretty advanced data preparation where in many datasets sometimes you find the data coded. You got to look up other tables of information to decode it or put it into some language that humans can understand. And this is an example of that, right. It can be some pretty advanced data preparation within the product here. And once you’ve done that then, we also apply and do some modeling, also some validation here, making sure the performance of the models are as good as we hope they are. And then, once we have this, we go into this phase of operationalizing things.
In this case, we’re building a prescriptive application, basically, one application where I can enter a number of constraints. And we will then apply the predictive model as well as optimize around those constraints to find a car that is most optimal for my type of driving and has then the lowest accident rate. So we built all this up and we can push this out and into a web-based application through our web services. So all those processes that I built there, I can pick any of them and then start to expose them here as a web service. And everything down here is through a graphical interface, right. It’s not already to be used and consumed by some other application.
In our case here then, we built an application with the RapidMiner web application and tool kit, and this application now allows me to enter a number of parameters here that will be then sent back to the RapidMiner model and optimization routines and find an optimal vehicle for my particular driving needs. So if I go here and, for example, set this to be for a young person who likes to drive very fast, might live out in the Seattle area where it’s rainy and wet, well, what kind of vehicle would reduce my risk of having a fatal accident? Turns out it’s a Datsun, a compact utility. So that would be the best choice in that case. Maybe instead then I’m a retired ranger out in Arizona driving around in dry or dusty, muddy conditions and things like that. Well, what kind of truck or car would be applicable in that case? Again, the whole RapidMiner models and optimizations then that we showed in the earlier screens will go through and find the best car in that case. So a pretty simple example, but hopefully, illustrative on what a prescriptive application can look like and how you could operationalize that. You could easily replace this with, let’s say, a loan application or application for a mobile phone and things of that nature. All the same, based on a number of models then and optimization and a set of constraints, we can find the best choice. And of course, you can fully take the step here and maybe integrate it with the purchase action in this case or the approval of a loan or the issuing of a particular service.
And then, let me quickly touch a little bit on the model management that’s like– as soon as we have put this stuff in place and we’re running it, we want to monitor the performance of these models. We also want to be able to run them on the new data, and in fact, run through that old process we built before, and maybe comparing different models and see which ones are best. And you can set all that up. You can also define certain criteria in which you want to be alerted, maybe of certain performance attributes of the models that you are particularly keen about “If these are improving or changing, please notify me, and then I can make a decision on whether to deploy that new model or not.”
So hopefully, that gave you a quick overview of the capabilities of the RapidMiner platform, so everything from accessing data, preparing it, building models, and then deploying and operationalizing those models, including the different phases here on schedule, trigger, or event-driven, and then also, being able to really tie everything back together again allowing you to do some model management or continuously monitor the performance of the models and test them on new data, see if you can find even better models and deploy those. And this can, of course, be applied to many different types of examples. I mentioned some loan applications, but you can even optimize around the loan rates, insurance premiums, project bidding, and a support call routing, for example. We have customers that use this precise thing to take in a lot of email-based support cases, do some text analytics on those, make sure they get routed to the right teams to put the right type of expert in touch with the customer as soon as possible. Preventive maintenance are other use cases where again you can really go quite far and not only figure out maybe when something is in need of maintenance, but more interestingly then, really optimize around the scheduling of the maintenance because it’s not just one machine that might need maintenance, right. You will have a whole part of them. When should you replace what parts or what pieces of equipment and schedule that in an optimal way. So all of these things are examples and all possible with the RapidMiner platform. So thanks. With that then I’ll hand back to Hayley and she’ll take the Q&A from here.
Great. Thanks, Lars. Thank you so much Mike and Lars for a great presentation today. You’ve covered a lot of really great information and we appreciate your time. So now it’s time to get your audience questions, so please, feel free to enter your questions in the questions panel. And it looks like we already have a few questions that have come in, so we’ll address those now. So it looks like I have a question here for Lars. How can I automate it and use it in my big data environment?
Okay. Good question. And so with big data, that could be a variety of things. If it’s related to, let’s say, Hadoop, for example, which is a common platform now associated with big data, I guess, we work very well with those systems. In fact, everything I showed you here can also be done on top of a Hadoop cluster, so all possible. Also we can, for example, what I showed here was a lot of processing within the RapidMiner system, but you can also in the case of Hadoop, even leveraging Spark and so on, push these type of processes or models into those systems and let them execute down in there instead of inside the RapidMiner server.
Great, thanks, Lars. I have a question here for Mike. How do you recommend that organizations address the data scientist shortage?
So in one word, productivity, right. If you look at the tasks of the data science, which is to understand the data, acquire that data, prepare an analytical dataset, and then iterate through a modeling process, testing algorithms, creating features, and then operationalizing that model, that’s the same process that’s been used for quite some time, right. So productivity is the key. And we’ve seen a trend where lots of data scientists are coding as well, like using open source R and maybe Python. And these are programming languages, right. So to some extent, you can do certain things in a custom fashion, but it’s also a rather inefficient way to simply apply a data operation to an analytical dataset, right. So I think it’s all about productivity. It’s looking at each one of those steps and trying to make it more productive. So the data prep stage, the iterative stage of building that model, and of course, what we’ve been talking about in this webinar is reducing the friction to operationalize models. You’ve got data scientists getting involved in lots of activities that they shouldn’t be involved in. They should be focused on that modeling. So to the extent that tooling and platforms can help them stay focused on that, you’ll make the existing data scientists you have more productive. So what’s interesting here is, okay, you’ve got a team of six data scientists. Well, what if you could make them 30% more productive? Well, now you don’t have a shortage anymore.
Great. Thanks, Mike. Looks like I have a question here for Lars. How do you work with Qlikview?
Okay. Yeah, I mentioned that as an integration option. We do it in two ways, actually. The easy or simplest way is that you can create or output data into Qlikview readable formats. The other is a more bi-directional integration then, which is what I explain more around this operationalization. And there, the two products can communicate through its respective API allowing Qlikview then to make a column over to RapidMiner and get results back, which are then appended. They have a particular feature there where you can append the column of data, let’s say, to an existing dataset. So that’s the integration that’s used in making that work.
Great. Thanks, Lars. I have a question here for Mike. What is the future of programming/coding versus more modern drag and drop technology like we saw here with RapidMiner.
I think both have a future, and I think there’s parallels to application development, right. So back when computers started with writing programs, it was all about programming languages, right. But if you look in any enterprise environment now you would never, for example, write a report in Java. Like if you needed a profit and loss statement, you just wouldn’t do that. You’d use a more modern tool that basically abstracts a lot of that functionality. So the way I think about coding versus drag and drop is productivity. Are there a certain set of functions and operators, in fact, are there hundreds that I can literally abstract? Because if I can abstract that then I can construct analytical and advanced analytical workflows much more quickly. Is there a need for programming? Yes, there is. Sometimes you’re going to need to drop down into coding to do some very custom stuff, for example, algorithm development, or say you’re going to do a particularly gnarly ensemble that requires some coding or you want to leverage some code that’s out there. So I think there’s a balance between both of these just as an application development. And the bottom line is what is the fastest way to build a model? Right. Because this is not a question about coding. It’s not a question about the tooling. It’s ultimately a question about that productivity. How can I build a model? How can I monitor that model? How can I streamline that entire process?
Great. Thanks, Mike. I have a question here for you, Lars. Can you talk about the differences of Rapid prototyping, substantiation, and prototyping?
Sure, sure. So the product then supports all those phases. The rRpid prototyping is sort of what I showed with the client in that demo. You can easily grab data around it locally on your desktop and do things pretty quickly. Then next, with the substantiation there, you might want to run bigger datasets, you might, in fact, want to not load all the data into RapidMiner because of limitations of RAM or you want to leave it in its source. Then you can do those in that phase, that’s where you really are testing your models on full datasets, making sure that they are actually maybe, yeah, scoring or returning at the performance that you expect. And then finally, the deployment phase or, again, operationalizing the models there is, again, where the solar functionality kicks in, the APIs are used and some of these out-of-the-box integrations. I think there’ve been a few questions here too around what other things do we integrate with besides Salesforce. And, yeah, you can do things with SFP, you can write back to all kinds of databases. We leverage something called Zapier, which integrates with lots of different web-based tools. So if you have things in the cloud, for example, it’s very easy to integrate with those. So a pretty rich set there. And again too, to address that maybe I think there was another question here around what about loading the data into your local repository and so on, yeah, you sort of disconnect from the proprietary data stored or from the original data source. And you again can do it both ways. It’s easier sometimes to load the data and work on it locally like this, but you can also have dynamic connections back to your databases and such through the product. I just didn’t show it in that quick little demo.
Great. Thank you, Lars. So this one looks like it’s for either Mike or Lars. Do you have any best practices of operationalizing using a platform like RapidMiner? What works in guidance for enterprises?
Yes, maybe I can start out a little bit since it’s related to the product platform itself then. So I mean, in many of these cases, yeah, it’s important to have some support by the other application owners and IT and so on to make sure that you can put these things in place in an effective way. Also, I think what we’ve seen is making sure the business side is bought into the improvements that are going to be provided here with these models. So a lot of it is not necessarily technical but organizational and project management-oriented because you’re starting to now move from this experimentational or prototyping phase and really deploying things into your business and your production systems. So there’s lots of elements around that, which I think is important to consider and really take care of as you do this.
Great. Mike, do you have anything to add to that?
No, that makes sense.
Awesome. Great. Looks like I have another question coming in here. This one looks like it’s for Lars. Can the server be deployed in the technology environment of my company or does it have to run in RapidMiner cloud?
Great question then. So what I showed today and went through today is all deployable on premises. So the system itself is really mainly built for desktop use and deployment on a local or enterprise installed servers. We do have some cloud functionality as well where you can upload your data and your models and run them on the cloud. This is more of a solution for a smaller setup maybe or where you are a desktop user and you want to run some jobs in the background or run them without loading your desktop environment, but for what we went through today, the best way to do that is to buy out the whole system here and deploy it on the premises.
Great. Thanks, Lars. So it looks like we still have a couple of questions coming in but we’re at the top of the hour right now, so if we didn’t address any of your questions, we’ll make sure to have someone follow up with you and address your question. So I apologize if we weren’t able to answer that on the line here today. So thanks again to everyone for joining us for today’s presentation and have a great day.