Deep learning is certainly all the rage but is it worth all the hype? Will it send your organization rocketing past competitors in an AI-fueled space jet? Will it take over all our jobs?
Probably not. But while it may not be the solution to world hunger, it’s still pretty cool and definitely a powerful tool to have in your data science toolkit.
Watch this 45-minute webinar to learn more about deep learning including:
- What is deep learning and the important advancements that have transformed what is possible
- What are the most promising applications of deep learning – where can it help, where won’t it help
- A demonstration of how you can use RapidMiner Studio to do deep learning
Dr. Ingo Mierswa. Ingo will get started in just a few minutes. But first a few quick housekeeping items for those of you in attendance today. Firstly, today’s webinar is being recorded. And you’ll receive a link to the on-demand version via email in a couple of business days. You’re free to share the link with colleagues who weren’t able to attend live session. Second, if you’re having trouble with audio or video, your best bet is try logging out and logging back in, which usually resolves the issue. And lastly, we’ll have a Q&A session at the end of today’s presentation. So as we go through the material, if you’ve got questions, take notes, write them down. And we will answer them at the end. So we’ll leave some time. And I think you’ll find Ingo will have lots of great answers for you. So with that, Ingo, I’d love to hand it over to you. Let’s get started.
Hey, perfect. Thanks, Tom. And welcome everybody to today’s webinar on deep learning. Of course, it’s a super important topic right now. So we will really cover all three big and important questions from deep learning. The first question is, what is deep learning, actually, in the first place? What can deep learning do for you? Where can you apply it? What kind of use cases are really great for deep learning? Or the other way around. Where is the deep learning really great? For what kind of use cases? And then last but not least, and maybe most important, how can you really use deep learning? So the focus on the how. I will show this to you in a quick demonstration as well. So first of all, what is, actually, deep learning? So probably, the moment where most people got exposed to deep learning was a couple of months ago where it was all over the news that an artificial intelligence based on deep learning wins against the world champion in GO. It was very exciting. No question about that. Because many people thought this game is way too complex, so it will take years until we finally will build a machine able to beat the world champion in this game. Some people even claimed it’s never possible. I think that’s really in general not a great idea to bet against an artificial intelligence in general in a task where it’s mainly involve making rational decisions. And I would be more than happy to have a great discussion with you guys on the pros and cons of when the IAI and all the ethical implications. Probably not today. But if you like, write me on Twitter, and I’m sure that I will make sure that I will answer with you. But for now, the important thing really is we made a tremendous progress thanks to deep learning, also in the broader AI space. Sometimes I’m a little bit sad. So we have this great success, and all people are saying is, oh my God, now robot armies are taking over the world and will kill us all. So I don’t think that’s the point in general. I really think deep learning can help us, machine learning can help us, AI can help us with a lot of different and exciting areas. And we should really know what those areas are and then how to use those technologies in the best possible way.
So bring us to the first important question. So what exactly is deep learning? Is this the same as AI? I’m sometimes surprised that actually many people ask me, hey, Ingo, are you also doing AI? Well, what exactly do you mean? Well, I heard, about this deep learning stuff. Okay, that’s really not exactly the same thing. So just as a primer here. Those are three topics. AI, machine learning, and deep learning. And AI is really the big bucket. So many of you probably know, but just as a reminder for all of us, AI is the big bucket containing all the techniques which enable computers in general to mimic human behavior. And there’s a lot of different research trends go into AI. There’s natural language understanding. There’s computer vision. There’s, of course, machine learning. The most important of all the research areas that in the artificial intelligence bucket. So machine learning now is a subset of those techniques. And the subset really focuses on enabling machines to improve with experience. And most of the time that means experiences, well, captured in form of data. So we look into data describing past and learn from those data. That’s our own behavior, but the behavior of machines to get better over time. So that’s the learning aspect of it. And that also allows us for a new situation if you have data describing the current situation we are in right now to also learn from the past and predict what’s going to happen. That’s how predictive analytics and machine learning are connected. And then as part of machine learning, there’s a subset of methods within all the machine learning methods, which basically really make computation of multi-layer neural networks feasible. And as a result of this, the exciting this about those multiple layers, and we will see this a little later, is that we will get something for free we typically would need to put some good amount of work into to be equally good with other machine learning methods. So those are the three topics today. We will focus on deep learning.
So let’s start on the basics. And I will try and do my best to explain to you what deep learning is. I can’t avoid to use a formula or two. So I’m sorry for that in case of people who don’t want to see this. But I’ll also use some cute images of some cats to make up for that. So let’s start on the images first. And then we have a formula or two. But I think both are important to understand the concept. So we start with a binary classification problem. You probably are familiar with this if you’re familiar with machine learning and data science in general. But the idea here really is data points in a multi-dimensional space. And you want to classify those data points into two or more different classes. So in our example here on the right, we have four images of animals. We see one red data point on the top-right corner and a blue data point on the bottom left– towards the bottom of this first graph. And what is this graph really? We have two dimensions. The size of the animal and the domestication. I can now take different animals I encounter and plot them somewhere on this graph. And while I’m doing this, I will try to find a separating line between those two classes. In this case, it’s a very simple approach. And as you can see in the picture, that probably wouldn’t be the line I would take if I only had those two data points. But it’s a correct answer. So there’s an infinite number of correct solutions. This is one of them. And typically, most machine learning methods while you’re adding more and more data points to the mix will adapt the position of the separating hyperplane as we called this line. And try to figure out, okay, well, can I describe this hyperplane in a good way so it separates the two classes of animals or just the colors red and blue here for us? So that’s what most learning methods or many of linear learning methods are doing following. And what you often end up with is is the following one. So the function which describes which class an animal belongs to, cat or dog in this example, can be calculated.
So for example here, this animal is a dog if 2 times the size of the animal plus 3 times the domestication minus some arbitrary value– not arbitrary, but some value like minus 250– is greater than 0. Well, then it’s a dog. And otherwise, it’s a cat. Of course, now the goal is to actually find those numbers. 2, 3, and 250. The 2 and 3 are important because they tell us which is the more important, well, dimension here in our space. So right now, since 3 is higher than 2, you could assume that domestication might be more important. But of course, that also depends on the scale of the dimension. So size, you might measure in inches or we might measure it in meters, or in centimeters, or whatever it is. Domestication might be on a scale between 1 and 10. Who knows? But of course, those exact depends also on the scales. But they often give us some good insight into what is more important. And in that case, it might be domestication. So that’s very well known. I assume that most people here in the audience today know of course the premise. It’s a linear classifier. This is exactly the outcome of linear regression model; or a linear support vector machine; and many, many others. Okay. So that’s the basic problem. So back to the deep learning then. Can we maybe learn from our own bodies? And that was exactly what people did in the past. Can we actually try to mimic the behavior of the human brain? So let’s look into our own brain. And somebody did it apparently and found out, well, there’s a lot of those neurons in this part of the brain structure. And each neuron consists of multiple parts. And on the left here, we see those dendrites. And those dendrites, they actually listen to electrical impulses coming from neighbor neurons. So they basically are connected to other neurons. And they then react and basically sum up all the electrical signals those dendrites are collecting. And then, if this is high enough, they send the signals themselves– or the neuron sends the signal through this axon into those synaptic terminals connecting to other neurons.
Okay. So what has this to do with our function? Well, remember, this picture here because we can actually try to mimic the behavior of the brain cells to derive very similar functions to the one we just saw before. So here on the left where the dendrites have been before, we have the inputs of our space, x1 to xn. x1, for example, could be the size of our animals. x2 is the domestication level. And then there’s this x0, which is just the constant, typically 1. And by assigning a certain weight– like we’ll use 0 here of 250– you will end up with the 250 we saw in the formula before. So if you just sum up the weighted input factors here– that is what happens in the left blue bubble– and deliver this to this activation function here on the right blue bubble, which basically introduces this if function, well, then that’s exactly what we have seen before. So all the inputs from the electrical signal from the neuron are there, and weighted, and summed up and then they either fire a new signal into the output or not. A 1 or a 0. Well, it’s kind of the same thing. So this structure derived from the neuron in the human brain– or at least somewhat resembles the structure, this actually leads to the same formula we have seen before in linear regression. So that’s exciting. So we can actually see the way our brain is doing something similar here. But is this good enough? Well, turns out this perceptron alone– that’s how this thing is called– is really not that powerful as a machine learning method. But what if we combine multiple of these perceptrons next to each other? This is this blue layer in the middle here. And if you do this, we really get a true network like the brain of multiple cells. All right. Let’s do that. So we introduce this concept of a hidden layer here. So left still we have the inputs. Right, we have an output. And we have this hidden layer of multiple perceptrons taking the weighted inputs and have some activation function. But in this case, the activation function is no longer just this neural step function we have seen before.
But this can be also non-linear function, originally called a so-called sigmoid function, which resembles an S-curve. It’s a little bit smoother. It’s important. Well, for the third element this is, well, how do I actually now figure out what the right ones are? Before it was complex enough already just with this linear regression-like case or for the perceptron. We only had one weight per input. But now everything is connected through everything else. So we have a lot of connections. And each connection gets its own weight. And we need to figure out what the right weights are. And that’s exactly what this backpropagation algorithm is doing. So the basic idea is it compares what in a forward way this network would calculate based on the inputs from– and it compares this result to what the result should actually be. And if there’s an error, it propagates this error back. And in order to be able to do that, those activation functions need to be differentiable. Without going into mathematical details here, but that’s exactly the reason why we no longer have this step function, but the sigmoid S-shape curve. Anyway. So those are the three elements, which turn a perceptron into multiple perceptrons, so then into a network. And this is the way how we train this. All right. So if you are there already, why not just saying, well, we can have multiple levels. So instead of just having one level of perceptrons here, we can have another hidden layer or even more. The numbers can differ and everything. But the basic idea is, yeah, we add another level, maybe two levels. And that lead something– so I typically would call this multi-layer feed-forward artificial neural networks, which are trained with a stochastic gradient descent using back-propagation. And I have to admit that’s not a fit name. And that doesn’t really fly well from a marketing perspective. And that’s probably what many other people also thought. And that’s why they came up with a new name for this thing. And they called it deep learning. So here you have it. This really is what deep learning is.
It is this basic idea of this perceptron who resembles a linear regression function, adding multiple of them into hidden layers, having multiple hidden layers, so I end up with a multi-layer feed-forward network. And you train all those connections and weights. And that’s exactly what the deep learning network actually is. Well, of course things didn’t stop there. And there are certain flavors or variants of deep learning, which have been developed from that basic idea. For example, on the left, we have something we call convolutional neural networks. And here the idea is that each input– those orange bubbles on the left– each input basically is not just one data point or one value. But it’s an actually a rectangular field of an 2D input matrix. So we have not just whatever set of values, but we really have a two-dimensional input. For example, an image. And then you move those rectangular small fields over this image, and you can, for example, find some calculations. Like an average of what you see there. Feed that now into the input of the neural network. And if you do this, well, then you can actually learn what’s going on in those images. So that’s exactly the reason why those convolutional neural networks have been very, very successful in computer vision, for example, for recognizing handwriting, recognizing what kinds of objects are inside of an image. And if you think about the U.S. Postal Service, if they read handwritten zip codes on letters– well, nobody is actually reading this. So those are machines, which are automatically reading those zip codes, for example, to route where the letters are going. So just as one of the use cases. So it’s important to be really correct in those situations because of otherwise those letters would go everywhere. So another example would be so-called recurrent neural networks. Here, we take the same structure but now we can also build directed circles with those connections. And that actually is due to something which is getting those neural networks now kind of a state.
So they get a short memory really. Interesting because while you’re training and while you going over input data, you actually can remember what happened the data set before or the data point before. And if you improve there, the number of variants are so-called long short-term memories, you actually can remember, yeah, quite a lot. And if you do this, well, that’s a kind of algorithm, which is really successful, for example, in speech recognition where you know well, just a second before, I had a sound that sounds like a A maybe. And then this next sound is whatever. L. So together, it’s Al. So this is the kind of application where you need some memory because you need to know what happened before to also say what’s most likely the outcome for the current situation. And that’s why it’s so successful in speech recognition. If you use your iPhone and ask something to Siri, LSTM is the technology that is used. If you use Alexa, same story. Any kind of good speech recognition as of today are using this kind of technologies. All right. So those are variants of the basic idea of this multi-layer neural network or deep learning approach. But those variants are really very well on those use cases. So are there others? So we now move to the next section here about, where can we use deep learning? Where is it in particular strong? Well, to answer that question, where is deep learning really strong, you can also reformulate this question to, well, what is the advantage of having multiple hidden layers in the first place? And you can have a look at the picture on the right side here. I think it explains it very well. The idea of these multi layers at the top here, you basically get all kind of images in this image recognition use case here. You all see a kind of images there. And each image, this is the raw pixel data. Okay. Here is a pixel that’s red. And here is another pixel, which is green. It’s very raw. It’s kind of noisy.
But in the first layer here, we can now start actually to, well, generate information based on the information we get delivered from the previous layer. So the first layer gets the raw data. But the neurons we are creating here, they might react or activate their electrical signal to the next layer based on finding some specific small structure. So for example, they could react to small circles, or lines, or rectangular shapes, or whatever it is. So some basic structures. And then those basic structures– and if they fire, they can now be used in the next layer, to create features or higher-level features. So for example, multiple of those structures might form a nose. And other structures might form a paw. And the paw of a cat looks a little different than the paw of a–
So you can have a neuron for each kind of animal using the kind of higher-level features we have seen before. And this is exactly what those multiple hidden layers are doing for you. And is exactly where deep learning is really strong. In cases where you would need to extract those describing features. And that’s nothing new. So extraction of features, generating of new features, feature selection, that’s all part of the topic, which is called feature engineering. And I had to write good chunks of my PhD exactly on this topic. Feature engineering is extremely important. It most often makes the difference between a mediocre classifier and a really good one. It’s not so much the learning method you use, it’s how we transform the input space for this learning method. So that’s often so much more important really. Well, and that explains why deep learning is working right now also because it is doing this automatically for you. So instead of you sitting down and describing all the features and how to extract them, all deep learning is do this in an implicit manner automatically for you. That’s great. That really can save you a lot of time. And in situation, use cases where you need to do feature engineering, well, this is certainly should be a part of the tool box. And try out deep learning because you can set really a good benchmark. But you still might improve with additional feature engineering. But you get a very good result pretty quickly by the way. So that sounds awesome. So what’s the catch? If it’s so great, well, are there disadvantages? And unfortunately, the best thing about the field of deep learning is the implicit feature engineering. And the biggest disadvantage of deep learning is the feature engineering. And so often, what makes it so good, unfortunately, also creates the drawbacks. So the first thing really, which is a little bit annoying, you can’t really see those hidden features. You can’t really learn anything from this. You don’t really know its importance to learn the concept of a nose to distinguish between cats and dogs, for example.
Well, it’s not exactly true. There are techniques to get to this point. But it’s not easy. Most people just can’t. And it’s often so complex, and it’s so hidden, and there’s so many things playing together, that it’s very hard to a make a clear statement and to learn something to, for example, change your course of action in a business use case. So those hidden features can be powerful. But there’s not much, since they’re hidden, to learn from. And then also it’s in general true, it’s really hard to understand the neural network. And the same is true for the deep learning network. It’s extremely difficult to understand. You will understand a linear regression function for sure. You see what are the important features and how they contribute to the prediction. You understand a decision tree for sure. There’s enough people who claim that they’re able to understand a support vector model. Neural networks? Not so much. There’s really not much you can do. You can use the insight of some simulation, but you can’t just look at the model and understand what’s going on. So you can’t understand what are the hidden processes you want to detect. So yeah, it’s a black box. And that’s also sometimes difficult because people can’t build trust in this model. We saw all the connections, and you will need to learn all those weights. And if you need to do this, and then iterate all the data over, and over, and over again, well, that explains why, unfortunately, deep learning is kind of slow. It’s really not one of the fastest algorithms. It’s very complex. The more hidden layers you have, the more nodes you have, the longer the run times. It’s really not one of the fastest. It’s not like a simple perceptron, a linear regression. It’s way, way, way slower. And unfortunately, still until today, it’s also one of the techniques, which are prone to overfitting. There is some kind of– we call it regularization also for neural networks. It’s just not as good in many cases. Or it’s easy to work around it, let’s put it that way, by setting up different network architectures or setting up different parameters. And then it still runs into overfitting quite easy.
What does overfitting mean? Well, you just memorize the data. You’re not really learning something. You’re not generalizing from the data. And that means that if you see a situation which is kind of similar to what you saw before, but well, just a little bit different, the prediction can be all over the place. And it’s unfortunately– well, you don’t really feel it learn something. So that’s not really great because you, of course, want to have a model which generalizes very well, which is a more robust model, which is also going to work on slight changes of the data in the future, and create good predictions for that case. And you can get there with some deep learning effort. It’s just much more difficult to get there. And most people still end up with a somewhat overfitted model. So everything here together now explains that deep learning really is a particular successful in cases where you need feature engineering. So like in image or speech recognition. That’s just a must in those cases anyway. Or another use case scenario of the data. But in other situations, it might actually be better methods to use your more scalable, more robust, less overfitting methods together maybe with a little bit of feature engineering and you might end up with a better model overall. So don’t think it’s always the best solution in all cases. Okay. So what are those use cases? So we talked about image recognition and speech recognition. But that’s not the use case every single company might have. So here is a small selection of different use cases which either drive revenue, reduce cost, or avoid risk. And, well, in all of those categories, you will find use cases where if you think about this feature engineering can be very helpful. So for example, in the top box here, you might have some applications in web analytics or for sure in pricing optimization. They really look into time series data about the past where you can or would usually extract some features anyway. And here learning might be a really good technology.
There is certain situation also in cross- and upselling, customer acquisition, or customer analytics in general. There is a lot of customer transactions, again, where you would typically do some feature extraction where deep learning can be very successful as well. Again, deep learning can be very strong in doing this implicit feature engineering for you. So those are just some examples. But in general, of course, if you use for your use case and see how well it works. Brings me to the last segment of today’s webinar. Is how can you now use deep learning? Okay, with all those network structures, isn’t that very complex? And good news is, no, not at all. I’m going to show this to you in a minute here. In a couple minutes of a demonstration. So let’s move over to RapidMiner in a small demonstration. I hope you can see it in your screen. I know you guys can’t answer but I’m just thinking it works. So let’s see. So let’s quickly build first process. But in the interest of time, I will load from R. So I’m not explaining whole RapidMiner to you now here. But the basic idea is– in case you don’t know it yet– it builds those analytical work flows here in the center of the screen like base composed of data sets and basic building blocks we call operators here. So let’s build a very simple one on one of my favorite data sets. The sonar data set. So you just drag in the data and you build a work flow by connecting those operator blocks. This one you just retrieve the data set. And you connecting them by what? Clicking and dragging. Okay. So this is the data set. Just quickly. Lots of numbers here in this data. Lots of different columns here. This is actually sensor data coming from a sonar. So this describing a frequency band.
It’s not a particular large data set. It’s only a couple of hundreds rows or examples here. But the goal is to distinguish between the sonar signals of rocks versus mines. Further down here, I have some mines. Well, it’s roughly 50% rocks, 50% mines. Okay. So it’s frequency band. I told you before, if I look into the data here in a scatterplot, you can already, okay, using the first two bands here, there is not really any pattern I see. Maybe more here towards the upper right, there’s more mines than rocks. But I don’t know. If I look into a parallel plot here showing basically now every line is one row in my data set. Need to distinguish between the red and the blue ones. Yeah, there’s not really a clear pattern I can easily see at least. So well, it’s a tough question. But it’s frequency bands. It’s kind of sensor data. So deep learning, again, might be very helpful. So let’s use it. And how do you use it? Well, you type in deep learning here. That operator, you drag it in and that’s it. So here you have now your deep learning operator inside of this data flow. And well, that’s all you need to do. So we can change the parameters here. And the most important of those here like defining the hidden layers and everything else. But, hey, let’s just go with the basics. And I can run this process now. And well, it’s done now. I get a very, very good training error. Although don’t pay too much attention to the training errors. Again, if you only pay attention to the training error, you would definitely will run into overfitting. So make sure that you also validate this process here properly. So this is what I’m showing you next. Next thing would actually be to validate this learning. I am not building all those process but quick explanation. So I have a cross-validation here. This, by the way, is now parallelized completely with the new version 7.3, we just released a couple of days ago. So give it a try. We introduce a complete new parallelization framework. So things are much faster now.
So anyway. So we take the deep learning and we train basically 10 different models always on 90% of the data to apply this model on the remaining 10%. Of almost 10 runs. And if I do this now here, well, it takes a couple of seconds. But now I’m getting actually quite good accuracy of 79%. And all the other values. I’m not going through all the details here. So well, it looks quite good for this data set. So I skip this one here. But maybe I show you this ROC comparison, where I take the same data set and actually now compare the ROC curve for a different learning model and gradient boosted tree. So what happens here is, initially I run a 10-fold cross-validation, create the ROC curve, average those curves for both deep learning as well as gradient boosted trees. And as you know, often gradient boosted trees are among the best methods. And even here you see there’s certain situations where gradient boosted trees are better. I can tell you already that will be the next one. But it takes too long, so I’m skipping this for today. If I optimize the model just for deep learning here in this process, I actually end up roughly about here. And I can’t get much higher with gradient boosted trees. So in this particular data set, I could probably change at least one thing here. Let’s see if I can. I don’t know. Let’s actually see. We might end up with a better result already just with this one thing that changed here. Let’s check. And anyway, my point is on this kind of data set, deep learning often outperforms– no it’s pretty much the same. Okay. Anyway. This outperforms other modeling methods. So if I, on the other hand, want to tune those parameters, that’s a process I talk about before. There are all kinds of operators in RapidMiner for doing that. So in this case, I actually say, okay, I would like to tune the model and the learn rate. You can also tune the network structure.
I added the log inside of this process here as well so that actually while it’s running, you can go here to those results and in the log window. And actually, you see that for all the different combinations– they’re added now while this process is running– for all the different combinations. What is the accuracy? So this case of 120 epochs and the learning rate of .003, we get a 81% accuracy, 82 almost here, 150. Okay. That looks good. So I can sort it now and basically find out now what is the best result. Oh, look at that here is even better. Anyway. I am stopping this process here. I just want to give you the idea, you can combine here in RapidMiner the whole enchilada of different validation methods, optimization, but also feature engineering methods together with deep learning. And it’s just like dragging in the deep learning method. And that’s all you need to do. Yeah. I’m skipping the next one as well in the interest of time here. But here is, for example, one example. Let me show you a different data set where I use the Titanic training to learn. And here, for example, gradient boosted trees clearly outperform deep learning. And so I think that’s just an important thing to remember here that there is certain situations where deep learning is just not as good as many people believe. Since we almost to the end, this is debunking some myths. That’s what I always love to do. See what people actually believe is going on and see what reality looks like. So the first myth is, well, deep learning seems to be this brand new class of evidence and– yeah, something, which has only has been developed in recent years. And that’s actually not true at all. I mentioned the perceptron before. It was actually developed in 1957 already. The first multi-layer neural network was published about in 1965. So this is a 50-year old technology. Well, that’s still younger than linear regression, which is roughly 200 years old. But still. This is by no means something created in the last couple of years.
Deep learning has always been around. Why is it now hip? All of a sudden why are people caring now? And the reason really is that it was overshadowed for multiple decades now by other learning methods. For example, support vector machines, which has been just more computationally feasible. And in recent years, we have more parallel compute power. We have new GPUs. And now actually, well, we can speed up the calculation. I think that those two things really go together and that’s why deep learning was all of a sudden getting the success that probably should have been earlier but was just computationally not feasible. Okay. The second myth is deep learning has a lot of hidden steps. That’s not true. It really only needs two, otherwise it’s hard to justify the name deep learning. It would just be a regular neural network probably. But yeah. But it’s not always a great idea to add hundreds of layers at the same time. Well, sure, the learning method gets better, but also the risk for overfitting increases. Because the more complex the model can be– well, if you’re not stopping it, it will get more complex and the increase of complexity, also the risk of overfitting increases. And the third one is really like, oh, deep learning, that is really the strongest machine learning method that we have right now. Well, it is definitely strong. And don’t get me wrong, I’m not saying it’s not. It is among the strongest methods we have currently. There’s certainly a couple of others which are very strong as well. But not always the right tool in all cases. I repeat what I said before, it’s often a great idea in cases that would need to do some feature engineering any way. But they are difficult to tune, the risk of overfitting is high, it’s very slow compared to others. And for certain data sets, other machine learning methods can just be much stronger. So I think it’s important to understand and that’s still true to today,. Even some people say, well, the whole idea of a no free lunch theory might not be applicable any longer.
My point really is, in my experience, just in practical work, there is no silver bullet in machine learning. There just isn’t. Yes, there’s certain situations where deep learning is best. There’s other situations where decision trees or random forest are best. And sometimes it’s just a linear regression or decision tree. It also depends on so many things. It’s really not always the right thing to do. So I would like you to see deep learning as just another tool in your machine learning tool box. It solves the same kind of problem other machine learning methods are solving. It’s a very powerful tool. But sometimes it’s not the right tool. For some problems, even the most powerful tool is just not the right one. So have it in your tool belt for your– yeah– your machine learning work. But also try out others. There’s often situations where others are better. So I’d like to conclude now before we do a bit of Q&A with a little bit of an idea what might happen next. Well, I don’t really know. Although we’re in the prediction of the future business, it’s sometimes hard to tell. But I personally think this whole concept of adding some form of memory to learning methods in general, especially to neural networks, might actually be a very important one. Because that all of a sudden allows you to also learn much more complex structures. So instead of just learning a function, which technically says for a certain input, there is a certain value, certain output. The output can be maybe a string or maybe even as complex as a complete algorithm, which depends on the input values. So and this kind of more complex structure requires actually some form of memory. Otherwise, you will never get there. There’s some early research, which is going into this direction. And there’s some promising results. And this will indeed be the next thing– whatever the name is going to be– which replaces deep learning. So this form of a memory-based reasoning powered by machine learning. We’ll see exactly how the name will play out. Okay. So at this point, I would like to thank you right now for attending this webinar.
I believe we’ll have some questions and answers of course. You can go online on rapidminer.com. You can download RapidMiner yourself and try it yourself. It’s definitely fun to work with deep learning. It’s often very powerful as I said. So give it a try. And then I hope you enjoy it. Thank you.
Thank you, Ingo. As a reminder, we’ll be sending a recorded version of today’s presentation in the next few business days via email. And we’ve got lots of really interesting questions that have come in. So, Ingo, get ready. First question is, all of the examples that you demonstrated today, those are all on RapidMiner Studio, right?
Right. Absolutely. So yeah. You can just download it. It’s all in there. We can actually even make the process available. Tom, correct me if I’m wrong, but I think we should maybe write or something and put the process and the data there as well, so you can use this as a starting point for their own experiments.
Excellent. Will do. Okay. So the first few questions are around GPUs. So with new GPUs, cheaper computing power, etc., can we not think about deep SVM or something in that direction? And the second question is centered around RapidMiner support for GPUs.
Sure. I’ll start on the first one. For sure, if you think about GPUs, parallel computing, there are techniques where you can make use of parallel computing. Because that’s two things. Parallel computing, and that’s also much faster work with a matrix optimization like categorization or multiplying different matrices, etc. SVM, a really, really difficult to parallelize. In fact, whenever you start really doing this, most of the case, you’re not ending up with SVM, unfortunately, in the long run. There are some approximations to support vector machines. Most frequently actually based on simplified versions, or linear kernel functions, or forms of iterative or stream-enabled SVMs, which are easier to parallelize. But those are actually no longer real, true SVMs. So I might need to check my research papers again. Maybe there’s been something in the last 6 or 12 months. But before, I didn’t see anything that I would say, yes, this is exactly a truly parallelized SVM. But you can parallelize and also run on GPUs linear regression. There’s a lot of k-means clustering. This is pretty trivial. There’s a lot of algorithms where it’s much, much simpler. And I think we are still kind of in the infancy in terms of parallelizing our machine learning algorithms. There’s still a lot of work in front of research and also companies like RapidMiner to implement stable and robust versions of this. So that’s on the algorithm itself. For RapidMiner and GPUs, we are actually right now actively looking into this. Some people might have been getting even an email from us in the audience today. So yeah. Looking into this. We are actually actively talking to Nvidia as well. Based on the CUDA graphic cards, it’s not in the product yet. But stay tuned. It’s definitely a topic we are really interested in, and there might be something down the road.
Excellent. Thank you, Ingo. The next question is you had mentioned that deep learning is slow. Is it slow with learning or during deployment to produce predictions, or both?
It’s one of those slowest actually in terms of– it’s not as slow as the k-mean clusters, but it’s one of the slower ones in terms of even the prediction of the score cards. And the reason is because you still need to go through the whole network, apply all those calculations for every single score. That’s more complicated than, let’s say, what you need to do in an SVM case or a linear regression, you really have to– it all depends on the number of inputs. But that’s it. And there is not multiple steps you need to do on that. So it’s relatively slower even in terms of prediction. But the real slow part is the model training. So yes. It’s slower than predicting the regression model but it’s still very fast prediction side. The modeling part, itself, is where it takes much, much slower than others. And it’s hard to test those numbers of course because it depends on the data set and so many other– the parameters. If you have 10 layers with 100 nodes each will definitely slower than, let’s say, a very small network architecture. So it’s hard to say– I can’t give you numbers. But in overall, the experience, yeah, it takes more time. Good news is in RapidMiner’s case, we actually implemented a set of algorithms with a partner of ours, H20, which are really, really doing an excellent job on taking basically the computation of the hardware you have and max out the users. So using multiple cores and also measuring very well when it’s no longer great idea to continue the computation. Basically, stopping the computation at the point when it’s no longer worth doing it. So we probably integrated one of the most interesting but certainly one of the most powerful implementations of deep learning which is available in today’s market. But still, in comparison, a lot of data sets, you will find it’s not one of the fastest methods.
Great. Thanks, Ingo. The next question is, how do you compare a RapidMiner to Google’s TensorFlow?
Well, good question. So in terms of– if you’re really interested building the exactly all internal of the machine learning algorithm itself and only that, TensorFlow might be the better option. Period. Why? Because that’s exactly the problem they solve. You can build the internal calculations in a very detailed way of the machine learning method itself. If you are actually interested in solving the complete end-to-end problem, you actually have around data science, which starts with the data ingestion, all the data preparation work. So not just on RapidMiner the UI, but there’s hundred operators to help you. Combining data sets, filtering them, cleaning them, solving all kinds of data quality problems. So the whole data ingestion and data prep phase is so important because that’s where you spend all the time. You can inject R scripts, Python scripts for doing of that in a more customized fashion, if you need or want to. Then you have hundreds of different predefined machine learning methods, which are just one drag away. So that’s much easier. You have all the validation and everything there. You have some visualization aspects and integration into third-party applications like data visualization products, Salesforce, HubSpot, you name it, to actually take those predictions and operationalize those predictions. So it’s really this end-to-end from RapidMiner, while, for example, TensorFlow only focuses on this one particular element in the middle and optimizes this one step. I think there’s some places where this is a good idea because it really every single micro percent of the counts. In reality, I think it’s more important to solve actually the overall data quality, data ingestion, data preparation, and validation problem. Because that’s where you as an analyst spend so much more time on. So either, Ingo, the scientist loves the idea of TensorFlow. And Ingo, the business-oriented data scientist, who actually wants to solve the problem thinks, “Oh yeah, that’s nice. But it’s not actually solving my problem. I need to model it as well.” And that’s where RapidMiner really shines.
Excellent, Ingo. We have just time for one more question. Any recommendation for sequence-to-sequence translation tasks? Assume you have variable-length input sequences that need to be classified and mapped to variable-length output sequences. Would deep learning or some other feature of RapidMiner be able to do this?
Tough one. That’s really a tough one. I’m almost afraid. Yes, the answer is you can do it in RapidMiner. But it’s really a tough one. The reason is this is exactly this gray area between those most simple function learning situations where you take some inputs, which could be the sequence in this case, and map it to some output, which is typically just one value. And actually, a more complex structure as an output. So we experiment some time ago with a thing, which was? called SVM Strikes. But it wasn’t really used by a lot of people. So deep learning in general can be used for that. Deep learning in RapidMiner, not so much at this current stage. You could probably replicate this with multiple models. But to be honest, that’s a very, very specific problem. And this would be, by the way, one of those situations where actually you might want to combine this with Theano or TensorFlow and then integrate this into the overall RapidMiner workflow. This is a very, very specialized problem, which is applicable certainly for certain use cases but not for so many. So that’s a little bit complex. You probably will need to combine multiple products here. But that’s easily possible. So that’s the good news.
Great. Well, thank you so much, Ingo, for your time today. For those of you that we didn’t get to your questions, we will be sending you out. Again, will be following up with a link to this recording. Feel free also to send your questions to Twitter. We are @RapidMiner. And with that, thanks, everyone. Have a great rest of your day.
You can check out the processes and data from this presentation here: