Analyzing Customer Reviews with MonkeyLearn and RapidMiner

Studies show that 80% of customer data is unstructured, and it’s predicted to grow above 90% by 2020. Getting actionable insights from unstructured content isn’t easy, but with RapidMiner and MonkeyLearn, you can aggregate and analyze your all of your unstructured content. Diego Ventura from MonkeyLearn will show how to analyze customer reviews to help inform product decisions or make changes in your customer communications. In this webinar, you’ll learn how to:
  • Analyze customer reviews of Slack scraped from the web using machine learning and natural language processing
  • Build custom predictive models and classify the results
  • Gain real-time insights from your unstructured content

Hi. Thank you everyone for joining us for our RapidMiner and MonkeyLearn webinar about analyzing Slack customer reviews. I’m Diego. I’ll be your host and presenter today. I’m a business developer at MonkeyLearn. What we’re going to be seeing today at this webinar is basically, what’s MonkeyLearn, how to analyze customer reviews using MonkeyLearn and RapidMiner, how to use pre-trained MonkeyLearn models, creating custom models with your own data, and how to set up this process with RapidMiner. So what is MonkeyLearn? Basically, today what we have is a lot of information. Around 80% of business information is unstructured data. There are expected to be around 93% by 2020. Making sense of all this sort of data is going to be very, very tough. Today, there are tech tools, the stack is around 20 tools per each user. So we have social media, email, NPS surveys, task management, chat, and so on. And that’s why we created MonkeyLearn, a system of intelligence for all your text data. The idea is that you can have this product that’s a horizontal product that will process all that, takes data that comes into your business, and using machine learning and natural language processing, we can structure the data so that it can actually provide some use to your business. So what can you do with MonkeyLearn is you can analyze text at scale, you can automate manual processes, you can create products and features based on an OP using either pre-trained models– so we have around 50 models that are pre-trained for sentiment analysis, for topic categorization, news analyses, and so on. We’re going to be seeing some of them today.

We also have to get a reach into– for you to then be able to create and train your own models with your own data, either classification models or extraction models. The idea is that MokeyLearn is a product that is very easy to integrate. So obviously, we are mainly an API business but you can also use CSV data, Excel files. And you can connect with the libraries for most common use, as the case like Curl, Python, Ruby, PHP. But mainly, what our customers end up doing, at least initially, is using one of our direct integrations for Zapier, Google Sheets, Zendesk, and obviously RapidMiner which is a tool I really love, it makes my work very, very easy, at least with doing demos. What we have with RapidMiner is custom operator for classification models as well as custom operator for extraction models, that allows you to do quite complex process through the RapidMiner studio. So how can MonkeyLearn actually help you? Well, for example in this case, what we’re going to be seeing is how to use customer feedback. And customer feedback we can all agree is very, very important. But keeping track of all is not easy at all. It keeps growing, we get more reviews, we get more feedback from customer support, we will get more feedback from chats, and is just basically too much to actually process manually. There are some problems like there’s lack of consistent criteria and it’s very hard to get deeper insights without having a data scientist team or actually drilling down each of the data itself manually.

So what can you do with MonkeyLearn is basically analyze the feedback at scale, you can sort through all of it. You can actually inform your product decisions by analyzing the aspects, the sentiment, the emotion, the opinion units. And also you can have a centralized criteria, which is not easy to have, especially if you have a lot of people, or if you are in customer support which has a high [inaudible] turnover. It’s very hard to have a centralized criteria about how to tag searching data, how to track or search on the feedback. So today, what we’re going to be working on specifically is trying to understand Slack reviews that were left in Capterra, things like, “It’s really easy to integrate with” or, “Fantastic collaboration tool” or, “Terrible notification sounds, you will hear it even in noisy room.” So the idea is that you can grab these and say, “Okay, it’s really easy to integrate with, it seems like it’s positive and it’s talking about ease of integration.” The idea is that you can get these at– doing it at scale, you can inform then product decisions. For example, if you were competitor to Slack, you can say, “Okay, we have to improve our integration capabilities.” Or if you were to slide, you can say, “Hey, well, stop.” So what the process would look like is we’re going to be using some models that we have already trained with, like Slack data taken from Capterra. Things like, “It helps improve coordination on team members.” Which classified with a more sentiment analysis would say, “Okay, this is negative, neutral or positive.” In this case obviously is positive. Or also using these to classify into topics. For example, ease of use, integrations, performance quality, UI, UX. For example, you can say, “I work across the some of the workspaces and switching is very easy.” This is definitely talking about ease of use.

And last but not least, what you are going to be doing is using RapidMiner to basically connect all these individual touts that we’re going to be creating with MonkeyLearn. So let’s jump and dive into the demo. What we’ll now review is how to use pre-trained models on MonkeyLearn. How to train custom models with your own data and how to set up the process in RapidMiner. So without further ado, let’s jump into the MonkeyLearn dashboard. So this what you will be able to see if you sign up to MonkeyLearn. We are a freemium company, so basically, you can sign up for free. Just test it out, we encourage you to do so. What you’ll see is that there are a lot of models that are pre-trained, for example NPSS, feedback crossfire, sentiment analysis, keyword extraction, urgency detection. Let’s grab the sentiment analysis, for example. Let’s just test it out and see, “Okay, here’s the best sentiment analysis tool ever.” Let’s test it out and see what MonkeyLearn says about it. They say, “Don’t do live demos” exactly for this reason. And it’s saying, “Okay, this is 100% positive.” Let’s say something like, “This demo is taking too long, it’s not that good.” And see what MonkeyLearn has to say about that. So this is calling the API directly, which is hosted in the cloud, and it was saying, “Okay, this seems to be very negative.” With a confidence of 99.8%. This, obviously, is the nice way to see it, you can actually turn off the nice way to see it and go directly to how the output MonkeyLearn is provided, which is pretty standard JSON format that you can reuse in other ways.

But I’ll leave you to explore actually what MonkeyLearn has to offer. There’s a lot of models that you can try that work right out of the box. So you do not need to train your own. But let’s go into what we’re actually came to see here. Let’s go to the Slack classifier that we created for sentiment. Well, basically what we did was grab all of these reviews and manually tag them saying, “Okay, this review seems positive, this review seems negative, this review seems neutral.” And with enough data, you get this model – created with machine learning NLP – that basically says, for example, let’s say, “Great capabilities” will be tagged as positive with a 66% confidence. Or let’s say, “This is a terrible product, the notifications are too loud.” But hopefully, we get a negative result from MonkeyLearn, this is saying the review that you’re seeing here is negative with a 44% confidence. Other than that, what we also did was creating a much more complex topic analysis model that basically was classifying things into ease of use, integrations, performance quality, UI, UX, desktop, mobile, web, calls, motions, and more things. And basically what it’s doing here is, for example, if we provide a sample that says, “It’s great on desktop, but it sucks in mobile,” even though it’s incorrectly written, we get to see that, “Okay, this is talking about desktop and this is talking about mobile.”

And much more interesting is getting a look at how to build this. So let’s go and create a live custom model for you. So basically when you go into the dashboard in MonkeyLearn, you’ll be able to create custom models. Let’s click Create model, and you’ll get to see that we can create classifiers or extractors. The main difference between classifiers and extractor is classifiers basically turn things into buckets, into tags. So basically, you put a piece of text and say, “This is tag A.” Put another text and say, “This is tag B.” And with enough data it will be able to predict, given a new sample, if this will belongs to A or B. An extractor works differently because what they do is basically grab a piece of data that’s containing text and tell you, “Okay, this is an important keyword, this is a name of person, this is a name of an organization, this is the name of the product, this is the name of a brand.” So he’s basically tagging entities that exist already in the text. So given a new text, he will tell you, “Okay, this is a certain tag, this is certain other tag.” In this case, when we’re going to be doing since we did a topic on sentiment analysis. Those belong to the classification side of things. So when we click classification, we get to see, “Okay, we can do topic classification, you can do sentiment analysis, we can do intent classification.” So, when you’re providing this information saying what you’re working on, these are really going to be tuning the data for the model to be trained more efficiently. So in this case what we’re gonna be doing is creating a topic classification model. So if you already have data that is tagged, you can directly upload it– given that you have it in, say, reviews in one column and the topics in the other column, you would upload that directly and the model will reach.

In this case, I’m going to be showing you how to manually do it, so we get a better sense of how MonkeyLearn works. So let’s get out our source of data CSV. That’s for all these files. I have something here prepared already to classify. So we get to select, what do we want to do? Okay, let’s use this column, which is a text. I’m not going to use the column of the category but you could do so and it will train the model directly. But I want to go through the manual process of doing it. After a few seconds of loading the data, I think I have around 20-something examples uploaded. So I’m trying to just take samples which is very little amount of data. And let’s create some categories, let’s create ease of use. So we are creating the tags that we are going to be using to create these reviews. Pricing, calls. And let’s add one more, search. So we’re going to be looking for tags that represent either one of these categories. So we are going to be tossed into a screen where we’re going to be able to track this very efficiently. The model is going to be learning live, so each time we tag something, a process called active learning is going on. So, “The search features are very helpful to look back on the history of a conversational process.” This is definitely search. And these might take us a while, so please hold with me. “Easy to use,” this is ease of use. ” The videochat feature needs to make calls at a place where communication is still going to another platform,” is definitely talking about calls.

So now we have seen how the model is already trying to learn from the data, although he got it incorrectly. “Files can be sent without issue, voice on videocalls without adding any additional planning to be able to see the history of conversations with team members.” So the model is now thinking, this is search because I’ve seen these previously and is showing you the history of us belonging to a certain category. Let’s correct that and say, “No, this is not search. This is about calls.” So we are correcting the model live, we get to see how it’s sort of thinking. Don’t quote me on that because I’m going to get killed by the data scientist community. Again, using the free feature you can get the value of the product, this is not search, this is definitely about pricing. “It’s like it’s super easy to use and inexpensive.” This is ease of use and probably about pricing. We could do a multi label or multi tag classifier. In this case, I’m gonna go and just leave ease of use, just for the sake of the model. Multi label classifiers and multi tag classifiers are much harder to do because you need to input much more data and all of the samples or much more samples need to be tagged with several categories. Again, “Been searching group threads, because you just know  what you want it to do.” Okay, this is definitely about search, not calls, let’s correct the model. “Conversations, the search function of each file.” So this is thinking it’s goals and it’s saying, “Mm, no, I think it’s search.” It might be falling to the two categories, but it was more relevant for me to put in search this is not search, this is different pricing. We need to try a few more samples please hang on there. This is definitely already tiny, just ease of use. We never tried this before, so it’s learning as we speak, as we do it. “Free” is assigned as pricing, definitely looking into that, that was correct.

Go algorithm, it’s getting everything correct now. So this is search, definitely. “Even just want to call”. This is not correct. Correct again, pricing. “Video call,” this was very good. Just three more samples. “Our friend thinks about the fact that you can categorize your conversation and that you can search.” That’s not pricing, that’s search. “Easy to use.” That one got correctly. One more. And that’s it. We got the model, so let’s give it a name, Slack topic classifier. And just finish it up. We could keep training it manually as we have been doing, or we can go on, just test it out directly and say, “I like this search future.” Woah, you can’t say future, incorrectly written. I’m sorry. English is not my native language, so you might understand my mistake, but let’s leave it incorrectly written. “Oh, I like the search future. “Let’s see if it say, “Okay, this is about search,” which it should. It’s not so confident, but he got it right. And again, this is how we went about creating these much more complex model would identify all these different things, and also how we went about creating this sentiment analysis model. Usually sentiment is much harder and needs much more data than doing topic. So we now have all the pieces of building this model. We now need to tie it all together. So let’s jump into RapidMiner platform. So before I move forward, take into account that your RapidMiner setup might look differently. This is how I use it, I get the repository, the operators, the process, and the parameters. You can sort of move it around if you want to, but on this is very useful for me.

So first of all, I grab a CSV file that I have already here that some Slack opinions to process through RapidMiner. And I just put it there. I’m going to be looking out for the operator of MonkeyLearn. So I put a classifier operator there. I say, “Let’s connect this, which is basically just text, in one column.” Let’s add our API token. So you will need to go through a process of activating your MonkeyLearn account and putting your API token. Once you have it done, you need to select the module ID. To look for the model ID, basically, what you need to do is go to the model itself, grab that piece of data which is hidden in the URL, paste it there, select the input attribute, so text. And now we could just test it out and just run it. Just to see how that looks like. So we have around, I think, 400 rows to classify at the moment. So it might not be outputted instantly but it shouldn’t take very long. So now what we’re going to be getting is ideally– we select the topic classifier. So we’re going to be getting which reviews fall into these categories. So we just wait for a little bit longer. And boom, we got it there. So we got here the text, which was the original file we uploaded. Here we have the classification that we’re getting. So we’re getting app, software, features, characteristics, purpose. We got also some level three classification, like communications, ease of use, general. And for each of those categories we get also, the confidence of it belonging to that category.

So now we are sort of– there, we need to add another piece of the component. So let’s call from MonkeyLearn classifier again. Let’s add the operator. Let’s say, “Okay, we want to–” Token in. We want to do sentiment now, so we go to the Slack sentiment model that we built before. Select the ID, paste it there. Select the input attribute, and we’re gonna say, “Okay, this should be text,” which is not appearing there. So let’s do something. Let’s click Rename. And we do this because we want to rename what the output of MonkeyLearn is. So basically the old name was the classification pass and we’re going to be naming it Topic. And then connecting it to the MonkeyLearn. Again, we’re going to say, “Okay, we just grab text.” And we could add a new name to just connect it again and say, “We want this new output to be sentiment.” This is missing, topic. And if we run the process– again, this is going to be doing only classification that we previously done. So it’s going to be saying, “Okay, we’ve got this piece of text, show me what categories it belongs to, feature, software, pricing, ease of use, etc.” Then renaming data, stop it. Then sending again that to MonkeyLearn, then classifying it into sentiment neutral, negative, positive. Then renaming that attribute as positive or negative. And we get the results going.

We can, for example, very quickly, use the charge function in RapidMiner to say, “Okay, let’s see how these datas that I uploaded gets into different sentiments.” For example, we see that out of the 400-something reviews I’ve loaded, most of them were positive, some of them were neutral– very few of them were neutral, and very few of them were also negative. And you can do this also for each of the classifications that you got. So for example, if you go to category– to the categories. Let’s go. For example, we got that most of them not having a category theme but we could go, for example the category level two. Oh, sorry. And we get to see, “Okay. Most of them were not tagged, so it means that they didn’t find category level two, but found that a lot of them were about features, a lot of them were about purposes, and a lot of them were about characteristics.” And we can do also do the category level one. And found that most of them were either nothing or of software. That’s probably because Slack has very bad [laughter] category level. Service, some of them. Could use these to actually go very quickly about other views, obviously we could tweak this possibly a bit more, but that’s essentially what the process will look like. We will create a model for sentiment, create a model for classification of topics, and then just run that tying everything with RapidMiner. We would be able to actually put this into production to do this live and to get things like notification about negative reviews or get notifications about certain specific aspects. Or basically just get insights over time about it. So a lot of things going on. I think we’re running out of time at the moment. So if you have any questions please leave it through a Q&A, we’ll make sure to answer every one of them through email, probably. So thank you very much for joining me today. It was a pleasure. Hope to jump into a new webinar again very soon. Bye-bye.