Language is everywhere. Humans are steeped in it from birth, and learn it without any particular effort. We plaster written words on buildings and clothing, and we can read stories written by people who died before our parents were born. This is all so obvious and easy for us that we barely notice how miraculous it all is. But for computers, language still presents a serious challenge. Just try asking Alexa to turn up your music and order toilet paper in the same command.
However, this is beginning to change. One of the biggest developments in machine learning in recent years has been the democratization of natural language processing (NLP) technology. As more and more self-service solutions become available, it’s no longer necessary to hire an entire team of developers and data scientists to unlock the value of NLP. So what is NLP exactly and how can you use it in your business? Let’s take a look.
What is natural language processing?
Broadly speaking, natural language processing is any computer-assisted analysis of language data. Whether you’re trying to understand what someone has typed into a search engine, sniff out spam in the comments section of your blog, or understand what someone says to your automated phone system (when they aren’t annoyedly screaming “agent!!”), you’re working with NLP.
Understanding speech is easy for us humans, thanks to hundreds of thousands of years of evolution—but it’s not so easy for a computer. Since the focus of this blog post is on understanding the meaning behind our words, we’re going to skip some of the initial steps that might be required for a given use case; for example, you might need to convert oral speech into written text in order to make sense of it. For our purposes here, we’ll assuming you’re already working with text-based data.
So how can you use NLP in your business? Here are five techniques that are at the forefront of machine learning and artificial intelligence.
1. Text Classification
Text classification is a fundamental part of NLP. Beyond simply identifying words, sentences, and meaning, text classification comes down to categorizing and tagging a text to put it into the correct, predetermined category.
By way of example, let’s say that your airline receives a large quantity of customer emails to your generic support address. You need to get these emails to the right departments to make sure that they are acted on. You could have a human or two in the loop, monitoring the email and routing requests. But what if you had an NLP model that could tell the difference between a request to cancel an account, a request to change a booking, and a special meal request?
The airline LIAT built just such a system, using RapidMiner to automatically categorize and route emails to the right departments. This simple change led to a significant improvement in social media sentiment for the company.
2. Text Clustering
Organizations today know that data is valuable, but often, the only thing worse than not enough data is too much of it. When you’re suffering from large quantities of unstructured textual data, text clustering is the best data science tool for turning it into useful information.
Text clustering is similar to text classification, in that both seek to place texts into categories. With classification, you’ve already decided what categories exist—in the email example above, there are only a finite number of departments that the emails might go to. With text clustering, however, you’re taking a set of texts and determining not only what categories they belong to, but discovering what categories exist in the data in the first place. (If you’re familiar with supervised and unsupervised learning, you can think about that being the distinction here.)
Text clustering can be especially useful if you’re worried about biasing your results by predefining your categories. Consider the scenario where you have 100,000 customer reviews of your software product, and you want to know what kind of information is contained in them. You could decide ahead of time what you want your categories to look like—say, positive and negative—you might miss other important categories that are implicit in the data you’re looking at—for example, reviews that might be positive or negative, but contain suggestions for future features.
3. Topic Mining
Topic mining also seeks to understand the content of a group of texts, and you can think about it as a special case of text clustering because, with topic mining, you also want to categorize into groups. But here, the goal is to identify the range of topics represented in a text. After all, a document is often about more than one topic: for example, both foreign politics and the UK.
A helpful—if slightly loose—visual analogy for topic mining is the (in)famous word cloud, where more frequent words are larger, and less frequent words are smaller. If you were to create a word cloud for a bunch of different documents, and line up those clouds side by side, you could get a sense of which documents cover which topics in more or less depth.
A word cloud of RapidMiner’s homepage in the shape of a hedgehog 🦔.
A great example of how one can better understand their customers using topic mining comes from a tutorial by Dr. Martin Schmitz on how to do topic mining of Amazon reviews. You can see in the tutorial how topic mining provides lets you quickly organize, understand, and summarize large collections of textual information. Applying topic mining to your customer or internal audit data can easily enable you to visually identify causalities and correlations in the data, and help you solve challenging business problems like customer churn and fraud.
4. Named Entity Recognition
While named entity recognition (NER) isn’t a full use case in and of itself, it’s an important enough part of other classification and categorization systems that it’s still worth discussing on its own. NER refers to how NLP systems identify important nouns (like people, places, and events) in a text. There are a number of ways that NER systems works, from dictionaries with tons of possible nouns to relying on sentence context to help properly identify nouns.
NER can be trickier than it might seem at first glance. For example, you want your system to identify “Bennie and the Jets” as a song title in “Elton John and Bernie Taupin composed Bennie and the Jets”, but in “I went with Bennie and the Jets played horribly”, “Bennie” is a personal name and “the Jets” is an NFL team.
As mentioned above, entity recognition is especially important for classifying text. For example, if you’re analyzing news articles to sort them according to the people, places, and things discussed in them, quality named entity recognition is essential. Anytime you use NLP and you know that your data is going to have a lot of proper nouns, be sure that the algorithm you are using is good at named entity recognition.
5. Sentiment Analysis
Sentiment analysis is about understanding the emotion inherent to a piece of text. You can think of it as a very specific kind of text classification task—is this text positive or negative? As straightforward as this sounds, it can be quite complex when you consider all of the double entendres, puns, sarcasm, and the general level of ambiguity that exists in human languages. Algorithms can have trouble precisely identifying the sentiment behind a piece of text.
Why would you want to know the sentiment behind text? One of most common applications is the analysis of how customers are talking about a company on social media. For example, by finding tweets that mention your company and then subjecting them to sentiment analysis,
Through text mining online chats, companies can determine how employees feel when they use an internal chat system. They can also investigate how various user groups feel when interacting with each other on an online platform. Armed with this kind of information, a company can try to target or assist customers with offers or helpful information.
NLP and RapidMiner
NLP offers a wide variety of applications for companies that need to analyze textual data quickly and reliably. With hundreds of digital and social media platforms, there’s no shortage of valuable text data being produced about your organization, but with the volume of text being produced, it can be incredibly difficult to know what’s actually being said. However, if you can succeed at unlocking analytic insights from this mass of unstructured data, you’ll have an advantage over competitors who stick with conventional business insights.
Just like any data science initiative, when engaging in your first NLP project you’ll encounter huge roadblocks if you’re not able to demonstrate cross-functional value that can be clearly understood by all of your team members. RapidMiner Studio helps uncover insights rapidly and explain them to all of the stakeholders with an analytics platform using visual design. RapidMiner Studio also includes Auto Model, which provides automated text analytics and doesn’t require a degree in data science and includes automatic feature extraction with built-in sentiment analysis, context categorization, and language detection.
If you’d like more tips to help you make sure you’re demonstrating value in the early stages of a new machine learning project, you can also check out A Human’s Guide to Machine Learning Projects.
If you’d like to talk about how AI can help your business with unstructured textual data, you can register for a free AI assessment where we’ll help you analyze the feasibility and impact that AI can have on your bottom line.
See why organizations are investing in Qlik machine learning to be able to easily implement predictive analytics models into their business.