Why text mining is a big deal
At least 80% of enterprise data is unstructured, contained in the myriad text-based social conversations that are happening every day. Unlocking the hidden value of text through predictive analytics is imperative to the understanding of customers’ opinions and needs, to make better, more informed business decisions.
A whopping 90% of this data is actually completely underutilized when it comes to data strategies and data analytics techniques. Why is that? It’s very easy for humans to consume and make sense of unstructured data, but machines don’t find it as easy. It’s not like other data sources, it’s not staying in the table or a database, and it’s not easily referenceable. Therefore, it’s extremely difficult to mine. At the rate it’s being created, it’s almost impossible for humans to consume this information at the rate that it’s growing.
Text mining in a sense is like an art form rooted in science
How do you actually connect to it and how do you actually then ETL or transform all that data into a way that machine-learning algorithms can use to extract all the values out of it? Once you’ve actually structured your unstructured data, how do you mash it up with your other structured data? How do you actually use the unstructured data and your structured data, put them together so you get a more 360 degree review of your customer or of your business problem that you’re looking at?
RapidMiner has a native text mining suite where you can do things like tokenize your words, transform your cases, do stemming, do filtering of stock words, add in your own dictionaries, generate bigrams and trigrams, prepare all the data in a statistical-based fashion for a clustering algorithm to use or maybe to do a classification with a linear SVM. Our partner AYLIEN compliments RapidMiner, adding natural language processing. Things like sentiment analysis, entity attraction, and concept extraction. All these different things can now be combined into a unified platform.
Using text mining to analyze tweets from Super Bowl 50
In a recent webinar with AYLIEN, we explored the power of social content by analyzing data captured from thousands of tweets referencing Super Bowl 50 ads to determine viewer sentiments and predict potential trends in brand adoption. We focused on the 15 top brands and clustered the results to see what the really hot topics were and what was going on in the Twitter sphere during Super Bowl 50. To start, we focused mainly on volume. We wanted to get a handle on what exactly people were talking about, what brands they mentioned most, and how the brand chatter developed over the course of the game and in the buildup, and actually in the aftermath of the game as well.
Through the text mining process, we see that it’s not just what you have in your databases and reports that matter, but how you are abke to combine that information with your own structured sources to get a true picture of the overall performance. All of this is made a lot easier with industry-leading predictive analytic tools like RapidMiner and state of the art tools like the AYLIEN text analysis extension.
Want to learn about more ways data science can have a real impact? Check out 50 Ways for some of our favorite use cases!