Go, Watson, Go: Win at Jeopardy with Basic Statistics

Share on twitter
Share on facebook
Share on linkedin

Well done, IBM. The new super computer named Watson was created and trained during the last 4 years by 25 IBM engineers in order to play (and win!) at Jeopardy. I just have viewed a short video about the event and the result really looks impressive.

Watson played quite well against two of the best Jeopardy players in the world. I especially liked to see the confidences at the bottom of the screen, this allowed me to check the quality of their model. And they did a good job: the clear cases were those where Watson was right in many cases.

Another nice thing was the reactions of the other contestants: Several times they seem to  know the answer (the question) as well but they are simply too slow.

And this was only day 1, on the second day of this three-day contest Watson performed even better. But after having digged a bit deeper I found out that the used techniques were pretty simple: at first, I thought that Watson understood the question by hearing instead of getting them directly. This is of course a big advantage since you don’t lose any time with “understanding” what has been said or written. Talking about time, there is of course another big advantage of Watson that he does not lose any time for pressing the buzzer.

The basic techniques are pretty simple as well: Watson stores about 200 million pages in a large search index – among them the complete Wikipedia – and searches for the given answer in those pages (ok, we probably all know how this works). From the top k results Watson extracts the most important person / concept / object etc. and creates an appropriate question. Little details have leaked about that but from that little I got the impression, that it’s merely a topic detection or a named entity recognition and the confidence is based more or less on the average of the topic / NER confidences. Mix those simple ideas with the power of 2800 traditional computers and you get an impressive result…

The simple ideas most often are the most robust ones and the scientific and engineering efforts are impressive. Thanks, IBM, for those efforts and also for the positive effect this show probably has on the public acceptance of data mining and business analytics.

Ingo Mierswa

Ingo Mierswa

Ingo Mierswa is the founder and president of RapidMiner and an industry-veteran data scientist since starting to develop RapidMiner at the Artificial Intelligence Division of the TU Dortmund University in Germany. Mierswa, the scientist, has authored numerous award-winning publications about predictive analytics and big data. Mierswa, the entrepreneur, is the founder of RapidMiner. Under his leadership RapidMiner has grown up to 300% per year over the first seven years. In 2012, he spearheaded the go-international strategy with the opening of offices in the US as well as the UK and Hungary. After two rounds of fundraising, the acquisition of Radoop, and supporting the positioning of RapidMiner with leading analyst firms like Gartner and Forrester, Ingo takes a lot of pride in bringing the world’s best team to RapidMiner.