01 October 2012

Blog

Books on Data Mining

The RapidMiner team keeps on mining and we excavated two great books for our users. The first one, Data Mining for the Masses by Matthew North, is a very practical book for beginners and intermediate data miners (and is available for free here), whereas The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman provides a deep insight into the mathematical models driving the heart of every data analysis. It is not really hot off the press, but has not lost its glamour since the release of the first edition a couple of years ago.

The book is targeted at readers with a statistical, mathematical or informatics background who want to understand not only how to use an operator in RapidMiner, but also why it works. The reader should not be afraid of mathematical formulas, but he will be rewarded by a decent understanding of many methods implemented in RapidMiner and of the connections and inter-relationships between different learning algorithms: what do decision trees and rule learning algorithms have in common? Should you try an SVM if k-NN fails on your data?

The Elements of Statistical Learning can be considered a standard book used in many data mining lectures around the world, which may be attributed to the fact that it does not just contain all the detailed information, but also presents them with relatively simple explanations – keeping in mind of course, that understanding complex topics will always require a whole lot of effort. The book is downloadable from the author’s website.

Data Mining for the Masses, on the other side, takes a practical approach and, as the name implies, aims at a broader range of readers. Those of you who have visited this year’s RCOMM already had the opportunity to follow the presentation of the book by the author himself: Matt’s comprehensive book gives a detailed and profound introduction to data mining. All major concepts of data mining are covered in a well structured manner using real-life examples, most of which are solved completely with RapidMiner.

Actually, the book begins one step before the data analysis and explains the meaning of data mining itself, and also does not leave out ethical concerns the responsible data miner should keep in mind. Because of the easy style of writing and the good examples, this book is suited not only for IT professionals and college students who want to take a deeper look onto data mining, but for anyone who wants to learn how to get the most out of their data.

Want to learn more about data science? Check out the RapidMiner Academy to start earning skills-based certifications today!

Related Resources