Clustering as Part of the Data Science Methodology

Lionel Der Krikorian, LDK 360

In this two-part presentation Lionel demonstrates two different clustering techniques using the Titanic dataset.

Using Clustering for Preprocessing: Clustering can be an efficient approach to dimensionality reduction, in particular as a preprocessing step before a supervised learning algorithm.

Using clustering for semi-supervised learning: Another use case for clustering is in semi-supervised learning, when we have plenty of unlabeled instances and very few labeled instances.

You can check out the processes from this presentation here. These links only work if RapidMiner Studio is running.