After a great weekend with our study group we’ve finished K-Means Clustering and Hierarchical Clustering.
In Classification, your model tries to predict two or more labels, that you already know of. (e.g. it learns from past customer data if a future customer is going to buy your product or not)
Clustering is explorative. You don’t know the output. What the model does is it puts data with certain patterns in clusters. The important thing is figuring out the appropriate number of clusters. For K-Means Clustering we used the Elbow method to find the optimal number. For Hierarchical Clustering we built the so called Dendrograms.
It was quite interesting to apply this method and we were all pretty excited about it. K-Means was simple to understand and easily adaptable. It works well on both, small and large datasets. Hierarchical Clustering is not appropriate for large datasets, but the optimal number of clusters can be obtained by the model itself.
We’re ready to move on.
A quick note: We’re a group of people with diverse backgrounds, ranging from software engineering to computational linguistics and business. We’re following the Udemy Machine Learning A-Z™: Hands-On Python & R in Data Science. This course is giving us a good overview, without going too deep. After completion we will be able to dive deeper into specific fields of interests, such as robotics, reinforcement learning and NLP.