Week VI: Clustering

After a great weekend with our study group we’ve finished K-Means Clustering and Hierarchical Clustering.

In Classification, your model tries to predict two or more labels, that you already know of. (e.g. it learns from past customer data if a future customer is going to buy your product or not)

Clustering is explorative. You don’t know the output. What the model does is it puts data with certain patterns in clusters. The important thing is figuring out the appropriate number of clusters. For K-Means Clustering we used the Elbow method to find the optimal number. For Hierarchical Clustering we built the so called Dendrograms.

IMG_9802

It was quite interesting to apply this method and we were all pretty excited about it. K-Means was simple to understand and easily adaptable. It works well on both, small and large datasets. Hierarchical Clustering is not appropriate for large datasets, but the optimal number of clusters can be obtained by the model itself.

We’re ready to move on.

A quick note: We’re a group of people with diverse backgrounds, ranging from software engineering to computational linguistics and business. We’re following the Udemy Machine Learning A-Z™: Hands-On Python & R in Data Science. This course is giving us a good overview, without going too deep. After completion we will be able to dive deeper into specific fields of interests, such as robotics, reinforcement learning and NLP.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s