Rule #1: Know your data

We’ve been talking a lot about what it takes to build a truly sophisticated model and get the best results. We looked at the Titanic challenge on Kaggle, but whatever we did, the accuracy did not exceed a certain percentage. So what to do?

Whatever problem you’re dealing with, having domain expertise gives you a head-start. Taking one step back, to the pre-pre-processing phase: one way of getting a better and intuitive understanding of the data you’re dealing with is by visualizing it. There are a lot of ways to visualize data (see previous posts), a new way was just recently published by Google: Facets.

 

Goole published

Facets: An Open Source Visualization Tool for Machine Learning Training Data

Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. I can be used with Jupyter Notebooks or embedded into webpages.

 

Tech and Art

Visualization is powerful. It’s indirectly related but I just stumbled upon this beautiful website R2D3. It combines statistics with interactive design and gives a nice visual intro to Machine Learning. Another very helpful place to look for additional insights is Andy Kirk’s Visualising Data website, a collection of helpful resources.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s