We’re done with Regression and Classification and finished about 40% of the Udemy course. I was a bit skeptical to be honest, because up until now everything went perfectly and smoothly, having templates, clear guidelines and a rather neat and small datasets.
But the real world isn’t smooth and perfect. It’s complicated, confusing and flawed. So we couldn’t wait to try out all the things we’ve learned on other data. And there’s no easier way to do that than get to our first Kaggle competition. Machine Learning from Disaster. Predict survival on the Titanic and get familiar with ML basics.
We had two hours. We decided to use Python as our weapon of choice, because we feel more comfortable writing code in Python than in R. We looked at the data and of course, we have some prior knowledge about Titanic survivors, meaning women, children and upper class.
We uploaded the training data and we’re quite busy with handling the data pre-preprocessing part. We didn’t take all the features, threw out names and ID for example. We also didn’t think that cabin might be a useful feature. So we focused just a few essential features. That was a mistake.
Converting categorical data, feature scaling and replacing missing data took some time. Some things didn’t work out, but we managed to figure out a way. As soon as the data was all set, we tried various classifiers. But the accuracy was disappointing in every case. There was clearly something we were missing. We decided to look at p-value and performed Backward Elimination. We threw out the insignificant features and got the best result of 87% accuracy with SVM. Still unsatisfying.
Time was up. We knew we had to work with the features, and maybe even do some feature engineering. We have gender and age, but we know for example that children survived, meaning gender male > age 14. Combining those two for example might lead to better results.
We’re definitely not giving up. Our goal is to reach 94% accuracy. Until next time. Sayonara.
We’re a group of tech-enthusiasts based in Tokyo and would love to share ideas, exchange knowledge, collaborate and work on projects with people who are interested in AI and Machine Learning. Currently we are hosting a weekly study group for beginners, following a Udemy online tutorial on Machine Learning in Python and R. If you’d like to join us go to Machine Learning Tokyo (FB group) for more details. All professional backgrounds and students from all fields are welcome.