We’re done with week IV and Linear Regression, Multiple Linear Regression and Polynomial Regression. The most important lesson learned this time: set the right features (MLR). Meaning, the ones that actually have an impact.
Explaining Backward Elimination, Forward Selection or Bidirectional Elimination was quite abstract and we couldn’t wrap our head around it until we actually started to write code. When we tried Backward Elimination to figure out the most relevant features in a Multiple Linear Regression Model it all made sense.
Step 1: you set the Significance Level (0.05).
Step 2: Remove the independent variable with the highest p-value (if it’s above the SL of 0.05)
Step 3: Repeat the process until all independent variables are below the SL
This was pretty straight forward. Until feature column marketing. As in the previous blog post mentioned, we are looking at a business matrix showing us the following columns: Admin, R&D, Marketing, City and finally Profit as an dependent variable.
Turns out, in order to be able to predict a startup’s profit, admin expenses and the city are pretty irrelevant factors. The p-value of marketing was 0.06 though. Following procedure thoroughly, we had to remove the feature, because it was slightly above the SL of 0.05. But if you look at this problem from a more business oriented perspective, it does make a lot of sense to keep marketing expenses as an important feature in predicting profit.
I assume that having domain expertise will make a huge difference and distinguish a data scientist from a remarkable data scientist, who is able to see the big picture and forge machine learning models not only based on mathematics but on domain knowledge. And this is where it gets interesting. Can’t wait to see where this is going next. If you have any comments or advice please share your thoughts.
We’re a group of tech-enthusiasts based in Tokyo and would love to share ideas, exchange knowledge, collaborate and work on projects with people who are interested in AI and Machine Learning. Currently we are hosting a weekly study group for beginners, following a Udemy online tutorial on Machine Learning in Python and R. If you’d like to join us go to Machine Learning Tokyo (FB group) for more details. All professional backgrounds and students from all fields are welcome.