In the previous video (part 1 - if you haven't seen that, please go back and watch it prior to this one as you will need what is in that video to understand this video) we covered loading the libraries and data along with setting up training and testing sets for two particular outcomes (Death and Survived). We used both random forest and Boruta methods for measuring importance of predictors.
In this video we will actually create the logistic regression and then thoroughly test it. You will see the outcome distribution and then score it back to the original dataset so that we can visually compare the data and see how our logistic regression model and machine learning did against the original dataset. We will also look at several ways of measuring our outcome accuracy.
This is a really neat business use case for you to learn machine learning and logistic regression on a dataset. This is a complete walk through and shows all code. There are lots of extras and great tips that will help carry you through larger datasets, data formatting issues you might encounter and so much more. This is an actual process that many data scientists now use to quickly possible outcomes and measure relative accuracy. This process is loaded with tests and more so you can easily document your work and its accuracy, or lack thereof as you will find some data is not good or complete enough for accurate predictions.
This coronavirus data from a NYC hospital (identifying information has been scrubbed from it) provides a very interesting look into how machine learning and logistic regression can be used to look at datasets and predict future outcomes - very cool stuff and a great application of data science!
Please take a moment to subscribe like and share. Also, don't forget to click that bell so that you get notified of all the other great videos I will have coming out!
Thanks again and God bless!
0 Comments