class: center, middle, inverse, title-slide # Midterm Project ### Kaitlyn Fales ### 10/26/20 --- class: inverse, middle, center # Visualizations --- # Plot 1 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- # Plot 2 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- # Plot 3 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- # Plot 4 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- # Plot 5 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- # Plot 6 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- # Plot 7 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- # Plot 8 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- # Plot 9 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- # Plot 10 ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- # Animation ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-12-1.gif)<!-- --> --- class: inverse, middle, center # Predictive Models --- # The Models I Chose - Random Forest using method = 'rf' - Random Forest using method = 'ranger' - Boosted Logistic Regression, method = 'LogitBoost' --- # Modelling Total Cost **Random Forest using method = 'rf'** ```r plot(forest_cv) ``` ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- # Modelling Total Cost **Random Forest using method = 'ranger'** ```r plot(forest_cv1) ``` ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- # Modelling Total Cost **Boosted Logistic Regression, method = 'LogitBoost'** ```r plot(boost_model) ``` ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-18-1.png)<!-- --> --- # Model Comparison for Total Cost - The best model is the Boosted Logistic Regression ```r bwplot(results) ``` ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-20-1.png)<!-- --> --- # Final Model Accuracy ```r pred <- predict(boost_model,df_test) cm <- confusionMatrix(data = factor(pred), reference = df_test$target, positive = "high cost") cm$overall[1] ``` ``` ## Accuracy ## 0.8763689 ``` --- # Modelling Length of Stay - Target was a patient's length of stay ('los') - Similar to the first set of models, a value less than the median (3 days) was coded as 'short' and a value greater than 3 was coded as 'long' - Input variables are similar to the other models, only total cost was taken out, and # of days in ICU was put in ```r median(df_cleaned$los) df_cleaned$target1 <- case_when( df_cleaned$los <3 ~ 'short', TRUE ~ 'long' ) df_model1 <- df_cleaned %>% select(target1,age,sex,raceethn,provider,moa,mod,admtype,campus,icu) %>% filter(admtype != '' & raceethn != '' & sex != 9) ``` --- # Modelling Length of Stay **Random Forest using method = 'rf'** ```r plot(forest_cv2) ``` ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-24-1.png)<!-- --> --- # Modelling Length of Stay **Random Forest using method = 'ranger'** ```r plot(forest_cv3) ``` ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-26-1.png)<!-- --> --- # Modelling Length of Stay **Boosted Logistic Regression, method = 'LogitBoost'** ```r plot(boost_model1) ``` ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-28-1.png)<!-- --> --- # Model Comparison for Length of Stay - The best model is the Boosted Logistic Regression ```r bwplot(results1) ``` ![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-30-1.png)<!-- --> --- # Final Model Accuracy ```r pred1 <- predict(boost_model1,df_test1) cm1 <- confusionMatrix(data = pred1, reference = df_test1$target1, positive = "long") cm1$overall[1] ``` ``` ## Accuracy ## 0.8792879 ``` --- # Modelling Issues - Issues with missing values for some variables (blank values) - Issues with variables not being of the correct class (ex. 'icu' was a character)