class: center, middle, inverse, title-slide # Midterm Project ### Kaitlyn Fales ### 10/26/20 --- class: inverse, middle, center # Visualizations --- # Plot 1 <!-- --> --- # Plot 2 <!-- --> --- # Plot 3 <!-- --> --- # Plot 4 <!-- --> --- # Plot 5 <!-- --> --- # Plot 6 <!-- --> --- # Plot 7 <!-- --> --- # Plot 8 <!-- --> --- # Plot 9 <!-- --> --- # Plot 10 <!-- --> --- # Animation <!-- --> --- class: inverse, middle, center # Predictive Models --- # The Models I Chose - Random Forest using method = 'rf' - Random Forest using method = 'ranger' - Boosted Logistic Regression, method = 'LogitBoost' --- # Modelling Total Cost **Random Forest using method = 'rf'** ```r plot(forest_cv) ``` <!-- --> --- # Modelling Total Cost **Random Forest using method = 'ranger'** ```r plot(forest_cv1) ``` <!-- --> --- # Modelling Total Cost **Boosted Logistic Regression, method = 'LogitBoost'** ```r plot(boost_model) ``` <!-- --> --- # Model Comparison for Total Cost - The best model is the Boosted Logistic Regression ```r bwplot(results) ``` <!-- --> --- # Final Model Accuracy ```r pred <- predict(boost_model,df_test) cm <- confusionMatrix(data = factor(pred), reference = df_test$target, positive = "high cost") cm$overall[1] ``` ``` ## Accuracy ## 0.8763689 ``` --- # Modelling Length of Stay - Target was a patient's length of stay ('los') - Similar to the first set of models, a value less than the median (3 days) was coded as 'short' and a value greater than 3 was coded as 'long' - Input variables are similar to the other models, only total cost was taken out, and # of days in ICU was put in ```r median(df_cleaned$los) df_cleaned$target1 <- case_when( df_cleaned$los <3 ~ 'short', TRUE ~ 'long' ) df_model1 <- df_cleaned %>% select(target1,age,sex,raceethn,provider,moa,mod,admtype,campus,icu) %>% filter(admtype != '' & raceethn != '' & sex != 9) ``` --- # Modelling Length of Stay **Random Forest using method = 'rf'** ```r plot(forest_cv2) ``` <!-- --> --- # Modelling Length of Stay **Random Forest using method = 'ranger'** ```r plot(forest_cv3) ``` <!-- --> --- # Modelling Length of Stay **Boosted Logistic Regression, method = 'LogitBoost'** ```r plot(boost_model1) ``` <!-- --> --- # Model Comparison for Length of Stay - The best model is the Boosted Logistic Regression ```r bwplot(results1) ``` <!-- --> --- # Final Model Accuracy ```r pred1 <- predict(boost_model1,df_test1) cm1 <- confusionMatrix(data = pred1, reference = df_test1$target1, positive = "long") cm1$overall[1] ``` ``` ## Accuracy ## 0.8792879 ``` --- # Modelling Issues - Issues with missing values for some variables (blank values) - Issues with variables not being of the correct class (ex. 'icu' was a character)