Midterm Project

class: center, middle, inverse, title-slide

# Midterm Project
### Kaitlyn Fales
### 10/26/20

---

class: inverse, middle, center
# Visualizations

---
# Plot 1
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-2-1.png)
---
# Plot 2
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-3-1.png)
---
# Plot 3
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-4-1.png)
---
# Plot 4
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-5-1.png)
---
# Plot 5
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-6-1.png)
---
# Plot 6
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-7-1.png)
---
# Plot 7
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-8-1.png)
---
# Plot 8
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-9-1.png)
---
# Plot 9
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-10-1.png)
---
# Plot 10
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-11-1.png)
---
# Animation
![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-12-1.gif)
---
class: inverse, middle, center
# Predictive Models
---
# The Models I Chose

- Random Forest using method = 'rf'
- Random Forest using method = 'ranger'
- Boosted Logistic Regression, method = 'LogitBoost'

---
# Modelling Total Cost
**Random Forest using method = 'rf'**

```r
plot(forest_cv)
```

![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-14-1.png)

---
# Modelling Total Cost
**Random Forest using method = 'ranger'**

```r
plot(forest_cv1)
```

![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-16-1.png)

---
# Modelling Total Cost
**Boosted Logistic Regression, method = 'LogitBoost'**

```r
plot(boost_model)
```

![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-18-1.png)

---
# Model Comparison for Total Cost

- The best model is the Boosted Logistic Regression

```r
bwplot(results)
```

![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-20-1.png)

---
# Final Model Accuracy

```r
pred <- predict(boost_model,df_test)
cm <- confusionMatrix(data = factor(pred), reference = df_test$target, positive = "high cost")
cm$overall[1]
```

```
##  Accuracy 
## 0.8763689
```

---
# Modelling Length of Stay

- Target was a patient's length of stay ('los')
- Similar to the first set of models, a value less than the median (3 days) was coded as 'short' and a value greater than 3 was coded as 'long'
- Input variables are similar to the other models, only total cost was taken out, and # of days in ICU was put in

```r
median(df_cleaned$los)
df_cleaned$target1 <- case_when(
  df_cleaned$los <3 ~ 'short',
  TRUE ~ 'long'
)

df_model1 <- df_cleaned %>% 
  select(target1,age,sex,raceethn,provider,moa,mod,admtype,campus,icu) %>%
  filter(admtype != '' & raceethn != '' & sex != 9)
```

---
# Modelling Length of Stay
**Random Forest using method = 'rf'**

```r
plot(forest_cv2)
```

![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-24-1.png)

---
# Modelling Length of Stay
**Random Forest using method = 'ranger'**

```r
plot(forest_cv3)
```

![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-26-1.png)

---
# Modelling Length of Stay
**Boosted Logistic Regression, method = 'LogitBoost'**

```r
plot(boost_model1)
```

![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-28-1.png)

---
# Model Comparison for Length of Stay

- The best model is the Boosted Logistic Regression

```r
bwplot(results1)
```

![](Midterm-Project-Presentation_files/figure-html/unnamed-chunk-30-1.png)

---
# Final Model Accuracy

```r
pred1 <- predict(boost_model1,df_test1)
cm1 <- confusionMatrix(data = pred1, reference = df_test1$target1, positive = "long")
cm1$overall[1]
```

```
##  Accuracy 
## 0.8792879
```
---
# Modelling Issues

- Issues with missing values for some variables (blank values)
- Issues with variables not being of the correct class (ex. 'icu' was a character)