01:00
penguins
datasetA logistic regression model was fit in an attempt to predict the sex of a penguin "male"
or "female"
based on its body mass (grams).
Assuming that no change to the penguins
dataset was made, will the model be predicting the probability of the penguin being male or the probability of the penguin being female?
01:00
(Intercept) body_mass_g
-5.162541644 0.001239819
Which of the expressions given in the poll (math or code) will correctly calculate the predicted probability that a penguin that weighs 4000 g is a female? Select all that apply
01:00
What is the misclassification rate of this model?
# A tibble: 4 × 3
# Groups: sex [2]
sex y_hat n
<fct> <chr> <int>
1 female female 109
2 female male 56
3 male female 74
4 male male 94
01:00
(Intercept) body_mass_g bill_length_mm
-6.91208086 0.00101530 0.06112808
Open up RStudio and fit the model here in the slides. What are the predicted sexes of these two penguins?
01:00
05:00
45:00
glm()
!m2 <- glm(sex ~ body_mass_g + bill_length_mm,
data = train, family = "binomial")
p_hat <- predict(m2, test, type = "response")
test |>
select(sex) |>
mutate(p_hat = p_hat)
# A tibble: 70 × 2
sex p_hat
<fct> <dbl>
1 female 0.345
2 male 0.566
3 female 0.259
4 male 0.280
5 male 0.365
6 female 0.196
7 male 0.428
8 female 0.220
9 male 0.559
10 male 0.279
# ℹ 60 more rows
m2 <- glm(sex ~ body_mass_g + bill_length_mm,
data = train, family = "binomial")
test |>
select(sex) |>
mutate(p_hat = predict(m2, test, type = "response"),
y_hat = ifelse(p_hat > .5, "male", "female"))
# A tibble: 70 × 3
sex p_hat y_hat
<fct> <dbl> <chr>
1 female 0.345 female
2 male 0.566 male
3 female 0.259 female
4 male 0.280 female
5 male 0.365 female
6 female 0.196 female
7 male 0.428 female
8 female 0.220 female
9 male 0.559 male
10 male 0.279 female
# ℹ 60 more rows
test |>
select(sex) |>
mutate(p_hat = p_hat,
y_hat = ifelse(p_hat > .5, "male", "female"),
FP = sex == "female" & y_hat == "male",
FN = sex == "male" & y_hat == "female")
# A tibble: 70 × 5
sex p_hat y_hat FP FN
<fct> <dbl> <chr> <lgl> <lgl>
1 female 0.345 female FALSE FALSE
2 male 0.566 male FALSE FALSE
3 female 0.259 female FALSE FALSE
4 male 0.280 female FALSE TRUE
5 male 0.365 female FALSE TRUE
6 female 0.196 female FALSE FALSE
7 male 0.428 female FALSE TRUE
8 female 0.220 female FALSE FALSE
9 male 0.559 male FALSE FALSE
10 male 0.279 female FALSE TRUE
# ℹ 60 more rows
test |>
select(sex) |>
mutate(p_hat = p_hat,
y_hat = ifelse(p_hat > .5, "male", "female")) |>
summarize(MCR = mean(sex != y_hat))
# A tibble: 1 × 1
MCR
<dbl>
1 0.371