Evaluating and Improving Predictions

STAT 20: Introduction to Probability and Statistics

Agenda

Announcements
Reading Questions
Break
Worksheet

Announcements

Portfolio (four worksheets) due Friday at 5pm.

Quiz 4 is next Thursday. Two sided, handwritten cheatsheet is allowed.

Reading Questions

Please put your laptops under your desk and your phones away.
Write your name, ID, and bubble in Version “A” on your answer sheet.
You may work only with those at your table!

Compare \(r\) and \(R^2\), selecting all of the statements which are true.

A: \(r\) is a summary statistic between numerical variables that is often computed before a linear model is fit; \(R^2\) is fit to assess the performance of the linear model once it is fit.
B: Both \(r\) and \(R^2\) can take values between 0 and 1.
C: When fitting a linear regression model which involves multiple predictor variables, the goal is to fit a model which maximizes \(r\).

00:40

When adding a new variable to a linear regression model, what will happen to the \(R^2\) value as compared to what it was before?

A: The \(R^2\) value will either be equal to what is was before, or less.
B: We cannot determine this information.
C: The \(R^2\) value will either be equal to what it was before, or greater.

00:30

When applying a transformation to a predictor variable before fitting a linear model (for example, squaring the predictor variable), the resulting function that describes the model is a _ function of the transformed variable and a function of the original variable.

A: non-linear; non-linear
B: linear; non-linear
C: linear; linear
D: non-linear; linear

00:40

You would like to generate predictions for `new_x` with a linear model that you built called `m1`. What data type/data structure does the argument supplied to `newdata`, `new_x`, need to be?

predict(object = m1, newdata = new_x)

A: vector
B: logical
C: integer
D: data frame
E: factor

00:30

Break

05:00

Worksheet: Evaluating and Improving Predictions

30:00

Appendix - More practice!

Concept Questions

Which two models will exhibit the highest \(R^2\)?

01:00

# A tibble: 4 × 5
  name    hours cuteness food_eaten is_indoor_cat
  <chr>   <dbl>    <dbl>      <dbl> <lgl>        
1 castiel    12      9          175 TRUE         
2 frank      18     10          200 TRUE         
3 luna       19      9.5        215 FALSE        
4 luca       10      8          218 FALSE

m1 <- lm(formula = hours ~ cuteness + food_eaten + is_indoor_cat, 
         data = cats)

      (Intercept)          cuteness        food_eaten is_indoor_catTRUE 
    -3.800000e+01      6.000000e+00      2.815002e-16     -4.000000e+00

How many hours does the model predict Frank will sleep each day? Write out the linear equation of the model from the model output to help you.

03:00

Which is the most appropriate non-linear transformation to apply to time_being_pet?