Evaluating and Improving Predictions

STAT 20: Introduction to Probability and Statistics

Agenda

  • Announcements
  • Reading Questions
  • Break
  • Worksheet

Announcements

  • Portfolio (four worksheets) due Friday at 5pm.
  • Quiz 4 is next Thursday. Two sided, handwritten cheatsheet is allowed.

Reading Questions

  • Please put your laptops under your desk and your phones away.
  • Write your name, ID, and bubble in Version “A” on your answer sheet.
  • You may work only with those at your table!

Compare \(r\) and \(R^2\), selecting all of the statements which are true.

  • A: \(r\) is a summary statistic between numerical variables that is often computed before a linear model is fit; \(R^2\) is fit to assess the performance of the linear model once it is fit.

  • B: Both \(r\) and \(R^2\) can take values between 0 and 1.

  • C: When fitting a linear regression model which involves multiple predictor variables, the goal is to fit a model which maximizes \(r\).

00:40

When adding a new variable to a linear regression model, what will happen to the \(R^2\) value as compared to what it was before?

  • A: The \(R^2\) value will either be equal to what is was before, or less.
  • B: We cannot determine this information.
  • C: The \(R^2\) value will either be equal to what it was before, or greater.
00:30

When applying a transformation to a predictor variable before fitting a linear model (for example, squaring the predictor variable), the resulting function that describes the model is a _____ function of the transformed variable and a ____ function of the original variable.

  • A: non-linear; non-linear
  • B: linear; non-linear
  • C: linear; linear
  • D: non-linear; linear
00:40

You would like to generate predictions for new_x with a linear model that you built called m1. What data type/data structure does the argument supplied to newdata, new_x, need to be?

predict(object = m1, newdata = new_x)
  • A: vector
  • B: logical
  • C: integer
  • D: data frame
  • E: factor
00:30

Break

05:00

Worksheet: Evaluating and Improving Predictions

30:00

Appendix - More practice!

Concept Questions

Which two models will exhibit the highest \(R^2\)?

01:00

# A tibble: 4 × 5
  name    hours cuteness food_eaten is_indoor_cat
  <chr>   <dbl>    <dbl>      <dbl> <lgl>        
1 castiel    12      9          175 TRUE         
2 frank      18     10          200 TRUE         
3 luna       19      9.5        215 FALSE        
4 luca       10      8          218 FALSE        
m1 <- lm(formula = hours ~ cuteness + food_eaten + is_indoor_cat, 
         data = cats)
      (Intercept)          cuteness        food_eaten is_indoor_catTRUE 
    -3.800000e+01      6.000000e+00      2.815002e-16     -4.000000e+00 

How many hours does the model predict Frank will sleep each day? Write out the linear equation of the model from the model output to help you.

03:00

Which is the most appropriate non-linear transformation to apply to time_being_pet?

01:00

Worksheet

25:00

Lab

45:00