Multiple Linear Regression

STAT 20: Introduction to Probability and Statistics

Agenda

  • Announcements
  • Reading Questions (open-note)
  • Break
  • Worksheet: Multiple Linear Regression
  • Appendix: more practice (quiz-level questions).

Announcements

  • Lab 2: Flights due tomorrow at 12pm.
    • Part 1: Submit like your portfolios - scan and upload
    • Part 2: Render to PDF, download, and upload - make sure to assign pages
  • No reading questions for the next six lectures leading up to the quiz (traditional-style lecture).

Reading Questions

  • Join https://pollev.com/jeremysanchez! You may register with your name or as an anonymous guest. You will not be graded today, but you can get a sense of how you would do by keeping track of the points you score based on your answers.

What are \(b_0, b_1, ..., b_p\) in the multiple linear regression formula called?

  • A: Terms
  • B: Coefficients
  • C: Variables
  • D: Predictors
02:00

What values can the variable \(geowest\) in the shown formula take?

\[ \widehat{price} = -15.97+2.87\times food-1.45 \times geowest\]

  • A: any value on the real number line
  • B: either 0 or 1
  • C: "east" or "west"
02:00

What name is given to a level of a categorical variable which is not given an indicator in a linear model?

  • A: Indicator level
  • B: Coefficient
  • C: Reference level
  • D: Primary level
02:00

If you include a numerical variable as an input variable to a linear model, how many terms for it will appear in the model?

  • A: 1
  • B: 2
  • C: 3
  • D: 4
02:00

If you include a categorical variable with 3 levels as an input variable in a linear model, how many terms for it will appear in the model?

  • A: 1
  • B: 2
  • C: 3
  • D: 4
02:00

Break

05:00

Worksheet: Multiple Linear Regression

03:00

Appendix - More practice!

Question 1

m1 <- lm(bill_depth_mm ~ bill_length_mm, data = penguins)
m2 <- lm(bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
         data = penguins)

How many more coefficients does the second model have than the first?

Questions 2-4

Consider the following multiple linear regression model, which will be the subject of the next three review questions.

Question 2

01:00


m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117  

Which is the correct interpretation of the coefficient in front of bill length? Select all that apply.

Question 3

01:00


m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117  

Which is the correct interpretation of the coefficient in front of Gentoo?

Question 4

01:00


m2

Call:
lm(formula = bill_depth_mm ~ bill_length_mm + body_mass_g + species, 
    data = penguins)

Coefficients:
     (Intercept)    bill_length_mm       body_mass_g  speciesChinstrap  
        10.33083           0.09484           0.00117          -0.90748  
   speciesGentoo  
        -5.80117  

How would this linear model best be visualized?

Question 5

Consider the following linear regression output where the variable school is categorical and the variable hours_studied is numerical.

Coefficients Estimate
(Intercept) 2.5
hours_studied .2
schoolCal 1
schoolStanford -1

Question 5 (cont.)

  • Say I wanted to create a data frame from the original edu dataframe which contains the minimum, median, and IQR for hours_studied among each school. In order to do this, I make use of group_by() followed by summarize(). I save this data frame into an object called GPA_summary.

What are the dimensions of GPA_summary?

01:00