Summarizing Numerical Data

STAT 20: Introduction to Probability and Statistics

Agenda

Announcements
Reading Questions
Break
Worksheet: Summarizing Numerical Data
R Workshop: Summarizing Numerical Data
Appendix: More practice!

Announcements

Group tutoring is today in Evans 340 from 5-7pm.
Lab 1 and Portfolio 1 are due tomorrow at 8pm.

Reading Questions

Please put your laptops under your desk and your phones away.
Write your name, ID, and bubble in Version “A” on your answer sheet.
You may work only with those at your table!

Which of the following plot types for numerical variables maintain all of the information found in the original data set?

A. dot plot
B. histogram
C. violin plot
D. box plot

00:30

If you wish to see less detail in your histogram and perform more aggregation, which of the following is the best course of action?

A. switch to a dot plot
B. switch to a bar chart
C. instead of presenting the histogram, display the original data frame with the raw data
D. increase the bin width of the histogram
E. decrease the bin width of the histogram

00:30

Which word best describes a distribution with a long tail stretching out to the left?

A. bimodal
B. unimodal
C. left skewed
D. right skewed

00:30

How many more columns will the output from the second line of code have than the first?

summarise(penguins, 
          body_mass_median = median(body_mass_g),
          body_mass_IQR = IQR(body_mass_g))

grouped_penguins <- group_by(penguins, species)

summarise(grouped_penguins, 
          body_mass_median = median(body_mass_g),
          body_mass_IQR = IQR(body_mass_g))

A. None
B. 1
C. 2
D. 3

01:00

Before making a violin plot using `ggplot2`, how can we determine the order of the violins?

A. By using select().
B. By using mutate() with factor().
C. By using group_by() and summarize().
D. By using data.frame().

00:30

Break

05:00

Worksheet: Summarizing Numerical Data

Mean, median, mode: which is best?

It depends on the nature of your data and what you seek to capture in your summary.

Get out your worksheet. You’ll be watching a 3 minute video that discusses characteristics of a typical human. Note which numerical summaries are used and what for.

Worksheet: Summarizing Numerical Data

25:00

R Workshop: Summarizing Numerical Data

25:00

End of Lecture

Appendix - More practice!

Describing Shape

Which of these variables do you expect to be uniformly distributed?

bill length of Gentoo penguins
salaries of a random sample of people from California
house sale prices in San Francisco
birthdays of classmates (day of the month)

Please vote at pollev.com.

01:00

General Advice - Measures of Center

Means are often a good default for symmetric data.

Means are sensitive to very large and small values, so can be deceptive on skewed data. > Use a median

Modes are often the only option for categorical data.

But there are other notions of typical… what about a maximum?

Concept Question 3 - Measures of Spread

Why are measures of spread so important? Consider the following question.

There are two new food delivery services that open in Berkeley: Oski Eats and Cal Cravings. A friend of yours that took Stat 20 collected data on each and noted that Oski Eats has a mean delivery time of 29 minutes and Cal Cravings a mean delivery time of 27 minutes. Which would would you rather order from?

One possible reality

Which would would you rather order from?

01:00

Summarizing Numerical Data

Agenda

Announcements

Reading Questions

Which of the following plot types for numerical variables maintain all of the information found in the original data set?

If you wish to see less detail in your histogram and perform more aggregation, which of the following is the best course of action?

Which word best describes a distribution with a long tail stretching out to the left?

How many more columns will the output from the second line of code have than the first?

Before making a violin plot using ggplot2, how can we determine the order of the violins?

Break

Worksheet: Summarizing Numerical Data

Mean, median, mode: which is best?

Worksheet: Summarizing Numerical Data

R Workshop: Summarizing Numerical Data

End of Lecture

Appendix - More practice!

Describing Shape

General Advice - Measures of Center

Concept Question 3 - Measures of Spread

One possible reality

Before making a violin plot using `ggplot2`, how can we determine the order of the violins?