Numerical and Visual Summaries

STAT 20: Introduction to Probability and Statistics



  • Announcements

  • Conceptual Review: bringing three reading notes together (mini-lecture/chart)

  • Coding Review

  • Break

  • Concept Questions

  • Work time on assignments



  • Quiz 1 on Monday. Covers Understanding the World with Data through Summarizing Numerical Data.
  • Problem Set 1 can now be turned in through Friday night for full credit.
  • Lab: Getting Started is due Monday, June 24 at 12pm. Make sure you read the Lab Submission Guidelines posted to Ed.

Concpetual Review



Concept Questions

Concept Question 1 - Taxonomy of Data

Images as data

Images as data

  • Images are composed of pixels (this image is 1012 by 1520)

  • The color in each pixel is in RGB

  • Each band takes a value from 0-255

  • This image is data with 1020 x 1520 x 3 values.

A shoebill with a duck in its mouth.


  • Grayscale images have only one band
  • 0 is black, 255 is white
  • This image is data with 1020 x 1520 x 1 values.

To simplify, assume our photos are 8 x 8 grayscale images.

A shoebill with a duck in its mouth in grayscale.

Images in a Data Frame

If you were to put the data from these (8 x 8 grayscale) images into a data frame, what would the dimensions of that data frame be in rows x columns? Answer at


Concept Questions 2 and 3 - Summarizing Categorical Data

Concept Question 2a

The table below displays data from a survey on a class of students.

What proportion of the entire class was in the marching band?


Concept Question 2b

What were the dimensions of the raw data from which this table was constructed? (rows x cols)


Concept Question 3

Below is a two-variable bar chart describing affiliation and college degree status of 500 survey participants.

Concept Question 3 (cont.)

Based off of the graphic on the previous slide, which group is largest?

  • Democrats with no college degree
  • Democrats with a college degree
  • Republicans with a college degree
  • Republicans without a college degree

Concept Question 3 (cont.)

  • The regular stacked bar chart of counts preserves original counts and thus is good at comparing joint proportions.
  • The stacked, normalized bar chart shows conditional proportions and thus is good for showing associations between variables.

Concept Activity 4 - Summarizing Numerical Data (Measures of Center)

Mean, median, mode: which is best?

It depends on your desiderata: the nature of your data and what you seek to capture in your summary.

Get out a piece of paper. You’ll be watching a 3 minute video that discusses characteristics of a typical human. Note which numerical summaries are used and what for.

General Advice

  1. Means are often a good default for symmetric data.
  1. Means are sensitive to very large and small values, so can be deceptive on skewed data. > Use a median
  1. Modes are often the only option for categorical data.

But there are other notions of typical, depending on the context.

Wrapup - Summarizing Distributions of Data

  • You can construct a statistical graphic to show the shape, which you can describe in terms of modality and skew
  • you can calculate a measure of center to convey a sense of a typical observation
  • and you can calculate a measure of spread to capture how much variability there is in the data

Free time
