Lab 1: Class Survey

Slides

Part I: Understanding the Context of the Data

Please record your answers on the worksheet below and upload it to Gradescope.

To answer this worksheet, consult the printout of the original survey found here.

Part II: Computing on the Data

The real data from your surveys is available in the stat20data package as class_survey. To complete the following questions, you will need to find the columns in the class_survey data frame associated with each variable mentioned.

Question 1

How much coding experience do students have?

Use the “scale” version for this question.

part a

Construct an appropriate plot for this variable using ggplot2.

part b

Then calculate one appropriate measure of a spread and one appropriate measure of center for this variable.

part c

Use these summaries to answer the survey question in a sentence or two.

Question 2

For this question only, you may use the following, modified dataset, which can be obtained by running the following code in R.

filtered_survey <- filter(class_survey, 
                          new_COVID_variant >= 0,
                          new_COVID_variant < 1)

What are students’ perceptions of the chance that there is a new COVID variant that disrupts instruction in Fall 2022?

part a

Construct an appropriate plot for this variable using ggplot2.

part b

Then calculate one appropriate measure of a spread and one appropriate measure of center for this variable.

part c

Use these summaries to answer the survey question in a sentence or two.

Question 3

Is there an association between students’ favorite season and terrain preference (beach or mountains)?

part a

Construct the plot that you proposed in Part I using ggplot2. Ensure that the seasons are plotted in the order: spring, summer, fall, winter.

part b

Then, state a measure of a typical observation of the terrain preference variable for each season group (what terrain does a typical student whose favorite season is fall prefer, and so on).

part c

Use these summaries to answer the survey question in a sentence or two.

Question 4

Is there an association between students most identifying as an entrepreneur and their optimism for cryptocurrency?

You may treat the entrepreneur variable categorically.

part a

Construct an appropriate plot including both of these variables using ggplot2.

part b

Then calculate one appropriate measure of a spread and one appropriate measure of center for the cryptocurrency variable, grouping by the levels of the entrepreneur variable.

part c

Use these summaries to answer the survey question in a sentence or two.

Question 5

Six variables appear in the survey data frame that were derived from the original title question: artist, humanist, natural_scientist, social_scientist, comp_scientist and entrepreneur. The artist variable is TRUE for those students who most identified as an artist and FALSE otherwise. The other five variables are defined similarly. You may treat these variables categorically.

part a

Propose your own question involving any two variables in the dataset–one of which comes from the title variable group mentioned above.

part b

Construct a plot which would aid in answering this question using ggplot2.

part c

Grouping by the levels of the title group variable, calculate or state one appropriate measure of center for your second variable.

part d

Use these summaries to answer your question in part a in a sentence or two.