Part II: Computing on the Data
You can access the data from the 2009 Iran election in the iran
data frame inside the stat20data
Question 1
What is the empirical distribution of the vote counts for Ahmadinejad? Answer with:
- a plot (label your axes and provide a title),
- numerical summaries of center and spread,
- and a written interpretation.
Question 2
Create two vectors:
one with the range of values that the Benford’s Law probability distribution can take
and the second with the corresponding probabilities for each value.
Question 3
What might 366 draws (the amount of rows in the iran
dataframe) from \(X \sim Benford()\) look like? Find out by sampling from the \(Benford\) probability distribution. Create a plot of the resulting empirical distribution that you collect. Label your axes and title your plot Benford’s Law Simulation.
Question 4
What do the first digit empirical distributions look like for the four candidates in the Iranian presidential election?
- Make one plot for each distribution and title them by candidate name.
- Combine the four plots into a single visualization using the
Inside the stat20data
package there is a function called get_first()
that pulls off the first digit of every element in a vector. This will be helpful when creating your plots.
Question 5
How do the observed first digit distributions of Question 4 compare to the one you created in Question 3 by sampling from Benford’s Law? Which candidate has a first-digit distribution that is:
- most similar to
- most different
from the sampled one?
US Elections
The OpenElections project obtains and standardizes precinct-level results from US elections, including the 2020 and 2024 (for many states) US Presidential Elections. To access the data, visit OpenElections’ GitHub page (https://github.com/openelections) and click on the tab labeled “Repositories”. From there, scroll down until you find the repository for Georgia. It will be called openelections-data-ga
. Select the 2020
folder and find the file 20201103__ga__general.csv
. Some notes: - To read the csv file into R, you will need to point R to the raw version of the data set. To view the raw csv you will either click the button that says “Raw” at the top right of the data frame on GitHub or click the link that says “View Raw Data”. When you are looking at the raw csv file, the url in your browser is the one you can use to access the file from within R. - There may be strange extra rows in your data, such as a row tallying total overall votes. Visually inspect the data to see if anything jumps out and be sure to take this into consideration when doing your analysis.
Question 6
What is the unit of observation in your state’s data frame? What are the dimensions?
Question 7
Use this data to create a plot of the state’s first digit distribution by precinct. Use the number of votes cast for Joseph Biden in each precinct.
Question 8
Does the 2020 election in Georgia appear to fit Benford’s distribution better or worse than the Iran election?
Question 9
Take this opportunity to explore the US elections data provided by Open Elections and construct a data visualization of your choosing. This plot could deal with vote totals for different candidates or different parties, the types of offices, how candidates appear on the ballots, how one state compares to another state, how a state or precinct has changed over time … you have lots of options here!
Once you have a plot, make two claims: a summary claim and a generalization. Do you think your generalization claim is well-supported here?
Question 10
Now go back to the Georgia repo (openelections-data-ga
), and go to the folder 2024
. Go to the file 20241105__ga__general__precinct-level.csv
, and do the same plot that you did for 2020 in Question 7, that is, a plot of the state’s first digit distribution by precinct, using the number of votes cast for Kamala Harris in each precinct. Compare the 2020 and 2024 plots, and comment on what you observe.
Last Question
Will you ensure that your submission to Gradescope…
- is of a pdf generated from a qmd file,
- has all of your code visible to readers,
- and assigns each of the questions to all pages that show your work for that question?
(This one is easy! Just answer “yes” or “no”)