Number of simulations = 1000
prop_wins_game_1
1 0.514
prop_wins_game_2
1 0.487
Definitions, axioms, and examples
In an enormously entertaining paper written about a decade ago, the economist Peter Backus estimated his chance of finding a girlfriend on any given night in London at about 1 in 285,000 or 0.0000034%. As he writes, this is either depressing or cheering news for a person, depending on what you had estimated your chance to be before reading the paper and doing a similar computation for yourself.1 The interesting point in the paper was using a probabilistic argument (originally developed by the astronomer and astrophysicist Frank Drake to estimate the probability of extra-terrestrial civilizations) to think about his dating problems. Anyone can follow the arguments put forward by Backus, including his statements that use probability.
We all have some notion of chance or probability, and can ask questions like: - What is the chance you will get an A in Stat 20? (About 32%, based on last fall.)2 - What is the chance the 49ers will win the Super Bowl this year? (They are the favorites, with an implied probability of about 54.5%, .)3 - What is the chance you will roll a double on your next turn to get out of jail while playing Monopoly? (One in six.) - What is the chance that Donald Trump will win the Presidential election? (About 47%.) 4
The second of our four types of claim we will investigate is a generalization. To do so, we first need to quantify uncertainty and randomness. This is the purpose of the Probability unit.
So far, we have examined data sets and summarized them, both numerically and visually. We have looked at data distributions, and associations between variables. Can we extend the conclusions that we make about the data sets to larger populations? If we notice that bill length and flipper length have a strong linear relationship for the penguins in our data, can we say this is true about all penguins? How do we draw valid conclusions about the population our data was drawn from? These are the kinds of questions we we will study using tools from probability theory.
In order to be taken seriously, we need to be careful about how we collect data, and then how we generalize our findings. For example, you may have observed that some polling companies are more successful than others in their estimates and predictions, and consequently people pay more attention to them. Below is a snapshot of rankings of polling organizations from the well-known website FiveThirtyEight5, and one can imagine that not many take heed of the polling done by the firms with C or worse grades. According to the website, the rankings are based on the polling organization’s ``historical accuracy and methodology”.
In order to make estimates as these polling organizations are doing, or understand the results of a clinical trial, or other such questions in which we generalize from our data sample to a larger group, we have to understand the variations in data introduced by randomness in our sampling methods. Each time we poll a different group of voters, for example, we will get a different estimate of the proportion of voters that will vote for Joe Biden in the next election. To understand variation, we first have to understand how probability was used to collect the data.
Since classical probability came out of gambling games played with dice and coins, we can begin our study by thinking about those.
In 17th century France, gamblers would bet on anything. In particular, they would bet on a fair six-sided die landing 6 at least once in four rolls. Antoine Gombaud, aka the Chevalier de Méré, was a gambler who also considered himself something of a mathematician. He computed the chance of a getting at least one six in four rolls as 2/3
The next popular dice game was betting on at least one double six in twenty-four rolls of a pair of dice. De Méré knew that there were 36 possible outcomes when rolling a pair of dice, and therefore the chance of a double six was 1/36. Using this he concluded that the chance of at least one double six in 24 rolls was the same as that of at least one six in four rolls, that is, 2/3 (
We will see later how to compute this probability, but for now we can estimate the value by simulating the game many times (1000 times each) and looking at the proportion of times we see at least one six in 4 rolls of a fair die, and do the same with at least one double six in 24 rolls.
Number of simulations = 1000
prop_wins_game_1
1 0.514
prop_wins_game_2
1 0.487
You can see here that the poor Chevalier wasn’t as good a mathematician as he imagined himself to be, and didn’t compute the chances correctly. The simulated probabilities are nowhere close to 4/6 and 2/3, the probabilities that he computed for the first and second game, respectively. 🧐
By the end of this unit, you’ll be able to conduct simulations like these yourself in R! For today, we are going to begin by introducing the conceptual building blocks behind probability.
First, let’s establish some terminology:
Let’s say that there are
Suppose we toss a fair coin, and I ask you what is the chance of the coin landing heads. Like most people, you reply 50%. Why? Well… (you reply) there are two possible things that can happen, and if the coin is fair, then they are both equally likely, so the probability of heads is 1/2 or 50%.
Here, we have thought about an event (the coin landing heads), seen that there is one outcome in that event, and two outcomes in the outcome space, so we say the probability of the event,
Consider rolling a fair six-sided die: six outcomes are possible so
Outcome | ||||||
---|---|---|---|---|---|---|
Probability |
Let
In order to compute the probabilities of events, we need to set some basic mathematical rules called axioms (which are intuitively clear if you think of the probability of an event as the proportion of the outcomes that are in it). There are three basic rules that will help us compute probabilities:
Before we write the third rule, we need some more definitions and notation:
Now we consider events that don’t intersect or overlap at all, that is, they are disjoint from each other, or mutually exclusive:
If
For example, if we are playing De Méré’s second game, the event
However, if we roll a die, the event
Here’s another example that might interest soccer fans: The event that Manchester City wins the English Premier League (EPL) in 2024, and the event that Liverpool wins the EPL in 2024 are mutually exclusive, but the events that Manchester City are EPL champions in 2024 and Manchester City are EPL champions in 2023 are not mutually exclusive.
Now for the third axiom:
For example, consider rolling a fair six-sided die, and the two events
The only outcome in
The complement rule
Here is an important consequence of axiom 3. Let
This is because
Consider the penguins dataset, which has 344 observations, of which 152 are Adelie penguins and 68 are Chinstrap penguins. Suppose we pick a penguin at random, what is the probability that we would pick an Adelie penguin? What about a Gentoo penguin?
Let
Assuming that all the penguins are equally likely to be picked, we see that then
Since only one penguin is picked, we see that
Therefore the complement of
Finally, the complement rule tells us that
We use
We often represent events using Venn diagrams. The outcome space
Here is a Venn diagram showing two mutually exclusive events (no overlap):
Suppose we toss a coin twice and record the equally likely outcomes. What is
Solution:
Now, let
Alternatively, we can consider
In this case,
Now you try: Let
Let
If
Consider the box above which has five almost identical tickets. The only difference is the value written on them. Imagine that we shake the box to mix the tickets up, and then draw one ticket without looking so that all the tickets are equally likely to be drawn7.
What is the chance of drawing an even number?
Solution:
Let
Suppose I have a coin that is twice as likely to land heads as it is to land tails. This means that I cannot represent
Solution:
In this case, we want to represent equally likely outcomes, and want
Suppose we toss the coin twice. How would we list the outcomes so that they are equally likely? Now we have to be careful, and think about all the things that can happen on the second toss if we have
This is much easier to imagine if we imagine drawing twice from a box of tickets, but putting the first ticket back before drawing the second (to represent the fact that the probabilities of landing
Now, imagine the box of tickets that represents
From this picture, where we use color to distinguish the two different outcomes of heads and one outcome of tails, we can see that there are 9 possible outcomes that are equally likely, and we get the following probabilities (where
What box would we use if the coin is not a fair coin, but lands heads
An American roulette wheel has 38 pockets8, of which 18 are red, 18 black, and 2 are green. The wheel is spun, and a small ball is thrown on the wheel so that it is equally likely to land in any of the 38 pockets. Players bet on which colored or numbered pocket the ball will come to rest in. If you bet one dollar that the ball will land on red, and it does, you get your dollar back, and you win one more dollar, so your net gain is $1. If it doesn’t, and lands on a black or green number, you lose your dollar, and your net “gain” is -$1.
What is the chance that we will win one dollar on a single spin of the wheel?
Hint Write out the chance of the ball landing in a red pocket, and not landing in a red pocket.
Our first step toward simulating experiments is introducing randomness in R. The following three functions are a good start.
sample()
: randomly picks out elements (items) from a vectorDrawing from a box of tickets is easily simulated in R, since there is a convenient function sample()
that does exactly what we need: draw tickets from a “box” (which needs to be a vector).
x
: the vector to be sampled from, this must be specifiedsize
: the number of items to be sampled, the default value is the length of x
replace
: whether we replace a drawn item before we draw again, the default value is FALSE
, indicating that we would draw without replacement.Example: one sample of size 2 from a box with tickets from 1 to 6
What would happen if we don’t specify values for size
and replace
?
What would we do differently if we wanted to simulate two rolls of a die?
set.seed()
: returns the random number generator to the point given by the seed numberThe random number generator in R is called a “Pseudo Random Number Generator”, because the process can be controlled by a “seed number”. These are algorithmic random number generators, which means that if you provide the same seed (a starting number), R will generate the same sequence of random numbers. This makes it easier to debug your code, and reproduce your results if needed.
n
: the seed number to use. You can use any number you like, for example 1, or 31415 etc You might have noticed that each time you run sample in the code chunk above, it gives you a different sample. Sometimes we want it to give the same sample so that we can check how the code is working without the sample changing each time. We will use the set.seed
function for this, which ensures that we will get the same random sample each time we run the code.Example: one sample of size 2 from a box with tickets from 1 to 6
Example: another sample of size 2 from a box with tickets from 1 to 6
Notice that we get the same sample. You can try to run sample(die)
without using set.seed()
and see what happens.
Though we used set.seed()
twice here to demonstrate its purpose, generally, you will only need to run set.seed()
once time per document. This is a line of code that fits perfectly at the beginning of your work, when you are also loading libraries and packages.
seq()
: creates a sequence of numbersAbove, we created the vector die
using die <- c(1, 2, 3, 4, 5, 6)
, which is fine, but this method would be tedious if we wanted to simulate a 20-sided die, for instance. The function seq()
allows us to create any sequence we like by specifying the starting number, how we want to increment the numbers, and either the ending number or the length of the sequence we want.
from
: where to startby
: size of jump from number to number (the increment)You can end a sequence in one of two ways: - to
: at what number should the sequence end - length
: how long should the sequence be
Example: sequence with the to argument
odds_1 <- seq(from = 1, by = 2, to = 9)
odds_1
[1] 1 3 5 7 9
Example: sequence with the length argument
odds_2 <- seq(from = 1, by = 2, length = 5)
odds_2
[1] 1 3 5 7 9
sample()
and replicate()
, and learned another useful function seq()
Paper is at https://www.astro.sunysb.edu/fwalter/AST248/why_i_dont_have_a_girlfriend.pdf and a talk by Backus at https://www.youtube.com/watch?v=ClPPSry8bBw↩︎
https://berkeleytime.com/grades/0-7077-all-all&1-7077-fall-2022-all↩︎
https://www.freep.com/betting/sports/nfl-49ers-vs-chiefs-odds-moneylines-spreads-totals-best-nfl-odds-this-week↩︎
This website was begun as poll aggregation site, by the statistician Nate Silver.↩︎
The singular is die and the plural is dice. If we use the word “die” without any qualifiers, we will mean a fair, six-sided die.↩︎
We call the tickets equally likely when each ticket has the same chance of being drawn. That is, if there are
Photo via unsplash.com↩︎