Continuous Distributions and Normal Approximations
Connections to boxes, continuous distributions, and a fundamental result
Imagine that a regular patron of a bar has hit the bottle rather hard one evening. When the bar closes for the night, they come out to weave their way home. Home is very near, in fact, just straight down the road. If our inebriate walks straight in the direction of their home, they can be there very soon. The only problem is - they can’t walk straight. Every minute, they move in a random direction: backwards or forwards with equal probability. Where will they be after
Each time step (say, each minute), they go backwards or forwards with equal probability, so it is as if they are walking on the real line, and each minute they go either forward
Here is a plot of a simulation of our itinerant inebriate’s path for
Note that the graph is spread out to show the number of steps taken, but the walker is just walking up and down the
What we have shown here is one possible path our random walker might take and where they might land up after sample()
that we haven’t used before, prob
. This defines the weights used for sampling. In this case they are the same, but they might be different. We will discuss this further in the “Ideas in Code” section. Note that the expected value of our Bernoulli random variable
Check your answer
In the code below we are defining a vector with the values taken by
<- c(-1,1) # defining the values taken by X
x <- c(1/2,1/2) # defining the probabilities of these values
px sum(sample(x, size = 30, prob = px, replace = TRUE))
For example,in the path shown in the figure above, the first
Well, that’s a nice shape! This is an example of the fundamental result mentioned in the subtitle of this chapter. Now, before we can compute the expected value of the walker’s position after
Independent and identically distributed random variables
- IID random variables
-
If we have
independent random variables such that they all have the same pmf and the same cdf , we call the random variables independent and identically distributed random variables. We usually use the abbreviation i.i.d or iid for ``independent and identically distributed’’.
This is a very important concept that we have already used to compute the expected value and variance of a binomial random variable by writing it as a sum of iid Bernoulli random variables.
A common example is when we toss a coin
Example: Drawing tickets with replacement
Consider the box shown below:
Say I draw
The Box Model2
We have seen that we can always represent a discrete random variable using tickets in a box. We define the tickets so that a ticket drawn at random from the box will have the same probability distribution as the random variable. Then it becomes very easy to define sample()
setting the argument replace = TRUE
. In this way, we can represent a chance process in which some action that defines a random variable is repeated over and over again, as a box of tickets from which we draw tickets with replacement over and over again.
We could do this for any of the examples that we have seen - die rolls, coin tosses, spins of a roulette wheel etc. Once we represent the random variable using a box of tickets, then it is easy to compute the expected value of the random variable - it is just the average of the tickets in the box, and the variance of the random variable is the variance of the tickets in the box. Note that we compute the population variance, so we divide by
Rolling a pair of dice and summing the spots
Suppose we want to represent the rolling of a pair of dice and summing the spots using a box with appropriately marked tickets. Our box could be:
Let
Spinning a Vegas roulette wheel and betting on red
We know that an American roulette wheel has
Check your answer
The box would have
How would you simulate this process (of spinning the wheel
We can generalize the ideas in the roulette example to any set up where we have iid discrete random variables. We need to think about three things:
What are the values of the tickets in the box?
How many tickets of each value?
How many times do we need to draw?
We are usually interested in the sum or the average of
Sums and averages of random variables
Sums
Suppose we make
Suppose we sum the drawn tickets. We denote the sum of
Let’s simulate this by letting
[1] "The sum of 25 draws is 56"
Now we will repeat this process (of drawing replicate()
is used) :
set.seed(12345)
<- c(0,0,1,1,1,2,3,3,4,4)
box
# or you could use box <- c(rep(0, 2), rep(1, 3), 2, rep(3,2), rep(4, 2))
replicate(n = 10, expr = sum(sample(x = box, size = 25, replace = TRUE)))
[1] 56 50 64 51 49 41 64 44 45 51
You can see that the sum
Since we know the distribution of the
What are
(Note that you could also have just computed the average of the tickets in the box.)
(We just use
Since the
We can see that the expectation and variance of the sum scale with
- Square root law for sums of iid random variables
-
The standard deviation of the sum of
iid random variables is given by:
Since all the
Example: The drunkard’s walk
Back to our inebriated random walker who is trying to walk home. Recall that at each time step, the walker moves forward or backward with equal probability. After
Averages
We denote the average of the random variables
This means that the expected value of an average does not scale as
Therefore
Note that, just like the sample sum
- Square root law for averages of iid random variables
-
The standard deviation of the average of
iid random variables is given by:
- Standard error
-
Since
and are numbers computed from the sample , they are called statistics. We use the term standard error to denote the standard deviation of a statistic: and to distinguish it from the standard deviations of random variables that do not arise as statistics that are computed from . We will see more about these statistics later in the course.
Example: Probability distributions for sums and averages
Let’s go back to the box of colored tickets, draw from this box
What do we notice in these figures? First, we see the bell-shape again! Note that the black line is the expected value. We see that the center of the distribution for the sample sum grows as the sample size increases (look at the x-axis), but this does not happen for the distribution of the sample mean. You can also see that the spread of the data for the sample sum is much greater when n = 100, but this does not happen for the distribution of the sample mean. Now, the
- Density histograms
-
These are histograms for which the total area of all the bars add up to
. The vertical axis is called density and the units are such that the height of each bar times its width gives the proportion of the values that fall in the corresponding interval of the histogram.
We see that we have moved from bar graphs to histograms, which is what we need when we consider random variables that are not restricted to taking particular values. These are called continuous random variables.
Continuous distributions
So far, we have talked about discrete distributions, and the probability mass functions for such distributions. Consider a random variable that takes any value in a given interval. Recall that we call such random variables continuous. In this situation, we cannot think about discrete bits of probability mass which are non-zero for certain numbers, but rather we imagine that our total probability mass of
Probability density function of a distribution
- Probability density function
-
This is a function
that satisfies two conditions:
- it is non-negative (
) and
- it is non-negative (
- the total area under the curve
is 1. That is,
- the total area under the curve
If
- The probability that a continuous random variable lies in an interval
-
The probability that
is in the interval is given by
Note that because a single point will not add any area, we have that:
Just as we did for discrete random variables, we define the cumulative distribution function for a continuous distribution.
Cumulative distribution function
- Cumulative distribution function (cdf)
-
The cdf is defined the same way as for discrete random variables, except that we now have an integral instead of a sum.
The difference is in how we compute
Special distributions
Just as in the case of discrete distributions, we have many special named continuous distributions. We are going to mention two of them here. The first is the a very easy distribution to think about with a rectangulare geometry that is easy to think about: the uniform distribution. The second distribution we mention here has a shape that has already shown up in examples in these note - its probability density function has a bell-shape. This is the Normal distribution and it is the most ubiquitous distribution in statistics and its bell shaped density curve is used in many disciplines.
Let’s consider the uniform distribution first.
The Uniform(0,1) distribution
Let
Because
If you know that
Example: cdf for the Uniform (0,1) distribution
Let
Check your answer
The Normal distribution
The Normal distribution, which is also called the Gaussian distribution (after the great 19th century German mathematician Carl Friedrich Gauss3) describes a continuous random variable that has a density function with a familiar bell shape4.
If a random variable
where
We can calculate the probability of any event related to
Interval | Area under the normal curve |
---|---|
Between -1 and 1 | 0.68 |
Between -2 and 2 | 0.95 |
Between -3 and 3 | 0.997 |
So if we know a particular distribution is similar in shape to the normal distribution, we’re able to calculate the probabilities that the random variable falls within a particular interval.
The Central Limit Theorem
The normal curve is enormously useful because many data distributions are similar to the normal curve, so we can use the areas under the normal curve to approximate the areas of the data distributions to figure out proportions. The reason so many data distributions approximately follow a normal distribution is because of one of the most fundamental results in statistics called the Central Limit Theorem. This astounding result says that sums (and averages) of independent and identically distributed random variables will follow an approximately normal distribution (after transforming them to standard units) as the sample size grows large. If we restate this in terms of our box model, this theorem says that, for large enough
Since many useful statistics can be written as the sum or mean of iid random variables, this is a very important and useful theorem, and we will use it extensively for inference in the next unit.
Recall our hammered homeseeker from the beginning of this chapter, and the bell-shaped distribution of their position after half an hour. Since their position can be written as a sum of all their steps (which were
Ideas in code
Useful functions
Uniform
dunif
computes the density of where , for .
- Arguments:
x
: the value of inmin
: the parameter , the lower end point of the interval for . The default value ismin = 0
max
: the parameter , the upper end point of the interval for . The default value ismax = 1
punif
computes the cdf of .
- Arguments:
q
: the value of inmin
: the parameter , the lower end point of the interval for . The default value ismin = 0
max
: the parameter , the upper end point of the interval for . The default value ismax = 1
runif
generates random numbers from the distribution.
- Arguments:
n
: the size of the sample we wantmin
: the parameter , the lower end point of the interval for . The default value ismin = 0
max
: the parameter , the upper end point of the interval for . The default value ismax = 1
Normal
dnorm
computes the density of
- Arguments:
x
: the value of inmean
: the parameter , the mean of the distribution. The default value ismean = 0
sd
: the parameter , the sd of the distribution. The default value issd = 1
pnorm
computes the cdf of .
- Arguments:
q
: the value of inmean
: the parameter , the mean of the distribution. The default value ismean = 0
sd
: the parameter , the sd of the distribution. The default value issd = 1
rnorm
generates random numbers from the Normal distribution.
- Arguments:
n
: the size of the sample we wantmean
: the parameter , the mean of the distribution. The default value ismean = 0
sd
: the parameter , the sd of the distribution. The default value issd = 1
Example
Let’s verify the empirical rule for the standard normal random variable:
Note that (for example)
pnorm(q = 1) - pnorm(q = -1)
[1] 0.6826895
pnorm(q = 2) - pnorm(q = -2)
[1] 0.9544997
pnorm(q = 3) - pnorm(q = -3)
[1] 0.9973002
The argument prob
in the function sample()
We have seen the function sample()
, but so far, have only used it when we were sampling uniformly at random. That is, all the values are equally likely. We can sample according to a weighted probability, though, by putting in a vector of probabilities. Let’s look at the example of net gain while betting on red on a roulette spin. Recall that if we bet a dollar on red, then our net gain is
<- c(1,-1) # define the gain for a single spin
gain <- c(18/38,20/38) #define the corresponding probabilities
prob_gain <- sum(gain*prob_gain)
exp_gain exp_gain
[1] -0.05263158
set.seed(123)
#simulate gain from 10 spins of the wheel
sample(x = gain, size = 10, prob = prob_gain, replace = TRUE)
[1] -1 1 -1 1 1 -1 1 1 1 -1
#simulate net gain from 10 spins of the wheel which would sum these
sum(sample(x = gain, size = 10, prob = prob_gain, replace = TRUE))
[1] 0
Here is a simulation showing the Central Limit Theorem at work, with the empirical distribution becoming gradually more bell-shaped. Net gain is the sum of gain
defined above using the prob_gain
vector.
Footnotes
The box model was introduced by Freedman, Pisani, and Purves in their textbook Statistics↩︎
Another instance where Abraham De Moivre was the first person to discover a distribution, but it was named after someone else. You can read about De Moivre at https://mathshistory.st-andrews.ac.uk/Biographies/De_Moivre/.↩︎
For a normally distributed random variable,
. Read more on Wikipedia: https://en.wikipedia.org/wiki/Normal_distribution↩︎