In this report we are looking specifically at the sampling distribution of a proportion. Sex is either male or female (by X and Y chromosomes). It is spread about evenly through the population although we observe slightly more women. So I generated a random sample of 10,000 indivduals. Let's look at the distribution now:
As you can see, with a proportion, you either have a triat or you don't. For instance, "are you male?" This leads to two modes in the population, yes = 1 and no = 0. However, when we take samples and average the yeses and noes, we see a familiar pattern:
With a sample size of 10, we see the normal distribution starting to form as hte CLT manifests in the sampling distribution. Now we see that the standard deviation has gone from .5 to .15. Why?
Well, as we saw before the variation decreases as the sample size increases. The formula is a bit different though. We calculate it with:
s.d.(pˆ) = √(p(1p)/n), which in this case is s.d.(pˆ) = √(.50(1.50)/10) = √(.25/10) = √(.025) = .158
which is very close to the observed value in the above sampling distribution.
if I increase the sample size to 50, we see the following:
where the expected s.d. = √(.50(1.50)/50) = √(.25/50) = √(.005) = .071
and lastly, we increase the sample size to 100, with the following result:
where the expected s.d. = √(.50(1.50)/100) = √(.25/100) = √(.0025) = .05
so the Central Limit Theorem is working and
the deviations are changing according to s.d.(pˆ) = √(p(1p)/n)
and this makes us happy!
NOTE: the p(1p) = p  p^2 , so we have a squre root of a squared term just like with means.
New column, sqrt(.501 * (1  .501) / 10), added to data table!

Already a member? Sign in.