Performing a randomization test for two proportions

This tutorial covers the basics of using a StatCrunch applet to conduct a a randomization test for two proportions.
To begin, load the Top US Problems data set, which will be used throughout this tutorial. This data set comes from a Gallup survey conducted in July and August of 2014 that asked 945 Republicans and 854 Democrats to name the biggest problem for the United States.
The Party column contains the respondent's party affiliation, either Republican or Democrat. The Response column contains the top problem identified by the respondent.
Only the top four responses are tabulated here: Immigration, Dysfunctional Government, Economy and Unemployment. The remaining responses are listed as Other. In the United States as a whole, is there a significant difference between the proportion of Republicans and the proportion of Democrats who think the economy is the biggest problem?

Comparing the two proportions

To compute the sample proportion for each party who felt the economy was the biggest issue, choose Stat > Tables > Frequency. Select the Response column and then specify Party as the Group by column. Click Compute to view separate frequency tables for the outcomes expressed by both parties as shown below. For the 945 Republicans surveyed, 161 indicated the economy as the biggest issue, and the corresponding sample proportion is given by the relative frequency of 161/945 = 0.1704. Of the 854 Democrats surveyed, 111 indicated the economy as the biggest issue, and the corresponding sample proportion is given by the relative frequency of 111/854 = 0.12997. The observed difference between the sample proportions (Republican - Democrat) is the then approximately 0.0404. A large difference between the sample proportions (either positive or negative) would provide evidence that the proportions expressing this sentiment within the population of each party are different. For the observed sample difference of 0.0404 to be statistically significant, it must be unusually large compared to what one would expect to occur if the proportions within each party who think the economy is the biggest issue were actually the same. The goal of the randomization approach is to quantify the likelihood of this large of a difference between the sample proportions if the population proportions are equal.

Constructing the randomization applet

The scenario where party affiliation has no impact on the proportion can be simulated in StatCrunch using the Applets > Resampling > Randomization test for two proportions menu option. Under Sample 1 in, select the Response column. In the corresponding Where input field, enter Party=Republican or use the Build button to limit the respondents in the first sample to those who affiliate with the Republican party. Make sure to type this statement accurately as such expressions are case sensitive. Under Sample 2 in select the Response column. In the corresponding Where input field, enter Party=Democrat to limit the respondents in the second sample to those who affiliate with the Democratic party. The Success input is used to define the outcome of interest. In this case, set this value to Economy to focus on the proportion of each party who thought the economy was the biggest problem facing the United States. Click Compute! to construct the applet as shown below.

Understanding the randomization process

With the randomization approach, the goal is to understand what magnitude of a difference between the two sample proportions you are likely to see if political party has no impact on views about the economy. To better understand this, begin by clicking the 1 time button at the top of the applet. A new window will appear with all of the responses along with a Party column that has been randomly shuffled. The Party column will show an animated shuffling if the total sample size is less than 200. The idea is to randomly assign the responses to simulate a case where political party has no impact on the response. The new window also shows the difference between the proportion of "Economy" responses for each party with the shuffled data. After the random reassignment is completed, this difference is "dropped" into the graph in the original applet. Note that your results won't match any of the following screen shots, because a different random number seed is used every time.

Graphing the randomization results

If the randomized difference in sample proportions is larger in magnitude (absolute value) than the observed difference, it will be displayed in red. Otherwise, it will be displayed in gray. In this case, a difference in proportions that is 0.040393 or larger in magnitude will be colored in red. The difference of 0.002498 from the first randomization shown above is not larger in magnitude, so it is shown in gray. Clicking the 5 times button in the applet will repeat the process of shuffling and recomputing the difference of sample proportions five more times in an animated fashion. The screenshot below shows the results after five additional randomizations have been added to the applet. One of the six total randomizations was more extreme in magnitude than the observed value, (below -0.040393). The number and proportion of the randomizations falling into each of these regions is also tabled above the graph. The Runs table to the left lists the individual randomizations color-coded in the same fashion. The results of an individual randomization can be inspected by clicking on a number in the Runs table. A bar in the graph may also be clicked to display a listing of all associated randomizations. An individual randomization may also be selected from this listing for inspection.

Ramping up the number of randomizations

After one understands the randomization process, pressing the 1000 times button will repeat the shuffling/recomputing process one thousand times very quickly. This allows one to build a better picture of the distribution of the differences between sample proportions if party has no impact on response. Clicking this button repeatedly allows for a more and more detailed distribution of this difference under the no impact scenario to emerge. The screenshot below shows the distribution after the 1000 times button has been clicked ten consecutive times making the total number of randomizations in the applet 10,006. As one might expect, the randomized differences between the proportion of the Republicans and Democrats who think the economy is the primary issue are centered around zero since the randomization approach simulates the scenario where political party has no impact on choice of primary issue.

Interpreting the randomization results

The randomization approach applied above shows the types of values one would expect to see for the difference between proportions if party affiliation really has no impact on their likelihood of rating the economy as the biggest issue. The proportion of the randomized differences that are more extreme in magnitude than 0.040393 quantifies how extreme this observed difference is in the proper context. If this proportion is very small, then the observed difference is unusually large under the scenario where political party has no impact. This would imply strong evidence that political affiliation is related to opinions about the economy. If on the other hand this proportion is not very small, then the observed difference is not that unusual under the no party impact scenario. This would imply there is not a great deal of evidence to support the idea of a difference between the proportion of all Republicans and the proportion of all Democrats rating the economy as most important. The results in this case show that only 191 of the 10,006 total randomizations were as or more extreme than the observed difference in proportions. This works out to be a proportion of 191/10,006 = 0.0191 or as a percentage about 1.91%. This means that there is about a 1.91% chance of a difference in sample proportions as or more extreme than the observed difference if political party really has no impact overall. As chances go, 1.91% is relatively low implying that political party affiliation probably does have an impact in this case.

Adding results to data table

In certain situations, it is convenient to use other StatCrunch routines to analyze the randomization results. Click the Analyze button at the top of the applet. All of the randomized differences between proportions will be added to a new column in the data table which can be used for subsequent analysis.

Always Learning
Pearson