Creating a contingency table from raw data

This tutorial covers the steps for creating a contingency table or two-way frequency table in StatCrunch. To begin, load the Two Categorical Variables data set, which will be used throughout this tutorial. This toy data set contains only two columns of data. The data in the var1 column contains 10 total values with the value b in the first four rows and the value a in the last six rows. The data in the var2 column contains six c values and four d values. Split and stacked bar plots can be used to summarize the association between the data in these two columns. In this case, the six a values are paired with four c values and two d values. The four b values are paired with two c values and two d values. This data set is in raw form in that it is not summarized already in the form of a two table showing the frequencies of each pairing. See ... for working with summary data from a two way table.

Creating a basic contingency table

To create a contingency table of the data in the var1 column cross-classified with the data in the var2 column, choose the Stat > Tables > Contingency > With Data menu option. Select var1 as the Row variable, choose var2 as the Column variable, and click Compute!. The resulting contingency table below shows the individual unique values for each column in the first row and first column of the table. The remaining cells in the table show the frequency (count) for each variable pairing as well as the row-wise totals, column-wise totals and the total number of pairs. The output also shows the results of a standard Chi-Square test for independence. Since the cell counts are quite low in this case, a warning message is displayed below the test results. A better alternative for this data will be discussed below.

Displaying more information in the table cells

StatCrunch allows for additional information to be added to the table cells that contain the frequencies of the variable pairings. The statistics which can be added include the Row percent, Column percent and Percent of total. For this example, in the window containing the resulting contingency table above, choose Options > Edit to reopen the contingency table dialog window. In the Display options, select the Row percent option and click Compute!. The resulting table below now shows that two-thirds (66.67%) of the a values are paired with c values while one-third (33.33%) are paired with d values. The plot also shows a 50-50 split between c and d pairings for b values.

Computing different tests and confidence intervals

As mentioned above, the default Chi-Square test is not appropriate with this data due to the small cell counts. StatCrunch offers a number of tests which can be computed from the contingency table output including Fisher's exact test for independence, McNemar's test for marginal homogeniety, Cramer's V test for association, and the Mantel-Haenszel test. Note that some of these calculations are restricted to two-by-two tables. In the window containing the resulting contingency table above, choose Options > Edit to reopen the contingency table dialog window. Under Hypothesis Tests, deselect the default Chi-Square test for independence and select Fisher's exact test for independence (2x2 only). Click Compute!. The resulting output below shows the new results for the test selected. Note StatCrunch will also compute confidence intervals for selected statistics. The list of standard statistics includes Lambda, Uncertainty coefficient, Kappa, Gamma, Somers' d, Kendall's tau-b, Kendall's tau-c, Relative risk, and Odds ratio.

Always Learning
Pearson