Bubble plots

This example covers the basics of creating a bubble plot in StatCrunch. The resulting bubble plot will provide a nice visualization of the impact of the 2009 swine flu outbreak on flu testing in the United States. To begin, load the Weekly influenza tests reported to the CDC from 1997 through 2014 data set into StatCrunch. The data set contains CDC reports for the year, the week, the total number of specimens reported for that week, the number of the specimens that tested positive for the flu and the percentage of the specimens that tested positive for the flu. For this example, a bubble plot will be created for comparing the weekly reports for 2008 (the year before the outbreak) and 2009 (the year of the outbreak). The plot will show the percent of positive specimens versus week for both years with the points sized by the total number of specimens reported for that week

To construct the plot, choose the Graph > Bubble plot menu option. Select WEEK for the X variable, PERCENT_POSITIVE for the Y variable and TOTAL_SPECIMENS for the Size variable. To restrict the plot to only the data for 2008 and 2009, specify a Where expression of YEAR = 2008 OR YEAR = 2009. To color-code the data for each year, choose YEAR for the Group by variable. Click Compute! to generate the bubble plot shown below. The plot below has a number of interesting details that show the impact of the outbreak. The 2008 data (shown in blue) represent a typical year of flu testing in the United States. In 2008, the percent of positive tests peaks over the winter months and declines over the summer months. The data for 2009 (shown in red) follows a similar pattern over the winter months, but there is a rapid increase in the percent of positive tests beginning in the spring of 2009 (week 17) corresponding to the original swine flu outbreak in the U.S. There is also another extreme rise in the percent of positive tests in the fall of 2009 (peaking in week 42) corresponding to the second wave of the outbreak. Comparing the size of the points in 2008 and 2009 also shows that the number of tests being conducted increased greatly over the period of the outbreak. The plot also shows a dramatic increase in the number of tests as the first wave of the epidemic began between weeks 16 and 17 of 2009. The data table in the screenshot below shows the number of tests to be 4,219 in week 16 of 2009 whereas the number of tests in week 17 of the same year increased by almost nine fold to 36,203. This is reflected in the plot by a point for week 17 of 2009 that is almost nine times larger than the point for week 16 of 2009. Check out the painting/annotating graphs example to see how to highlight this extreme rise in the number of specimens tested.

Always Learning
Pearson