Introduction
Group #1 designed a survey to learn about the consumption of energy drinks and caffeinated beverages. The population that we sampled were adults aged 2150 years of age. We attempted to obtain a random sample for the survey. In our discussion group, members stated they randomly chose people they worked with, neighbors, people they met at a store or walking on a street, for example. These methods do not meet the true definition of a random sample, so I will have to call it a convenience sample.
We asked the following four questions:
1) Do you drink some kind of energy or caffeinated beverage? Y/N
2) Which one of the below beverages do you consume the most?
a. Coffee
b. Tea
c. Energy drink
d. Pop/soda
e. Other
3) On an average day, how many of these types of beverages do you consume?
4) On average, how many days a week do you consume these types of beverages? (17)
Looking at a Categorical Variable
The responses to the question, “Which one of the below beverages do you consume the most? a.Coffee, b.Tea, c.Energy drink, d.Pop/soda, e.Other” is shown in the pie chart below.
The pie chart shows that a little more than half of those surveyed, 53.75% consumed coffee the most for energy. The number of people who primarily consumed soda/ pop, 16.25%, was close to the amount who drank tea, 14.37%. The “other” group consisted of 9.38% surveyed. The number of those surveyed who drank energy drinks in this age group was only 6.25%. I found it interesting that as fashionable and as heavily marketed energy drinks are, the number of people who drank them in this survey was the smallest.
We can break down the type of energy/caffeinated beverage based on whether the surveyor said they consume these types of drinks using the bar plot below.
The bar plot does an excellent job of illustrating the difference in frequency between coffee use for energy as compared to the other options on the survey. Over 50% drink coffee, compared to a combined amount of approximately 30% of the next two choices, tea, and pop/soda. Interestingly, there are two “other” bars in the graph. One under those who said “yes” they consume these beverages and one under “no” they did not. The bar plot also points out that three of those surveyed believe that “tea” is not an energy/caffeinated beverage. An interpretation of this data could be that the meaning of “other” for the “no” responders meant something different than the “yes” responders. For the “yes” responders, they are drinking some energy or caffeinated beverage not listed. The “no” responders are drinking something “other” than energy or caffeinated beverages. That could be water or juice, for example. Except for the three surveyors who either thought that tea was not a caffeinated/energy beverage or they were drinking decaffeinated tea.
Looking at a Numerical Variable
The responses to the question “On an average day, how many of these types of beverages do you consume?” are shown in the histogram, boxplot, and summary statistics below.
Summary statistics:

Summary statistics for Average consumed daily:Group by: Do you drink energy or caffeinated beverages

The histogram shows a right skew with three peaks relatively similar in size. I would say this is close to multimodal. The mode of 46 does stand alone. However, bars 12 (46), 23(42), and 34(41) all had the highest frequency average consumption in a day amongst our 160 sample size. There is only a 5 point difference between the number of surveyors from Bar 12 and Bar 34. There are gaps between 78 and 910 beverages per day and four outliers beyond seven beverages per day. Gaps can suggest the data is from two different populations. Considering our first question was a yes/no to drinking these types of beverages, and our bar plot clearly showed that even the “no” responders reported a beverage intake, I decided to make another histogram of the “yes” only responders. The histograms were almost identical and still had the same gaps. That tells me that those gaps are in the “yes” category, not the “no” category, and are possible outliers.
The mean was 2.4 beverages per day, and the median was two beverages per day. The mean is not a resistant value and can be affected by outliers. However, the median is a resistant value and is reasonably close to the value of the mean. That allows us to validate that the average person in our study consumes about 2 of these beverages a day.
The range is 10 beverages per day, with a minimum of 0 and a maximum of 10. The range does not show a valid variation because it does not take into account every value. Our histogram and boxplot show outliers that would affect this number. I created a summary statistics separating the “yes” and “no” responses, and the “yes” resulted in similar results to the whole group. The variance for the N=160 adults 2150 years of age is 2.5 beverages per day, and the standard deviation is 1.5. The variance can be affected by outliers because they are squared, giving the outliers undue value. We have outliers, so it is not a reliable value. The IQR is two beverages per day, the middle half of the data, the same as the median. The range rule of thumb is not accurate for this data set: range/4 or 10/4 = 2.5 estimated standard deviation. We know the standard deviation is 1.5. The outliers and skewed data make the range rule of thumb unreliable. Right skewed data distribution is reflected in the mean being greater than the median.
With a small standard deviation of 1.5, the data values are not that spread out from the mean.
The boxplot shows the outliers. I did a comparison boxplot between the “yes” and “no” responders and validated that the outliers were in the “yes” category. Boxplots do not show distribution. They do show quartiles which show the spread of data broken down into quarters. The IQR was 2, Q1 =1, and Q3 =3. We can say that two beverages per day are the middle half of the scores in the distribution since outliers and skewness influence quartiles less. I think that it is reasonable to believe that the average 2150yearold adult consumes, on average, two energy/caffeinated beverages a day.
Looking for a Relationship between Two Numerical Variables
If we look at the responses for the two questions, “On an average day, how many of these types of beverages do you consume” and “On average, how many days a week do you consume these types of beverages? (17)”, on a scatter plot, we can see if there is a correlation.
The scatterplot shows a gradual positive trend as both variables are moving in the same direction. As the x variable increases, the y variable increases along with it. The plots aren’t widely scattered. There is a heavy concentration along the bottom half of the graph. Two outliers stand out: (7 days, 8 per day) and (7 days, 10 per day).
The correlation coefficient for the paired data is 0.397, resulted here.
Correlation between Number of drinks per day and Number of days a week is: 0.3972807 
Using the extended table of critical values of correlation coefficient (extended table from Module 3: Descriptive Report Outline), we can see that the absolute value of r is more than 0.155. That allows us to conclude that there is a statistically significant correlation between the average number of energy/caffeinated beverages consumed in a day and the average number of days in a week people consume them for this sample. Because our data on our scatterplot is heavily concentrated, it shows the association between these two variables is a strong one.
Already a member? Sign in.