Group 1 Data
Generated Feb 24, 2019 by mcquadec11
Introduction
Group 1 and I designed a survey to examine the exercise and health lifestyles of our friends and colleagues. Our population included American adults. We did not collect a random sample for our survey data. We used convenience sampling to survey our friends via Facebook posts, emails, and text messages. The survey results are also subject to voluntary response as participants were able to choose whether or not they responded to our survey.
In our survey, we asked the following questions:
1. How many days in a week do you do at least 30 minutes of moderate physical activity (an activity that increases heart rate, increases respiration/breathing rate, increases sweating and/or causes muscle fatigue)?
2. How many hours in a day do you spend sedentary/sitting (ex: eating, commuting, working, watching tv, etc)?
3. Are you male or female?
4. How would you rate your overall health: Poor, Good, Very Good, Excellent
Looking at a Categorical Variable
The responses to the question "What type of exercise do you do?" are shown in the pie chart below.
Result 1: Pie Chart of Hours per Day Spent Sedentary
Based on the pie chart representing hours per day spent sedentary, one can conclude that a majority of the adults who were surveyed spent approximately 310 hours per day sedentary (82.5% of responses). The number of respondents who reported number of hours per day spent sedentary greater than 10 steadily decreased and maxed out at 20 hours per day. It may be possible that due to the wording of question 2, some respondents were unsure if they should include sleeping hours as sedentary. As our group did not plan to include sleeping ours, those respondents who assumed sleeping hours were included as sedentary hours per day would have reported higher number of hours per day sedentary.
Next, we will examine a bar plot to determine if hours per day spent sedentary differs among subjects based on how many days per week subjects spend physically active (for at least 30 minutes).
Result 2: Bar plot of Daily Sedentary Hours based on Subjects Number of Weekly Active
Subjects who reported being physically active five or more days per week were less likely to spend greater than days hours per day sedentary. There was the least variability among subjects who reported being physically active all seven days of the week, who were most likely to report being sedentary for six hours per day, with a range between three and eight hours per day. However, subjects who reported being physically active for one to four days per week reported the greatest variability of daily sedentary hours, ranging from three to twenty hours.
Looking at a Numerical Variable
The responses to the question " How many days in a week do you do at least 30 minutes of moderate physical activity (an activity that increases heart rate, increases respiration/breathing rate, increases sweating and/or causes muscle fatigue)?" are shown in the histogram, boxplot, and summary statistics below.
Result 3: Histogram of Days per Week Spent Engaging in Moderate Activity
Result 4: Boxplot of Days per Week Spent Engaging in Moderate Activity
Result 5: Summary stats Days per Week Spent Engaging in Moderate Activity
Column 
n 
Mean 
Variance 
Std. dev. 
Median 
Range 
Min 
Max 
Q1 
Q3 
IQR 
Mode 
var1 
165 
4.1575758 
4.1091648 
2.0271075 
4 
7 
0 
7 
3 
6 
3 
5 
Based on the histogram, the shape of the data appears to be slightly skewed to the left, with the tail of the distribution longer on the left hand side than on the right hand side. However, it does not exact fit a left skewed histogram because the mean is typically les than the medium, which is not true in this case because the mean is 4.16 and the median is 4. Also, in left skewed data, the median is typically closer to the third quartile than to the first quartile, which also does not hold true in this data set as the median is 4, which is closer to Q1 (3) than Q3 (6). The center of a skewed data set is best described by the median, however since this data isn’t truly skewed, it is possible to use both the median (4) and the mean (4.16) to describe the center of the data. The data is spread from 0 to 7 days, with respondents covering all seven possible outcomes. However, the IQR (Q3Q1 or 63) is 3. The IQR describes the range of the middle half of the data. This tells us that a majority of respondents are physically active between 3 to 6 days per week. Using the range rule of thumb is reasonable for this data since range/4 (7/4) is 1.75 and the actual standard deviation is 2.03. Therefore, this is a good approximation of the standard deviation as both are approximately 2. Based on the boxplot, there are no outliers in our data, which helps makes the range rule of thumb more accurate as this value is largely affected by outliers.
Looking for a Relationship Between Two Numerical Variables
To determine whether or not there is a relationship between the responses to the questions " How many days in a week do you do at least 30 minutes of moderate physical activity (an activity that increases heart rate, increases respiration/breathing rate, increases sweating and/or causes muscle fatigue)?" and " How many hours in a day do you spend sedentary/sitting (ex: eating, commuting, working, watching tv, etc)?" we look at the scatter plot of the paired data.
Result 6: Scatter Plot of Active Days vs. Sedentary Hours Daily
The scatterplot a great deal of scatter. A few outliers are obvious in the scatter plot: (1 day, 20 hours), (1 day, 22 hours), and (4 days, 20 hours). Outliers in a data set have a significant impact on the correlation coefficient.
The correlation coefficient for the paired data is – 0.3076 as shown below.
Result 7: Correlation of hours/days of
Correlation between var1 and var2 is:
0.30760913
The absolute value of the correlation coefficient, r, is 0.3076. This value is greater than 0.196 from Table A5 in the textbook. Therefore, we can conclude that there is a statistically significant correlation between the number of active days per week and the number of hours a day spent sedentary. However, by observing the scatterplot, we can conclude that the association between these two variables is not strong.
Summary statistics:

Correlation between var1 and var2 is: 0.30760913 