I. Introduction
A. The purpose of this survey was to discover if there was a correlation between people who have health insurance and frequency of Doctor visits. We also wanted to see how often families sat together for meals, as well as how often they ate fast food. The data below was collected via a StatCrunch survey and distributed via Facebook to friends and family. The data was collected by convenience and was a voluntary response sample, as everyone had the choice to answer the survey questions.
B. Survey Questions
1. Do you have health insurance (yes/no)
2. How many times a year do you go to the Doctor?
3. In a typical week, how many times do you get fast food?
4. In a typical week, how often do you sit together as a family?
II. Looking at a Categorical Variable
A. In a typical week, how often do you sit together as a family?
The pie chart below shows that more than half the people surveyed sat together as a family often. Only 8.59% of people surveyed answer never, which was a much smaller percentage than I had thought would be.
B. Below is a bar plot using the data from whether or not someone has health insurance and the amount of times they sit together as a family. The majority of people responded "yes" to health insurance with a relative frequency of 6.7% and "no" 1.2%. Those that responded "no" to health insurance responded that they eat together often, and those that responded "yes" responded sometimes.
III. Looking at a Numerical Variable
A. For this next set of data I am using the survey question " In a typical week, how many times do you get fast food?"
B. Below is the histogram, box plot, and summary statistics of the above variable.
Summary statistics:

C. The distribution of the histogram is skewed to the right. The mean of this variable is 2.17 with the median being 1. The midrange of the data is 6. The mode is 0 and the graph is unimodal. The range is 12 and the IQR is 3. The variance is 7.92 and the Std. dev is 2.81.
D. The mean is 2.17 but the Median is 1. We see the histogram skewed to the right because the median represents the value in the middle when listed from smallest to highest, with 50% of data below and 50% above. The response of 0 and 1 make up the majority of the responses so we see higher frequencies than the higher responses.
E. To calculate range we subtract the lowest value from the highest value. In this situation we take 120=12. The range for this data is 12. Using the range rule we can roughly estimate the standard deviation would be 3. This is a good approximation of the standard deviation as it's almost exactly to what was calculated in statcrunch (2.8).
F. The boxplot shows that there are some outliers in the data we polled. The majority of people surveyed responded with a number between 05. There were however some responses >8. I don't believe that these were in error though, as they aren't completely impossible answers. Had they been much higher I would have thought they would be in error.
IV. Looking for a Relationship between Two Numerical Variables
A.
Above is the scatter plot for the amount of fast food eaten per week, and the number of times people went to the Doctor in a year. There is no real scatter in the graph with all the data bunched together with no definitive line. .
B. Below is the correlation coefficient. With a data sample of 163 in order to be statistically significant our correlation coefficient would need to be greater than 0.154. Since it is only 0.088 this correlation is not statistically significant.
Correlation between Doctor and fast food is: 0.088426164 
Already a member? Sign in.