Introduction
My group and I designed a survey to learn about the experiences of people who went on diets in 2013, particularly those who chose a specific type of diet. To narrow that down further, the population we chose consisted of dieters who used on line websites and diet groups to assist with their diet program. We did not obtain a random sample for this survey. Instead, some of us polled friends and acquaintances, and some of us posted our survey on several webbased diet forums. Therefore, we have a convenience sample with a voluntary response aspect, and it is unknown how many who actually saw the survey chose to respond. After our polling ended, we combined our responses and ended up with 180 responses to our survey.
We asked the following questions:
1. What is your gender?
2. What is your age in years?
3. How much weight did you lose in 2013 (in pounds, whole number)?
4. What weight loss plan did you follow (Weight Watchers, Atkins, Medifast, Jenny Craig, Nutrisystem, South Beach Diet, or other)?
Looking at a Cateogorical Variable
The responses to the question, "What weight loss plan did you follow?" are shown in the pie chart below.
It can be seen that the largest group of dieters did not follow any of the programs on the list (38.33%), followed by Weight Watchers (22.78%), Atkins (14.44%), Medifast (12.22%), and South Beach Diet (5%), with only 3.89% following Jenny Craig and 3.33% following Nutrisystem.
In order to see if the total amount of weight lost by the dieters was obviously different depending on which weight loss program they followed, we can examine the bar plot below:
In this Bar Plot of Pounds Lost by Program, it becomes clear that the most common amount of weight lost was 0 to 50 pounds, regardless of which program was followed. There were some people in every program that lost a signficant amount of weight, 50 to 100 pounds, and a very large amount of weight was lost (150200 pounds) by a very small number of respondents in the Medifast Program. Of note, a few respondents actually gained weight, as denoted by the blue bar, but it was beyond the scope of this survey to identify the reasons why (i.e. dropped off the program).
Looking at a Numerical Variable
The responses to the question, "How much weight did you lose in 2013?" are shown in the histogram, boxplots, and summary statistics below:
This histogram shows a rightskewed distrubution, with a mode of 1520#. With a mean of 26.921788, a median of 20, and a standard deviation of 28.212965, there is an extreme weight loss outlier of 190#, which represents a Z score of 5.78, a very unusual value. This extreme outlier is partially responsible for the right skew, however there are a several other high values in the right tail that are pulling the mean farther to the right as well. The mean is greater than the median, which is also caused by the extreme outlier(s) and an expected finding in a right skewed distribution.
The range of this particular distribution is 226#. The Range Rule will not work for this distribution, which tells us the standard deviation is estimated to be aproximately equal to one fourth of the range of the data. In that case, the standard deviation would be expected to be 56.5 if this were a normal distribution. Again, due to the presence of outlier(s), the actual standard deviation is half the estimated value of the Range Rule.
The IQR, which gives the range of the middle half of the data, is 27 pounds. The midrange of the whole sample is 113, again rightskewed due to the presence of outlier(s).
Summary statistics:

The following boxplot gives a visual representatin of the outliers of the whole sample, and now the interquartile range (IQR) is made clearer, as is the median of each "box":
It is interesting to note which programs have the highest number of unusual values, or outliers:
The outliers in this data set are likely to be actual values and not errors. The extremely high weight loss values are from a weight loss program known to be used by many dieters after having bariatric surgery (Medifast), which could explain the large amount of weight lost. By the same token, the outliers representing weight gains are likely to be actual values, because the instructions given with the on line surveys indicated the dieters should use a negative number to identify any weight gain.
Looking for a Relationship Between Two Numerical Values
To determine whether or not there is a relationship between the responses to the questions, "What is your age in years?" and "How much weight did you lose in 2013?", we look at the scatter plot of the paired data:
The scatter plot demonstrates no visible correlation of the two variables. There is a great deal of scatter that appears mostly random, and the only "pattern" is that most values trend in a positive direction. While one could argue that there is a general upward trend or concentration of data points between 0 and 50 pounds, which is expected given the bar graph above, there is still no statistically significant correlation either negatively or positively between the age of the respondents and their weight loss. The respondents who gained weight are also visible in the scatterplot as the values less than "0" weight lost.
Correlation between AGE and POUNDS LOST is: 0.095997294 
The correlation coefficient for the paired data is .096 as shown above. Since the absolute value of r is less than .196 (Table A5 in the textbook), there would appear to be no statistically significant correlation between the age of the respondent and the amount of weight lost in this nonrandom sample.
Already a member? Sign in.