My project is on the variables of fruit snacks per pouch. It was an interesting project. My son was not happy that I opened all the pouches in the boxes and put them in a gallon sized Ziploc bag after I counted them. He like having his own pouch of fruit snacks better than grabbing a handful of them. After I opened the 82 pouches, I found there was not as much variation as I originally thought there would be. Each brand seems to have a standardized number of fruit snacks per pouch or they may package each pouch based on weight. I couldn't find any information on how each brand packages their fruit snacks. Each box only says how many pouches it contains.
After collecting my raw data, I put them into a spreadsheet with information on my primary variable and four auxilliary variables. My variables are as follows:
How many fruit snacks come in each pouch? This primary variable is quantitative because it's numbers that represent counts or measurements. It is discrete because the number of fruit snacks is countable and quantitative because the numbers representing counts or measurements.
What character is on the box? This question is nomial because it consists of names that cannot be arranged in an ordering scheme. Since this is a qualitative variable, it cannot be discrete or continuous because it holds no numeric quantitative values in and of itself.
How many pouches come in each box? Interval because the number of pouches can be arranged in order, differences between data values can be found and are meaningful and doesn't have a natural zero. Since this is a qualitative variable, it cannot be discrete or continuous because it holds no numeric quantitative values in and of itself.
What brand is each box? This question is nomial because it consists of names that cannot be arranged in an ordering scheme. Since this is a qualitative variable, it cannot be discrete or continuous because it holds no numeric quantitative values in and of itself.
What store did the boxes come from? This question is nomial because it consists of names that cannot be arranged in an ordering scheme. Since this is a qualitative variable, it cannot be discrete or continuous because it holds no numeric quantitative values in and of itself.
The spreadsheet of my variables is here:
My sample size is 82. The boxes of fruit snacks have pouches of either 6 or 10. I have an equal chance of grabbing any number. This is an observational study because I'm only counting and measuring, not modifying the subject. I chose that method because I have no reason to modify the fruit snacks as I'm only counting them. I'm using a cluster method by sampling from a location that is already stocked (a store). It is also stratified sampling because I will choose one from each of a few brands (strata). I will go to three store; WalMart, WinCo and Fred Meyer. At each store I will buy one box of Betty Crocker brand fruit snacks and one box of Kellogg's brand fruit snacks and one box of generic brand fruit snacks. Then I will open each pouch of fruit snacks and count how many fruit snacks came in each pouch. My method of sampling will help me get a simple random sample because each pouch has the same chance of being chosen and each box of fruit snacks has the same chance of being chosen.
After compiling my raw data, I grouped them into frequencies and made graphs to see how they compare. My grouped distributions are as follows:
My histogram is roughly bell shaped; it starts out low, increases to a maximum then decreases. Both sides look almost symmetric. This is considered a normal distribution pattern.
My frequency polygon is also roughly bell shaped, like my histogram. It's almost exactly symmetric as it increases to a maximum and then decreases back down. Both sides look symmetric with an even slope. This is also considered a normal distribution pattern.
My ogive gets higher as it runs from left to right. It reflects a normal distribution because it has line segments that run on either side of the class midpoint of 9.5 and are fairly equal in slope and length.
My pie chart doesn't reveal much of anything other than the frequency of the bins with the number and percentage per bin. Pie charts are not reliable as a source of information and can often be misleading. Pie charts should be used as rarely as possible.
My frequency distributions were normally distributed. So I took a look at my measures of central tendencies. All of my measures of central tendencies are very close. They vary slightly. The range of the number of fruit snacks in pouches is only 5 and I counted 82 pouches. That’s not a large range for so many pouches. The mean is always the most accurate since it’s the sum of all fruit snacks divided by the number of all pouches; however you can’t have .43 of a fruit snack. The midrange is closest to the mean. Since the mean isn’t reliable in my case, the median best describes my data. The mode is a whole number that is closest to the mean. The meridian is the worst description of my data. The meridian isn’t always reliable. The meridian is the number that just happens to be in the middle of all the values and it can be far away from the mean. However since all my central tendencies are very close; 8, 8, 8.43, 8.5, any of them can be used to best describe my data. My intervals of 89 had a frequency of 59 which was over 4 times higher than my intervals of 67 and 1011. Since all four of my central tendencies were in between the 89 intervals, I think it's safe to use any of the central tendencies to describe my data.
Summary statistics:

My stem and leaf plot looks different in StatCrunch. It uses 0’s as markers for each instance of the stem number. For example, I had four pouches that had seven fruit snacks in them so in StatCrunch the stem and leaf plot looked like this; 7: 0000. To me that’s a little confusing as I interpret it as there being four instances of 70. I also liked the way it looked when I created it in Excel.
Variable: # of Fruit Snacks
Decimal point is at the colon. 6 : 0000000 6 : 7 : 0000 7 : 8 : 0000000000000000000000000000000 8 : 9 : 0000000000000000000000000000 9 : 10 : 00000000000 10 : 11 : 0 
The boxplot in StatCrunch isn’t as informative as the one I created in Excel. The one in StatCrunch only shows the Minimum, Quartile 1, Median, Quartile 3 and Maximum but it doesn’t allow us to customize the boxplot. In Excel we were able to label the Min, Q1, Median, Q3, and Max and I found that helpful. StatCrunch also doesn’t start the boxplot at 0 so it skews the whole boxplot and makes it unusable.
While my measures of central tendencies were all pretty close, my sample statistics were a little confusing. I only had a range of 5 which was smaller than I was expecting for a sample of 82 pouches. I calculated the sample statistics for the number of pouches that had 9 fruit snacks. (I took the middle number of my range (3) and added it to the lowest number of fruit snacks per pouch (6) and came up with the number 9 so I tested that.)
1. Sample Statistics
For the phat, I counted the number of fruit snack pouches that had 9 fruit snacks in them. There were 28 pouches that had 9 fruit snacks so I used 28/82=0.34.
µ (xbar) – 8.43 (8.426829268)
s = 1.100286021
Value Range – 9 fruit snacks in the pouch
Phat – 28/82 = 0.34 (0.3414634146)
Qhat – 54/82 = 0.66 (0.6585365854)
2. 68% Confidence Intervals
a. for the population mean – 8.3052501 < µ < 8.5484085
b. proportion that meet the value range you selected – 0.28938695 < p < 0.39353988
c. standard deviation – 1.023562602 < σ < 1.197453506, s=1.100286008
3. 90% Confidence Intervals
d. for the population mean – 8.2246569 < µ < 8.6290016
e. proportion that meet the value range you selected – 0.25532788 < p < 0.42759895
f. standard deviation – 0.9756845904 < σ < 1.265185915, s=1.100286008
Margins of Error
a. 0.1215792
b. 0.052076465
c. 0.086945452
d. 0.20217235
e. 0.086135535
f. 0.1447506623
4. How confident are you that the 90% confidence interval estimate of the mean contains the true mean?
90% confident that the confidence interval of 8.2246569  8.6290016 actually contains true population mean.
5. What is the probability that your 90% confidence interval estimate of the mean contains the population average of your variable?
I’m having a hard time with this one. I understand the confidence level, which means that we’re 90% confident that the interval of 8.2246569 < µ < 8.6290016 contains the true mean. What I’m not understanding is the probability part. According to the book, there is no probability in confidence intervals. The confidence interval contains the population mean or it doesn’t, no probability about it. (Triola, 326327).
6. Can you interpret the answer to #3 to say that 90% of the population (data values) will fit within the 90% confidence interval?
No. 90% of the population may not fit into 90% confidence interval.
7. Do you believe that your confidence interval estimates contain:
The actual population mean: Yes. The confidence interval contains my sample mean so it is likely that the confidence interval contains the population mean.
The actual proportion: Yes. The value range I chose is within my sample but it isn’t the median or lowest range or highest range in my sample so it is likely that actual proportion is in the confidence interval.
The actual standard deviation? Yes. The confidence interval contains my sample standard deviation so it is likely that the confidence interval contains the population standard deviation.
It’s nearly impossible for me to make a guess about the standard deviation for the population of fruit snacks. There are many different brands (I tested only three), with each brand having different numbers of fruit snacks per pouch. There was even some pouches that had different numbers of fruit snacks between pouches in the same box. So I’m making a “best guess” on a standard deviation. I did a 95% confidence interval for my data which is 0.9538325639 < σ < 1.300289429. I then found the midrange and will be using that to test my standard deviation.
I’m going to be brave and test for the standard deviation. It was one of the few that I got correct the first try.
2) My sample standard deviation is σ = 1.100286021
3) Since I am going to be testing at a 95% confidence level, a=0.05
I’m going to be testing the claim that the standard deviation for the population of fruit snacks is higher than 1.127060996.
a. H_{0}: σ = 1.127060996, H_{1}: σ > 1.127060996
2) Because we are testing at 95%, 10.95=0.05 which means a=0.05. This significance level makes it a right tail test. Since I am testing standard deviation, this will be a chisquare distribution. I tested 82 pouches of fruit snacks so my degrees of freedom will be 81.
3) Since I’m testing the standard deviation, I had to square my original sample deviation to get σ^{2 }= 1.270266489
Hypothesis test results:
σ^{2} : Variance of variable
H_{0} : σ^{2} = 1.2702665
H_{A} : σ^{2} > 1.2702665
Variable 
Sample Var. 
DF 
ChiSquare Stat 
Pvalue 
var1 
1.2106293 
81 
77.197168 
0.5991 
My Pvalue of 0.5991 is more than a=0.05. Since P > a, I fail to reject the null hypothesis.
5) There is not sufficient evidence to support the claim that the standard deviation is greater than 1.127060996.
6) My Confidence Levels are below:
a. 68%: 1.023562602 < σ < 1.197453506, s=1.100286008
b. 90%: 0.9756845904 < σ < 1.265185915, s=1.100286008
c. 95%: 0.9538325639 < σ < 1.300289429, s=1.100286008
Both of my confidence levels contain the null hypothesized value which does support my testing result to fail to reject the null hypothesis. I added the 95% CL above just in case anyone was curious.
My raw data sample size is 82 samples. I separated the sample into two columns of 41 then took off the bottom 16 so each sample now has 25 samples.
xbar_{1}=8.6
xbar_{2}=8.32
Step 1: State the null and alternate hypotheses (H0 & H1 ).
H_{0}: _{1} = _{2}
H_{1}: _{1} ≠ _{2}
My claim is that the means of my two samples are equal.
Step 2: Does the alternate hypothesis cause this to be a onetailed or twotailed test?
I’m going to use a 95% confidence level making this α=0.05. This is a twotailed test (α/2=0.025) due to H_{1}. My sample size will affect the test because it will become more narrow the higher the number (n) of samples we have.
Step 3: Run your test through StatCrunch.
Hypothesis test results:
μ_{1} : Mean of Sample 1
μ_{2} : Mean of Sample 2
μ_{1}  μ_{2} : Difference between two means
H_{0} : μ_{1}  μ_{2} = 0
H_{A} : μ_{1}  μ_{2} ≠ 0
(without pooled variances)
Difference 
Sample Diff. 
Std. Err. 
DF 
TStat 
Pvalue 
μ_{1}  μ_{2} 
0.28 
0.19765289 
42.351867 
1.4166248 
0.1639 
Step 4: Explicitly compare your Pvalue with α, and state whether this comparison leads you to reject the null hypothesis or not.
My Pvalue of 0.1639 is greater than the α value of 0.05 so I fail to reject the null hypothesis.
Step 5: State whether the outcome of Step 4 provides enough evidence to support the claim or not. Before you take this step, make sure you reread the way in which you stated the claim so that you don’t contradict yourself in these last two steps!
There is insufficient evidence to reject the claim that the means of my two samples are equal.
Construct a confidence interval for the difference between your two sample statistics at the same confidence level (level of significance) with which you hypothesis tested for a difference above. Clearly state this confidence interval, and explain how it confirms the results of your hypothesis testing.
95% confidence interval results:
μ_{1} : Mean of Sample 1
μ_{2} : Mean of Sample 2
μ_{1}  μ_{2} : Difference between two means
(without pooled variances)
Difference 
Sample Diff. 
Std. Err. 
DF 
L. Limit 
U. Limit 
μ_{1}  μ_{2} 
0.28 
0.19765289 
42.351867 
0.11878153 
0.67878153 
0.11878153 < μ_{1}  μ_{2 }< 0.67878153
I’m 95% confident that the limits of 0.11878153 and 0.67878153 contain the difference in the two sample means. The limits do contain 0 so the confidence interval suggests that there is not a significant difference between the two means.
I only had one quantitative variable in my raw data, the number of fruit snacks per pouch. So I used the number of pouches per box as a second quantitative variable. I’m going to use the number of pouches per box as the independent variable (x) and the number of fruit snacks per pouch as the dependent variable (y).
2. I have 82 variables. They make the two columns very long so I have attached them at the end.
3. Overlay polynomial order of 1 seemed to fit the best
4. Correlation between X and Y is: 0.16160493(0.1469). I interpret r to be a positive linear between x and y.
5. H_{0}: ρ=0
H_{1}: ρ≠0
A 95% CI would mean α = 0.05. My ρ > α so I fail to reject H_{0 }and conclude that there is not sufficient evidence to support the claim of a linear correlation. (Triola, 507).
6. R^{2}=0.02157961 so about 2.16% of the variation in number of fruit snacks per pouch can be explained by the linear relationship between the number of pouches per box and number of fruit snacks per pouch. This means that about 97.84% of the variation in number of fruit snacks per pouch cannot be explained by the number of pouches per box. (Triola, 505).
7. I read the common errors involving correlation and I do not believe I made any of the three.
Simple linear regression results:
Dependent Variable: Y
Independent Variable: X
Y = 7.25 + 0.125 X
Sample size: 82
R (correlation coefficient) = 0.16160493
Rsq = 0.026116155
Estimate of error standard deviation: 1.0925887
Parameter estimates:
Parameter 
Estimate 
Std. Err. 
Alternative 
DF 
TStat 
PValue 
Intercept 
7.25 
0.81247482 
≠ 0 
80 
8.9233535 
<0.0001 
Slope 
0.125 
0.085342229 
≠ 0 
80 
1.4646911 
0.1469 
Analysis of variance table for regression model:
Source 
DF 
SS 
MS 
Fstat 
Pvalue 
Model 
1 
2.5609756 
2.5609756 
2.1453199 
0.1469 
Error 
80 
95.5 
1.19375 

Total 
81 
98.060976 
9. I answered two yes and one no to the questions on which way to predict a yvalue so I’m going with the y=b_{0}+b_{1}x. I repeated the linear regression and these are the results:
Predicted values:
X value 
Pred. Y 
s.e.(Pred. y) 
95% C.I. for mean 
95% P.I. for new 
10 
8.5 
0.13058932 
(8.240119, 8.759881) 
(6.3102035, 10.689797) 
Predicted values:
X value 
Pred. Y 
s.e.(Pred. y) 
95% C.I. for mean 
95% P.I. for new 
10 
8.5 
0.13058932 
(8.240119, 8.759881) 
(6.3102035, 10.689797) 
10. I can’t do any regressions for other polynomials with a predicted x of 10.
Polynomial Regression Results:
Dependent Variable: Y
Independent Variable: X
At least three unique xvalues are required for a 2nd order computation.
Currently there are only 2 unique xvalues.
Polynomial Regression Results:
Dependent Variable: Y
Independent Variable: X
At least four unique xvalues are required for a 3rd order computation.
Currently there are only 2 unique xvalues.
11. Since I only have one fit, it has to be the best fit. However, I don’t think it fits at all. Since my p value shows that there is no linear correlation between x and y and there is no model for a relationship between my data variables.
12. I don’t have enough unique x values to complete the regressions.
After completing all discussion boards and dealing with my project all semester, I was a little suprised by the findings. I had a good sized sample of 82 but a small range of only 5. With a large sample size and a small range, it wasn't suprising that I had high frequencies but I was suprised that my frequency polygon was almost symmetrical on each side. I was surprised that my measures of central tendencies were very similar. They were all between 89, meaning they were all within 1 number. This project helped me a lot with statistics. This class was really hard but applying the lessons to a sample that I could see and touch helped me connect what I learned to real world matters.
Already a member? Sign in.