StatCrunch logo (home)

Report Properties
Thumbnail:
Owner: marybooth2
Created: Dec 10, 2013
Share: yes
Views: 1996
Tags:
 
Results in this report
 
Data sets in this report
 
Need help?
To copy selected text, right click to Copy or choose the Copy option under your browser's Edit menu. Text copied in this manner can be pasted directly into most documents with formatting maintained.
To copy selected graphs, right click on the graph to Copy. When pasting into a document, make sure to paste the graph content rather than a link to the graph. For example, to paste in MS Word choose Edit > Paste Special, and select the Device Independent Bitmap option.
You can now also Mail results and reports. The email may contain a simple link to the StatCrunch site or the complete output with data and graphics attached. In addition to being a great way to deliver output to someone else, this is also a great way to save your own hard copy. To try it out, simply click on the Mail link.
Final Project
Mail   Print   Twitter   Facebook

My project is on the variables of fruit snacks per pouch. It was an interesting project. My son was not happy that I opened all the pouches in the boxes and put them in a gallon sized Ziploc bag after I counted them. He like having his own pouch of fruit snacks better than grabbing a handful of them. After I opened the 82 pouches, I found there was not as much variation as I originally thought there would be. Each brand seems to have a standardized number of fruit snacks per pouch or they may package each pouch based on weight. I couldn't find any information on how each brand packages their fruit snacks. Each box only says how many pouches it contains.

After collecting my raw data, I put them into a spreadsheet with information on my primary variable and four auxilliary variables. My variables are as follows:

How many fruit snacks come in each pouch? This primary variable is quantitative because it's numbers that represent counts or measurements. It is discrete because the number of fruit snacks is countable and quantitative because the numbers representing counts or measurements.  

What character is on the box? This question is nomial because it consists of names that cannot be arranged in an ordering scheme.  Since this is a qualitative variable, it cannot be discrete or continuous because it holds no numeric quantitative values in and of itself.

How many pouches come in each box? Interval because the number of pouches can be arranged in order, differences between data values can be found and are meaningful and doesn't have a natural zero. Since this is a qualitative variable, it cannot be discrete or continuous because it holds no numeric quantitative values in and of itself.

What brand is each box?  This question is nomial because it consists of names that cannot be arranged in an ordering scheme. Since this is a qualitative variable, it cannot be discrete or continuous because it holds no numeric quantitative values in and of itself.

What store did the boxes come from?  This question is nomial because it consists of names that cannot be arranged in an ordering scheme. Since this is a qualitative variable, it cannot be discrete or continuous because it holds no numeric quantitative values in and of itself.

The spreadsheet of my variables is here:

Data set 1. Raw Data   [Info]
To analyze this data, please sign in.

My sample size is 82. The boxes of fruit snacks have pouches of either 6 or 10. I have an equal chance of grabbing any number. This is an observational study because I'm only counting and measuring, not modifying the subject. I chose that method because I have no reason to modify the fruit snacks as I'm only counting them. I'm using a cluster method by sampling from a location that is already stocked (a store). It is also stratified sampling because I will choose one from each of a few brands (strata). I will go to three store; WalMart, WinCo and Fred Meyer. At each store I will buy one box of Betty Crocker brand fruit snacks and one box of Kellogg's brand fruit snacks and one box of generic brand fruit snacks. Then I will open each pouch of fruit snacks and count how many fruit snacks came in each pouch. My method of sampling will help me get a simple random sample because each pouch has the same chance of being chosen and each box of fruit snacks has the same chance of being chosen.

After compiling my raw data, I grouped them into frequencies and made graphs to see how they compare. My grouped distributions are as follows:

Data set 2. Organizing & Displaying Data   [Info]
To analyze this data, please sign in.

My histogram is roughly bell shaped; it starts out low, increases to a maximum then decreases. Both sides look almost symmetric. This is considered a normal distribution pattern. 

Histogram

 

My frequency polygon is also roughly bell shaped, like my histogram. It's almost exactly symmetric as it increases to a maximum and then decreases back down. Both sides look symmetric with an even slope. This is also considered a normal distribution pattern.

Frequency Polygon
My ogive gets higher as it runs from left to right. It reflects a normal distribution because it has line segments that run on either side of the class midpoint of 9.5 and are fairly equal in slope and length. 

Ogive

My pie chart doesn't reveal much of anything other than the frequency of the bins with the number and percentage per bin. Pie charts are not reliable as a source of information and can often be misleading. Pie charts should be used as rarely as possible.
Pie Chart


My frequency distributions were normally distributed. So I took a look at my measures of central tendencies. All of my measures of central tendencies are very close. They vary slightly. The range of the number of fruit snacks in pouches is only 5 and I counted 82 pouches. That’s not a large range for so many pouches. The mean is always the most accurate since it’s the sum of all fruit snacks divided by the number of all pouches; however you can’t have .43 of a fruit snack. The midrange is closest to the mean. Since the mean isn’t reliable in my case, the median best describes my data. The mode is a whole number that is closest to the mean. The meridian is the worst description of my data. The meridian isn’t always reliable. The meridian is the number that just happens to be in the middle of all the values and it can be far away from the mean. However since all my central tendencies are very close; 8, 8, 8.43, 8.5, any of them can be used to best describe my data. My intervals of 8-9 had a frequency of 59 which was over 4 times higher than my intervals of 6-7 and 10-11. Since all four of my central tendencies were in between the 8-9 intervals, I think it's safe to use any of the central tendencies to describe my data.

Result 1: Summary Stats for Measuring and Summarizing Data   [Info]
Summary statistics:
Column n Mean Variance Std. dev. Std. err. Median Range Min Max Q1 Q3
# of Fruit Snacks 82 8.4268293 1.2106293 1.100286 0.12150626 8 5 6 11 8 9

Data set 3. More Data Statistics   [Info]
To analyze this data, please sign in.

My stem and leaf plot looks different in StatCrunch. It uses 0’s as markers for each instance of the stem number. For example, I had four pouches that had seven fruit snacks in them so in StatCrunch the stem and leaf plot looked like this; 7: 0000. To me that’s a little confusing as I interpret it as there being four instances of 70. I also liked the way it looked when I created it in Excel.

Result 2: Stem and Leaf Plot   [Info]
Variable: # of Fruit Snacks

Decimal point is at the colon.
 6 : 0000000
 6 : 
 7 : 0000
 7 : 
 8 : 0000000000000000000000000000000
 8 : 
 9 : 0000000000000000000000000000
 9 : 
10 : 00000000000
10 : 
11 : 0

The boxplot in StatCrunch  isn’t as informative as the one I created in Excel. The one in StatCrunch only shows the Minimum, Quartile 1, Median, Quartile 3 and Maximum but it doesn’t allow us to customize the boxplot. In Excel we were able to label the Min, Q1, Median, Q3, and Max and I found that helpful. StatCrunch also doesn’t start the boxplot at 0 so it skews the whole boxplot and makes it unusable.  

Result 3: Boxplot   [Info]
Right click to copy

While my measures of central tendencies were all pretty close, my sample statistics were a little confusing. I only had a range of 5 which was smaller than I was expecting for a sample of 82 pouches. I calculated the sample statistics for the number of pouches that had 9 fruit snacks. (I took the middle number of my range (3) and added it to the lowest number of fruit snacks per pouch (6) and came up with the number 9 so I tested that.)

1. Sample Statistics

For the p-hat, I counted the number of fruit snack pouches that had 9 fruit snacks in them. There were 28 pouches that had 9 fruit snacks so I used 28/82=0.34.

µ (x-bar) – 8.43 (8.426829268)

s = 1.100286021

Value Range – 9 fruit snacks in the pouch

P-hat – 28/82 = 0.34 (0.3414634146)

Q-hat – 54/82 = 0.66 (0.6585365854)

 

2. 68% Confidence Intervals

a. for the population mean – 8.3052501 < µ < 8.5484085

b. proportion that meet the value range you selected – 0.28938695 < p < 0.39353988

c. standard deviation – 1.023562602 < σ < 1.197453506, s=1.100286008

3. 90% Confidence Intervals

d. for the population mean – 8.2246569 < µ < 8.6290016

e. proportion that meet the value range you selected – 0.25532788 < p < 0.42759895

f. standard deviation – 0.9756845904 < σ < 1.265185915, s=1.100286008

Margins of Error

a. 0.1215792

b. 0.052076465

c. 0.086945452

d. 0.20217235

e. 0.086135535

f. 0.1447506623

4. How confident are you that the 90% confidence interval estimate of the mean contains the true mean?

90% confident that the confidence interval of 8.2246569 - 8.6290016 actually contains true population mean.

5. What is the probability that your 90% confidence interval estimate of the mean contains the population average of your variable?

I’m having a hard time with this one. I understand the confidence level, which means that we’re 90% confident that the interval of 8.2246569 < µ < 8.6290016 contains the true mean. What I’m not understanding is the probability part. According to the book, there is no probability in confidence intervals. The confidence interval contains the population mean or it doesn’t, no probability about it. (Triola, 326-327). 

6. Can you interpret the answer to #3 to say that 90% of the population (data values) will fit within the 90% confidence interval?

No. 90% of the population may not fit into 90% confidence interval.

7. Do you believe that your confidence interval estimates contain:

The actual population mean: Yes. The confidence interval contains my sample mean so it is likely that the confidence interval contains the population mean.

The actual proportion: Yes. The value range I chose is within my sample but it isn’t the median or lowest range or highest range in my sample so it is likely that actual proportion is in the confidence interval.

The actual standard deviation? Yes. The confidence interval contains my sample standard deviation so it is likely that the confidence interval contains the population standard deviation.

It’s nearly impossible for me to make a guess about the standard deviation for the population of fruit snacks. There are many different brands (I tested only three), with each brand having different numbers of fruit snacks per pouch. There was even some pouches that had different numbers of fruit snacks between pouches in the same box. So I’m making a “best guess” on a standard deviation.  I did a 95% confidence interval for my data which is 0.9538325639 < σ < 1.300289429. I then found the midrange and will be using that to test my standard deviation.

 I’m going to be brave and test for the standard deviation. It was one of the few that I got correct the first try.

2)      My sample standard deviation is σ = 1.100286021

3)      Since I am going to be testing at a 95% confidence level, a=0.05

I’m going to be testing the claim that the standard deviation for the population of fruit snacks is higher than 1.127060996.

a.      H0: σ = 1.127060996, H1: σ > 1.127060996

2)     Because we are testing at 95%, 1-0.95=0.05 which means a=0.05. This significance level makes it a right tail test. Since I am testing standard deviation, this will be a chi-square distribution. I tested 82 pouches of fruit snacks so my degrees of freedom will be 81.

3)     Since I’m testing the standard deviation, I had to square my original sample deviation to get σ2 = 1.270266489

Hypothesis test results:
σ2 : Variance of variable
H0 : σ2 = 1.2702665
HA : σ2 > 1.2702665

Variable

Sample Var.

DF

Chi-Square Stat

P-value

var1

1.2106293

81

77.197168

0.5991

My P-value of 0.5991 is more than a=0.05. Since P > a, I fail to reject the null hypothesis.

5)    There is not sufficient evidence to support the claim that the standard deviation is greater than 1.127060996.

6)      My Confidence Levels are below:

a.      68%: 1.023562602 < σ < 1.197453506, s=1.100286008

b.      90%:  0.9756845904 < σ < 1.265185915, s=1.100286008

c.       95%: 0.9538325639 < σ < 1.300289429, s=1.100286008

Both of my confidence levels contain the null hypothesized value which does support my testing result to fail to reject the null hypothesis. I added the 95% CL above just in case anyone was curious.

My raw data sample size is 82 samples. I separated the sample into two columns of 41 then took off the bottom 16 so each sample now has 25 samples.

x-bar1=8.6

x-bar2=8.32

Step 1: State the null and alternate hypotheses (H0 & H1 ).

H0: 1 = 2

H1: 12

My claim is that the means of my two samples are equal.

Step 2: Does the alternate hypothesis cause this to be a one-tailed or two-tailed test?

I’m going to use a 95% confidence level making this α=0.05. This is a two-tailed test (α/2=0.025) due to H1. My sample size will affect the test because it will become more narrow the higher the number (n) of samples we have.

Step 3: Run your test through StatCrunch.

Hypothesis test results:

μ1 : Mean of Sample 1
μ2 : Mean of Sample 2
μ1 - μ2 : Difference between two means
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 ≠ 0
(without pooled variances)

Difference

Sample Diff.

Std. Err.

DF

T-Stat

P-value

μ1 - μ2

0.28

0.19765289

42.351867

1.4166248

0.1639

Step 4: Explicitly compare your P-value with α, and state whether this comparison leads you to reject the null hypothesis or not.

My P-value of 0.1639 is greater than the α value of 0.05 so I fail to reject the null hypothesis.

Step 5: State whether the outcome of Step 4 provides enough evidence to support the claim or not. Before you take this step, make sure you re-read the way in which you stated the claim so that you don’t contradict yourself in these last two steps!

There is insufficient evidence to reject the claim that the means of my two samples are equal.

Construct a confidence interval for the difference between your two sample statistics at the same confidence level (level of significance) with which you hypothesis tested for a difference above. Clearly state this confidence interval, and explain how it confirms the results of your hypothesis testing.

95% confidence interval results:

μ1 : Mean of Sample 1
μ2 : Mean of Sample 2
μ1 - μ2 : Difference between two means
(without pooled variances)

Difference

Sample Diff.

Std. Err.

DF

L. Limit

U. Limit

μ1 - μ2

0.28

0.19765289

42.351867

-0.11878153

0.67878153

 

-0.11878153 < μ1 - μ2 < 0.67878153

I’m 95% confident that the limits of -0.11878153 and 0.67878153 contain the difference in the two sample means. The limits do contain 0 so the confidence interval suggests that there is not a significant difference between the two means.

I only had one quantitative variable in my raw data, the number of fruit snacks per pouch. So I used the number of pouches per box as a second quantitative variable. I’m going to use the number of pouches per box as the independent variable (x) and the number of fruit snacks per pouch as the dependent variable (y).

2.      I have 82 variables. They make the two columns very long so I have attached them at the end.

3.     Overlay polynomial order of 1 seemed to fit the best

4.    Correlation between X and Y is: 0.16160493(0.1469). I interpret r to be a positive linear between x and y.

5.      H0: ρ=0
H1: ρ≠0
A 95% CI would mean α = 0.05. My ρ > α so I fail to reject H0 and conclude that there is not sufficient evidence to support the claim of a linear correlation. (Triola, 507).

6.      R2=0.02157961 so about 2.16% of the variation in number of fruit snacks per pouch can be explained by the linear relationship between the number of pouches per box and number of fruit snacks per pouch.  This means that about 97.84% of the variation in number of fruit snacks per pouch cannot be explained by the number of pouches per box. (Triola, 505).

7.      I read the common errors involving correlation and I do not believe I made any of the three.

             Simple linear regression results:

    Dependent Variable: Y 
Independent Variable: X 
Y = 7.25 + 0.125 X
Sample size: 82
R (correlation coefficient) = 0.16160493
R-sq = 0.026116155

Estimate of error standard deviation: 1.0925887

Parameter estimates:

Parameter

Estimate

Std. Err.

Alternative

DF

T-Stat

P-Value

Intercept

7.25

0.81247482

≠ 0

80

8.9233535

<0.0001

Slope

0.125

0.085342229

≠ 0

80

1.4646911

0.1469


Analysis of variance table for regression model:

Source

DF

SS

MS

F-stat

P-value

Model

1

2.5609756

2.5609756

2.1453199

0.1469

Error

80

95.5

1.19375

   

Total

81

98.060976

     

9. I answered two yes and one no to the questions on which way to predict a y-value so I’m going with the y=b0+b1x. I repeated the linear regression and these are the results:

    Predicted values:

X value

Pred. Y

s.e.(Pred. y)

95% C.I. for mean

95% P.I. for new

10

8.5

0.13058932

(8.240119, 8.759881)

(6.3102035, 10.689797)

 

Predicted values:

X value

Pred. Y

s.e.(Pred. y)

95% C.I. for mean

95% P.I. for new

10

8.5

0.13058932

(8.240119, 8.759881)

(6.3102035, 10.689797)

10. I can’t do any regressions for other polynomials with a predicted x of 10.


Polynomial Regression Results:
Dependent Variable: Y
Independent Variable: X 
At least three unique x-values are required for a 2nd order computation.
Currently there are only 2 unique x-values.

Polynomial Regression Results:
Dependent Variable: Y
Independent Variable: X 
At least four unique x-values are required for a 3rd order computation.
Currently there are only 2 unique x-values.

11. Since I only have one fit, it has to be the best fit. However, I don’t think it fits at all. Since my p value shows that there is no linear correlation between x and y and there is no model for a relationship between my data variables.

12. I don’t have enough unique x values to complete the regressions.

After completing all discussion boards and dealing with my project all semester, I was a little suprised by the findings. I had a good sized sample of 82 but a small range of only 5. With a large sample size and a small range, it wasn't suprising that I had high frequencies but I was suprised that my frequency polygon was almost symmetrical on each side. I was surprised that my measures of central tendencies were very similar. They were all between 8-9, meaning they were all within 1 number. This project helped me a lot with statistics. This class was really hard but applying the lessons to a sample that I could see and touch helped me connect what I learned to real world matters.

HTML link:
<A href="https://www.statcrunch.com/5.0/viewreport.php?reportid=37095">Final Project</A>

Comments
Want to comment? Subscribe
Already a member? Sign in.

Always Learning