In my data, there aren’t any outliers as all of the data follows the same linear pattern. The data shows a strong positive correlation. Since the data is strongly correlated, an appropriate significance level for the data would be .01 as the correlation is almost perfect and the Plevels for both calories and total fat are less than .0001. Since the slope is .0684 and the yintercept is 7.8836, the line of best fit is y=.0684x7.8836.
Simple linear regression results:
Dependent Variable: Total Fat (g) Independent Variable: Calories Total Fat (g) = 7.8836043 + 0.068410501 Calories Sample size: 126 R (correlation coefficient) = 0.94076085 Rsq = 0.88503097 Estimate of error standard deviation: 6.2098677 Parameter estimates:
Analysis of variance table for regression model:

The R value, also known as the correlation coefficient, is .94. Since this value in very close to 1, it validates the notion that the data shows a strong positive correlation. The Pvalues for both the intercept and slope are less than .0001, indicating strong evidence against the null hypothesis, or in other words, a .01% chance that the null hypothesis is true. These extremely small Pvalues also indicate that the terms are statistically significant and the results are not simply a product of random sampling. The line of best fit extends from the bottom left hand corner of the graph to the upper right hand corner of the graph, demonstrating a positive correlation. The line accurately represents the data and its strong correlation as it runs through the area in which the data points are concentrated in. R^2 is .88, and since it is close to 1, this indicates that the line of regression, also known as the line of best fit, accurately fits the data.
The data did not contain outliers. The data is correlated as evidenced by its strong positive trend and correlation. Despite this strong correlation, causation cannot be assumed. For example, although grams of total fat increases as the amount of calories increases, other nutritional factors that accompany a heavy amount of calories may serve as an underlying cause prompting the total fat to increase linearly.
The expected values follow a normal distribution as both sets evidently derive from a population within the same distribution, shown by the data falling directly on the reference line.
The residual plot implies that the linear model is a good fit because it is symmetrically distributed and most of the data points are conglomerated in the middle of the graph while the points do not exhibit an obvious pattern.
After changing the significance level to .99, I did not get different results.
The linear model for the cluster sample, which contains a sample size of 50 per variable (calories and total fat), is much different than it is for the whole data set. While the entire data set maintains a strong positive correlation, the sample maintains a random distribution with no correlation.
Multiple linear regression results:
Dependent Variable: Total Fat (g) Independent Variable(s): Calories, Serving Size (g), Saturated Fat (g), Trans Fat (g), Sodium (mg), Carbs (g), Sugars (g), Protein (g) Total Fat (g) = 0.2483677 + 0.091703224 Calories + 0.0039963241 Serving Size (g) + 0.47823524 Saturated Fat (g) + 2.0581851 Trans Fat (g) + 0.00074598736 Sodium (mg) + 0.32706204 Carbs (g) + 0.08810608 Sugars (g) + 0.30616988 Protein (g) Parameter estimates:
Analysis of variance table for multiple regression model:
Summary of fit: Root MSE: 2.5309629 Rsquared: 0.9809 Rsquared (adjusted): 0.9795 
The Pvalues for calories, saturated fat, carbs, and protein are are less than .0001, indicating strong evidence against the null hypothesis and that the results were not obtained by a random sample. Trans Fat and sugars also maintain a very low Pvalue, indicating the same properties. In contrast, the Pvalues for the intercept, serving size, and sodium are all larger than .05, demonstrating weak evidence against the null hypothesis. R^2 is very close to one, indicating that the line of regression, also known as the line of best fit, accurately fits the data.
Already a member? Sign in.
Nov 15, 2017
Excellent report. One point about the cluster sample regression: the cluster should be selected in a clustered way, like calories and fat in McDonald's or Burger King, etc.