StatCrunch logo (home)

Report Properties
Thumbnail:

from Flickr
Owner: erj16b
Created: Nov 12, 2017
Share: yes
Views: 314
Tags:
 
Results in this report
 
Data sets in this report
 
Need help?
To copy selected text, right click to Copy or choose the Copy option under your browser's Edit menu. Text copied in this manner can be pasted directly into most documents with formatting maintained.
To copy selected graphs, right click on the graph to Copy. When pasting into a document, make sure to paste the graph content rather than a link to the graph. For example, to paste in MS Word choose Edit > Paste Special, and select the Device Independent Bitmap option.
You can now also Mail results and reports. The email may contain a simple link to the StatCrunch site or the complete output with data and graphics attached. In addition to being a great way to deliver output to someone else, this is also a great way to save your own hard copy. To try it out, simply click on the Mail link.
State Population and Area Codes Part 2
Mail   Print   Twitter   Facebook

Data set 1. State Population and Area Codes   [Info]
To analyze this data, please sign in.

The two quantitative variables in this data are the state populations and the number of area codes each state has.

Result 1: Scatter Plot Result 1-   [Info]
Right click to copy

Looking at the two variables, State Population (x) and Number of Area Codes (y), there is a strong positive correlation between the population size and number of area codes. There infact is a linear relationship between the two variables. Since the number of area codes increase with population size, it may appear that California is an outlier since it has 16 area codes, but being that it appears to fall within the general trend, it is not an outlier. What's making California stand out from the rest of the states is it's great population. But similar to the other states, the greater the population, the greater the number of area codes. An appropriate significance level would be .01, taking into account a 1% risk factor since a few states have a greater population but still the same amount of area codes as those with fewer populations.

Result 2: Simple Linear Regression result 2   [Info]
Simple linear regression results:
Dependent Variable: Number of Area Codes
Independent Variable: Population (2000)
Number of Area Codes = 0.81333482 + 4.8088711e-7 Population (2000)
Sample size: 50
R (correlation coefficient) = 0.96382456
R-sq = 0.92895778
Estimate of error standard deviation: 0.83268969

Parameter estimates:
ParameterEstimateStd. Err.AlternativeDFT-StatP-value
Intercept0.813334820.15981091 ≠ 0485.0893574<0.0001
Slope4.8088711e-71.9194765e-8 ≠ 04825.053034<0.0001

Analysis of variance table for regression model:
SourceDFSSMSF-statP-value
Model1435.19814435.19814627.65452<0.0001
Error4833.2818610.69337211
Total49468.48

The correlation coefficient is .9638 (rounded to four decimal places). Any correlation coefficient that starts with .9 represents a strong positive correlation. These data terms are extremely signifcant at the .01 level being that there is a very strong positive correlation between state population and the number of area codes. The line of best fit, y=mx+b, is y= 4.8089x + 0.8133. This means that y, the number of area codes, is equivalent to every 4.8089 units times x(population), in addition to .8133. R-sq tells how close the data is to the "line of best fit". Being that .929 is still pretty strong, it can be predicted that knowing X helps you predict Y since there is a linear relationship between the two variables. 

Result 3: Simple Linear Regression W/ fitted line result 3   [Info]
Right click to copy

Analyzing the line of best fit and the scatter plot, this is a good fit for the data. The data is correlated and causes causation because the overall trend shows as the population increases so does the number of area codes.

Result 4: Simple Linear Regression QQ Plot result 4   [Info]
Right click to copy

Looking at the QQ plot of residuals, my expected values follow a normal distribution. You can tell because the data points are following a straight line that is fairly straight amongst the quantiles.

Result 5: Simple Linear Regression result 5   [Info]
Right click to copy

Looking at the graph, the residual plot implies that the linear model is not a good fit. Being that on this graph, the data points are not evenly distributed and has a clear pattern (being clustered around the bottom right), so the residuals are not correlated with the predicted values.

HTML link:
<A href="https://www.statcrunch.com/5.0/viewreport.php?reportid=73683">State Population and Area Codes Part 2</A>

Comments
Want to comment? Subscribe
Already a member? Sign in.
By xg15
Nov 18, 2017

Nice report.

Always Learning