StatCrunch logo (home)

Report Properties
Thumbnail:

from Flickr
Created: May 6, 2017
Share: yes
Views: 184
 
Results in this report
 
Data sets in this report
 
Need help?
To copy selected text, right click to Copy or choose the Copy option under your browser's Edit menu. Text copied in this manner can be pasted directly into most documents with formatting maintained.
To copy selected graphs, right click on the graph to Copy. When pasting into a document, make sure to paste the graph content rather than a link to the graph. For example, to paste in MS Word choose Edit > Paste Special, and select the Device Independent Bitmap option.
You can now also Mail results and reports. The email may contain a simple link to the StatCrunch site or the complete output with data and graphics attached. In addition to being a great way to deliver output to someone else, this is also a great way to save your own hard copy. To try it out, simply click on the Mail link.
Week 15 Multiple Regression Analysis vs BiVariate -Scott_Espinoza - 05-06-2017
Mail   Print   Twitter   Facebook

Here we will compare data sets relating to the percentage of votes which were cast for Hillary Clinton in the 2016 Presidential Election by comparing a Bivariate Table analysis to Multiple Regression analysis.

Result 1: Week 13 Correlation White - Poverty - Graduate Degree - Service Industry - GINI Coefficient - Manage   [Info]
Correlation matrix:
d2ptygtbawhitepctpovertygini_coefficientmngmnt_Prof
gtba0.53072216
(<0.0001)
whitepct-0.56937067
(<0.0001)
-0.087339394
(0.1225)
poverty0.19782195
(0.0005)
-0.18287061
(0.0011)
-0.54372512
(<0.0001)
gini_coefficient0.29087381
(<0.0001)
0.1436191
(0.0108)
-0.3977513
(<0.0001)
0.65929109
(<0.0001)
mngmnt_Prof0.20610116
(0.0003)
0.58188636
(<0.0001)
0.024801222
(0.6615)
-0.20249147
(0.0003)
-0.0032339934
(0.9545)
service0.33729307
(<0.0001)
-0.0076227903
(0.893)
-0.35987171
(<0.0001)
0.35118347
(<0.0001)
0.2915259
(<0.0001)
-0.23116951
(<0.0001)

   When we compare the Bivariate results with the regression analysis, we still see a negative relationship with the percentage of the population that is white with respect to the percentage of votes for Clinton. The relationships for Graduate degree, Poverty, and percent of Management Professionals remain pretty constant as well. We can see a direct Inverse relationship where percentage of the White population and Percentage with Graduate degrees, is concerned. A relatively Strong relationship between % white and votes for Clinton, and a strong positive relationship in the case of % with Graduate degrees and votes for Clinton. Just as in the bivariate analysis we did in week 13, we see in the multiple regression analysis, and especially in the T-Stat that their scores are about equally - in inverse directions. This tells me that because these two variables remain statistically significant in both types of analysis, that they are strong indicators to make predictions about the Independent variable (Clinton 2 Party vote).

 

Result 2: Week 15 - Multiple Linear Regression vs BiVariate Analysis   [Info]
Multiple linear regression results:
Dependent Variable: d2pty
Independent Variable(s): gtba, whitepct, poverty, mngmnt_Prof, service
d2pty = 45.246999 + 2.0364789 gtba + -0.41771422 whitepct + -0.15391639 poverty + -0.082936095 mngmnt_Prof + 0.76929017 service

Parameter estimates:
ParameterEstimateStd. Err.AlternativeDFT-StatP-value
Intercept45.2469997.0825806 ≠ 03016.3884906<0.0001
gtba2.03647890.19485991 ≠ 030110.450989<0.0001
whitepct-0.417714220.03944779 ≠ 0301-10.58904<0.0001
poverty-0.153916390.11444042 ≠ 0301-1.34494780.1797
mngmnt_Prof-0.0829360950.12161989 ≠ 0301-0.681928720.4958
service0.769290170.19182905 ≠ 03014.0102903<0.0001

Analysis of variance table for multiple regression model:
SourceDFSSMSF-statP-value
Model547743.0089548.601584.647842<0.0001
Error30133953.955112.80384
Total30681696.963

Summary of fit:
Root MSE: 10.620915
R-squared: 0.5844
R-squared (adjusted): 0.5775

 So, if we compare the bivariate table to the multiple regression analysis we can see the same correlations, in terms of there being two variables which appear to stand out. These variables are again the percentage of the population that is white, and those who have graduate degrees.  According to the two Multiple Regression tables, in which the second analysis includes the Gini Coefficient, according to the R-Squared and the R-Squared (adjusted) we are able to explain about 58% of the correlations with the dependant variable (2 party vote) - Clinton.

In general, an F-test in regression compares the fits of different linear models. Unlike t-tests that can assess only one regression coefficient at a time, the F-test can assess multiple coefficients simultaneously.

 

The F-test of the overall significance is a specific form of the F-test. It compares a model with no predictors to the model that you specify. A regression model that contains no predictors is also known as an intercept-only model.

 

The hypotheses for the F-test of the overall significance are as follows:

 

  • Null hypothesis: The fit of the intercept-only model and your model are equal.
  • Alternative hypothesis: The fit of the intercept-only model is significantly reduced compared to your model.

Based on this definition, it would seem that when you add the Gini-Coefficient as an additional Independent variable, it changes the F-Stat, however the P-Value appears to remain the same, and indicates statistical significance.

The only variable which change significantly in terms of T-stat is Poverty.

My observations from a previous analysis, which stated that the overall seemingl most significant Independent variables for prediction outcomes for the Dependent variable remain, both Percentage of White Population, and the percentage of respondents with Graduate degrees in a given county most strongly determines the percentage of votes for Hillary Clinton in the 2016 Presidential Election.

I am still surprised that those living below the poverty line were not more apt to vote for Hillary, nor were those who are in the Service Industry.

As far as fit of the model, it seems like the two versions of the Multiple Regression model are a relatively accurate fit, but there may be some further analysis necessary to determine which of these variables explains the lions' share of the the percentage of the two party vote which Hillary Clinton received in the 2016 election.

 

Result 3: Week 15 - Multiple Linear Regression - Gini Coefficient   [Info]
Multiple linear regression results:
Dependent Variable: d2pty
Independent Variable(s): gtba, whitepct, poverty, gini_coefficient, mngmnt_Prof, service
d2pty = 50.184491 + 2.0734791 gtba + -0.41806189 whitepct + -0.10141004 poverty + -14.078804 gini_coefficient + -0.083285106 mngmnt_Prof + 0.77626859 service

Parameter estimates:
ParameterEstimateStd. Err.AlternativeDFT-StatP-value
Intercept50.18449110.860262 ≠ 03004.6209283<0.0001
gtba2.07347910.20457677 ≠ 030010.135457<0.0001
whitepct-0.418061890.039494028 ≠ 0300-10.585446<0.0001
poverty-0.101410040.1441447 ≠ 0300-0.703529430.4823
gini_coefficient-14.07880423.457145 ≠ 0300-0.600192580.5488
mngmnt_Prof-0.0832851060.12175073 ≠ 0300-0.684062450.4945
service0.776268590.1923849 ≠ 03004.0349767<0.0001

Analysis of variance table for multiple regression model:
SourceDFSSMSF-statP-value
Model647783.737963.95570.449976<0.0001
Error30033913.233113.04411
Total30681696.963

Summary of fit:
Root MSE: 10.63222
R-squared: 0.5849
R-squared (adjusted): 0.5766

 

Data set 1. counties_sample.xls   [Info]
To analyze this data, please sign in.

HTML link:
<A href="https://www.statcrunch.com/5.0/viewreport.php?reportid=68780">Week 15 Multiple Regression Analysis vs BiVariate -Scott_Espinoza - 05-06-2017</A>

Comments
Want to comment? Subscribe
Already a member? Sign in.

Always Learning