StatCrunch logo (home)

Report Properties
Created: Nov 30, 2006
Share: yes
Views: 3415
Tags:
 
Results in this report
None
 
Data sets in this report
 
Need help?
To copy selected text, right click to Copy or choose the Copy option under your browser's Edit menu. Text copied in this manner can be pasted directly into most documents with formatting maintained.
To copy selected graphs, right click on the graph to Copy. When pasting into a document, make sure to paste the graph content rather than a link to the graph. For example, to paste in MS Word choose Edit > Paste Special, and select the Device Independent Bitmap option.
You can now also Mail results and reports. The email may contain a simple link to the StatCrunch site or the complete output with data and graphics attached. In addition to being a great way to deliver output to someone else, this is also a great way to save your own hard copy. To try it out, simply click on the Mail link.
Tax-Deferred Retirement Plan
Mail   Print   Twitter   Facebook

Mallory Parsons
STP 420
Project
Retire Plan
December 5, 2006

Preliminary Analysis

First, I carefully examined each of the variables. I found their means, standard deviations, and minimum and maximum values.  I  then looked at the graphs of each of the variables to determine their shape and distribution.
Column Statistics
Column	n	Mean	Variance	Std. Dev.	Std. Err.	Median	Range	Min	Max	Q1	Q3
Children	194	1.984536	0.9064954	0.9521005	0.068356834	2	4	0	4	1	3
Salary	194	70911.234	5.5087622E8	23470.752	1685.1018	72355	121290	10770	132060	53110	86990
Mortgage	194	79117.32	9.0878208E8	30146.012	2164.3577	78415	180390	5070	185460	55640	99330
Debt	194	9857.525	2.400828E7	4899.8247	351.7869	9890	23670	0	23670	6470	13180
Invested	193	7.2067356	20.56209	4.534544	0.32640362	7.3	15	0	15	3.6	10.4
Summary statistics: 


Looking at the results, there do not seem to be any extreme values in the minimum or maximum that we would expect to have an excessive influence on the results. The mean number of children is about 2 and the mean salary was higher than the mortgage and debt as we might expect. 

 
Histogram
 
Histogram
 
Histogram
 
Histogram
 
I decided to use histograms to determine the shapes of the distributions of the four variables.  The variable Children is roughly normal with a high mode at 2. Salary is also fairly normally distributed in a bell-shape, and it seems to have no outliers. Mortgage has a normal shape to it, but is not nearly as normally distributed as Children and Salary. There is also a gap in the graph and there seems to be a high value of 185460. Debt seems to be right skewed  with a high value of 23670. The invested percents seem to be roughly right skewed which means that more people invested at a lower percentage and less people invested at a higher percentage.

QQ Plot
 QQ Plot
 QQ Plot
 

QQ Plot
 
QQ Plot
 
Looking at the normal quantile plots, we can see that the variables seems to be roughly normally distributed.  The variables all seem to follow the straight line.However, the multiple regression model does not require any of these distributions to be normal. 

The response variable is the percentage of combined income invested in tax-deferred retirement plans. The explanatory variables are the number of dependent children, the combined annual salary of the husband and wife, the current mortgage on the home, and the average amount of other debt. I will use the software StatCrunch to illustrate the outputs of this case study.

Relationships between Pairs of Variables

The second step in my analysis is to examine the relationships between all pairs of variables. I will use scatterplots and correlations to determine their relationship. I will also test the a null hypothesis for each pair that the population correlation is 0 versus the two sided alternative for each pair of variables.  I will test H0 : ñ = 0 verse the two sided alternative for each pair of variables. 

Correlation
Correlation matrix: 
	Children	Salary	Mortgage	Debt
Salary	-0.2471933			
Mortgage	0.31415617	0.46826112		
Debt	0.4368945	0.1269236	0.42571983	
Invested	-0.4274252	0.40030798	-0.21422993	-0.40889457





Scatter Plot

 
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Salary 
Independent Variable: Children 
Salary = 83026.66 - 6104.9116 Children 
Sample size: 194 
R (correlation coefficient) = -0.2476 
R-sq = 0.061329626 
Estimate of error standard deviation: 22798.78
Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	6.520511E9	6.520511E9	12.544646	0.0005
Error	192	9.9798598E10	5.19784352E8		
Total	193	1.0631911E11			







Scatter Plot
 
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Mortgage 
Independent Variable: Children 
Mortgage = 59100.156 + 10086.571 Children 
Sample size: 194 
R (correlation coefficient) = 0.3186 
R-sq = 0.10148291 
Estimate of error standard deviation: 28649.766
Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	1.77995899E10	1.77995899E10	21.68542	<0.0001
Error	192	1.57595353E11	8.2080909E8		
Total	193	1.75394947E11			



Scatter Plot
 

Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Debt 
Independent Variable: Children 
Debt = 5342.444 + 2275.132 Children 
Sample size: 194 
R (correlation coefficient) = 0.4421 
R-sq = 0.19544195 
Estimate of error standard deviation: 4406.434
Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	9.0559949E8	9.0559949E8	46.64033	<0.0001
Error	192	3.72799872E9	1.941666E7		
Total	193	4.6335985E9			

Scatter Plot
 

Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Mortgage 
Independent Variable: Salary 
Mortgage = 36743.96 + 0.5975549 Salary 
Sample size: 194 
R (correlation coefficient) = 0.4652 
R-sq = 0.21644618 
Estimate of error standard deviation: 26754.223 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	36743.96	6127.244	192	5.996817	<0.0001
Slope	0.5975549	0.08205153	192	7.2826786	<0.0001

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	3.7963563E10	3.7963563E10	53.037407	<0.0001
Error	192	1.37431376E11	7.1578842E8		
Total	193	1.75394947E11			

Scatter Plot
 

Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Debt 
Independent Variable: Salary 
Debt = 8030.2856 + 0.025767995 Salary 
Sample size: 194 
R (correlation coefficient) = 0.1234 
R-sq = 0.015235411 
Estimate of error standard deviation: 4875.002 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	8030.2856	1116.4716	192	7.1925573	<0.0001
Slope	0.025767995	0.014950961	192	1.7235008	0.0864

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	7.0594776E7	7.0594776E7	2.9704552	0.0864
Error	192	4.5630034E9	2.3765642E7		
Total	193	4.6335985E9			



Scatter Plot
 
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Debt 
Independent Variable: Mortgage 
Debt = 4305.2764 + 0.07017742 Mortgage 
Sample size: 194 
R (correlation coefficient) = 0.4318 
R-sq = 0.18642043 
Estimate of error standard deviation: 4431.07 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	4305.2764	895.5033	192	4.807661	<0.0001
Slope	0.07017742	0.010580351	192	6.632807	<0.0001

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	8.6379731E8	8.6379731E8	43.994125	<0.0001
Error	192	3.76980096E9	1.963438E7		
Total	193	4.6335985E9			

Scatter Plot
 
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Invested 
Independent Variable: Children 
Invested = 11.237425 - 2.0364475 Children 
Sample size: 193 
R (correlation coefficient) = -0.4274 
R-sq = 0.18269232 
Estimate of error standard deviation: 4.110175 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	11.237425	0.6841511	191	16.425356	<0.0001
Slope	-2.0364475	0.31166583	191	-6.534074	<0.0001

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	721.2549	721.2549	42.69412	<0.0001
Error	191	3226.6663	16.893541		
Total	192	3947.9211			






Scatter Plot
 
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Invested 
Independent Variable: Salary 
Invested = 1.7339833 + 7.714847E-5 Salary 
Sample size: 193 
R (correlation coefficient) = 0.4003 
R-sq = 0.16024648 
Estimate of error standard deviation: 4.166232 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	1.7339833	0.9548239	191	1.8160242	0.0709
Slope	7.714847E-5	1.2778865E-5	191	6.0371923	<0.0001

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	632.6405	632.6405	36.447693	<0.0001
Error	191	3315.2808	17.35749		
Total	192	3947.9211			








Scatter Plot
 
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Invested 
Independent Variable: Mortgage 
Invested = 9.752726 - 3.225372E-5 Mortgage 
Sample size: 193 
R (correlation coefficient) = -0.2142 
R-sq = 0.045894463 
Estimate of error standard deviation: 4.4408464 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	9.752726	0.89872855	191	10.851692	<0.0001
Slope	-3.225372E-5	1.0640969E-5	191	-3.0310886	0.0028

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	181.18773	181.18773	9.187498	0.0028
Error	191	3766.7334	19.721117		
Total	192	3947.9211			







Scatter Plot
 


Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Invested 
Independent Variable: Debt 
Invested = 10.945596 - 3.8119787E-4 Debt 
Sample size: 193 
R (correlation coefficient) = -0.4089 
R-sq = 0.16719478 
Estimate of error standard deviation: 4.14896 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	10.945596	0.6736084	191	16.249197	<0.0001
Slope	-3.8119787E-4	6.155937E-5	191	-6.192362	<0.0001

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	660.07184	660.07184	38.345345	<0.0001
Error	191	3287.8494	17.213871		
Total	192	3947.9211			







From the output, we see that the correlation between Children and Salary is -0.247 with a p- value of 0.0005. The correlation between Children and Mortgage is 0.3142 with a p-value < 0.0001. The correlation between Children and Debt is 0.4369 with a p-value < 0.0001 and the correlation between Children and Invested Percents is -.04274 with a p-value < 0.0001.  Therefore  these are all statistically significant and we conclude that the population correlation is not zero. Children correlated the strongest with Debt, however the correlation with Invested is very similar. The correlation between Children and Salary, and Children and Invested is negative. Therefore the two variables are negatively associated meaning, the higher the number of chidren the lower the salary and invested percent and the lower the number of the children the higher the salary and invested percent. 

The correlation between Salary and Mortgage is 0.4683 with a p-value <0.0001 and the correlation between Salary and Invested is 0.4003 with a p-value of p<0.0001. Therefore both are statistically significant and highly correlated. The correlation between Salary and Debt is 0.1269 with a p-value of 0.0864. Therefore, we do not reject the null hypothesis that the population correlation is zero. 

The correlation nbetween Mortgage and Debt is 0.4257 with a p-value < 0.0001 and the correlation between Mortgage and Invested is -0.2142 with a p-value of 0.0028. Therefore both are statistically significant and Mortgage and Invested are negatively associtated. 

The correlation between Debt and Invested is -0.4089 with a p-value < 0.0001 therefore it is statistically significant and the two variables are negatively associated and highly correlated. 

Children, Salary, and Debt all have a higher correlation with Invested Percents than does Mortgage. However, Children, Mortgage, and Debt are all negatively associated with invested while Salary is positively associated. The number of children have the highest correlation with invested percent ( r=-0.43). Salary and Mortgage also have a high correlation with each other (r=0.47) as do Mortgage and Debt (r=0.43) and Children and Debt (r=0.44). Salary and Debt have the lowest correlation (r=0.13). 

Regression

To explore the relationship between the explanatory variables and our response variable Invested percents, I will run sever multiple regressions. I will begin my analysis by using each explanatory to predict Invested Percent. 

Regression on Children
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Invested 
Independent Variable: Children 
Invested = 11.237425 - 2.0364475 Children 
Sample size: 193 
R (correlation coefficient) = -0.4274 
R-sq = 0.18269232 
Estimate of error standard deviation: 4.110175 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	11.237425	0.6841511	191	16.425356	<0.0001
Slope	-2.0364475	0.31166583	191	-6.534074	<0.0001

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	721.2549	721.2549	42.69412	<0.0001
Error	191	3226.6663	16.893541		
Total	192	3947.9211			


The Null Hypothesis is H0: â1 = 0 and the Alternative Hypothesis is Ha: â1 does not equal zero.  We are testing to see if the slope is different from zero. Since I am only using one variable, I will consider the t-test. For the slope, the t= -6.534 with a p-value < 0.0001. Therefore, I can conclude that the slope is significantly different from zero. The regression equation is Invested = 11.237 – 2.036 Children with S= 4.11 and R^2 = 0.183. So 18.3% of the observed variation in the Invested Percent is explained by the linear regression on the number of children. Although the P-value is very small, the model does not explain very much of the variation in Invested Percents.
Residuals
 

I will now examine the residuals to help determine whether the regression model was appropriate for the data. I will examine the residuals themselves, and then the residuals verse all the explanatory variables. 
Simple Linear Regression
 
Stem and Leaf
Variable: Residuals 
-0 : 988777776666555555555555555555 
-0 : 4444444444444333333333333333322222222222222222222211111111111111 
0 : 00000000000011111111111111122222222222222333333333333333333344444444444 
0 : 5555566666666777778888888899 
Looking at the quantile plot and the stemplot, we can see that the residuals seem to have a roughly normal distribution. 
Simple Linear Regression
 

Scatter Plot
 
Scatter Plot
 
Scatter Plot
 
The plot of residuals verse Salary reveal a positive association with scatter around zero. The plot of residuals verse Mortgage and Debt show a negative association.
Regression on Salary
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Invested 
Independent Variable: Salary 
Invested = 1.7339833 + 7.714847E-5 Salary 
Sample size: 193 
R (correlation coefficient) = 0.4003 
R-sq = 0.16024648 
Estimate of error standard deviation: 4.166232 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	1.7339833	0.9548239	191	1.8160242	0.0709
Slope	7.714847E-5	1.2778865E-5	191	6.0371923	<0.0001

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	632.6405	632.6405	36.447693	<0.0001
Error	191	3315.2808	17.35749		
Total	192	3947.9211			


The Null Hypothesis is H0: â1 = 0 and the Alternative Hypothesis is Ha: â1 does not equal zero.  We are testing to see if the slope is different from zero. Since I am only using one variable, I will consider the t-test. For the slope, the t= 6.037 with a p-value < 0.0001. Therefore, I can conclude that the slope is significantly different from zero. The regression equation is Invested = 1.734 + .000077Salary with S= 4.166 and R^2 = 0.16. So 16% of the observed variation in the Invested Percent is explained by the linear regression on the number of children. Although the P-value is very small, the model does not explain very much of the variation in Invested Percents.

Simple Linear Regression
 
Residuals
I will now examine the residuals to help determine whether the regression model was appropriate for the data. I will examine the residuals themselves, and then the residuals verse all the explanatory variables. 
 
Stem and Leaf
Variable: Residuals 
-0 : 998888777777766666666666655555555 
-0 : 4444444444333333333333322222222222211111111111111 
0 : 00000000000000000000011111111111111111122222222222222222233333333333333444444444 
0 : 555555555556666666677777778899 
1 : 0
The Residuals appear to be more normal for Investment Percent verse Salary.
Simple Linear Regression
 
Scatter Plot
 
Scatter Plot
 
Scatter Plot
 

The residuals on Salary seem to be scattered among zero. The residuals against Debt, Children and Mortgage all seem to be negatively associated.
Regression on Mortgage
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Invested 
Independent Variable: Mortgage 
Invested = 9.752726 - 3.225372E-5 Mortgage 
Sample size: 193 
R (correlation coefficient) = -0.2142 
R-sq = 0.045894463 
Estimate of error standard deviation: 4.4408464 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	9.752726	0.89872855	191	10.851692	<0.0001
Slope	-3.225372E-5	1.0640969E-5	191	-3.0310886	0.0028

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	181.18773	181.18773	9.187498	0.0028
Error	191	3766.7334	19.721117		
Total	192	3947.9211			


The Null Hypothesis is H0: â1 = 0 and the Alternative Hypothesis is Ha: â1 does not equal zero.  We are testing to see if the slope is different from zero. Since I am only using one variable, I will consider the t-test. For the slope, the t= -3.031 with a p-value =0.0028. Therefore, I can conclude that the slope is significantly different from zero. However, it is not as significantly significant as the test on Salary and Children. The regression equation is Invested = 9.75 -3.23 Mortgage with S= 4.44and R^2 = 0.0458. So only 4.5% of the observed variation in the Invested Percent is explained by the linear regression on the number of children. Although the P-value is very small, the model does not explain very much of the variation in Invested Percents.

Simple Linear Regression
 
Residuals
 
Stem and Leaf
Variable: Residuals 
-0 : 9888877777776666666666666665555555555 
-0 : 444444444443333333333322222222222222222111111111111 
0 : 00000000000000011111111111111111111122222222222222333333333334444444444 
0 : 5555555566666667777778888888888899 

Once again the residuals appear to be roughly normally distributed.
Simple Linear Regression
 
Scatter Plot
 
Scatter Plot
 
Scatter Plot
 
The residuals are all scattered around zero with the residuals on salary appearing to have a positive association.
Regression on Debt
Simple Linear Regression
Simple linear regression results: 
Dependent Variable: Invested 
Independent Variable: Debt 
Invested = 10.945596 - 3.8119787E-4 Debt 
Sample size: 193 
R (correlation coefficient) = -0.4089 
R-sq = 0.16719478 
Estimate of error standard deviation: 4.14896 
Parameter estimates: 
Parameter	Estimate	Std. Err.	DF	T-Stat	P-Value
Intercept	10.945596	0.6736084	191	16.249197	<0.0001
Slope	-3.8119787E-4	6.155937E-5	191	-6.192362	<0.0001

Analysis of variance table for regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	1	660.07184	660.07184	38.345345	<0.0001
Error	191	3287.8494	17.213871		
Total	192	3947.9211			


The Null Hypothesis is H0: â1 = 0 and the Alternative Hypothesis is Ha: â1 does not equal zero.  We are testing to see if the slope is different from zero. Since I am only using one variable, I will consider the t-test. For the slope, the t= -6.19with a p-value < 0.0001. Therefore, I can conclude that the slope is significantly different from zero. The regression equation is Invested = 10.946 – 3.912 Debt with S= 4.15 and R^2 = 0.167. So 16.7% of the observed variation in the Invested Percent is explained by the linear regression on the number of children. Although the P-value is very small, the model does not explain very much of the variation in Invested Percents.

Simple Linear Regression
 
Simple Linear Regression
 
Residuals
Stem and Leaf
Variable: Residuals 
-0 : 98777777777666666655555555555555 
-0 : 4444444444443333333333333322222222222222222222111111111111 
0 : 000000000000000011111111111111111112222222222222233333333344444444444 
0 : 55555555555556666666666677778889 
1 : 01 
Simple Linear Regression
 
Scatter Plot
 
Scatter Plot












 
Scatter Plot
 

The residuals against salary, mortgage, and debt all appear to have a positive association.
After doing the Linear Regression models for all the explanatory variables, I can conclude that the variable Mortgage contributes the least to our explanation of Investment percents because it has the largest p-value. Also, the regression on Debt had the highest R^2 value and explained more of the variation among Investment Percent than any other variable.
Multiple Linear Regression
Multiple Regression on Children and Salary
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Children, Salary 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	6.215421	1.20739	5.1478157	<0.0001
Children	-1.66684	0.30370235	-5.4884	<0.0001
Salary	6.0481732E-5	1.22848005E-5	4.923298	<0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	2	1086.3181	543.15906	36.063778	<0.0001
Error	190	2861.6033	15.0610695		
Total	192	3947.9211			

Root MSE: 3.8808594 
R-squared: 0.2752 
 
I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 36.063 with a p-value < 0.0001 with a F(2, 190) distrubtion. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children and Salary is different from zero. The fitted regression equation is: Invested = 6.215 – 1.667 Children + 0.0000605 Salary. The t value for children = -5.4884 with a p-value < 0.0001 and the t value for Salary =4.93 with a p-value < 0.0001. Therefore, we reject the null hypothesis and conclude that both of these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.88 and R^2 = 0.28. Therefore 28% of the observed variation in the investment percent  is explained by linear regression on Children and Salary.
QQ Plot
 
The residuals appear to be normal.
Multiple Regression on Mortgage and Debt
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Mortgage, Debt 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	11.33754	0.8922756	12.706321	<0.0001
Mortgage	-7.383896E-6	1.1002756E-5	-0.67109513	0.503
Debt	-3.6173314E-4	6.81304E-5	-5.3094234	<0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	2	667.8468	333.9234	19.342684	<0.0001
Error	190	3280.0745	17.26355		
Total	192	3947.9211			

Root MSE: 4.154943 
R-squared: 0.1692 

I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 19.34 with a p-value < 0.0001 with a F(2, 190) distrubution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Mortgage and Debt is different from zero. The fitted regression equation is: Invested = 11.338 -7.384 Mortgage – 3.617 Debt.  The t value for Mortgage = -0.671 with a p-value =0.503 and the t value for Debt = -5.309 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Debt and conclude that the Debt explanatory variables achieve statistical significance. However, we cannot reject the null hypothesis for Mortgage. Therefore, it is not statistically significant. I found that the root MSE = S = 4.15 and R^2 = 0.169. Therefore 16.9% of the observed variation in the investment percent  is explained by linear regression on Mortage and Debt.

QQ Plot
 
The Residuals appear to be normally distributed.
Multiple Regression on Children and Mortgage
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Children, Mortgage 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	12.02885	0.9181752	13.1008215	<0.0001
Children	-1.9036733	0.3277181	-5.808874	<0.0001
Mortgage	-1.33553385E-5	1.03558805E-5	-1.2896382	0.1987

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	2	749.25446	374.62723	22.252762	<0.0001
Error	190	3198.6667	16.835089		
Total	192	3947.9211			

Root MSE: 4.1030583 
R-squared: 0.1898 

I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 22.25 with a p-value < 0.0001 with a F(2, 190) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Mortgage and Children is different from zero. The fitted regression equation is: Invested = 12.029 -1.904 Children -1.336 Mortgage.  The t value for Children = -5.809 with a p-value < 0.0001 and the t value for Mortgage= -1.29 with a p-value = 0.1987. Therefore, we can reject the null hypothesis for Children and conclude that the Children explanatory variables achieve statistical significance. However, we cannot reject the null hypothesis for Mortgage. Therefore, it is not statistically significant. I found that the root MSE = S = 4.10 and R^2 = 0.189. Therefore 18.9% of the observed variation in the investment percent  is explained by linear regression on Children and Mortgage.

QQ Plot
 
The residuals appear to be normally distributed.
Multiple Regression on Children and Debt
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Children, Debt 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	12.616782	0.7480491	16.86625	<0.0001
Children	-1.4649284	0.3341793	-4.38366	<0.0001
Debt	-2.559648E-4	6.538913E-5	-3.9144857	0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	2	962.0599	481.02994	30.609488	<0.0001
Error	190	2985.8613	15.71506		
Total	192	3947.9211			

Root MSE: 3.9642224 
R-squared: 0.2437 

I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 30.61 with a p-value < 0.0001 with a F(2, 190) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Mortgage and Children is different from zero. The fitted regression equation is: Invested =12.62 -1.46 Children -2.56 Debt.  The t value for Children = -4.38 with a p-value < 0.0001 and the t value for Debt= -3.91 with a p-value = 0.0001. Therefore, we can reject the null hypothesis for Children and Debt and conclude that the these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.96 and R^2 = 0.24. Therefore 24% of the observed variation in the investment percent  is explained by linear regression on Children and Debt.

QQ Plot
 

The residuals are normally distributed.
Multiple Regression on Salary and Mortgage
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Salary, Mortgage 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	4.5547347	0.9051146	5.032219	<0.0001
Salary	1.235784E-4	1.2590398E-5	9.81529	<0.0001
Mortgage	-7.7459845E-5	9.835718E-6	-7.875363	<0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	2	1448.5165	724.25824	55.056732	<0.0001
Error	190	2499.4048	13.154762		
Total	192	3947.9211			

Root MSE: 3.6269495 
R-squared: 0.3669 

I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 55. 06 with a p-value < 0.0001 with a F(2, 190) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Salary and Mortgage is different from zero. The fitted regression equation is: Invested = 4.55 + 1.236 Salary – 7.75 Debt. The t value for Salary = 9.81  with a p-value < 0.0001 and the t value for Mortgage = -7.88 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Salary and Mortgage and conclude that the these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.62 and R^2 = 0.367. Therefore 36.7% of the observed variation in the investment percent  is explained by linear regression on Salary and Mortgage.

QQ Plot
 
The residuals appear to be normally distributed. 
Multiple Regression on Salary and Debt
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Salary, Debt 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	5.195506	0.9303605	5.584401	<0.0001
Salary	8.857741E-5	1.1143234E-5	7.9489865	<0.0001
Debt	-4.3558193E-4	5.3903543E-5	-8.080766	<0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	2	1480.603	740.3015	57.008163	<0.0001
Error	190	2467.3184	12.985886		
Total	192	3947.9211			

Root MSE: 3.6035933 
R-squared: 0.375 

I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 57.01 with a p-value < 0.0001 with a F(2, 190) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Salary and Debt is different from zero. The fitted regression equation is: Invested = 5.196 + 8.86 Salary -4.356 Debt. The t value for Salary = 7.95 with a p-value < 0.0001 and the t value for Debt = -8.08 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Salary and Debt and conclude that the these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.6 and R^2 = 0.375. Therefore 37.5% of the observed variation in the investment percent  is explained by linear regression on Salary and Debt.

QQ Plot
 
The residuals appear to be normally distributed. 
Therefore, by doing regression on a pair of explanatory variables, I have shown that the regression on Salary and Debt appears to be the best. It explains 37.5% of the variation among the Investment Percent. The regression on Mortgage and Debt was the worst model, only explaining 16.9 % of the variation among Investment Percent.  Although Debt explained the Investment percent the best, by adding Salary into the model the R^2 value increased. Therefore, the Salary and Debt model is the best so far in this case.
Multiple Regression on Children, Salary, and Mortgages
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Children, Salary, Mortgage 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	6.0507097	1.1169988	5.416935	<0.0001
Children	-0.72799116	0.32478705	-2.2414417	0.0262
Salary	1.0870633E-4	1.4115748E-5	7.701068	<0.0001
Mortgage	-6.47925E-5	1.1254935E-5	-5.7568073	<0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	3	1513.2361	504.41202	39.156548	<0.0001
Error	189	2434.6853	12.881932		
Total	192	3947.9211			

Root MSE: 3.589141 
R-squared: 0.3833 
I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 39.16 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children, Salary and Mortgage is different from zero. The fitted regression equation is: Invested = 6.05 – 0.728 Children + 1.087 Salary – 6.479 Mortgage.. The t value for Chidlren = -2.24 with a p-value =0.02 and the t value for Salary = 7.7 with a p-value < 0.0001. The t value for Mortgage = -5.75 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Children, Salary and Mortgage and conclude that the these explanatory variables achieve statistical significance. However, Children is not as significant as the other two. I found that the root MSE = S = 3.589 and R^2 = 0.383. Therefore 38.3% of the observed variation in the investment percent  is explained by linear regression on Children, Salary and Mortgage.

QQ Plot
  The Residuals are normally distributed.
Multiple Regression on Children, Salary, and Debt
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Children, Salary, Debt 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	6.617582	1.1108874	5.957023	<0.0001
Children	-0.72982574	0.31939462	-2.2850282	0.0234
Salary	7.949656E-5	1.1716058E-5	6.785265	<0.0001
Debt	-3.6761555E-4	6.105045E-5	-6.0215044	<0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	3	1546.9332	515.6444	40.590286	<0.0001
Error	189	2400.988	12.703641		
Total	192	3947.9211			

Root MSE: 3.5642166 
R-squared: 0.3918 
I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 40.59 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children, Salary, and Debt is different from zero. The fitted regression equation is: Invested = 6.62 – 0.729 Children + 7.95 Salary – 3.68 Debt. The t value for Chidlren = -2.29 with a p-value =0.0234 and the t value for Salary = 6.78 with a p-value < 0.0001. The t value for Mortgage = -6.02 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Children, Salary and Debt and conclude that the these explanatory variables achieve statistical significance. However, Children is not as significant as the other two. I found that the root MSE = S = 3.564 and R^2 = 0.392. Therefore 39.2% of the observed variation in the investment percent  is explained by linear regression on Children, Salary and Debt
QQ Plot
 
Multiple Regression on Salary, Mortgage, and Debt
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Salary, Mortgage, Debt 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	6.1702666	0.8879056	6.9492373	<0.0001
Salary	1.17606534E-4	1.173369E-5	10.02298	<0.0001
Mortgage	-5.386524E-5	1.00483685E-5	-5.360595	<0.0001
Debt	-3.1141035E-4	5.5425873E-5	-5.618501	<0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	3	1806.2305	602.07684	53.132095	<0.0001
Error	189	2141.691	11.331697		
Total	192	3947.9211			

Root MSE: 3.3662586 
R-squared: 0.4575 
I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 53.13 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Salary, Mortgage, and Debt is different from zero. The fitted regression equation is: Invested = 6.17 + 1.17 Salary -5.39 Mortgage -3.11 Debt. The t value for Salary = 10.02 with a p-value < 0.0001 and the t value for Mortgage= -5.36 with a p-value < 0.0001. The t value for Debt = -5.62 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Salary, Mortgage, and Debt and conclude that  these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.366 and R^2 = 0.4575. Therefore 45.8% of the observed variation in the investment percent  is explained by linear regression on Salary, Mortgage, and Debt.

QQ Plot
 
Multiple Regression on Children, Mortgage, and Debt
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Children, Mortgage, Debt 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	12.623476	0.9040852	13.962706	<0.0001
Children	-1.4642199	0.33929464	-4.315482	<0.0001
Mortgage	-1.4132792E-7	1.06583975E-5	-0.013259772	0.9894
Debt	-2.5565282E-4	6.965626E-5	-3.670206	0.0003

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	3	962.0626	320.68753	20.299	<0.0001
Error	189	2985.8586	15.798194		
Total	192	3947.9211			

Root MSE: 3.9746943 
R-squared: 0.2437 
I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 20.299 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children, Mortgage, and Debt is different from zero. The fitted regression equation is: Invested = 12.62 -1.46 Children -1.41 Mortgage -2.56 Debt. The t value for Children = -4.31 with a p-value < 0.0001 and the t value for Mortgage= -0.013 with a p-value = 0.9894. The t value for Debt = -3.67 with a p-value =0.003. Therefore, we can reject the null hypothesis for Children and Debt and conclude that  these explanatory variables achieve statistical significance. However we cannot reject the null hypothesis for Mortgage. I found that the root MSE = S = 3.97 and R^2 = 0.2437. Therefore 24.4% of the observed variation in the investment percent  is explained by linear regression on Children, Mortgage, and Debt.

QQ Plot
 

Therefore after doing Multiple Regression on three variables at a time, I found the regression on Salary, Mortgage, and Debt to be the best model. It explained 45.8 % of the variation in Invested Percents. The worst model was the one on Children, Mortgage, and Debt which explained only 24.4% variation. 
Multiple Regression on All Variables
Multiple Linear Regression
Multiple linear regression results 
Dependent Variable: Invested 
Independent Variable(s): Children, Salary, Mortgage, Debt 
Parameter estimates: 
Variable	Estimate	Std. Err.	Tstat	P-value
Intercept	6.413731	1.0523098	6.094908	<0.0001
Children	-0.14140931	0.3262831	-0.43339452	0.6652
Salary	1.1489188E-4	1.3323196E-5	8.623448	<0.0001
Mortgage	-5.2092873E-5	1.0868739E-5	-4.7929087	<0.0001
Debt	-3.0232704E-4	5.9367867E-5	-5.092436	<0.0001

Analysis of variance table for multiple regression model: 
Source	DF	SS	MS	F-stat	P-value
Model	4	1808.368	452.092	39.72479	<0.0001
Error	188	2139.5532	11.380602		
Total	192	3947.9211			

Root MSE: 3.373515 
R-squared: 0.4581 
I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = â4 =0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero.  The ANOVA F statistic is 39.72 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children, Salary, Mortgage, and Debt is different from zero. The fitted regression equation is: Invested = 6.41 -0.14Children + 1.15 Salary -5.21 Mortgage -3.023 Debt. The t value for Children = -0.433 with a p-value = 0.6652 and the t value for Salary= 8.623 with a p-value  < 0.0001. The t value for Mortgage = -4.79 with a p-value < 0.0001 and the t value for Debt = -5.09 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Salary, Mortgage and Debt and conclude that  these explanatory variables achieve statistical significance. However we cannot reject the null hypothesis for Children. I found that the root MSE = S = 3.37 and R^2 = 0.458. Therefore 45.8% of the observed variation in the investment percent  is explained by linear regression on Children, Salary, Mortgage, and Debt. After testing all variables, I found that it gives us the exact amount of 45.8% which is also explaine by the regression model on Salary, Mortgage, and Debt. Therefore, we do not need to use the model which tests all variables. 
 The residuals appear to be normally distributed.

Conclusion
I have shown all of the variables regressed alone on Investment Percent, regressed as pairs, regressed in groups of three, and finally all variables regressed. Looking through these models I have found that when the variables are regressed alone, the explanatory variable Debt is the best predictor of Investment Percent. It explains 16.7% of the variation among Investment Percents. Then I added another variable to regress on two. I found that Salary and Debt were the best predictors of Investment Percent. The model on Salary and Debt explained 37.5% of the variation among the Investment Percent. The regression on Mortgage and Debt was the worst model, only explaining 16.9 % of the variation among Investment Percent.  When I did the regression on groups of three variables, I found the model on Salary, Mortgage, and Debt to be the best. It explained 45.8 % of the variation in Invested Percents. The worst model was the one on Children, Mortgage, and Debt which explained only 24.4% variation. When I did the regression on all variables, I found that adding children to the model did not make a significant contribution with the other three explanatory variables. Therefore, the best model is the one on Salary, Mortgage, and Debt. 
Therefore, I can conclude that the combined annual salary of husbands and wives, the current mortgage on a home, and the average amount of other debt all are good predictors of the percentage of combined income invested in tax-deferred retirement plans.  Salary and Investment Percent are highly correlated at r=40 and have a positive association. Therefore, those with a lower salary do not invest as much in a tax-deferred retirement plan as those with a higher salary. People with high salaries are more likely to take advatntage of this investment opportunity. Mortgage and Investment percent are correlated with r=21 and have a negative association. Therefore, people with a higher mortgage do not invest as much in the tax-deferred retirement plan as those people that have a lower mortgage. Debt and Investment are also highly correlated with r=41 and have a negative association. Therefore, peole with a higher debt do not invest as much in the tax-deffered retirement plan as those who have a lower debt. Therefore, the people that do take advantage of this investment opportunity are people who have high salaries, people who have low mortgages, and people that have a low debt amount. 












Data set 1. RetirePlan.xls   [Info]
To analyze this data, please sign in.

HTML link:
<A href="https://www.statcrunch.com/5.0/viewreport.php?reportid=68">Tax-Deferred Retirement Plan</A>

Comments
Want to comment? Subscribe
Already a member? Sign in.

Always Learning