**Tax-Deferred Retirement Plan**

**Mail Print Twitter Facebook**

Mallory Parsons STP 420 Project Retire Plan December 5, 2006 Preliminary Analysis First, I carefully examined each of the variables. I found their means, standard deviations, and minimum and maximum values. I then looked at the graphs of each of the variables to determine their shape and distribution. Column Statistics Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3 Children 194 1.984536 0.9064954 0.9521005 0.068356834 2 4 0 4 1 3 Salary 194 70911.234 5.5087622E8 23470.752 1685.1018 72355 121290 10770 132060 53110 86990 Mortgage 194 79117.32 9.0878208E8 30146.012 2164.3577 78415 180390 5070 185460 55640 99330 Debt 194 9857.525 2.400828E7 4899.8247 351.7869 9890 23670 0 23670 6470 13180 Invested 193 7.2067356 20.56209 4.534544 0.32640362 7.3 15 0 15 3.6 10.4 Summary statistics: Looking at the results, there do not seem to be any extreme values in the minimum or maximum that we would expect to have an excessive influence on the results. The mean number of children is about 2 and the mean salary was higher than the mortgage and debt as we might expect. Histogram Histogram Histogram Histogram I decided to use histograms to determine the shapes of the distributions of the four variables. The variable Children is roughly normal with a high mode at 2. Salary is also fairly normally distributed in a bell-shape, and it seems to have no outliers. Mortgage has a normal shape to it, but is not nearly as normally distributed as Children and Salary. There is also a gap in the graph and there seems to be a high value of 185460. Debt seems to be right skewed with a high value of 23670. The invested percents seem to be roughly right skewed which means that more people invested at a lower percentage and less people invested at a higher percentage. QQ Plot QQ Plot QQ Plot QQ Plot QQ Plot Looking at the normal quantile plots, we can see that the variables seems to be roughly normally distributed. The variables all seem to follow the straight line.However, the multiple regression model does not require any of these distributions to be normal. The response variable is the percentage of combined income invested in tax-deferred retirement plans. The explanatory variables are the number of dependent children, the combined annual salary of the husband and wife, the current mortgage on the home, and the average amount of other debt. I will use the software StatCrunch to illustrate the outputs of this case study. Relationships between Pairs of Variables The second step in my analysis is to examine the relationships between all pairs of variables. I will use scatterplots and correlations to determine their relationship. I will also test the a null hypothesis for each pair that the population correlation is 0 versus the two sided alternative for each pair of variables. I will test H0 : ñ = 0 verse the two sided alternative for each pair of variables. Correlation Correlation matrix: Children Salary Mortgage Debt Salary -0.2471933 Mortgage 0.31415617 0.46826112 Debt 0.4368945 0.1269236 0.42571983 Invested -0.4274252 0.40030798 -0.21422993 -0.40889457 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Salary Independent Variable: Children Salary = 83026.66 - 6104.9116 Children Sample size: 194 R (correlation coefficient) = -0.2476 R-sq = 0.061329626 Estimate of error standard deviation: 22798.78 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 6.520511E9 6.520511E9 12.544646 0.0005 Error 192 9.9798598E10 5.19784352E8 Total 193 1.0631911E11 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Mortgage Independent Variable: Children Mortgage = 59100.156 + 10086.571 Children Sample size: 194 R (correlation coefficient) = 0.3186 R-sq = 0.10148291 Estimate of error standard deviation: 28649.766 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 1.77995899E10 1.77995899E10 21.68542 <0.0001 Error 192 1.57595353E11 8.2080909E8 Total 193 1.75394947E11 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Debt Independent Variable: Children Debt = 5342.444 + 2275.132 Children Sample size: 194 R (correlation coefficient) = 0.4421 R-sq = 0.19544195 Estimate of error standard deviation: 4406.434 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 9.0559949E8 9.0559949E8 46.64033 <0.0001 Error 192 3.72799872E9 1.941666E7 Total 193 4.6335985E9 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Mortgage Independent Variable: Salary Mortgage = 36743.96 + 0.5975549 Salary Sample size: 194 R (correlation coefficient) = 0.4652 R-sq = 0.21644618 Estimate of error standard deviation: 26754.223 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 36743.96 6127.244 192 5.996817 <0.0001 Slope 0.5975549 0.08205153 192 7.2826786 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 3.7963563E10 3.7963563E10 53.037407 <0.0001 Error 192 1.37431376E11 7.1578842E8 Total 193 1.75394947E11 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Debt Independent Variable: Salary Debt = 8030.2856 + 0.025767995 Salary Sample size: 194 R (correlation coefficient) = 0.1234 R-sq = 0.015235411 Estimate of error standard deviation: 4875.002 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 8030.2856 1116.4716 192 7.1925573 <0.0001 Slope 0.025767995 0.014950961 192 1.7235008 0.0864 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 7.0594776E7 7.0594776E7 2.9704552 0.0864 Error 192 4.5630034E9 2.3765642E7 Total 193 4.6335985E9 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Debt Independent Variable: Mortgage Debt = 4305.2764 + 0.07017742 Mortgage Sample size: 194 R (correlation coefficient) = 0.4318 R-sq = 0.18642043 Estimate of error standard deviation: 4431.07 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 4305.2764 895.5033 192 4.807661 <0.0001 Slope 0.07017742 0.010580351 192 6.632807 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 8.6379731E8 8.6379731E8 43.994125 <0.0001 Error 192 3.76980096E9 1.963438E7 Total 193 4.6335985E9 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Invested Independent Variable: Children Invested = 11.237425 - 2.0364475 Children Sample size: 193 R (correlation coefficient) = -0.4274 R-sq = 0.18269232 Estimate of error standard deviation: 4.110175 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 11.237425 0.6841511 191 16.425356 <0.0001 Slope -2.0364475 0.31166583 191 -6.534074 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 721.2549 721.2549 42.69412 <0.0001 Error 191 3226.6663 16.893541 Total 192 3947.9211 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Invested Independent Variable: Salary Invested = 1.7339833 + 7.714847E-5 Salary Sample size: 193 R (correlation coefficient) = 0.4003 R-sq = 0.16024648 Estimate of error standard deviation: 4.166232 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 1.7339833 0.9548239 191 1.8160242 0.0709 Slope 7.714847E-5 1.2778865E-5 191 6.0371923 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 632.6405 632.6405 36.447693 <0.0001 Error 191 3315.2808 17.35749 Total 192 3947.9211 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Invested Independent Variable: Mortgage Invested = 9.752726 - 3.225372E-5 Mortgage Sample size: 193 R (correlation coefficient) = -0.2142 R-sq = 0.045894463 Estimate of error standard deviation: 4.4408464 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 9.752726 0.89872855 191 10.851692 <0.0001 Slope -3.225372E-5 1.0640969E-5 191 -3.0310886 0.0028 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 181.18773 181.18773 9.187498 0.0028 Error 191 3766.7334 19.721117 Total 192 3947.9211 Scatter Plot Simple Linear Regression Simple linear regression results: Dependent Variable: Invested Independent Variable: Debt Invested = 10.945596 - 3.8119787E-4 Debt Sample size: 193 R (correlation coefficient) = -0.4089 R-sq = 0.16719478 Estimate of error standard deviation: 4.14896 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 10.945596 0.6736084 191 16.249197 <0.0001 Slope -3.8119787E-4 6.155937E-5 191 -6.192362 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 660.07184 660.07184 38.345345 <0.0001 Error 191 3287.8494 17.213871 Total 192 3947.9211 From the output, we see that the correlation between Children and Salary is -0.247 with a p- value of 0.0005. The correlation between Children and Mortgage is 0.3142 with a p-value < 0.0001. The correlation between Children and Debt is 0.4369 with a p-value < 0.0001 and the correlation between Children and Invested Percents is -.04274 with a p-value < 0.0001. Therefore these are all statistically significant and we conclude that the population correlation is not zero. Children correlated the strongest with Debt, however the correlation with Invested is very similar. The correlation between Children and Salary, and Children and Invested is negative. Therefore the two variables are negatively associated meaning, the higher the number of chidren the lower the salary and invested percent and the lower the number of the children the higher the salary and invested percent. The correlation between Salary and Mortgage is 0.4683 with a p-value <0.0001 and the correlation between Salary and Invested is 0.4003 with a p-value of p<0.0001. Therefore both are statistically significant and highly correlated. The correlation between Salary and Debt is 0.1269 with a p-value of 0.0864. Therefore, we do not reject the null hypothesis that the population correlation is zero. The correlation nbetween Mortgage and Debt is 0.4257 with a p-value < 0.0001 and the correlation between Mortgage and Invested is -0.2142 with a p-value of 0.0028. Therefore both are statistically significant and Mortgage and Invested are negatively associtated. The correlation between Debt and Invested is -0.4089 with a p-value < 0.0001 therefore it is statistically significant and the two variables are negatively associated and highly correlated. Children, Salary, and Debt all have a higher correlation with Invested Percents than does Mortgage. However, Children, Mortgage, and Debt are all negatively associated with invested while Salary is positively associated. The number of children have the highest correlation with invested percent ( r=-0.43). Salary and Mortgage also have a high correlation with each other (r=0.47) as do Mortgage and Debt (r=0.43) and Children and Debt (r=0.44). Salary and Debt have the lowest correlation (r=0.13). Regression To explore the relationship between the explanatory variables and our response variable Invested percents, I will run sever multiple regressions. I will begin my analysis by using each explanatory to predict Invested Percent. Regression on Children Simple Linear Regression Simple linear regression results: Dependent Variable: Invested Independent Variable: Children Invested = 11.237425 - 2.0364475 Children Sample size: 193 R (correlation coefficient) = -0.4274 R-sq = 0.18269232 Estimate of error standard deviation: 4.110175 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 11.237425 0.6841511 191 16.425356 <0.0001 Slope -2.0364475 0.31166583 191 -6.534074 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 721.2549 721.2549 42.69412 <0.0001 Error 191 3226.6663 16.893541 Total 192 3947.9211 The Null Hypothesis is H0: â1 = 0 and the Alternative Hypothesis is Ha: â1 does not equal zero. We are testing to see if the slope is different from zero. Since I am only using one variable, I will consider the t-test. For the slope, the t= -6.534 with a p-value < 0.0001. Therefore, I can conclude that the slope is significantly different from zero. The regression equation is Invested = 11.237 – 2.036 Children with S= 4.11 and R^2 = 0.183. So 18.3% of the observed variation in the Invested Percent is explained by the linear regression on the number of children. Although the P-value is very small, the model does not explain very much of the variation in Invested Percents. Residuals I will now examine the residuals to help determine whether the regression model was appropriate for the data. I will examine the residuals themselves, and then the residuals verse all the explanatory variables. Simple Linear Regression Stem and Leaf Variable: Residuals -0 : 988777776666555555555555555555 -0 : 4444444444444333333333333333322222222222222222222211111111111111 0 : 00000000000011111111111111122222222222222333333333333333333344444444444 0 : 5555566666666777778888888899 Looking at the quantile plot and the stemplot, we can see that the residuals seem to have a roughly normal distribution. Simple Linear Regression Scatter Plot Scatter Plot Scatter Plot The plot of residuals verse Salary reveal a positive association with scatter around zero. The plot of residuals verse Mortgage and Debt show a negative association. Regression on Salary Simple Linear Regression Simple linear regression results: Dependent Variable: Invested Independent Variable: Salary Invested = 1.7339833 + 7.714847E-5 Salary Sample size: 193 R (correlation coefficient) = 0.4003 R-sq = 0.16024648 Estimate of error standard deviation: 4.166232 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 1.7339833 0.9548239 191 1.8160242 0.0709 Slope 7.714847E-5 1.2778865E-5 191 6.0371923 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 632.6405 632.6405 36.447693 <0.0001 Error 191 3315.2808 17.35749 Total 192 3947.9211 The Null Hypothesis is H0: â1 = 0 and the Alternative Hypothesis is Ha: â1 does not equal zero. We are testing to see if the slope is different from zero. Since I am only using one variable, I will consider the t-test. For the slope, the t= 6.037 with a p-value < 0.0001. Therefore, I can conclude that the slope is significantly different from zero. The regression equation is Invested = 1.734 + .000077Salary with S= 4.166 and R^2 = 0.16. So 16% of the observed variation in the Invested Percent is explained by the linear regression on the number of children. Although the P-value is very small, the model does not explain very much of the variation in Invested Percents. Simple Linear Regression Residuals I will now examine the residuals to help determine whether the regression model was appropriate for the data. I will examine the residuals themselves, and then the residuals verse all the explanatory variables. Stem and Leaf Variable: Residuals -0 : 998888777777766666666666655555555 -0 : 4444444444333333333333322222222222211111111111111 0 : 00000000000000000000011111111111111111122222222222222222233333333333333444444444 0 : 555555555556666666677777778899 1 : 0 The Residuals appear to be more normal for Investment Percent verse Salary. Simple Linear Regression Scatter Plot Scatter Plot Scatter Plot The residuals on Salary seem to be scattered among zero. The residuals against Debt, Children and Mortgage all seem to be negatively associated. Regression on Mortgage Simple Linear Regression Simple linear regression results: Dependent Variable: Invested Independent Variable: Mortgage Invested = 9.752726 - 3.225372E-5 Mortgage Sample size: 193 R (correlation coefficient) = -0.2142 R-sq = 0.045894463 Estimate of error standard deviation: 4.4408464 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 9.752726 0.89872855 191 10.851692 <0.0001 Slope -3.225372E-5 1.0640969E-5 191 -3.0310886 0.0028 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 181.18773 181.18773 9.187498 0.0028 Error 191 3766.7334 19.721117 Total 192 3947.9211 The Null Hypothesis is H0: â1 = 0 and the Alternative Hypothesis is Ha: â1 does not equal zero. We are testing to see if the slope is different from zero. Since I am only using one variable, I will consider the t-test. For the slope, the t= -3.031 with a p-value =0.0028. Therefore, I can conclude that the slope is significantly different from zero. However, it is not as significantly significant as the test on Salary and Children. The regression equation is Invested = 9.75 -3.23 Mortgage with S= 4.44and R^2 = 0.0458. So only 4.5% of the observed variation in the Invested Percent is explained by the linear regression on the number of children. Although the P-value is very small, the model does not explain very much of the variation in Invested Percents. Simple Linear Regression Residuals Stem and Leaf Variable: Residuals -0 : 9888877777776666666666666665555555555 -0 : 444444444443333333333322222222222222222111111111111 0 : 00000000000000011111111111111111111122222222222222333333333334444444444 0 : 5555555566666667777778888888888899 Once again the residuals appear to be roughly normally distributed. Simple Linear Regression Scatter Plot Scatter Plot Scatter Plot The residuals are all scattered around zero with the residuals on salary appearing to have a positive association. Regression on Debt Simple Linear Regression Simple linear regression results: Dependent Variable: Invested Independent Variable: Debt Invested = 10.945596 - 3.8119787E-4 Debt Sample size: 193 R (correlation coefficient) = -0.4089 R-sq = 0.16719478 Estimate of error standard deviation: 4.14896 Parameter estimates: Parameter Estimate Std. Err. DF T-Stat P-Value Intercept 10.945596 0.6736084 191 16.249197 <0.0001 Slope -3.8119787E-4 6.155937E-5 191 -6.192362 <0.0001 Analysis of variance table for regression model: Source DF SS MS F-stat P-value Model 1 660.07184 660.07184 38.345345 <0.0001 Error 191 3287.8494 17.213871 Total 192 3947.9211 The Null Hypothesis is H0: â1 = 0 and the Alternative Hypothesis is Ha: â1 does not equal zero. We are testing to see if the slope is different from zero. Since I am only using one variable, I will consider the t-test. For the slope, the t= -6.19with a p-value < 0.0001. Therefore, I can conclude that the slope is significantly different from zero. The regression equation is Invested = 10.946 – 3.912 Debt with S= 4.15 and R^2 = 0.167. So 16.7% of the observed variation in the Invested Percent is explained by the linear regression on the number of children. Although the P-value is very small, the model does not explain very much of the variation in Invested Percents. Simple Linear Regression Simple Linear Regression Residuals Stem and Leaf Variable: Residuals -0 : 98777777777666666655555555555555 -0 : 4444444444443333333333333322222222222222222222111111111111 0 : 000000000000000011111111111111111112222222222222233333333344444444444 0 : 55555555555556666666666677778889 1 : 01 Simple Linear Regression Scatter Plot Scatter Plot Scatter Plot The residuals against salary, mortgage, and debt all appear to have a positive association. After doing the Linear Regression models for all the explanatory variables, I can conclude that the variable Mortgage contributes the least to our explanation of Investment percents because it has the largest p-value. Also, the regression on Debt had the highest R^2 value and explained more of the variation among Investment Percent than any other variable. Multiple Linear Regression Multiple Regression on Children and Salary Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Children, Salary Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 6.215421 1.20739 5.1478157 <0.0001 Children -1.66684 0.30370235 -5.4884 <0.0001 Salary 6.0481732E-5 1.22848005E-5 4.923298 <0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 2 1086.3181 543.15906 36.063778 <0.0001 Error 190 2861.6033 15.0610695 Total 192 3947.9211 Root MSE: 3.8808594 R-squared: 0.2752 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 36.063 with a p-value < 0.0001 with a F(2, 190) distrubtion. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children and Salary is different from zero. The fitted regression equation is: Invested = 6.215 – 1.667 Children + 0.0000605 Salary. The t value for children = -5.4884 with a p-value < 0.0001 and the t value for Salary =4.93 with a p-value < 0.0001. Therefore, we reject the null hypothesis and conclude that both of these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.88 and R^2 = 0.28. Therefore 28% of the observed variation in the investment percent is explained by linear regression on Children and Salary. QQ Plot The residuals appear to be normal. Multiple Regression on Mortgage and Debt Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Mortgage, Debt Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 11.33754 0.8922756 12.706321 <0.0001 Mortgage -7.383896E-6 1.1002756E-5 -0.67109513 0.503 Debt -3.6173314E-4 6.81304E-5 -5.3094234 <0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 2 667.8468 333.9234 19.342684 <0.0001 Error 190 3280.0745 17.26355 Total 192 3947.9211 Root MSE: 4.154943 R-squared: 0.1692 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 19.34 with a p-value < 0.0001 with a F(2, 190) distrubution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Mortgage and Debt is different from zero. The fitted regression equation is: Invested = 11.338 -7.384 Mortgage – 3.617 Debt. The t value for Mortgage = -0.671 with a p-value =0.503 and the t value for Debt = -5.309 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Debt and conclude that the Debt explanatory variables achieve statistical significance. However, we cannot reject the null hypothesis for Mortgage. Therefore, it is not statistically significant. I found that the root MSE = S = 4.15 and R^2 = 0.169. Therefore 16.9% of the observed variation in the investment percent is explained by linear regression on Mortage and Debt. QQ Plot The Residuals appear to be normally distributed. Multiple Regression on Children and Mortgage Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Children, Mortgage Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 12.02885 0.9181752 13.1008215 <0.0001 Children -1.9036733 0.3277181 -5.808874 <0.0001 Mortgage -1.33553385E-5 1.03558805E-5 -1.2896382 0.1987 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 2 749.25446 374.62723 22.252762 <0.0001 Error 190 3198.6667 16.835089 Total 192 3947.9211 Root MSE: 4.1030583 R-squared: 0.1898 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 22.25 with a p-value < 0.0001 with a F(2, 190) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Mortgage and Children is different from zero. The fitted regression equation is: Invested = 12.029 -1.904 Children -1.336 Mortgage. The t value for Children = -5.809 with a p-value < 0.0001 and the t value for Mortgage= -1.29 with a p-value = 0.1987. Therefore, we can reject the null hypothesis for Children and conclude that the Children explanatory variables achieve statistical significance. However, we cannot reject the null hypothesis for Mortgage. Therefore, it is not statistically significant. I found that the root MSE = S = 4.10 and R^2 = 0.189. Therefore 18.9% of the observed variation in the investment percent is explained by linear regression on Children and Mortgage. QQ Plot The residuals appear to be normally distributed. Multiple Regression on Children and Debt Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Children, Debt Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 12.616782 0.7480491 16.86625 <0.0001 Children -1.4649284 0.3341793 -4.38366 <0.0001 Debt -2.559648E-4 6.538913E-5 -3.9144857 0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 2 962.0599 481.02994 30.609488 <0.0001 Error 190 2985.8613 15.71506 Total 192 3947.9211 Root MSE: 3.9642224 R-squared: 0.2437 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 30.61 with a p-value < 0.0001 with a F(2, 190) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Mortgage and Children is different from zero. The fitted regression equation is: Invested =12.62 -1.46 Children -2.56 Debt. The t value for Children = -4.38 with a p-value < 0.0001 and the t value for Debt= -3.91 with a p-value = 0.0001. Therefore, we can reject the null hypothesis for Children and Debt and conclude that the these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.96 and R^2 = 0.24. Therefore 24% of the observed variation in the investment percent is explained by linear regression on Children and Debt. QQ Plot The residuals are normally distributed. Multiple Regression on Salary and Mortgage Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Salary, Mortgage Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 4.5547347 0.9051146 5.032219 <0.0001 Salary 1.235784E-4 1.2590398E-5 9.81529 <0.0001 Mortgage -7.7459845E-5 9.835718E-6 -7.875363 <0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 2 1448.5165 724.25824 55.056732 <0.0001 Error 190 2499.4048 13.154762 Total 192 3947.9211 Root MSE: 3.6269495 R-squared: 0.3669 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 55. 06 with a p-value < 0.0001 with a F(2, 190) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Salary and Mortgage is different from zero. The fitted regression equation is: Invested = 4.55 + 1.236 Salary – 7.75 Debt. The t value for Salary = 9.81 with a p-value < 0.0001 and the t value for Mortgage = -7.88 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Salary and Mortgage and conclude that the these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.62 and R^2 = 0.367. Therefore 36.7% of the observed variation in the investment percent is explained by linear regression on Salary and Mortgage. QQ Plot The residuals appear to be normally distributed. Multiple Regression on Salary and Debt Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Salary, Debt Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 5.195506 0.9303605 5.584401 <0.0001 Salary 8.857741E-5 1.1143234E-5 7.9489865 <0.0001 Debt -4.3558193E-4 5.3903543E-5 -8.080766 <0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 2 1480.603 740.3015 57.008163 <0.0001 Error 190 2467.3184 12.985886 Total 192 3947.9211 Root MSE: 3.6035933 R-squared: 0.375 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 57.01 with a p-value < 0.0001 with a F(2, 190) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Salary and Debt is different from zero. The fitted regression equation is: Invested = 5.196 + 8.86 Salary -4.356 Debt. The t value for Salary = 7.95 with a p-value < 0.0001 and the t value for Debt = -8.08 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Salary and Debt and conclude that the these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.6 and R^2 = 0.375. Therefore 37.5% of the observed variation in the investment percent is explained by linear regression on Salary and Debt. QQ Plot The residuals appear to be normally distributed. Therefore, by doing regression on a pair of explanatory variables, I have shown that the regression on Salary and Debt appears to be the best. It explains 37.5% of the variation among the Investment Percent. The regression on Mortgage and Debt was the worst model, only explaining 16.9 % of the variation among Investment Percent. Although Debt explained the Investment percent the best, by adding Salary into the model the R^2 value increased. Therefore, the Salary and Debt model is the best so far in this case. Multiple Regression on Children, Salary, and Mortgages Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Children, Salary, Mortgage Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 6.0507097 1.1169988 5.416935 <0.0001 Children -0.72799116 0.32478705 -2.2414417 0.0262 Salary 1.0870633E-4 1.4115748E-5 7.701068 <0.0001 Mortgage -6.47925E-5 1.1254935E-5 -5.7568073 <0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 3 1513.2361 504.41202 39.156548 <0.0001 Error 189 2434.6853 12.881932 Total 192 3947.9211 Root MSE: 3.589141 R-squared: 0.3833 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 39.16 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children, Salary and Mortgage is different from zero. The fitted regression equation is: Invested = 6.05 – 0.728 Children + 1.087 Salary – 6.479 Mortgage.. The t value for Chidlren = -2.24 with a p-value =0.02 and the t value for Salary = 7.7 with a p-value < 0.0001. The t value for Mortgage = -5.75 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Children, Salary and Mortgage and conclude that the these explanatory variables achieve statistical significance. However, Children is not as significant as the other two. I found that the root MSE = S = 3.589 and R^2 = 0.383. Therefore 38.3% of the observed variation in the investment percent is explained by linear regression on Children, Salary and Mortgage. QQ Plot The Residuals are normally distributed. Multiple Regression on Children, Salary, and Debt Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Children, Salary, Debt Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 6.617582 1.1108874 5.957023 <0.0001 Children -0.72982574 0.31939462 -2.2850282 0.0234 Salary 7.949656E-5 1.1716058E-5 6.785265 <0.0001 Debt -3.6761555E-4 6.105045E-5 -6.0215044 <0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 3 1546.9332 515.6444 40.590286 <0.0001 Error 189 2400.988 12.703641 Total 192 3947.9211 Root MSE: 3.5642166 R-squared: 0.3918 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 40.59 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children, Salary, and Debt is different from zero. The fitted regression equation is: Invested = 6.62 – 0.729 Children + 7.95 Salary – 3.68 Debt. The t value for Chidlren = -2.29 with a p-value =0.0234 and the t value for Salary = 6.78 with a p-value < 0.0001. The t value for Mortgage = -6.02 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Children, Salary and Debt and conclude that the these explanatory variables achieve statistical significance. However, Children is not as significant as the other two. I found that the root MSE = S = 3.564 and R^2 = 0.392. Therefore 39.2% of the observed variation in the investment percent is explained by linear regression on Children, Salary and Debt QQ Plot Multiple Regression on Salary, Mortgage, and Debt Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Salary, Mortgage, Debt Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 6.1702666 0.8879056 6.9492373 <0.0001 Salary 1.17606534E-4 1.173369E-5 10.02298 <0.0001 Mortgage -5.386524E-5 1.00483685E-5 -5.360595 <0.0001 Debt -3.1141035E-4 5.5425873E-5 -5.618501 <0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 3 1806.2305 602.07684 53.132095 <0.0001 Error 189 2141.691 11.331697 Total 192 3947.9211 Root MSE: 3.3662586 R-squared: 0.4575 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 53.13 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Salary, Mortgage, and Debt is different from zero. The fitted regression equation is: Invested = 6.17 + 1.17 Salary -5.39 Mortgage -3.11 Debt. The t value for Salary = 10.02 with a p-value < 0.0001 and the t value for Mortgage= -5.36 with a p-value < 0.0001. The t value for Debt = -5.62 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Salary, Mortgage, and Debt and conclude that these explanatory variables achieve statistical significance. I found that the root MSE = S = 3.366 and R^2 = 0.4575. Therefore 45.8% of the observed variation in the investment percent is explained by linear regression on Salary, Mortgage, and Debt. QQ Plot Multiple Regression on Children, Mortgage, and Debt Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Children, Mortgage, Debt Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 12.623476 0.9040852 13.962706 <0.0001 Children -1.4642199 0.33929464 -4.315482 <0.0001 Mortgage -1.4132792E-7 1.06583975E-5 -0.013259772 0.9894 Debt -2.5565282E-4 6.965626E-5 -3.670206 0.0003 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 3 962.0626 320.68753 20.299 <0.0001 Error 189 2985.8586 15.798194 Total 192 3947.9211 Root MSE: 3.9746943 R-squared: 0.2437 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = 0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 20.299 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children, Mortgage, and Debt is different from zero. The fitted regression equation is: Invested = 12.62 -1.46 Children -1.41 Mortgage -2.56 Debt. The t value for Children = -4.31 with a p-value < 0.0001 and the t value for Mortgage= -0.013 with a p-value = 0.9894. The t value for Debt = -3.67 with a p-value =0.003. Therefore, we can reject the null hypothesis for Children and Debt and conclude that these explanatory variables achieve statistical significance. However we cannot reject the null hypothesis for Mortgage. I found that the root MSE = S = 3.97 and R^2 = 0.2437. Therefore 24.4% of the observed variation in the investment percent is explained by linear regression on Children, Mortgage, and Debt. QQ Plot Therefore after doing Multiple Regression on three variables at a time, I found the regression on Salary, Mortgage, and Debt to be the best model. It explained 45.8 % of the variation in Invested Percents. The worst model was the one on Children, Mortgage, and Debt which explained only 24.4% variation. Multiple Regression on All Variables Multiple Linear Regression Multiple linear regression results Dependent Variable: Invested Independent Variable(s): Children, Salary, Mortgage, Debt Parameter estimates: Variable Estimate Std. Err. Tstat P-value Intercept 6.413731 1.0523098 6.094908 <0.0001 Children -0.14140931 0.3262831 -0.43339452 0.6652 Salary 1.1489188E-4 1.3323196E-5 8.623448 <0.0001 Mortgage -5.2092873E-5 1.0868739E-5 -4.7929087 <0.0001 Debt -3.0232704E-4 5.9367867E-5 -5.092436 <0.0001 Analysis of variance table for multiple regression model: Source DF SS MS F-stat P-value Model 4 1808.368 452.092 39.72479 <0.0001 Error 188 2139.5532 11.380602 Total 192 3947.9211 Root MSE: 3.373515 R-squared: 0.4581 I tested each variable separately with H0 : âj = 0 verse the two-sided alternative. Then I used the t test for each of those and found the p- value to determine if each of the regression coefficients are significantly different from zero. Then I will use the ANOVA table and the F test to test the H0: â1 = â2 = â3 = â4 =0 verse Ha: at least one âi does not equal zero. This test will help to determine that at least one of the regression coefficients is different from zero. The ANOVA F statistic is 39.72 with a p-value < 0.0001 with a F(3, 189) distribution. Therefore, I can reject the null hypothesis and conclude that at least one of the coefficients of Children, Salary, Mortgage, and Debt is different from zero. The fitted regression equation is: Invested = 6.41 -0.14Children + 1.15 Salary -5.21 Mortgage -3.023 Debt. The t value for Children = -0.433 with a p-value = 0.6652 and the t value for Salary= 8.623 with a p-value < 0.0001. The t value for Mortgage = -4.79 with a p-value < 0.0001 and the t value for Debt = -5.09 with a p-value < 0.0001. Therefore, we can reject the null hypothesis for Salary, Mortgage and Debt and conclude that these explanatory variables achieve statistical significance. However we cannot reject the null hypothesis for Children. I found that the root MSE = S = 3.37 and R^2 = 0.458. Therefore 45.8% of the observed variation in the investment percent is explained by linear regression on Children, Salary, Mortgage, and Debt. After testing all variables, I found that it gives us the exact amount of 45.8% which is also explaine by the regression model on Salary, Mortgage, and Debt. Therefore, we do not need to use the model which tests all variables. The residuals appear to be normally distributed. Conclusion I have shown all of the variables regressed alone on Investment Percent, regressed as pairs, regressed in groups of three, and finally all variables regressed. Looking through these models I have found that when the variables are regressed alone, the explanatory variable Debt is the best predictor of Investment Percent. It explains 16.7% of the variation among Investment Percents. Then I added another variable to regress on two. I found that Salary and Debt were the best predictors of Investment Percent. The model on Salary and Debt explained 37.5% of the variation among the Investment Percent. The regression on Mortgage and Debt was the worst model, only explaining 16.9 % of the variation among Investment Percent. When I did the regression on groups of three variables, I found the model on Salary, Mortgage, and Debt to be the best. It explained 45.8 % of the variation in Invested Percents. The worst model was the one on Children, Mortgage, and Debt which explained only 24.4% variation. When I did the regression on all variables, I found that adding children to the model did not make a significant contribution with the other three explanatory variables. Therefore, the best model is the one on Salary, Mortgage, and Debt. Therefore, I can conclude that the combined annual salary of husbands and wives, the current mortgage on a home, and the average amount of other debt all are good predictors of the percentage of combined income invested in tax-deferred retirement plans. Salary and Investment Percent are highly correlated at r=40 and have a positive association. Therefore, those with a lower salary do not invest as much in a tax-deferred retirement plan as those with a higher salary. People with high salaries are more likely to take advatntage of this investment opportunity. Mortgage and Investment percent are correlated with r=21 and have a negative association. Therefore, people with a higher mortgage do not invest as much in the tax-deferred retirement plan as those people that have a lower mortgage. Debt and Investment are also highly correlated with r=41 and have a negative association. Therefore, peole with a higher debt do not invest as much in the tax-deffered retirement plan as those who have a lower debt. Therefore, the people that do take advantage of this investment opportunity are people who have high salaries, people who have low mortgages, and people that have a low debt amount.

**Data set 1. RetirePlan.xls**[Info]

To analyze this data, please sign in.

**HTML link:**

CommentsAlready a member? Sign in.