Zestimate vs Final Selling Prices of Single Family Homes in Bolingbrook, IL/ Meagan Workman

This report examines the final selling prices of homes versus the “Zestimate” sales prices of 4 BR single family homes in the Bolingbrook, IL area that were sold between December 12, 2012 and January 15, 2013. The data shown below was extracted from www.zillow.com and according to Zillow, January 15th was the most recent sale of a single family home in the Bolingbrook area. The following data includes two outliers, which affected the linear regression results drastically.

Data set 1. Bolingbrook Single-Family Homes Sold Price Vs. Zes

Simple linear regression results of the price sold vs the Zestimate shows a linear correlation coefficient of -0.0636, with the absolute value being 0.0636. This is less than the critical value of 0.423 for a sample size of 22, so according to these numbers, no linear relation exists between the two variables. The coefficient of determination (R2) for the Zestimate price is 0.004045041, so only .04% of the variability of selling price can be explained by the linear relation between the Zestimate price and the final selling price, which is expected since it has already been concluded that no linear relation exists. As stated above, this was due to the presence of outliers. It did not make sense to interterpret the y-intercept since a value of x=0 did not make sense.

Result 1: Simple Linear Regression Selling Price Vs. Zestimate
Simple linear regression results:
Dependent Variable: Price Sold
Independent Variable: Zestimate
Price Sold = 293511.7 - 0.18534933 Zestimate
Sample size: 22
R (correlation coefficient) = -0.0636
R-sq = 0.004045041
Estimate of error standard deviation: 144607.61

Parameter estimates:
 Parameter Estimate Std. Err. Alternative DF T-Stat P-Value Intercept 293511.7 155140.75 ≠ 0 20 1.8919059 0.0731 Slope -0.18534933 0.65033096 ≠ 0 20 -0.28500772 0.7786

Analysis of variance table for regression model:
 Source DF SS MS F-stat P-value Model 1 1.69861709E9 1.69861709E9 0.081229396 0.7786 Error 20 4.18227192E11 2.091136E10 Total 21 4.19925819E11

Residuals stored in new column, Residuals.

The scatter plot of Zestimate vs selling price, however, shows a positive linear correlation with apparent outliers, so this may lead one to believe that it is because of these outliers that the results show no linear correlation. The boxplot of Zestimate and selling price confirms the presence of outliers, the most noticeable of which is a home that sold for \$840,000 and its Zestimate was only \$148,904! Data entry error? One would think so...

Result 2: Scatter Plot Bolingbrook Single Family Homes Zestimate vs, Final Selling Price

Result 3: Boxplot Bolingbrook Zillow Data Zestimate and Final Selling Price

The following data set is the original data set minus the one major outlier, where the selling price was \$840,000 and the Zestimate was \$148,904.

Data set 2. Bolingbrook Single-Family Homes Sold Price Vs. Zes

Simple linear regression results of the price sold vs the Zestimate shows a linear correlation coefficient of 0.8981, which is greater than the critical value of 0.433 for a sample size of 21, so according to these numbers, a positive linear relation exists between the two variables. The coefficient of determination (R2) for the Zestimate price is 0.80655575, so 80.7% of the variability of selling price can be explained by the linear relation between the Zestimate price and the final selling price; a BIG difference compared to the first set of results with the outlier! The scatter plot and least squares regression line show a discernable pattern of positive correlation between the Zestimate and Price Sold, as would be expected from the results mentioned above.

Result 4: Simple Linear Regression Zestimate vs Price Sold #2
Simple linear regression results:
Dependent Variable: Price Sold
Independent Variable: Zestimate
Price Sold = -23693.863 + 1.0333966 Zestimate
Sample size: 21
R (correlation coefficient) = 0.8981
R-sq = 0.80655575
Estimate of error standard deviation: 23764.312

Parameter estimates:
 Parameter Estimate Std. Err. Alternative DF T-Stat P-Value Intercept -23693.863 28097.266 ≠ 0 19 -0.84328 0.4096 Slope 1.0333966 0.11610501 ≠ 0 19 8.900535 <0.0001

Analysis of variance table for regression model:
 Source DF SS MS F-stat P-value Model 1 4.473863E10 4.473863E10 79.21951 <0.0001 Error 19 1.07301079E10 5.6474253E8 Total 20 5.5468737E10

Residuals stored in new column, Residuals.

Result 5: Scatter Plot Zestimate vs Price Sold 2

Result 6: Simple Linear Regression Fitted Line Zestimate vs Price sold 2

The scatter plot of Zestimate residuals had no discernable pattern once the major outlier was removed, further confirming the presence of a linear relation.

Result 7: Scatter Plot Zestimate vs Residuals

The data included in this report and resulting statistics derived from this data show just how greatly one or two outliers can affect the entire linear model. While the ""Zestimate" seems to work fairly well in a linear model when no outliers are present, in today's market, with the presence of short sale, foreclosure, and bank-owned properties, the Zestimate should only be used as a tool for evaluating the current market, and should not be relied upon too heavily.