This report examines the final selling prices of homes versus the “Zestimate” sales prices of 4 BR single family homes in the Bolingbrook, IL area that were sold between December 12, 2012 and January 15, 2013. The data shown below was extracted from www.zillow.com and according to Zillow, January 15^{th} was the most recent sale of a single family home in the Bolingbrook area. The following data includes two outliers, which affected the linear regression results drastically.
Simple linear regression results of the price sold vs the Zestimate shows a linear correlation coefficient of 0.0636, with the absolute value being 0.0636. This is less than the critical value of 0.423 for a sample size of 22, so according to these numbers, no linear relation exists between the two variables. The coefficient of determination (R^{2) }for the Zestimate price is 0.004045041, so only .04% of the variability of selling price can be explained by the linear relation between the Zestimate price and the final selling price, which is expected since it has already been concluded that no linear relation exists. As stated above, this was due to the presence of outliers. It did not make sense to interterpret the yintercept since a value of x=0 did not make sense.
Simple linear regression results:
Dependent Variable: Price Sold Independent Variable: Zestimate Price Sold = 293511.7  0.18534933 Zestimate Sample size: 22 R (correlation coefficient) = 0.0636 Rsq = 0.004045041 Estimate of error standard deviation: 144607.61 Parameter estimates:
Analysis of variance table for regression model:
Residuals stored in new column, Residuals. 
The scatter plot of Zestimate vs selling price, however, shows a positive linear correlation with apparent outliers, so this may lead one to believe that it is because of these outliers that the results show no linear correlation. The boxplot of Zestimate and selling price confirms the presence of outliers, the most noticeable of which is a home that sold for $840,000 and its Zestimate was only $148,904! Data entry error? One would think so...


The following data set is the original data set minus the one major outlier, where the selling price was $840,000 and the Zestimate was $148,904.
Simple linear regression results of the price sold vs the Zestimate shows a linear correlation coefficient of 0.8981, which is greater than the critical value of 0.433 for a sample size of 21, so according to these numbers, a positive linear relation exists between the two variables. The coefficient of determination (R^{2) }for the Zestimate price is 0.80655575, so 80.7% of the variability of selling price can be explained by the linear relation between the Zestimate price and the final selling price; a BIG difference compared to the first set of results with the outlier! The scatter plot and least squares regression line show a discernable pattern of positive correlation between the Zestimate and Price Sold, as would be expected from the results mentioned above.
Simple linear regression results:
Dependent Variable: Price Sold Independent Variable: Zestimate Price Sold = 23693.863 + 1.0333966 Zestimate Sample size: 21 R (correlation coefficient) = 0.8981 Rsq = 0.80655575 Estimate of error standard deviation: 23764.312 Parameter estimates:
Analysis of variance table for regression model:
Residuals stored in new column, Residuals. 


The scatter plot of Zestimate residuals had no discernable pattern once the major outlier was removed, further confirming the presence of a linear relation.

The data included in this report and resulting statistics derived from this data show just how greatly one or two outliers can affect the entire linear model. While the ""Zestimate" seems to work fairly well in a linear model when no outliers are present, in today's market, with the presence of short sale, foreclosure, and bankowned properties, the Zestimate should only be used as a tool for evaluating the current market, and should not be relied upon too heavily.
Already a member? Sign in.
Mar 11, 2013
Nicely done. Way to pick up on the likely data entry error.
Mar 3, 2013
Thanks, Miranda and Allison! Miranda, I have no idea what that house was supposed to say, because there ARE no million dollar houses in Bolingbrook, and that one CLEARLY isn't! Allison, the Zestimate was for $140s, and it says it SOLD for $840k... not very likely around here! Terrible about those people w/the foreclosure, but apparently it happens quite often. As you said, it's sad! Thanks for the comments, ladies!
Meagan
Mar 3, 2013
What a great report, Meagen! While reading your report I wondered what the correlation would be without the outlier and there you have it! I cannot believe the Zestimate for the outlier home was so off. I, too, assume it must be an error in data entry. Maybe it's supposed to be $1.48 million, instead? Your report is a great illustration of how significantly outliers (and more specifically influential data points) can greatly affect the correlation. Way to go! Miranda Sorensen
Mar 3, 2013
HELLO MEAGEN.
What a fantastic report! I only wish that I was in the market when the home for $840k sold for $145K! Wow! I am so interested to know if this was an error or a lucky purchase! I know hom value are depreciating very quickly... but that is a steal! My friend has a home sell in her neighborhood for half of its value, The previous homeowners lost the home due to foreclosure. In anger, they turned and left the water on the last day they lived there. Consequently, a month later, a meter reader discovered the "watering hole" left from the previous homeowner! Sad! Thanka for sharing Allison Lawson