Generating simple linear regression results

This tutorial covers the steps for creating simple linear regression results in StatCrunch. To begin, load the Home prices in Albuquerque data set, which will be used throughout this tutorial. This data set contains eight columns of data taken from 117 homes sales in Albuquerque, New Mexico in 1993. The data in the PRICE and SQFT columns will be used for this tutorial. PRICE represents the sales price in hundreds of dollars for each home. The first value of 2,050 for PRICE then represents a sales price of $205,000. SQFT represents the square footage of the living space in each home.

Creating a simple linear regression model

To create a simple linear regression model for sales price using square footage, choose the Stat > Regression > Simple Linear menu option. Select SQFT for the X variable and PRICE for the Y variable. Under the Perform option, the Hypothesis tests option is selected by default with a null value of 0 for both the y-intercept and the slope. Click Compute! to view the regression results as shown below. The least squares regression equation is listed at the top along with the observed correlation coefficient and other information that describes the model fit. A table lists the y-intercept and slope estimates along with their hypothesis test results. This output shows the results of tests to determine if the slope and y-intercept of the regression model are significantly different from zero. Press the > button in the bottom right of the result window to display the default fitted line plot shown in the second image below.

Excluding outliers from calculations

In the fitted line plot shown above, there is one home with a square footage over 3500 that stands out as being much further away from the red regression line than the other points. This house could be considered an outlier in this case. To eliminate this house from the regression results, choose Options > Edit to reopen the dialog window. In the Where input field, enter SQFT < 3500 to limit the homes included in the model to those that are under 3500 square feet. Make sure to type this statement accurately as such expressions are case sensitive and spaces are important. This statement can also be created by clicking on the adjoining Build button, which will open a custom expression builder. Click Compute! to view the regression results without the outlier. In the results window press the > button in the bottom right to view the fitted line plot for the new regression model as shown in the output below.

Calculating confidence intervals for the regression coefficients

In addition to hypothesis tests, confidence intervals for the regression coefficients can also be computed. For this example, in the window containing the regression results above, choose Options > Edit to reopen the dialog window. Under Perform, select Confidence intervals. By default, a value of 0.95 representing a 95% confidence level is provided for the Level input. If this value was changed to 0.99, a 99% confidence interval for each parameter would be produced. Leave the Level at the default 0.95 and click Compute!. Now instead of P-values, the results have 95% confidence intervals for the y-intercept and the slope. The results for the slope show that there is 95% confidence that the true slope is between 0.62866641 and 0.76142027.

Calculating predictions for Y

Predictions for the value of the y-variable can be obtained based on user specified values of the x-variable. In the window containing the regression results, choose Options > Edit to reopen the dialog window. Under Prediction of Y, enter 2000, 3000 for X value(s) to compute predictions of sales price for homes that are 2,000 and 3,000 square feet. This input accepts a list of values separated by commas. The corresponding Level input determines the level used in the calculation of confidence and prediction intervals. The default value of 0.95 will produce 95% confidence intervals and 95% prediction intervals. Click Compute! to view the results shown below. A table has been appended to the output table with the predicted PRICE of homes with 2000 and 3000 square feet. The table also shows a 95% confidence interval for the mean price of home with each square footage and a 95% prediction interval for the price of an individual home with each square footage.

Adding graphs to results

Under Graphs, a list of optional graphs is provided for better understanding the fit of the regression model. The Fitted line plot option is selected by default. In the window containing the previous regression results, choose Options > Edit to reopen the dialog window. Select Fitted line plot, --- with mean interval, and --- with prediction interval under Graphs and press Compute!. In the results window, press the > button at the bottom right to display the modified fitted line plot. The plot now shows a 95% confidence interval for the mean value of sales price across the entire range of values for square footage. A 95% prediction interval for the sales price of an individual home over the entire range of values for square footage is also provided. The value of Level under Prediction of Y can be used to modify the level used in the calculation and display of these intervals.

Saving results to the data table

The simple linear regression procedure also offers the capability to save regression results to the data table. Choose Options > Edit to reopen the dialog window. Select Residuals and Studentized residuals under Save and click Compute!. At the bottom of the results window shown below, a message has been added indicating new columns have been added to the data containing the residuals and studentized residuals. These columns can be used for further analysis.

Always Learning
Pearson