Tuesday, October 8, 2013

Residuals and Residual Plots

A least squares regression line is the "average" of all the data points - it goes through the middle of everything, after all, so therefore, it is a measure of center. There are points above and below the regression line - not everything is directly on the line (if they were, we would have a perfect correlation of 1 or -1). When we take the actual data values (the points) and subtract what the line predicts we will have (the predicted), we are left with the residual value. The residual value is the distance in the "y variable" from the data point to the regression line.

Residual = Actual - Predicted (RAP!)

We can find the residuals in the TI-84 by making a regression line. After you enter your data, hit stat - calc - LinReg, which will bring up your regression equation. Then go to make a scatterplot, but instead of L2 in your y-variable, go to 2nd - list and put in #7 - RESID. Then zoom9 to make the plot.

Note - if you don't make the regression line first, you won't have the residuals programmed in the calculator, and it won't work.

We look for 3 features in the residual plot to tell if the regression line is a good fit for the data. 1) Random scatter, 2) approx equal points above and below the x axis, and 3) no outliers (if we have one, we should consider removing it to improve the model). If we see patterns or unequal distribution of points, this is a sign that a linear regression model might not be the best fit to use for our data.

No comments:

Post a Comment