All data has variability. If the points on a scatterplot are closer together, this means that the variability is low, but if they are more spread out, then the variability is high. When variability is high, it's hard to use a model to make predictions, since we're less sure about where our data will be. When variability is low, it's easier to use a model to make predictions, since we know our points are more likely to end up closer together, and therefore become more accurate.
R-Squared (awkward, I can't make exponents on the blog) is called the coefficient of determination. While r measures the correlation of the data, r-squared measures variability. We interpret r-squared as "the amount of variation in (x-variable) that can be explained by a linear relationship with (y-variable)." If r-squared is high, this serves as strong evidence that a least squares regression line is a good fit for the data. If r-squared is low, then a least squares regression line is not a good fit for the data (and that we might have to transform it to a different type of model - more on that next week).
No comments:
Post a Comment