Data will rarely show perfect correlation. We can almost expect that there will be variation in and amongst the data. The more spread there is, the higher our coefficient of determination (r-squared) will be.
Influential observations make our coefficient of determination much lower. They change the slope and the correlation because they're so far above or below most of the data. We want to remove these from the data because they weaken the model's predictive power. Bivariate outliers are different - they are far from the bulk of the data but they still lie close to the regression equation. They hardly effect the slope of the regression line or the correlation. We can usually leave those within the model because they don't weaken the correlation.
 
No comments:
Post a Comment