We like straight lines. Straight lines are predictable and easy to model. Ideally, we want our data to be linear, i.e. to resemble a straight line, but that doesn't always happen in practice. And if not, we can transform the data a few ways to make it look linear, so that way, we can interpret and make predictions with it more easily.
The four major AP stat transformations (there are many, many more in college stat and beyond) are y = x-squared, y = root x, y = log(x), and y = 1/x. Depending on the data shape, we want to pick a transformation and change our x-variable (can do this in the calculator by using L3) so that our scatterplot appears linear.
The evidence that our transformation is a good one comes in three forms. Firstly, the scatterplot between the transformed x and y appears more linear than when it was regular x and y. Secondly, the r-squared value improves. This is a good thing, because it means our data has less variation. Less variation = more predictive power for our model = more accuracy. Finally, our residual plot should appear more randomly scattered, with more points above and below the x-axis than before (although today in class this definitely was not the case!).
Remember: if you can improve the model by getting a higher r-squared, then transforming the data is probably a good idea.
No comments:
Post a Comment