About small values with huge influence - Sum Of Squares - part 2
In the first part of this blog article we became familiar with the RSS and started to get an insight about the influence of the individual data values. This is followed by this part.
Hat Values and Cook’s Distance – what is really influencing the regression line?
So far, we were thinking about the influence of data points, but have actually not clarified what influence actually means. One intuitive way to think about that is to consider what would happen to the regression line if a single data point would be removed from the data set. If one data point has a big influence on the regression line, then, removing that data point should change the regression line a lot, which can be measured by a difference in the slope and/or y-intercept. This can be done and is shown in the following Figure 1: