Statistics – report p-values in correlation or not?

 In Statistics & methods

My teacher in statistics told us following: “don’t report the p-values from the correlations, stay with the correlation coefficient only”. First I didn’t understand what he meant but now after 30 years of experience with statistics, I know why and will explain that.

I Think this is similar to the problem with normality test I wrote about the 9/11, that the test was done many years ago when you normally had small samples. So when the sample is bigger then it’s very, very easy to get significant p-values in correlations, also when you have very low coefficients like 0.14.

So that means that you have a very low correlation but it differs significantly from r=0 (no relationship), but how interesting is that?

Here is the correlation between 2 variables age and salary when you have 1000 cases.

So the number is 1000. The Correlation is low, r=0.117, so a low relation or correlation between age and salary (0 is the lowest correlation and 1 is the maximum of a correlation). But although the correlation is low, it has 2 small stars above (**) and that means that the correlation is significant at the 0.01 level. We can also see the significance value p=0.00196, that indicates the significance.

What happens if I weight down the same data so we only have 10 cases and 100 cases. See below:

The correlation is still the same (r=0.117), but now the significance value is higer. When we have 10 cases the significance value is 0.746, so we cannot say that there is a significant correlation that differs from 0 in the population. When we have 100 cases the correlation is still the same (r=0.117), but now the significance value is lower compare to 10 cases: it is significance value 0.244 (still not significant correlation that differs from 0). 

So what my teacher meant was that when you work with big data sample it is not so impressing to report the most significance values  will be significant. Then it’s better to report the correlation coefficients (r) instead.

Just a note: to be correct it’s better to use Spearman’s correlation when you compare salary, because it’s not normal distributed. But I will come back to that next week. 


error: Content is protected !!