Statistics – normal distribution
Hi, when you shall choose the right statistics method it sometimes depends on the distribution of the variable, is it normal or skewed? A user asked me if there are any tests for a check if the distribution is normal or skewed. Answer is yes, but the tests are made for small samples. What is then a small sample? I would say less then 100 cases, in this situation.
So instead you have to use other methods/graphs to decide if it’s normal or not. This is something that we typically discuss together during training/consulting.
In this example we should study the salary.
Check normal distribution
1. Choose command: Analyze – Descriptive statistics – Explore
2. Choose your variable/variables. If you want to split it by groups, you have to put the group variable in the factor list box.
3. You will get a lot of results, but in this case I will compare if mean and median differ (if normal, they don’t differ). I can also check if the skewness value (4.107) is more than twice as large as the standard error value (0.077), because that indicate a skewed distribution (not normal). (You can read this definition in the help).
4. You get a test, but this test is calculated for small samples under 100 cases. (Significance value <0.05 indicate that it’s not normal distributed). In the example below, you can see what happens if you put in a normal distributed variable like age when you have a big sample (1000 cases) – then the test shows incorrect that the distribution is skewed (sig < 0.05) but I know that age is normal distributed.
5. If we go further, you will find the histogram. If the graph has a bell shaped form, it is normal distributed – but in this example you can clearly see that the distribution is skewed in the form:
Typically you have some people that has a very high salary, that will effect the distribution.
6. Here is another graph you get, QQ-plot:
If the dots differ from the diagonal line, then the distribution is not normal.
7. You also get a box plot that is clearly not symmetric – all small circles and stars with case numbers shows the outliers resp the extreme values.
So in this example I am sure that the salary is not normal distributed.
Do you want to learn more about statistics and methods, then we can offer a statistical training ?