## Statistics – normal distribution

Hi, when you shall choose the right statistics method it sometimes depends on the distribution of the variable, is it normal or skewed? A user asked me if there are any tests for a check if the distribution is normal or skewed. Answer is yes, but the tests are made for small samples. What is then a small sample? I would say less then 100 cases, in this situation.

So instead you have to use other methods/graphs to decide if it’s normal or not. This is something that we typically discuss together during training/consulting.

In this example we should study the salary.

#### Check normal distribution

1. Choose command: Analyze – Descriptive statistics – Explore

2. Choose your variable/variables. *If you want to split it by groups, you have to put the group variable in the factor list box.*

3. You will get a lot of results, but in this case I will compare if mean and median differ (if normal, they don’t differ). I can also check if the skewness value (4.107) is more than twice as large as the standard error value (0.077), because that indicate a skewed distribution (not normal). (You can read this definition in the help).

4. You get a test, but this test is calculated for **small samples** under 100 cases. (Significance value <0.05 indicate that it’s **not** normal distributed). *In the example below, you can see what happens if you put in a normal distributed variable like age when you have a big sample (1000 cases) – then the test shows incorrect that the distribution is skewed (sig < 0.05) but I know that age is normal distributed.*

5. If we go further, you will find the histogram. If the graph has a bell shaped form, it is normal distributed – but in this example you can clearly see that the distribution is skewed in the form:

Typically you have some people that has a very high salary, that will effect the distribution.

6. Here is another graph you get, QQ-plot:

If the dots differ from the diagonal line, then the distribution is **not** normal.

7. You also get a box plot that is clearly not symmetric – all small circles and stars with case numbers shows the outliers resp the extreme values.

So in this example I am sure that the salary is not normal distributed.

Do you want to learn more about statistics and methods, then we can offer a statistical training ?

Gunilla