Statistics – Boxplot how to understand it
Hi, many times during seminar/workshops/trainings I met some users that has not seen/used boxplot before and don’t know how to use or understand them. So let me take you into the boxplot world today!
This plot has been very important for me, to understand my variables information better, where is the middle person and how is the spread and … finally is it skewed distributed?
Here is an example of a boxplot of salary, splitted up by gender. Choose command “Graphs – Chart Builder”:
This is what came out from the command:
Let me explain the boxplot (definitions are from Tukey):
And IQR is the same as interquartile range, the range between quartile 1 and 3.
So if a person differs at least 1.5 IQR from the box side in the chart, then this person will be marked with a circle= outlier. For example id nr 157. And if a person differ at least 3 IQR from a box side we call it extreme value, like id 516 that differs 5.5 IQR from the box side. See below:
You also get the information that the salary distribution for female is very skewed, not normal distributed. If the boxplot is symmetric, with median line in the middle of the box then the variable is normal distributed. A boxplot can be normal distributed even if there are outliers and extreme values, but then it should be on both sides of the box – I mean below AND above.
Hopes that you got inspired to use more boxplots?