## Statistics – What is tree analysis?

Tree analysis is a way to find the most important explanation to a target variable (Y), either it’s to find the best explanation between sick and healthy people (categorical target) or to find the best explanation for high salaries (numerical/continous variables).

This is an example of a tree that is based on numerical information like salary. You see the target in the top of the tree:

Then all boxes you see in the tree shows the most different groups, when comparing mean salary. This splits are based on statistical tests (here F-test based on Regression algorithms).

For example if I focus on the first split, I can see that job category is the most important variable to explain salary – that means that here we will find the groups who differs MOST when comparing salaries. And just as an example the right part of the split have these 2 boxes, with the highest salaries:

So above you can see that seller and Econom has statistically similar salaries, with a mean value of 24 660 (money/month) and the manager- and the specialist groups have 40 350 (money/month).

(money = Swedish kr)

To the left in the tree you can see lower salaries among employees in the service category, but the split has also continued with the variables “overtime” and then “gender”:

So we can see the lowest salaries among employees with overtime=yes and in the service group. In this group we can see a small (but statistically significant) difference between male and female where male has 17 467 and female 16 773.

If you rather has a category variable as target (Y), then you see percentages in the boxes and all statistics splitting is based of chi2 test (Bonferroni adjusted). Instead of searching for differences by doing a LOT of crosstables you find the most interesting differences in the tree.

Here above you can see that people living in the countryside and who are 2 people-household has the highest interest of “speed warning system” within their car (87% are interested).

Tree analysis is a type of data mining method, for finding differences as fast as possible without manually searching. Tree analysis will be found in the add on module called Decision Tree, just ask your contact person if you want to know more or try the analysis.

If you don’t have any contact person, just email me: gunilla.rudander@crayon.com

Greetings Gunilla Rudander