Tips & Tricks #20 – Restructure for cases with several rows
Hi, I got a question how to restructure a file that looked like this. So for example case nr 4 is repeated 3 times, case nr 4 is repeated 7 times in 7 rows:
Important in Statistics is almost always (not in GEE analysis though) that every case is on only one row, not several rows – otherwise the calculation will be wrong. So in the end I want to have the lab_data repeated in columns instead.
First: save a copy of the file
To fix the data file, there is a restructure command that you find here: Data -Restructure
And then choose this alternative of restruction:
Here you put in your identification variable in the upper box (here obs).
The file will then look like this:
As you can see case 1 is now on only 1 row, with 4 values (as it was repeated on 4 rows).
You can see that we also got 7 columns, 7 variables, because case 4 had 7 repeated rows – and that was the maximum of number of rows in the whole data file.
I recommend to work with only a few variables when you do this restructure, so you have a control of what happens. You can merge several data files if they have a key variable you can match from, like “obs” in this example data file.
In some situations it’s better to use the aggregate command to get one row for each case, then you will have a summarize of only ONE value per case. For example a mean value of lab_data or max value of lab_data. But in this situation I wanted every measured lab_data so that’s why the command restructure is better.