Aggregating with the median
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | *SPSS AnswerNet solution ID:100006782 * How to aggregate the median. DATA LIST FREE /id score. BEGIN DATA 1 1 1 2 1 3 1 4 1 5 2 1 2 3 2 5 2 7 2 9 3 2 3 4 3 6 3 8 3 10 END DATA. ** Next we aggregate the file, breaking on the id variable. AGGREGATE OUTFILE 'aggmedian.sav'/ PRESORTED/ BREAK=id/ nscore=N(score). ** We then add this aggregated outfile back to the original ** data file, interlacing the agreggated cases (1 per id) ** among their associated id subgroups, and creating a new ** variable (aggfile) to flag the cases coming from this ** aggregated file. ADD FILES FILE 'aggmedian.sav'/IN=aggfile/FILE */BY id. exe. ** Next we split the file by id, so that our next procedure ** is repeated for each id group. SPLIT FILE BY id. ** At this point, we know that "score" is missing for (at ** least) one case within each split file group; we've put this case ** there with the aggregation and re-merging of the file. We can now ** use the RMV procedure to substitute for that missing value ** the median across the remaining values for "score" (within ** each id subgroup, since SPLIT FILE is in effect). RMV medscore=MEDIAN(score, ALL). ** Only the flagged, "aggfile" cases hold the median values at ** this point; we still need to substitute the subgroup medians ** in "medscore" for each case within each id subgroup. IF id=LAG(id) medscore=LAG(medscore). EXE. ** There's no need to retain the cases that were generated ** by the AGGREGATE and ADD FILES and used solely to hold the ** id subgroup medians. We can select everything but them for ** the active file. SELECT IF NOT(aggfile). EXE. ** At this stage, each case in the file has a value for the ** "medscore" variable appropriate to its id subgroup. If this ** is sufficient, you can drop the "nscore" and "aggfile" ** variables the next time the file is saved. If your original ** aim was indeed an aggregated file with one case per id and ** the median within id subgroups on "score," you now have ** everything you need for this. Several of the AGGREGATE ** functions will capture the "medscore" values, which are ** effectively constants within each break group. AGGREGATE OUTFILE */BREAK=id/aggmed=MAX(medscore). |
Related pages
...