Flag n random cases within each subgroups
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | I am trying to sample a fixed number of units from certain subgroups within one datafile. Units consist of just an ID for each case and its size (population). The file is sorted by size. What I would like to do is to sample from three groups defined by size (pop). Lets say I want 10 units from the biggest 50, 10 units from those having pop between a and b, and 10 units from the smallests ones (less than b). As a result, I would like to have in the original file a "filter" variable for each subsample (1=selected, else=0). *SOLUTION by rlevesque@videotron.ca posted to SPSSX-L on 2001/05/14. * www.spsstools.net *. * Define some dummy data for illustration purposes. INPUT PROGRAM. LOOP id=1 TO 200. COMPUTE pop=5+TRUNC(UNIFORM(95)). END CASE. END LOOP. END FILE. END INPUT PROGRAM. LIST. SORT CASES BY pop(D). * The solution starts here. * Define a macro to do the job. *//////////////////////. DEFINE !sample (size=!TOKENS(1) /larger=!TOKEN(1) /b=!TOKEN(1)). * Rank the pop to know which cases are in largest 50. RANK VARIABLES=pop (D) /RANK INTO rpop /PRINT=YES /TIES=MEAN . * Assign a categ to each case. COMPUTE categ=2. DO IF rpop LE !larger. COMPUTE categ=1. ELSE IF pop LT !b. COMPUTE categ=3. END IF. * Get the random samples. COMPUTE draw=UNIFORM(1). RANK VARIABLES=draw (A) BY categ /RANK INTO rdraw. COMPUTE filter1=(rdraw LE !size). VALUE LABEL filter1 1 'Selected' 0 'Not selected'. EXECUTE. !ENDDEFINE. *//////////////////////. *Example of use of the macro when we want 10 cases (size=10) from each category where. *cat1= is out of those where rank of pop is in largest 50 (larger=50). *cat3= is out of those where pop < 20 (b=20). *cat2= is out of the remaining cases. * Call the macro. !sample size=10 larger=50 b=20. * This crosstab shows that 10 cases from each categ were selected. CROSSTABS /TABLES=flag BY categ /FORMAT= AVALUE TABLES /CELLS= COUNT . * Change the parameters of the macro if you want other types of samples. |
Related pages
...