1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
I am trying to sample a fixed number of units from certain subgroups within
one datafile. Units consist of just an ID for each case and its size
(population). The file is sorted by size.

What I would like to do is to sample from three groups defined by size (pop).

Lets say I want 10 units from the biggest 50, 10 units from those having
pop between a and b, and 10 units from the smallests ones (less than b).

As a result, I would like to have in the original file a "filter" variable
for each subsample (1=selected, else=0).

*SOLUTION by rlevesque@videotron.ca posted to SPSSX-L on 2001/05/14.
* www.spsstools.net
*.

* Define some dummy data for illustration purposes.
INPUT PROGRAM.
LOOP id=1 TO 200.
COMPUTE pop=5+TRUNC(UNIFORM(95)).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
LIST.
SORT CASES BY pop(D).

* The solution starts here.

* Define a macro to do the job.
*//////////////////////.
DEFINE !sample (size=!TOKENS(1) /larger=!TOKEN(1) /b=!TOKEN(1)).

* Rank the pop to know which cases are in largest 50.
RANK
  VARIABLES=pop  (D) /RANK INTO rpop /PRINT=YES
  /TIES=MEAN .

* Assign a categ to each case.
COMPUTE categ=2.
DO IF rpop LE !larger.
COMPUTE categ=1.
ELSE IF pop LT !b.
COMPUTE categ=3.
END IF.

* Get the random samples.
COMPUTE draw=UNIFORM(1).
RANK VARIABLES=draw  (A) BY categ  /RANK INTO rdraw.
COMPUTE filter1=(rdraw LE !size).
VALUE LABEL filter1 1 'Selected' 0 'Not selected'.
EXECUTE.
!ENDDEFINE.
*//////////////////////.

*Example of use of the macro when we want 10 cases (size=10) from each category where. 
*cat1= is out of those where rank of pop is in largest 50 (larger=50).
*cat3= is out of those where pop < 20 (b=20).
*cat2= is out of the remaining cases.

* Call the macro.
!sample size=10 larger=50 b=20.

* This crosstab shows that 10 cases from each categ were selected.
CROSSTABS
  /TABLES=flag  BY categ
  /FORMAT= AVALUE TABLES
  /CELLS= COUNT .

* Change the parameters of the macro if you want other types of samples.