Systematic fixed sampling | Raynald's SPSS Tools

SPSS AnswerNet: Result 

Solution ID:	 	100000957	
Title:
Systematic Sampling with Fixed Sample Size 
Description:
Q. 
I want to sample cases from a file by systematic sampling with a 
fixed sample size. Suppose I have N=10,000 cases in the file and 
want a sample of n=500 cases, choosing 1 case from every 20 cases. 
The first case sampled is the Kth case, where K is a random 
number from 1 to 20. The next cases sampled are the (K+20)th case, 
the (K+40th) case, and so forth. How can I do this in SPSS or 
SPSS/PC+? 
A. 
The general approach is to assign each case a sequence number 
based on their serial position in the file, assign an interval 
number such that the length of each interval equals L=N/n. K is 
generated as a single random number from 0 to L, and the Kth 
case in each interval is chosen. Note that the number of 
cases in the file, i.e. the population, must be known. 
When the sample size is fixed, the size of the intervals from 
which single cases are sampled will not be an integer if the 
population size is not evenly divisible by the sample size. In 
such cases K is likely to be non integer, perhaps indicating 
case 244.34 as the case to be sampled, for example. It is 
customary to round the case number to be sampled up to the 
next integer (Case 245 in this example). The detailed steps are: 
1. Determine the population size, i.e. the number of cases in the 
file to be sampled. This can be determined from running 
DESCRIPTIVES on a variable or the AGGREGATE and MATCH FILES 
commands can be combined to save the population size as a 
variable, NSIZE. Calculate the length of each sampling 
interval as population size/sample size and save as L. 
2. Create a variable named CASENO which indicates the serial 
position of each case in the file. In SPSS, this can be a 
copy of the SPSS system variable $CASENUM. 
3. Create a variable named START and generate a random number 
between 0 and L for the first case in the file. Copy this 
number to all following cases. This constant will serve as K, 
the starting position. 
4. Compute a variable named INTERV which indicates to which 
interval of length L each case belongs. INTERV starts at 
0 rather than 1. 
5. Compute a variable named SAMCASE which is (L*INTERV + START). 
If SAMCASE is not an integer, it is rounded upwards to the 
next integer. 
6. Select the case if SAMCASE = CASENO. 
IMPLEMENTING IN SPSS 
The following commands will perform these steps in versions 4 
and above of SPSS, including SPSS for windows and SPSS for the 
Macintosh. Some modifications are required for SPSS/PC+. The 
first command sets the seed for the random number generator, 
using the date and time. (The default value of SEED may vary 
or be fixed, depending on the version of SPSS). The EXECUTE 
command forces a pass of the data, with assignment of CASENO 
to all cases before any selection takes place. (Otherwise, 
the first case to be deleted would pass its value for 
$CASENUM and CASENO to the next case, which would then 
necessarily be deleted by the same comparison to SAMCASE and 
pass the same value of $CASENUM to the next case, etc.) 
SET SEED = 950203123 . 
* Save population size as variable. 
COMPUTE DUM = 1. 
AGGREGATE OUTFILE = tsize 
/BREAK = DUM /NSIZE = N. 
MATCH FILES /FILE = * /TABLE = tsize /BY DUM. 
* Calculate interval length L (for sample size of 500 in this 
* example). 
COMPUTE L = NSIZE/500. 
COMPUTE CASENO = $CASENUM. 
* Generate starting point as random number from 0 to L . 
IF (CASENO = 1) START = UNIFORM(L). 
IF (MISSING(START)) START = LAG(START). 
* Calculate which interval a case falls into . 
COMPUTE INTERV = TRUNC(CASENO/L). 
IF (INTERV = CASENO/L) INTERV = INTERV - 1. 
COMPUTE SAMCASE = INTERV * L + START. 
IF (SAMCASE > TRUNC(SAMCASE)) SAMCASE = TRUNC(SAMCASE) + 1. 
EXECUTE. 
SELECT IF (SAMCASE = CASENO). 
EXECUTE.
...
Navigate from here