Solution ID: 100000322 Question Subtype: Statistical Distributions Title: Generating multivariate hypergeometric random variables in SPSS Description: Q. How can I use SPSS to generate variables with a multivariate hypergeometric distribution for a specified number of cases? A. If you draw n observations without replacement from a population with k classes of objects, where k>2, the k numbers of objects sampled from the respective classes have a multivariate hypergeometric distribution. The following macro generates the cases and variables with such a distribution. You supply the macro with the number of cases to be generated (ncases), the number of classes of objects (classes), the number of objects to sample (or 'draw') for each case (samsize), and the population sizes for each of these classes (popc). The algorithm is similar to the directed, or ball-in-urn method, for generating a multinomial distribution. [see Johnson, N. L., Kotz, S., & Balakrishnan, N. (1997). "Discrete Multivariate Distributions", Wiley.] 1. The population sizes for each class (pop1 to pop(k)) are initialized from the respective values of popc and the sample sizes for all classes (sam1 to sam(k)) are initiialized as 0. The total sample size is calculated as the sum of the population sizes for the classes, i.e, the sum of pop(k), and stored as poptot. 2. For each of the samsize sample units to be drawn: (i). A discrete uniform random number from 1 to poptot is drawn and stored as Y. (ii). For each of the k classes, the variable psum is calculated as the sum of class populations considered to that point. If Y is less than or equal to psum but greater than psum for the previously- considered classes, the observation is considered a draw from the current class. The sample size for that class is incremented by 1 and its population size is decreased by 1, as is poptot. [Note that psum is not decremented, so there is no danger of a single y matching the range for both of 2 adjacent classes]. * macro to generate a multivariate hypergeometric distribution. * First example call has 3 classes with pop sizes of 50, 30, & 20. * 25 items are sampled without replacement and * sam1 to sam3 hold the respective counts. * 200 cases are generated. * Second example call has 4 classes with pop sizes of 20, 10, 30, & 20. * 30 items are sampled without replacement and * sam1 to sam4 hold the respective counts. * 300 cases are generated . * . *************************************************************. define mvhypgen (ncases = !tokens(1) /classes = !tokens(1) /samsize = !tokens(1) /popc = !enclose('[',']') ). new file. input program . loop id = 1 to !ncases . vector pop sam (!classes , F8). + do repeat popn = pop1 to !concat('pop',!classes) /samn = sam1 to !concat('sam',!classes) /pc = !popc . + compute popn = pc. + compute samn = 0. + end repeat. + compute poptot = sum(pop1 to !concat('pop',!classes)). + loop #j = 1 to !samsize . + compute y = trunc(uniform(poptot)) + 1. + compute psum = 0. + loop #k = 1 to !classes . + compute psum = psum + pop(#k). + do if (y le psum and y gt (psum - pop(#k))). + compute sam(#k) = sam(#k) + 1. + compute pop(#k) = pop(#k) - 1. + compute poptot = poptot - 1. + end if. + end loop. + end loop. + end case. end loop. end file. end input program. execute. !enddefine . mvhypgen ncases = 200 classes = 3 samsize = 25 popc = [ 50 30 20 ] . mvhypgen ncases = 300 classes = 4 samsize = 30 popc = [ 20 10 30 20 ] .