Generating multinominal random variables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | Solution ID: 100000320 Product: SPSS Base Version: O/S: Question Type: Statistics Question Subtype: Statistical Distributions Title: Generating multinomial random variables in SPSS Description: Q. How can I use SPSS to generate variables with a multinomial distribution for a specified number of cases? A. If you draw n observations with replacement from a population with k classes of objects, where k>2, the k numbers of objects sampled from the respective classes have a multinomial distribution. The following macro generates the cases and variables with such a distribution. You supply the macro with the number of cases to be generated (ncases), the number of classes of objects (classes), the number of objects to sample (or 'draw') for each case (samsize), and the population proportions or sizes for each of these classes (prop). If you enter population sizes for each class, rather than proportions, you should indicate this by specifying "norp = 1" on the macro call. (If you enter sizes but neglect to specify that norp = 1, the macro will "infer" that sizes were entered if the sum(prop1 to prop(k)) > 1.001 and rescale prop1 to prop(k) to proportions. The extra .001 in the above threshold is to allow for any numerical imprecision in the summing of prop values. The algorithm employed is the directed, or ball-in-urn method, as described in Johnson, Kotz, & Balakrishnan [(1997). "Discrete Multivariate Distributions", Wiley. pp. 68-69]. As noted by those authors, there are faster algorithms available to generate multinomial variables. However, those methods do not appear to lend themselves to efficient use within the structure of SPSS data files. The algorithm is summarized as follows. 1. The population proportions for each class (pop1 to pop(k)) are initialized from the respective values of prop and the sample sizes for all classes (sam1 to sam(k)) are initiialized as 0. The total sample size is calculated as the sum of the prop values for the classes, i.e, the sum of pop1 to pop(k), and stored as poptot. If proportions were correctly entered for the prop parameter, poptot will equal 1. If counts were entered for prop, poptot > 1.001, and the values of pop1 to pop(k) are rescaled by division over poptot. 2. For each of the samsize observations to be drawn for each case: (i). A uniform(0,1) random number is drawn and stored as Y. (ii). For each of the k classes, the variable psum is calculated as the sum of class population proportions considered to that point. If Y is less than or equal to psum but greater than psum for the previously- considered classes, the observation is considered a draw from the current class. The sample size for that class is therefore incremented by 1 * macro to generate a multinomial distribution. * First example call has 3 classes with pop proportions of .5, .3, & .2. * 125 items are sampled with replacement and * sam1 to sam3 hold the respective counts. * 200 cases are generated. * Second example call has 4 classes with pop sizes of 20, 10, 30, & 20. * 30 items are sampled with replacement and * sam1 to sam4 hold the respective counts. * norp = 1 in the macro call indicates that population sizes, * rather than proportions, are input to the prop parameter. * 300 cases are generated . * Third example call is like the second, except that norp has not been set. * The macro infers that pop sizes were entered because the sum of * of the prop elements is greater than 1.001 . * . *************************************************************. define multnomg (ncases = !tokens(1) /classes = !tokens(1) /samsize = !tokens(1) /norp= !default (0) !tokens(1) /prop = !enclose('[',']') ). new file. input program . loop id = 1 to !ncases . + compute np = !norp . vector pop (!classes , F8.5) sam (!classes , F8) . + do repeat popn = pop1 to !concat('pop',!classes) /samn = sam1 to !concat('sam',!classes) /pc = !prop . + compute popn = pc. + compute samn = 0. + end repeat. + compute poptot = sum(pop1 to !concat('pop',!classes)). + do if (np = 1 or poptot > 1.001). + do repeat propn = pop1 to !concat('pop',!classes). compute propn = propn/poptot. + end repeat. + end if. + loop #j = 1 to !samsize . + compute y = uniform(1) . + compute psum = 0. + loop #k = 1 to !classes . + compute psum = psum + pop(#k). + if (y le psum and y gt (psum - pop(#k))) sam(#k) = sam(#k) + 1. + end loop. + end loop. + end case. end loop. end file. end input program. execute. !enddefine . multnomg ncases = 200 classes = 3 samsize = 125 prop = [ 0.5 0.3 0.2 ] . multnomg ncases = 300 classes = 4 samsize = 30 norp = 1 prop = [ 20 10 30 20 ] . multnomg ncases = 300 classes = 4 samsize = 30 prop = [ 20 10 30 20 ] . |
Related pages
...