Goodness of Fit Test for Poisson Distribution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | **** GOODNESS OF FIT TEST FOR POISSON DISTRIBUTIONS **** * Example dataset *. DATA LIST FREE/ NUMBER(f8.0). BEGIN DATA 0 2 0 0 2 2 0 0 1 1 3 0 0 1 0 0 1 0 1 4 0 0 1 4 2 0 0 1 0 0 2 2 1 1 0 6 0 5 1 3 0 1 0 1 8 END DATA. VAR LABEL number 'Nr of Ixodes trianguliceps/mouse'. * One sample K-S (low sensitivity if Lilliefors correction not applied manually) *. NPAR TESTS /K-S(POISSON)= number /MISSING ANALYSIS. * Notice the gap between 6 & 8; it must be filled to compute expected values correctly (this part is only for didactic purposes, can be removed from final code) *. FREQUENCIES VARIABLES=number /ORDER= ANALYSIS . *** Filling gaps in sequence *** * First: aggregate the dataset *. AGGREGATE /OUTFILE=* /BREAK=number /obs=N. EXEC. * Create auxiliary file with every possible value from 0 to max *. PRESERVE. SET ERRORS=NONE RESULTS=NONE. MATRIX. GET number /VAR=number. COMPUTE max=CMAX(number). RELEASE number. COMPUTE number=MAKE(max+1,1,1). COMPUTE wgt=MAKE(max+1,1,0.0001). LOOP i=1 TO max+1. - COMPUTE number(i)=i-1. END LOOP. COMPUTE namevec={"number","wgt"}. SAVE {number,wgt} /OUTFILE='c:\\temp\\temp.sav' /NAMES=namevec. END MATRIX. * Match it with working file to fill the gaps *. MATCH FILES /FILE=* /FILE='C:\\Temp\\temp.sav' /BY NUMBER. EXECUTE. IF MISSING(obs) obs=wgt. MATCH FILES /FILE=* /DROP=wgt. EXECUTE. WEIGHT BY obs. RESTORE. * Now the gaps are filled *. FREQUENCIES VARIABLES=number /ORDER= ANALYSIS . *** Goodness of fit Chi-square test (more sensitive than K-S) *** * Calculate mean & N *. COMPUTE const = 1 . EXECUTE . AGGREGATE /OUTFILE='C:\\temp\\aggr.sav' /BREAK=const /mean = MEAN(number) /N=N. WEIGHT OFF. PRESERVE. SET ERRORS=NONE. MATCH FILES /FILE=* /FILE='C:\\temp\\aggr.sav' /BY const. EXECUTE. RESTORE. DO IF MISSING(mean). - COMPUTE mean=LAG(mean). - COMPUTE n=LAG(n). END IF. EXECUTE. * Compute expected frequencies *. DO IF $casenum=1. - COMPUTE expect=n*CDF.POISSON(number,mean) . END IF. DO IF $casenum GT 1. - COMPUTE expect=n*CDF.POISSON(number,mean)-n*CDF.POISSON(LAG(number),mean) . END IF. SORT CASES BY number(D). DO IF $casenum=1. - COMPUTE expect=n*(1-CDF.POISSON(number-1,mean)). END IF. EXEC. SORT CASES BY number(A). * Graphical comparison before collapsing categories, although not part of the test, it's useful for visual cheking of departures from Poisson fit; must be done before collapsing categories *. GRAPH /BAR(GROUPED)=VALUE( obs expect ) BY number . * Check for Expected LT 1 & collapse cells if necessary (can be modified to collapse any Exp frequency below 5) *. COMPUTE id=$casenum. DO IF expect LT 1. - COMPUTE id=LAG(id). END IF. EXECUTE. AGGREGATE /OUTFILE=* /BREAK=id /observed = SUM(obs) /expected = SUM(expect). * Test statistic & significance *. STRING groups (A2). COMPUTE groups = STRING(id,F2.0) . EXECUTE . MATRIX. GET groups/VAR=groups. GET obs/VAR=observed. GET expect/VAR=expected. PRINT {obs,expect;MSUM(obs),MSUM(expect)} /FORMAT='F8.2' /TITLE='Observed & expected frequencies' /CLABELS='Obs','Exp' /RNAMES=groups. COMPUTE k=NROW(obs). PRINT {k-2} /FORMAT='F8.0' /TITLE='Degrees of freedom (k-2)'. COMPUTE chi2=MSUM((obs-expect)&**2/expect). COMPUTE chisig=1-CHICDF(chi2,k-2). PRINT {chi2,chisig} /FORMAT='F8.4' /TITLE='GOODNESS OF FIT TEST' /CLABELS='Chi^2','Sig.'. COMPUTE minexp=CMIN(expect). COMPUTE flag=0. LOOP i=1 TO k. - DO IF expect(i) LT 5. - COMPUTE flag=flag+1. - END IF. END LOOP. COMPUTE pflag=100*flag/k. DO IF flag GT 0. - PRINT pflag /FORMAT='F8.1' /TITLE='WARNING: EXPECTED<5 IN AT LEAST ONE CELL' /RLABEL='%cells='. - PRINT minexp /FORMAT='F8.1' /TITLE='Minimum expected count is:' /RLABEL='Exp='. END IF. END MATRIX. |
Related pages
...