Fitting Models with Overdispersion
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | ****************************************** *** FITTING MODELS WITH OVERDISPERSION *** *** ("extra-Poisson" variation) *** ****************************************** * These models are useful for parasites * * distributions * ****************************************** ***************************************************************************** * Adapted from: MJ Moroney 'Facts From Figures' * * (London: Penguin Books, 1951; 2nd ed 1953) * * * * In Poisson models E(x)=V(x)=lambda (only one parameter) * * Probability function: * * P(x)=e^(-lambda)*(lambda^x)/x! * * * * In over-dispersed distributions V(X)>E(x) * * Negative binomial law is a prototype of overdispersed distributions * * * * In Negative Binomial models one extra parameter (k) is needed: * * Parameter Estimated by: * * E(x)=lambda ----------------> sample mean * * V(x)=lambda*(1+lambda/k) ---> sample variance * * k = EІ(x)/(V(x)-E(x)) -> meanІ/(variance-mean) ('moments method') * * Comment: Maximum Likelihood method is better, but far more complicated! * * (see ParaDis Web page & program) * * Probability function: * * P(x)=[(k/(k+lambda))^k]*[(lambda/(k+lambda))^x]*GAMMA(x+k)/(x!*GAMMA(k)) * ***************************************************************************** * Example dataset (from ParaDis program manual): * (http://www.bondy.ird.fr/~pichon/paradis/parad2.html). DATA LIST FREE/ number freq (2 F8.0). BEGIN DATA 0 70 1 38 2 17 3 10 4 9 5 3 6 2 7 1 END DATA. VARIABLE LABEL number 'Acariens/feuille'. * Add mean, variance & N to the dataset & compute Fisher Dispersion Test *. CACHE. EXECUTE. MATRIX. PRINT /TITLE='DATA & STATISTICS'. GET x/VAR=number. GET freq/VAR=freq. COMPUTE ndata=NROW(x). COMPUTE nt=MSUM(freq). COMPUTE mean=MSUM(freq&*x)/nt. COMPUTE variance=(MSUM(freq&*(x&**2))-nt*mean&**2)/(nt-1). COMPUTE k=(mean&**2)/(variance-mean). PRINT {x,freq} /FORMAT='F8.0' /TITLE='Sample data' /CLABELS='X','Freq'. PRINT nt /FORMAT='F8.0' /TITLE='Total sample size' /RLABEL='Nt'. PRINT {mean,variance,(variance/mean),k} /FORMAT='F8.2' /TITLE='Statistics' /CLABELS='Mean','Variance','DP','K(*)'. PRINT /TITLE='(*) Computed by moments method.'. PRINT /TITLE='FISHER DISPERSION TEST'. COMPUTE fd=(MSUM(freq&*(x-mean)&**2))/mean. COMPUTE fdsig=1-CHICDF(fd,MSUM(freq)-1). PRINT {fd,fdsig} /FORMAT='F8.4' /TITLE='Test statistics (df=Nt-1)' /CLABELS='Chi^2','Sig.'. DO IF fdsig LE 0.05. - PRINT /TITLE='Overdispersion exists! Extra-Poisson variation is present.'. END IF. COMPUTE means=MAKE(ndata,1,mean). COMPUTE n=MAKE(ndata,1,nt). COMPUTE kp=MAKE(ndata,1,k). COMPUTE namevec={'number','n','mean','k'}. SAVE {x,n,means,kp} /OUTFILE='c:\\temp\\temp.sav' /NAMES=namevec. END MATRIX. MATCH FILES /FILE=* /FILE='C:\\Temp\\temp.sav' /BY number. EXECUTE. * Calculate expected frequencies under NB assumption * (For Chi-square, 3 df are lost: N, mean & variance are used) *. COMPUTE expect=n*((mean/(k+mean))**number)*((k/(k+mean))**k)*EXP(LNGAMMA(number+k))/(EXP(LNGAMMA(number+1))*EXP(LNGAMMA(k))). COMPUTE sumexp=expect. * Trick to make the sum of expected frequencies equal to N replacing p(last) by p(x GE last) *. DO IF $casenum GT 1. - COMPUTE sumexp=sumexp+LAG(sumexp). END IF. SORT CASES BY number(D). DO IF $casenum EQ 1. - COMPUTE expect=expect+(n-sumexp). END IF. SORT CASES BY number(A). * List&Graph Observed vs Expected *. VAR LABEL freq 'Observed' /expect 'Expected'. REPORT /FORMAT=LIST /TITLE='Observed & expected frequencies (assuming over-dispersion)' /VAR=number freq expect. GRAPH /BAR(GROUPED)=VALUE(freq expect) BY number /TITLE='Observed & expected(*) frequencies' /Footnote='(*) Assuming over-dispersion'. * Check for very low Expected frequencies & collapse cells to avoid them *. COMPUTE id=$casenum. DO IF (expect LT 2). - COMPUTE id=LAG(id). END IF. AGGREGATE /OUTFILE=* /BREAK=id /observed = SUM(freq) /expected = SUM(expect). STRING groups (A2). COMPUTE groups = STRING(id,F2.0) . * Goodness of fit Chi-square test *. MATRIX. PRINT /TITLE='CHI-SQUARE GOODNESS-OF-FIT TEST'. GET groups/VAR=groups. GET obs/VAR=observed. GET expect/VAR=expected. PRINT {obs,expect,(obs-expect);MSUM(obs),MSUM(expect),0} /FORMAT='F8.2' /TITLE='Frequencies (after collapsing categories)' /CLABELS='Observed','Expected','Residual' /RNAMES=groups. COMPUTE k=NROW(obs). PRINT {k-3} /FORMAT='F8.0' /TITLE='Degrees of freedom (k-3)'. COMPUTE chi2=MSUM((obs-expect)&**2/expect). COMPUTE chisig=1-CHICDF(chi2,k-3). PRINT {chi2,chisig} /FORMAT='F8.4' /TITLE='Test statistics' /CLABELS='Chi^2','Sig.'. COMPUTE minexp=CMIN(expect). COMPUTE flag=0. LOOP i=1 TO k. - DO IF expect(i) LT 5. - COMPUTE flag=flag+1. - END IF. END LOOP. COMPUTE pflag=100*flag/k. DO IF flag GT 0. - PRINT pflag /FORMAT='F8.1' /TITLE='WARNING: Some cells with expected frequencies less than 5.' /RLABEL='cells(%)='. - PRINT minexp /FORMAT='F8.1' /TITLE='The minimum expected cell frequency is:' /RLABEL='Exp='. END IF. END MATRIX. ********************** * END OF SYNTAX. |
Related pages
...