``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332``` ```* SPSS PROCEDURE FOR CALCULATING White's Standard Errors for Large, Intermediate and Small Samples. *(i) HC0: This is the original White (1980) procedure applicable when sample sizes are large (n > 500). * 1st step: Open up your data file and save it under a new name since the following procedure will alter it. * 2nd step: Run you OLS regression and save UNSTANDARDISED residuals as RES_1:. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT mp_pc /METHOD=ENTER xp_pc gdp_pc /SAVE RESID(RES_1) . * 3rd step: create a variable called ESQ = square of those residuals:. COMPUTE ESQ = RES_1 * RES_1. EXECUTE. * 4th step: create a variable called CONSTANT = constant of value 1 for all observations in the sample. FILTER OFF. USE ALL. EXECUTE . COMPUTE CONSTANT = 1. EXECUTE. * 5th step: Filter out missing values and Enter Matrix syntax mode . FILTER OFF. USE ALL. SELECT IF(MISSING(ESQ) = 0). EXECUTE . * 6th step: Tell the matrix routine to get your variables. * you need to enter the names of the Y and X variables from your regression here. and Use matrix syntax to calculate White's standard errors for large samples:. *******Note that the only thing you need to do here is alter the variable names in lines 2 and 3 below so that they match those of your regression. MATRIX. GET Y / VARIABLES = mp_pc. GET X / VARIABLES = CONSTANT, xp_pc, gdp_pc / NAMES = XTITLES. GET RESIDUAL / VARIABLES = RES_1. GET ESQ / VARIABLES = ESQ. COMPUTE XRTITLES = TRANSPOS(XTITLES). COMPUTE N = NROW(ESQ). COMPUTE K = NCOL(X). COMPUTE O = MDIAG(ESQ). COMPUTE WHITEV = (INV(TRANSPOS(X) * X)) *TRANSPOS(X)* O * X*INV(TRANSPOS(X) * X). COMPUTE WDIAG = DIAG(WHITEV). COMPUTE WHITE_SE = SQRT(WDIAG). PRINT WHITE_SE / FORMAT = "E13" / TITLE = "White's (Large Sample) Corrected Standard Errors" / RNAMES = XRTITLES. COMPUTE B = (INV(TRANSPOS(X) * X)) * (TRANSPOS(X) * Y). PRINT B / FORMAT = "E13" /TITLE = "OLS Coefficients" / RNAMES = XRTITLES. COMPUTE WT_VAL = B / WHITE_SE. PRINT WT_VAL / FORMAT = "E13" / TITLE = "t-values based on Whites (large sample) corrected SEs" / RNAMES = XRTITLES. COMPUTE SIG_WT = 2*(1- TCDF(ABS(WT_VAL), N)) . PRINT SIG_WT / FORMAT = "E13" / TITLE = "Prob(t < tc) based on Whites (large n) SEs" / RNAMES = XRTITLES. COMPUTE SIGMASQ = (TRANSPOS(RESIDUAL)*RESIDUAL)/(N-K). COMPUTE SE_SQ = SIGMASQ*INV(TRANSPOS(X)*X). COMPUTE SESQ_ABS = ABS(SE_SQ). COMPUTE SE = SQRT(DIAG(SESQ_ABS)). PRINT SE / FORMAT = "E13" / TITLE = "OLS Standard Errors" / RNAMES = XRTITLES. COMPUTE OLST_VAL = B / SE. PRINT OLST_VAL / FORMAT = "E13" / TITLE = "OLS t-values" / RNAMES = XRTITLES. COMPUTE SIG_OLST = 2*(1- TCDF(ABS(OLST_VAL), N)) . PRINT SIG_OLST / FORMAT = "E13" / TITLE = "Prob(t < tc) based on OLS SEs" / RNAMES = XRTITLES. COMPUTE WESTIM = {B, SE, WHITE_SE, WT_VAL, SIG_WT}. PRINT WESTIM / FORMAT = "E13" / RNAMES = XRTITLES / CLABELS = B, SE, WHITE_SE, WT_VAL, SIG_WT. END MATRIX. Notes: ? Don't save your data file under the same name since the above procedure has removed from the data all observations with missing values. ? If you already have a variable called res_1, you will need to delete or rename it before you run the syntax. This means that if you run the procedure on several regressions, you will need to delete the newly created res_1 and ESQ variables after each run. ? Note that the output will use scientific notation, so 20.7 will be written as 2.07E+01, and 0.00043 will be written as 4.3E-04. ? Note that the last table just collects together the results of five of the other tables. ? WT_VAL" is an abbreviation for "White's t-values" and "SIG_WT" is the significance level of these t values. Example of White's Standard Errors: If we run the matrix syntax on our earlier regression of floor area on age of dwelling, bedrooms and bathrooms, we get: Run MATRIX procedure: White's (Large Sample) Corrected Standard Errors CONSTANT 4.043030E-02 AGE_DWEL 1.715285E-04 BATHROOM 2.735781E-02 BEDROOMS 1.284207E-02 OLS Coefficients CONSTANT 3.536550E+00 AGE_DWEL 1.584464E-03 BATHROOM 2.258710E-01 BEDROOMS 2.721069E-01 t-values based on Whites (large sample) corrected SEs CONSTANT 8.747276E+01 AGE_DWEL 9.237322E+00 BATHROOM 8.256180E+00 BEDROOMS 2.118870E+01 Prob(t < tc) based on Whites (large n) SEs CONSTANT 0.000000E+00 AGE_DWEL 0.000000E+00 BATHROOM 2.220446E-16 BEDROOMS 0.000000E+00 OLS Standard Errors CONSTANT 3.514394E-02 AGE_DWEL 1.640008E-04 BATHROOM 2.500197E-02 BEDROOMS 1.155493E-02 OLS t-values CONSTANT 1.006304E+02 AGE_DWEL 9.661319E+00 BATHROOM 9.034130E+00 BEDROOMS 2.354899E+01 Prob(t < tc) based on OLS SEs CONSTANT 0.000000E+00 AGE_DWEL 0.000000E+00 BATHROOM 0.000000E+00 BEDROOMS 0.000000E+00 WESTIM B SE WHITE_SE WT_VAL SIG_WT CONSTANT 3.536550E+00 3.514394E-02 4.043030E-02 8.747276E+01 0.000000E+00 AGE_DWEL 1.584464E-03 1.640008E-04 1.715285E-04 9.237322E+00 0.000000E+00 BATHROOM 2.258710E-01 2.500197E-02 2.735781E-02 8.256180E+00 2.220446E-16 BEDROOMS 2.721069E-01 1.155493E-02 1.284207E-02 2.118870E+01 0.000000E+00 *(ii) HC2 and HC3: Matrix Procedure for Corrected Standard Errors when the sample is < 500 : *When the sample size is small, it has been found that White's stand ard errors are not reliable . *MacKinnon and White (1985) proposed three tests to be used when the sample size is small. *Long and Ervin (1999) found that the third of these tests, what they call HC3, is the most reliable. *But unless one has a great deal of RAM on your computer, you may run into difficulties if your sample size is greater than 250. *As a result, I would recommend the following:. *n < 250 use HC3 irrespective of whether your tests for heteroscedasticity prove positive (Long and Ervin found that the tests are not very powerful in small samples). *250 < n < 500 use HC2 since this is more reliable than HC0 (HC0 = White's original SE as computed above). *n > 500 use either HC2 or HC0. *Syntax for computing HC2 is presented below. Follow the first 5 steps as before, and then run the following: *HC2. MATRIX. GET Y / VARIABLES = flarea_l. GET X / VARIABLES = CONSTANT, age_dwel, bathroom, bedrooms / NAMES = XTITLES. GET RESIDUAL / VARIABLES = RES_1. GET ESQ / VARIABLES = ESQ. COMPUTE XRTITLES = TRANSPOS(XTITLES). COMPUTE N = NROW(ESQ). COMPUTE K = NCOL(X). COMPUTE O = MDIAG(ESQ). /*Computing HC2*/. COMPUTE XX = TRANSPOS(X) * X. COMPUTE XX_1 = INV(XX). COMPUTE X_1 = TRANSPOS(X). COMPUTE H = X*XX_1*X_1. COMPUTE H_MONE = h * -1. COMPUTE ONE_H = H_MONE + 1. COMPUTE O_HC2 = O &/ ONE_H. COMPUTE HC2_a = XX_1 * X_1 *O_HC2. COMPUTE HC2 = HC2_a * X*XX_1. COMPUTE HC2DIAG = DIAG(HC2). COMPUTE HC2_SE = SQRT(HC2DIAG). PRINT HC2_SE / FORMAT = "E13" / TITLE = "HC2 Small Sample Corrected Standard Errors" / RNAMES = XRTITLES. COMPUTE B = XX_1 * X_1 * Y. PRINT B / FORMAT = "E13" /TITLE = "OLS Coefficients" / RNAMES = XRTITLES. COMPUTE HC2_TVAL = B / HC2_SE. PRINT HC2_TVAL / FORMAT = "E13" / TITLE = "t-values based on HC2 corrected SEs" / RNAMES = XRTITLES. COMPUTE SIG_HC2T = 2*(1- TCDF(ABS(HC2_TVAL), N)) . PRINT SIG_HC2T / FORMAT = "E13" / TITLE = "Prob(t < tc) based on HC2 SEs" / RNAMES = XRTITLES. END MATRIX. *Sample output from this syntax is as follows:. *HC2 Small Sample Corrected Standard Errors. *CONSTANT 4.077517E-02. *AGE_DWEL 1.726199E-04. *BATHROOM 2.761153E-02. *BEDROOMS 1.293651E-02. *OLS Coefficients. *CONSTANT 3.536550E+00. *AGE_DWEL 1.584464E-03. *BATHROOM 2.258710E-01. *BEDROOMS 2.721069E-01. *t-values based on HC2 corrected SEs. *CONSTANT 8.673291E+01. *AGE_DWEL 9.178915E+00. *BATHROOM 8.180314E+00. *BEDROOMS 2.103402E+01. *Prob(t < tc) based on HC2 SEs. *CONSTANT 0.000000E+00. *AGE_DWEL 0.000000E+00. *BATHROOM 1.998401E-15. *BEDROOMS 0.000000E+00. *For HC3, you need to make sure that your sample is not too large otherwise the computer may crash. *You can temporarily draw a random sub-sample by using the TEMPORARY. *SAMPLE p. where p is the proportion of the sample (e.g. if p = 0.5, you have selected 40% of your sample for the following operations). *HC3. /*when Computing HC3 make sure n is < 250 (e.g. use TEMPORARY. SAMPLE 0.4.) */. TEMPORARY. SAMPLE 0.4. MATRIX. GET Y / VARIABLES = flarea_l. GET X / VARIABLES = CONSTANT, age_dwel, bathroom, bedrooms / NAMES = XTITLES. GET RESIDUAL / VARIABLES = RES_1. GET ESQ / VARIABLES = ESQ. COMPUTE XRTITLES = TRANSPOS(XTITLES). COMPUTE N = NROW(ESQ). COMPUTE K = NCOL(X). COMPUTE O = MDIAG(ESQ). COMPUTE XX = TRANSPOS(X) * X. COMPUTE XX_1 = INV(XX). COMPUTE X_1 = TRANSPOS(X). COMPUTE H = X*XX_1*X_1. COMPUTE H_MONE = h * -1. COMPUTE ONE_H = H_MONE + 1. /*Computing HC3*/. COMPUTE ONE_H_SQ = ONE_H &** 2. COMPUTE O_HC3 = O &/ ONE_H_SQ. COMPUTE HC3_a = XX_1 * X_1 *O_HC3. COMPUTE HC3 = HC3_a * X*XX_1. COMPUTE HC3DIAG = DIAG(HC3). COMPUTE HC3_SE = SQRT(HC3DIAG). COMPUTE B = XX_1 * X_1 * Y. PRINT B / FORMAT = "E13" /TITLE = "OLS Coefficients". PRINT HC3_SE / FORMAT = "E13" / TITLE = "HC3 Small Sample Corrected Standard Errors" / RNAMES = XRTITLES. COMPUTE HC3_TVAL = B / HC3_SE. PRINT HC3_TVAL / FORMAT = "E13" / TITLE = "t-values based on HC3 corrected SEs" / RNAMES = XRTITLES. COMPUTE SIG_HC3T = 2*(1- TCDF(ABS(HC3_TVAL), N)) . PRINT SIG_HC3T / FORMAT = "E13" / TITLE = "Prob(t < tc) based on HC3 SEs" / RNAMES = XRTITLES. END MATRIX. *Sample output from the above syntax is as follows:. *OLS Coefficients. * 3.530325E+00. * 1.546620E-03. * 2.213146E-01. * 2.745376E-01. *HC3 Small Sample Corrected Standard Errors. *CONSTANT 4.518059E-02. *AGE_DWEL 1.884062E-04. *BATHROOM 3.106637E-02. *BEDROOMS 1.489705E-02. *t-values based on HC3 corrected SEs. *CONSTANT 7.813809E+01. *AGE_DWEL 8.208966E+00. *BATHROOM 7.123928E+00. *BEDROOMS 1.842899E+01. *Prob(t < tc) based on HC3 SEs. *CONSTANT 0.000000E+00. *AGE_DWEL 2.220446E-15. *BATHROOM 4.005019E-12. *BEDROOMS 0.000000E+00. *References:. *H. White. 1980. "A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedasticity." Econometrica, 48, 817-838. *MacKinnon, J.G. and H. White. (1985), 'Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties'. Journal of Econometrics, 29, 53-57. *Long, J. S. and Laurie H. Ervin (1999) "Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model", Mimeo, Indiana University. http://www.indiana.edu/~jsl650/files/hccm/99TAS.pdf. ```