Many tests of inequality v5
The Lorenz curve is produced. The various indexes are plotted on the same graph when there is data for mor than on year. There are 9 examples about how to use the syntax at the end can be distributed freely. Raynald Levesque*/. * Formulas used in this SPSS syntax file come from Goetz Kluge's web page (as of Feb 17,2001). * The URL is . * Refer to these Web pages for information and comments on the various inequality measures such as * usage of each measure, ranges of input and output values, significance of the measures, references. * About Lorenz curve. * Lorenz curve plots the cumulative share of income against the cumulative population share. * The diagonal represents perfect income equality. The further away it is from the diagonal, the more unequal the society. * Powerful theorems associated with Lorenz dominance have made this curve something of a symbol of inequality measurement and analysis. * See * Recommended usage. * 1. Save the 2 macros in a separate sps file (say inequality.sps). * 2. Once per session, define the macro by using the INCLUDE command. * 3. Save the chart template inequality.sct in folder of your choice, change the path in inequality.sps accordingly. * 4. The syntax assumes that the folder "c:\\temp\\" exists. * 4. see examples of usage at the end of this syntax. * Notes. * 1. Cases with missing salary data are excluded. * 2. This syntax file works with data files for different years, the syntax calculates the * coefficients for each file (each year) then produces a graph of all the coefficients. This allows to see how * the coefficients change over time. SET MPRINT=yes /PRINTBACK=listing /RESULTS=listing. *////////////////////////////////////////////. DEFINE !inequal (sal =!TOKENS(1) /data =!DEFAULT('i') !TOKENS(1) /ntiles =!DEFAULT('0') !TOKENS(1) /yr =!TOKENS(1)) * data=i means individual data are supplied * data=g means grouped data are supplied (number of persons and total earnings are given for each group). * data=w means weighted data are supplied (the weight variable is assumed to be called "a") * ntiles=0 means group data using sal. * ntiles>0 means group data using ntiles of sal. COMPUTE dummy=1. !IF (!data !NE 'g') !THEN * Weight data if needed. !IF (!data !EQ 'w') !THEN WEIGHT BY a. !IFEND * delete cases with missing !sal. SELECT IF ~MISSING(!sal). !IF (!ntiles !NE '0') !THEN * Need to group data by ntiles. !LET !brkvar=nsal RANK VARIABLES=!sal (A) /NTILES (!ntiles) INTO !brkvar /PRINT=YES /TIES=MEAN . !ELSE * need to group by sal. !LET !brkvar=!sal !IFEND * Find the totals by group. AGGREGATE /OUTFILE='C:\\temp\\AGGR as needed.SAV' /BREAK=!brkvar /ai = N(!sal) /ei = SUM(!sal) /dummy=FIRST(dummy). * Find the grand totals. AGGREGATE /OUTFILE=* /BREAK=dummy /atot = N(!sal) /etot = SUM(!sal). * add the grand totals to the file containing group totals. MATCH FILES /TABLE=* /FILE='C:\\Temp\\AGGR as needed.SAV' /BY dummy. SORT CASES BY !brkvar(A). !IFEND !IF (!data !EQ 'g') !THEN * data is already grouped, find grand totals and add this info to the file. AGGREGATE /OUTFILE='C:\\Temp\\AGGR as needed.SAV' /BREAK=dummy /atot = SUM(ai) /etot = SUM(!sal). MATCH FILES /FILE=* /TABLE='C:\\Temp\\AGGR as needed.SAV' /BY dummy. !IFEND ***********************. * * Compute the coefficients or redundancy. * ***********************. * Compute the ATKINSON index = DEMAND coefficient. COMPUTE demand1=ei*LN(ai/ei)/etot. CREATE demand2=CSUM(demand1). COMPUTE zdemand=1-EXP(demand2)*etot/atot. * Compute the THEIL redundancy. COMPUTE rtheil=-LN(1-zdemand). * Compute the RESERVE coefficient. COMPUTE reserv1=ai*LN(ei/ai)/atot. CREATE reserv2=CSUM(reserv1). COMPUTE zreserve=1-EXP(reserv2)*atot/etot. * Compute the D&R coefficient. COMPUTE zd_and_r=1-SQRT((1-zdemand)*(1-zreserve)). * Compute the KULLBACK-LIEBLER redundancy. COMPUTE rkull_li=-LN(1-zd_and_r). * Compute the HOOVER coefficient. COMPUTE hoover1=ABS(ei/etot-ai/atot). CREATE hoover2=CSUM(hoover1). COMPUTE zhoover=hoover2/2. * Compute the COULTER coefficient. COMPUTE coulte1=(ei/etot - ai/atot)**2. CREATE coulte2=CSUM(coulte1). COMPUTE zcoulter=SQRT(coulte2/2). * Compute the GINI coefficient. CREATE csai csei=CSUM(ai ei). COMPUTE gini1=(2*csei-ei)*ai/(etot*atot). CREATE gini2=CSUM(gini1). COMPUTE zgini=1-gini2. * Print values of the inequality measures. MATCH FILES FILE=* /BY dummy /LAST=last. TEMPORARY. SELECT IF last. !LET !title=!QUOTE(!CONCAT('Summary of Coefficients and redundancies',!UNQUOTE(!yr))) SUMMARIZE /TABLES=zdemand rtheil zreserve zd_and_r rkull_li zhoover zcoulter zgini /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE=!title /MISSING=VARIABLE /CELLS=NONE. * Print information on the groups. COMPUTE pccsei=csei/etot*100. COMPUTE pccsai=csai/atot*100. COMPUTE refline=pccsai. COMPUTE avg=ei/ai. FORMATS avg ei(COMMA12.0) pccsai pccsei (PCT6.1). !LET !title=!QUOTE(!CONCAT('Information on the groups', !UNQUOTE(!yr))) SUMMARIZE /TABLES=ai ei avg pccsai pccsei /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE=!title /MISSING=VARIABLE /CELLS=NONE. SAVE OUTFILE='c:\\temp\\temp2.sav'. * Create a dummy case with zero values for the following 3 variables. NEW FILE. INPUT PROGRAM. COMPUTE pccsai=0. COMPUTE pccsei=0. COMPUTE refline=0. END CASE. END FILE. END INPUT PROGRAM. LIST. * Add the dummy case to the data file to force Lorenz curve to start at (0,0). ADD FILES /FILE=* /FILE='C:\\temp\\temp2.sav'. FORMATS pccsai pccsei refline (PCT6.0). !LET !title=!QUOTE(!CONCAT('Lorenz curve ',!UNQUOTE(!yr))) GRAPH /TITLE=!title /TEMPLATE='c:\\Program Files\\SPSS\\syntax\\inequality\\inequality.sct' /SCATTERPLOT(OVERLAY)= pccsai pccsai WITH pccsei refline (PAIR) /MISSING=LISTWISE . EXECUTE. !ENDDEFINE. *////////////////////////////////////////////. * ##### EXAMPLE 1. * Allocate data to 20 groups. GET FILE='C:\\Program Files\\SPSS\\University of Florida graduate salaries.sav'. !inequal sal=salary ntiles=20. * ##### EXAMPLE 2. * Each salary level corresponds to a group. GET FILE='C:\\Program Files\\SPSS\\University of Florida graduate salaries.sav'. !inequal sal=salary. * ##### EXAMPLE 3. * Using weighted data. * in this example there are 20 persons earning 20 each, 10 earning 30 etc. DATA LIST LIST /a e. BEGIN DATA 20 20 10 30 5 40 5 50 END DATA. !inequal sal=e data=w. * ##### EXAMPLE 4. * Using data already grouped (this is same data as above but presented differently. * The 20 poorest persons receive 400; the 5 richest persons receive 250. DATA LIST LIST /ai ei. * The number of cases and sum of earnings must be named ai and ei. BEGIN DATA 20 400 10 300 5 200 5 250 END DATA. !inequal sal=ei data=g. * The above data come from and the coefficients produced * here in example #4 equal those shown on that site. *######################################. * Following portion of the syntax handles data files containing data for different years. * The program first calculates the coefficients for each file using the macro defined above * It then graphs the evolution of each coefficient (eg the GINI coefficients) over the years. *######################################. *///////////////////////////////////. DEFINE !manyyrs (sal=!TOKENS(1) /data= !DEFAULT('i') !TOKENS(1) /ntiles= !DEFAULT('0') !TOKENS(1) /fname=!TOKENS(1) /nbyrs=!TOKENS(1)) * the first 3 parameters are those needed to call the inequal macro define above. * the fname parameter is the path and alphabetical portion of the file names (eg "c:\\mydata"). * the nbyrs parameter is the number of different data files, for instance if nbyrs=3 * and fname=c:\\mydata then the 3 file names are mydata1.sav mydata2.sav and mydata3.sav. !DO !cnt=1 !TO !nbyrs GET FILE=!QUOTE(!CONCAT(!UNQUOTE(!fname),!cnt,'.sav')). !LET !thisyr=!QUOTE(!CONCAT(', Year=',!cnt)) !inequal sal=!sal data=!data ntiles=!ntiles yr=!thisyr. SELECT IF last. COMPUTE yr=!cnt. FORMATS yr(F8.0). !IF (!cnt=1) !THEN SAVE OUTFILE=!QUOTE(!CONCAT(!UNQUOTE(!fname)," summary.sav")). !ELSE ADD FILES FILE=* /FILE=!QUOTE(!CONCAT(!UNQUOTE(!fname)," summary.sav")). SAVE OUTFILE=!QUOTE(!CONCAT(!UNQUOTE(!fname)," summary.sav")). !IFEND !DOEND EXECUTE. GET FILE=!QUOTE(!CONCAT(!UNQUOTE(!fname)," summary.sav")). SORT CASES BY yr. SUMMARIZE /TABLES=yr zdemand rtheil zreserve zd_and_r rkull_li zhoover zcoulter zgini /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Summary of Coefficients and redundancies (by year)' /MISSING=VARIABLE /CELLS=NONE. GRAPH /TITLE="First 4 coefficients" /LINE(MULTIPLE)= VALUE( zdemand rtheil zreserve zd_and_r ) BY yr. GRAPH /TITLE="Last 4 coefficients" /LINE(MULTIPLE)= VALUE( rkull_li zhoover zcoulter zgini ) BY yr. GRAPH /TITLE="All 8 coefficients" /LINE(MULTIPLE)= VALUE(zdemand rtheil zreserve zd_and_r rkull_li zhoover zcoulter zgini ) BY yr. !ENDDEFINE. *///////////////////////////////////. ****************************. * Now define a macro to create 12 dummy weighted data files to test the manyyrs macro. ****************************. *///////////////////////////. DEFINE !dummy(). !DO !yr=1 !TO 12. INPUT PROGRAM. LOOP linenb=1 TO 10. COMPUTE a=15+TRUNC(UNIFORM(15)+1). * the dummy files are constructed in such a way that distribution of earnings becomes less equal as time passes. COMPUTE e=20+!yr*linenb*2+UNIFORM(3). END CASE. END LOOP. END FILE. END INPUT PROGRAM. LIST. SAVE OUTFILE=!QUOTE(!CONCAT("c:\\temp\\testfile",!yr,".sav")). !DOEND. !ENDDEFINE. *///////////////////////////. * ##### EXAMPLE 5. * next line defines the 12 dummy data files. !dummy. * Run macro on all 12 files and graph the 8 coefficients over the 12 year period. !manyyrs sal=e data=w fname="c:\\temp\\testfile" nbyrs=12. * ##### EXAMPLE 6. * this uses the file SAMPLH.SAV from LIS. * weight and income come directly from the data file. GET FILE='D:\\data\\aa\\Luxemb\\samplh.sav' /KEEP=d5 hweight casenum dpi. SELECT IF d5 NE 2. COMPUTE a=hweight. !inequal sal=dpi data=w ntiles=100. * ##### EXAMPLE 7. * this uses the file SAMPLH.SAV from LIS. * weight and income are derived from the data file. GET FILE='D:\\data\\aa\\Luxemb\\samplh.sav' /KEEP=d4 d5 d27 hweight casenum dpi. SELECT IF d5 NE 2. COMPUTE a=hweight*d4. COMPUTE oecdeq=(1+(.5*d27)+(.7*(d4-d27-1)))/2.2. COMPUTE y=dpi/oecdeq. !inequal sal=y data=w ntiles=100. * ##### EXAMPLE 8. * Preparatory work for this example (Run analysis on FI91H and FI95H). * Create 2 dummy files and pretend that they are Finland data at two different years. GET FILE='D:\\data\\aa\\Luxemb\\samplh.sav' /KEEP=d4 d5 d27 hweight casenum dpi. SAVE OUTFILE='c:\\temp\\FI91h.sav'. * Taking square root of dpi reduces inequality. COMPUTE dpi=SQRT(dpi). SAVE OUTFILE='c:\\temp\\FI95h.sav'. * Start the example (this simulates a job sent to LIS). * (The standard first 5 lines of syntax are not included here). * next command must be replaced by an ADD FILES command when run by LIS. GET FILE='c:\\temp\\FI91h.sav' /KEEP=d4 d5 d27 hweight casenum dpi. SELECT IF d5 NE 2. COMPUTE income=dpi. COMPUTE a=hweight. SAVE OUTFILE='c:\\temp\\FI1.sav'. SELECT IF $casenum=0. *The path must be changed when run at LIS. ADD FILES FILE='c:\\temp\\FI95h.sav' /KEEP=d4 d5 d27 hweight casenum dpi. SELECT IF d5 NE 2. COMPUTE income=dpi. COMPUTE a=hweight. SAVE OUTFILE='c:\\temp\\FI2.sav'. * Analyse the 2 years of Finland data. !manyyrs sal=income ntiles=100 data=w fname="c:\\temp\\FI" nbyrs=2. * ##### EXAMPLE 9. * the same approach as Example 8 can be used to compare many countries. * For example the files CO1.sav could be FI95 CO2.sav could be US94 CO3.sav could be GE94 CO4.sav could be SW95. * The following line would analyse and compare the 4 countries. !manyyrs sal=income ntiles=100 data=w fname="c:\\temp\\CO" nbyrs=4. |
