Deduplicate cases while keeping all the information
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | *(Q) Hi All, I have a file that looks like this: ID LastName FirstName Gender 123 Google A F 123 Google F 123 F 124 Robin Justin M 124 Robin Justin So I have duplicate records while some of the records have missing values. I wondering how I can deduplicate and keep the record with the most information, the ending results would be like this: ID LastName FirstName Gender 123 Google A F 124 Robin Justin M *(A) Posted to SPSSX-L on 2002/01/16 by rlevesque@videotron.ca. * Visit my SPSS pages http://pages.infinit.net/rlevesqu/index.htm *** NB Always work with a COPY of your original data file. DATA LIST FIXED /id 1-3(F) lastname 4-9(A) firstnam 10-15(A) gender 16-16 (A). BEGIN DATA 123GoogleA F 123Google F 123 F 124Robin JustinM 124Robin Justin 333ExciteJohn 333 John M END DATA. LIST. SORT CASES BY id. DO IF $CASENUM>1. PRESERVE. SET ERRORS=no. DO REPEAT varname=ALL. * One of the next 2 lines will produce an error BUT the other line will give us what we need!. IF (id=LAG(id) and RTRIM(varname)="") varname=LAG(varname). IF (id=LAG(id) and MISSING(varname)) varname=LAG(varname). END REPEAT PRINT. END IF. RESTORE. * After the above syntax, the last case of each id contains all the information available from the other cases. MATCH FILES FILE=* /BY id /LAST=last. SELECT IF last. LIST. |
Related pages
...