1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
*(Q) Hi All, I have a file that looks like this:

ID               LastName      FirstName      Gender
123              Google              A                        F
123              Google                                         F
123                                                                      F
124              Robin               Justin                 M
124              Robin               Justin

So I have duplicate records while some of the records have missing values.
I wondering how I can deduplicate and keep the record with the most
information, the ending results would be like this:

ID               LastName               FirstName          Gender
123                 Google                     A
F
124                 Robin                        Justin
M

*(A) Posted to SPSSX-L on 2002/01/16 by rlevesque@videotron.ca.
* Visit my SPSS pages http://pages.infinit.net/rlevesqu/index.htm

*** NB Always work with a COPY of your original data file.
DATA LIST FIXED /id 1-3(F) lastname 4-9(A) firstnam 10-15(A) gender 16-16 (A).
BEGIN DATA
123GoogleA	    F
123Google      F
123            F
124Robin JustinM
124Robin Justin
333ExciteJohn
333 	    John  M 
END DATA.
LIST.

SORT CASES BY id.
DO IF $CASENUM>1.
PRESERVE.
SET ERRORS=no.
DO REPEAT varname=ALL.
* One of the next 2 lines will produce an error BUT the other
	line will give us what we need!.
IF (id=LAG(id) and RTRIM(varname)="") varname=LAG(varname).
IF (id=LAG(id) and MISSING(varname)) varname=LAG(varname).
END REPEAT PRINT.
END IF.
RESTORE.

* After the above syntax, the last case of each id contains all the 
	information available from the other cases.

MATCH FILES FILE=* /BY id /LAST=last.
SELECT IF last.
LIST.