* --------------------------------------------------------------------------- 
File:		double entry check.sps
Author:   	Bruce Weaver
Date:		19-Dec-2001
Notes:	Use MATCH FILES to check for double entry .
* --------------------------------------------------------------------------- .

* First create a file with data for ID numbers 6-12 and save it.

DATA LIST LIST /ID (f2.0) a (f2.0) b(f2.0) c (f2.0) Y (f2.0) .
BEGIN DATA.
6 2 1 1 13
6 2 1 2 12
6 2 2 1  8
6 2 2 2 11
6 2 3 1  9
6 2 3 2  8
7 3 1 1 16
7 3 1 2 17
7 3 2 1 12
7 3 2 2 14
7 3 3 1 11
7 3 3 2 12
8 3 1 1 16
8 3 1 2 13
8 3 2 1 10
8 3 2 2 15
8 3 3 1 12
8 3 3 2 16
9 3 1 1 18
9 3 1 2 21
9 3 2 1 17
9 3 2 2 18
9 3 3 1 22
9 3 3 2 23
10 4 1 1 19
10 4 1 2 17
10 4 2 1 20
10 4 2 2 17
10 4 3 1 14
10 4 3 2 16
11 4 1 1 10
11 4 1 2 11
11 4 2 1 14
11 4 2 2 12
11 4 3 1 13
11 4 3 2 16
12 4 1 1 21
12 4 1 2 18
12 4 2 1 22
12 4 2 2 12
12 4 3 1 15
12 4 3 2 17
END DATA.

compute fnum=2.
exe.
formats fnum (f2.0).
var lab fnum 'File number'.
save outfile = 'c:\\file2.sav' /compressed.


* Now create another file with data for ID numbers 1-7 .
* Note that both files will have data for IDs 6 and 7 . 

DATA LIST LIST /ID (f2.0) a (f2.0) b(f2.0) c (f2.0) Y (f2.0) .
BEGIN DATA.
1 1 1 1 14
1 1 1 2 12
1 1 2 1 17
1 1 2 2 20
1 1 3 1 12
1 1 3 2 10
2 1 1 1 11
2 1 1 2 13
2 1 2 1 12
2 1 2 2 16
2 1 3 1 11
2 1 3 2 13
3 1 1 1 12
3 1 1 2 13
3 1 2 1 15
3 1 2 2 18
3 1 3 1 15
3 1 3 2 13
4 2 1 1 11
4 2 1 2 16
4 2 2 1 13
4 2 2 2 14
4 2 3 1 15
4 2 3 2 17
5 2 1 1  9
5 2 1 2 11
5 2 2 1 13
5 2 2 2 14
5 2 3 1 12
5 2 3 2  8
6 2 1 1 13
6 2 1 2 12
6 2 2 1  8
6 2 2 2 11
6 2 3 1  9
6 2 3 2  8
7 3 1 1 16
7 3 1 2 17
7 3 2 1 12
7 3 2 2 14
7 3 3 1 11
7 3 3 2 12
END DATA.

compute fnum=1.
exe.
formats fnum (f2.0).
var lab fnum 'File number'.

* Stack the two files .

ADD FILES /FILE=*
 /FILE='C:\\file2.sav'.
EXECUTE.

* Check for duplicate records by using MATCH FILES .
* First need to sort by ID etc.

sort cases by id a b c y .

MATCH FILES file=* 
	/BY id a b c y  / FIRST = unique .
EXEC .

freq unique.

* There are 12 records that are not unique.
* Data for IDs 6 and 7 were in both files, so list these.

use all.
compute f = any(id,6,7).
filter by f.
exe.
list id to unique.

* Delete the duplicates.

use all.
filter off.
select if unique.
exe.

freq unique.

* Now all records are unique.

* --------------------------------------------------------------------------- .