1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
*Match 2 files using "between-dates" criteria.SPS

* QUESTION.
I have to match two files on a particular key. 

This key (patient_id) in file 1 has one or more occurrences. 
I have to match this key with patient_ids in file 2. 

In file 2 these patient_ids are also not unique. But another key named admission_id, is unique. 
So 1 patient can be admitted more than once to the hospital (more unique admission_ids), but the patient_id remains the same. 
This is a problem when you want to match these two files. However, there is extra information available, namely date-information.

In file 1: date-information about when a test has taken place is available.
In file 2: a patient is admitted at date A and dismissed at date B. 
We know that the date-value of file 1 must be between date A and date B.

My question now is: in what way/ how can I define these matching conditions?

* ANSWER posted to SPSS newsgroup by Raynald Levesque on 2004/07/02.
In the following data files, the last record of file1 and the 3 record of file2 do not have a match.
SET MPRINT=no.

DATA LIST LIST /id dtTest otherVar.
BEGIN DATA
1 20030115 2
1 20040215 5
2 20030705 7
2 20031022 5
2 20030101 9
END DATA.
* Sort and number records of each id.
SORT CASES BY id dtTest.
COMPUTE idx=SUM(LAG(idx)*(LAG(id)=id),1).
SAVE OUTFILE='c:\\temp\\file1.sav'.

DATA LIST LIST /id admin_id dtIn dtOut.
BEGIN DATA
1 10 20030101 20030201
1 15 20040115 20040315
1 17 20031010 20031210
2 22 20030701 20030710 
2 32 20031001 20031030
END DATA.
SAVE OUTFILE='c:\\temp\\file2.sav'.

* Get all test dates for each id.
GET FILE='c:\\temp\\file1.sav'.
VECTOR dt(3).
COMPUTE dt(idx)=dtTest.
AGGREGATE OUTFILE=*
	/PRESORTED
	/BREAK=id
	/dt1 TO dt3=FIRST(dt1 TO dt3).
MATCH FILES FILE='c:\\temp\\file2.sav'
	/TABLE=*
	/BY=id.
VECTOR dt=dt1 TO dt3.
LOOP #idx=1 TO 3.
DO IF RANGE(dt(#idx),dtIn,dtOut).
- COMPUTE idx=#idx.
- BREAK.
END IF. 
END LOOP.

* Drop unneeded variable.
ADD FILES FILE=* /DROP=dt1 TO dt3.
SORT CASES BY id idx.
MATCH FILES FILE=*
	/FILE='c:\\temp\\file1.sav'
	/BY=id idx.
LIST.

    id admin_id     dtIn    dtOut      idx   dtTest otherVar

    1.00    17.00 20031010 20031210      .        .        .
    1.00    10.00 20030101 20030201     1.00 20030115     2.00
    1.00    15.00 20040115 20040315     2.00 20040215     5.00
    2.00      .        .        .       1.00 20030101     9.00
    2.00    22.00 20030701 20030710     2.00 20030705     7.00
    2.00    32.00 20031001 20031030     3.00 20031022     5.00