Match 2 files using between-dates criteria
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | *Match 2 files using "between-dates" criteria.SPS * QUESTION. I have to match two files on a particular key. This key (patient_id) in file 1 has one or more occurrences. I have to match this key with patient_ids in file 2. In file 2 these patient_ids are also not unique. But another key named admission_id, is unique. So 1 patient can be admitted more than once to the hospital (more unique admission_ids), but the patient_id remains the same. This is a problem when you want to match these two files. However, there is extra information available, namely date-information. In file 1: date-information about when a test has taken place is available. In file 2: a patient is admitted at date A and dismissed at date B. We know that the date-value of file 1 must be between date A and date B. My question now is: in what way/ how can I define these matching conditions? * ANSWER posted to SPSS newsgroup by Raynald Levesque on 2004/07/02. In the following data files, the last record of file1 and the 3 record of file2 do not have a match. SET MPRINT=no. DATA LIST LIST /id dtTest otherVar. BEGIN DATA 1 20030115 2 1 20040215 5 2 20030705 7 2 20031022 5 2 20030101 9 END DATA. * Sort and number records of each id. SORT CASES BY id dtTest. COMPUTE idx=SUM(LAG(idx)*(LAG(id)=id),1). SAVE OUTFILE='c:\\temp\\file1.sav'. DATA LIST LIST /id admin_id dtIn dtOut. BEGIN DATA 1 10 20030101 20030201 1 15 20040115 20040315 1 17 20031010 20031210 2 22 20030701 20030710 2 32 20031001 20031030 END DATA. SAVE OUTFILE='c:\\temp\\file2.sav'. * Get all test dates for each id. GET FILE='c:\\temp\\file1.sav'. VECTOR dt(3). COMPUTE dt(idx)=dtTest. AGGREGATE OUTFILE=* /PRESORTED /BREAK=id /dt1 TO dt3=FIRST(dt1 TO dt3). MATCH FILES FILE='c:\\temp\\file2.sav' /TABLE=* /BY=id. VECTOR dt=dt1 TO dt3. LOOP #idx=1 TO 3. DO IF RANGE(dt(#idx),dtIn,dtOut). - COMPUTE idx=#idx. - BREAK. END IF. END LOOP. * Drop unneeded variable. ADD FILES FILE=* /DROP=dt1 TO dt3. SORT CASES BY id idx. MATCH FILES FILE=* /FILE='c:\\temp\\file1.sav' /BY=id idx. LIST. id admin_id dtIn dtOut idx dtTest otherVar 1.00 17.00 20031010 20031210 . . . 1.00 10.00 20030101 20030201 1.00 20030115 2.00 1.00 15.00 20040115 20040315 2.00 20040215 5.00 2.00 . . . 1.00 20030101 9.00 2.00 22.00 20030701 20030710 2.00 20030705 7.00 2.00 32.00 20031001 20031030 3.00 20031022 5.00 |
Related pages
...