Find beginning and end of continuous periods
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | **QUESTION . I have the following data that I want to put into vectors: id time1 time2 1 100 200 1 200 300 1 300 400 1 500 600 1 600 700 1 800 830 2 800 900 2 900 930 I want two vectors to include only the continuous observations for that id (ie the first 3 time1 variables in the vector, the first 3 time2 variables in another vector). I don't know the number of continuous observations, which would be the subscript for the vector. How can I get this number? I'm going to use the 2 vectors to create one record with id, time1 of the first record of the original data and time 2 of the last record of the original data so I don't need to create new vectors for the next set of continuous records. Desired output: id time1 time2 1 100 400 1 500 700 1 800 830 2 800 930 *ANSWER by rlevesque@videotron.ca. DATA LIST LIST /id time1 time2. BEGIN DATA 1 100 200 1 200 300 1 300 400 1 500 600 1 600 700 1 800 830 2 800 900 2 900 930 END DATA. LIST. * this assumes the file is already sorted. MATCH FILES FILE=* /BY id /FIRST=first /LAST=last. CREATE time1a= LEAD(time1,1). CREATE time2a= LAG(time2,1). * identify the beginning of a period. DO IF first. COMPUTE beg1=1. ELSE. COMPUTE beg1=time1<>time2a. END IF. * identify the end of a period. DO IF last. COMPUTE end1=1. ELSE. COMPUTE end1=time2<>time1a. END IF. * Get the end values. DO IF end1. COMPUTE val2=time2. END IF. * Copy the end values to the line of the beginning values. COMPUTE casenb=$CASENUM. SORT CASES BY casenb(D). IF MISSING(val2) val2=LAG(val2). SORT CASES BY casenb. * Keep only the lines we want. EXECUTE. SELECT IF beg1=1. MATCH FILES FILE=* /KEEP=id time1 val2. RENAME VARIABLES (val2=time2). FORMATS id time1 time2 (F8.0). EXECUTE. |
Related pages
...