Replace outliers by average of cases with same characteristics
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | *(Q) Received by email on 2002/05/09. I have reaction time data for manipulated visual comparisons. Of course it highly positively skewed since participants do 2 hours of responses to what takes on average 2 to 3 seconds. Seems like there are many cases where they got bored or spaced out for 20 seconds or so, or moved around in their chair. There are also several cases where their reaction time was less than 100 miliseconds (college kids are fast, but not that fast). Anyhow, let me get to the point. I want to replace outliers with the mean reaction time, BUT based on the mean reaction time for that participant as well as on two independant variables. Let me sketch an example data file.... ID sides change RT 1 4 7 750 1 4 10 1000 1 5 7 850 1 4 10 22000 2 4 7 14750 2 4 10 1000 2 5 7 50 2 4 10 900 ... ID = pariticpant ID sides = number of sides to shape change = percent difference in shape RT = latency outliers = RT > 600 & RT < 10000 Basically, find outliers and identify what participant it is, how many sides and what percent change. Replace that outlier with the mean reaction time for that participant's responses to the other times they responded to cases with that number of sides and that percent change. *(A) By Ray on 2002/05/09. DATA LIST LIST /ID sides change RT. BEGIN DATA 1 4 7 750 1 4 10 1000 1 5 7 850 1 4 10 22000 2 4 7 14750 2 4 10 1000 2 5 7 50 2 4 10 900 END DATA. LIST. SORT CASES BY id sides change. SAVE OUTFILE='c:\\temp\\original data.sav'. SELECT IF RANGE(RT,600,10000). AGGREGATE /OUTFILE='C:\\temp\\aggr.sav' /BREAK=id sides change /mean_rt = MEAN(rt). MATCH FILES /FILE='c:\\temp\\original data.sav' /TABLE='C:\\temp\\aggr.sav' /BY=id sides change. COMPUTE newrt=rt. IF ~RANGE(RT,600,10000) newrt=mean_rt. EXECUTE. * Note that 2 cases end up with missing values because there are no cases to compare them with. You could either leave these cases as missing or assign the current values to it. Or apply the overall mean to it. Many otions exist. * It would be also possible (but this is more complicated) to use a case which is 'similar'. For an example, see syntax number 12 in http://pages.infinit.net/rlevesqu/SampleSyntax.htm#RandomSampling. |
Related pages
...