When to use EXECUTE
Understanding the following Q and A might save yourself a lot of problems. This was posted to the SPSSX-L list on 1999/02/04 by David Matheson, SPSS Technical Support
Question
I run a series of transformations from syntax in SPSS and am puzzled to find that obtaining the correct results may require an insertion of anEXECUTE
command among the transformations. The following commands are one example.1 2 3 4 5 | DATA LIST FILE '/tmp/ret.dat' FIXED /da 1-15 w 19-20 . COMPUTE RETURN = da-LAG(da). COMPUTE sv =(w<=1 or w>=5). SELECT IF (sv=0). LIST. |
The results for the variable RETURN were incorrect for some cases. Correct results were obtained if an EXECUTE
command was placed somewhere between lines 2 (Compute RETURN...
) and 4 (SELECT IF...
). What are the rules, in this case and more generally, that dictate when an EXECUTE
command should be placed between transformation commands?
Answer
The key here was to runEXECUTE
before the SELECT IF
. (In this particular example, placing the EXECUTE
between the 2 COMPUTES
would also work.) Otherwise, when you compute RETURN as DA - LAG(DA)
for a given case, the case that originally preceded the current case may have already been dropped from the active file and LAG(DA)
may capture the value of DA for an unintended case.To further illustrate the use of EXECUTE
among transformations, consider any 3 sequential cases with ID values of 1, 2, and 3. Suppose you want to keep or drop case 2 depending on the result of a comparison with case 1. Likewise, you wish to compare case 3 to case 2 and keep or drop case 3 as a result. Suppose also that case 2 fails the comparison test but case 3 would pass it, i.e., its relation to case 2 is such that you would want to keep case 3. Without an EXECUTE
(or other command that forces a data pass) before the SELECT IF
, case 2 is evaluated and dropped from the active file before case 3 is evaluated. Therefore, case 3 is compared to case 1, rather than case 2, and may be kept or dropped in error. Placing the EXECUTE
before the SELECT IF
results in all cases being present when the LAG
function is being used. One can envision data selection tasks where each case is compared to the last case that passed a similar comparison - there you might leave out the EXECUTE
to achieve that strategy.
A similar situation arise when cases are being selected by original case number in the data set. Suppose you wanted to select every fifth case and used the following syntax:
1 2 3 4 | compute seq = $casenum. select if (mod(seq,5) = 0). frequencies x. * the mod function returns the remainder when the first argument is divided by the 2nd. |
You would have no cases remaining in your frequency report. The first case would be given a value of 1 for seq, since it's $casenum
would be 1. (mod(seq,5)=0)
would therefore be false and the case would be deleted. The case that was 2nd would now become the first, so that $casenum = 1
, so seq = 1
and case would be deleted. This would eventually happen to the case that was originally the 5th case, as well as the 10th, etc. The following syntax would work.
1 2 3 4 | compute seq = $casenum. execute. select if (mod(seq,5) = 0). frequencies x. |
Adding the execute
before the select allows seq to be calculated correctly before any cases are deleted.
If you have a series of transformation commands (COMPUTE, IF
, etc.) followed by a MISSING VALUES
command that involves the same variables, you will often want to place an EXECUTE
statement before the MISSING VALUES
command. This is because the MISSING VALUES
command changes the dictionary before the transformations take place. For example, consider:
1 2 | IF (x = 0) y = z*2. MISSING VALUES x (0). |
The cases where x=0
would be considered user-missing on x and the transformation of y would not occur. Placing an EXECUTE
before the MISSING VALUES
allows the transformation to occur before 0 is assigned missing status.
An EXECUTE
command is often necessary after you run the WRITE
command to save the data to an ASCII file, or after you use XSAVE
, rather than SAVE
, to save data to an .sav file. WRITE
and XSAVE
are treated like transformations. If your program ends with a write
or xsave
, with no procedure to force a data pass, the file to which you had tried to write would be empty. If the WRITE
or XSAVE
was part of a LOOP
or DO IF
structure, the EXECUTE
command would not be placed within that structure.
Also, if a statistical procedure followed the WRITE
or XSAVE
commands, then the new file would be written.
Finally, if you have such an extensive sequence of transformations that you get an insufficient memory message when SPSS tries to process them, you could intersperse EXECUTE
commands among the transformations to occasionally force a pass of the data and free up memory for the next set of transformations. Don't place any of these EXECUTE
commands within transformation structures such as LOOP..END LOOP
, DO IF..END IF
, or DO REPEAT..END REPEAT
. Also, don't place EXECUTE
commands between commands that define scratch variables and subsequent commands that reference those scratch variables. If you followed an extensive set of transformation commands with a memory-intensive command such as CLUSTER
or MANOVA
, you might place an EXECUTE
command before that statistical procedure. Although the procedure alone would force the data pass that executed the transformations, placing the EXECUTE
command before the procedure would free memory that was needed for the transformations.
This is just a sample of cases where EXECUTE
commands should be placed among or after transformations. Placing an EXECUTE
after every compute would almost always be inefficient at best, and unworkable at worst (e.g. in a series of transformations in a LOOP
or DO IF
structure).
...