Generating multivariate normal variables with a specific covariance matrix

Solution ID:	 	100000316	
Question Subtype:	 	Statistical Distributions	
Title:
Generating MVN data with a specific covariance matrix 
Description:
Q. 
How can I generate data which are multivariate normal and 
have a covariance or correlation matrix that I specify? 
A. 
In SPSS 4.0 or above on non-DOS platforms, this job can be performed 
with the matrix language of SPSS. Users of SPSS/PC+ or SPSSX can do 
part of the job but will have to perform some matrix algebra outside 
of SPSS. The SPSS 4.0 approach is described first, followed by a 
description of the SPSSX approach. The general algorithm is described 
in Rubinstein, R. (1981), "Simulation and the Monte Carlo Method", 
Wiley. 
I SPSS 4.0 Method 

The basic strategy is: 
1. Before entering matrix, generate the required number of normally- 
distributed variables by using compute commands. These variables should 
be multivariate normal (MVN) with correlations approaching 0, 
i.e. approaching independence. 

2. To compute a set of variables which are multivariate normal and 
strictly independent, or orthogonal, run the FACTOR procedure 
with the variables created in Step 1. Use the default extraction 
method of principal components, extracting as many components 
as you included in the analysis. Save these components to a 
new set of variables with the FACTOR /SAVE subcommand. These 
new variables will be MVN and orthogonal with means of 0 and 
variances of 1. 

3. Enter the matrix language and read the set of standard normal variables 
from Step 2 as a matrix, Z for example. Then define and enter the 
target covariance matrix, S. 

4. Calculate the Cholesky factor for the target covariance matrix. 
The Cholesky factor is an upper triangular matrix which is the 
"square root" of the covariance matrix. If V is the Cholesky 
factor of matrix S, and VT is the transpose of V, then 
VT*V=S , where * is the matrix multiplication operator. 

5. Post-multiply the independent standard normals, Z, by the Cholesky 
factor, V, to give a new data matrix, X. If the target matrix was 
a correlation matrix and you want the new variables to have a 
standard deviation other than 1, then multiply X by the desired SD. 
If you want the new variables to have a nonzero mean, add the desired 
mean to X (but not before multiplication by SD). 
If the target covariance matrix is not a correlation matrix, the 
new variables will have variances equal to their respective 
diagonal elements in the target matrix. Their means will be 0. 
The desired mean can be added to the variables with a compute 
statement, either before or after leaving matrix. 

6. Save the X matrix to variables in the SPSS active file and leave 
matrix. 
An example of the implementation of this algorithm is shown below. The 
target matrix is a correlation matrix for the five variables to be 
generated. I added some statements to print the determinant and the 
condition number of the input covariance matrix to insure that it was 
not singular or ill-conditioned. I also printed the product of VT*V to 
insure that the target matrix was reproduced. All the new variables 
were scaled to have means of 100 and standard deviations of 15 before 
leaving matrix. 
input program. 
+ loop #i = 1 to 10000. 
+ do repeat response=r1 to r5 . 
+ compute response = normal(1) . 
+ end repeat. 
+ end case. 
+ end loop. 
+ end file. 
end input program. 
correlations r1 to r5 / statistics=descriptives. 
* Factor procedure computes pr1 to pr5, which are standard MVN . 
factor variables = r1 to r5 / print = default det 
/criteria = factors(5) /save=reg (all,pr). 
* use matrix to set corr matrix. 
* x is a 10,000 by 5 matrix of independent standard normals . 
* cor is the target covariance matrix. 
* cho is the Cholesky factor of cor . 
* newx is the 10,000 by 5 data matrix which has target covariance matrix . 
matrix. 
get x / variables=pr1 to pr5. 
compute cor={1, 0.4, 0.3, 0.2, 0.1 ; 
0.4, 1, 0.4, 0.3, 0.2 ; 
0.3, 0.4, 1, 0.4, 0.3 ; 
0.2, 0.3, 0.4, 1, 0.4 ; 
0.1, 0.2, 0.3, 0.4, 1 }. 
compute deter=det(cor). 
print deter / title "determinant of corr matrix" / format=f10.7 . 
print sval(cor) / title "singular value decomposition of corr". 
print eval(cor) / title "eigenvalues of input corr". 
* In a symmetric matrix sval and eigenvalues are identical - choose 1 . 
compute condnum=mmax(sval(cor))/mmin(sval(cor)). 
print condnum / title "condition number of corr matrix" / format=f10.2 . 
compute cho=chol(cor). 
print cho / title "cholesky factor of corr matrix" . 
compute chochek=t(cho)*cho. 
print chochek / title "chol factor premult by its transpose " /format=f10.2 . 
compute newx=x*cho. 
compute newx=newx*15 + 100. 
save newx /outfile=* /variables= nr1 to nr5. 
end matrix. 
correlations nr1 to nr5 / statistics=descriptives. 

II SPSSX and SPSS/PC+ Method 
If the matrix language is not available, users can apply the Cholesky 
factor matrix to the standard normal variables with a series of 
compute commands. However, the Cholesky factorization must be 
calculated outside of SPSS. The formulae for the Cholesky are 
provided in Rubinstein's book and in Bock (1975), "Multivariate 
Statistical Methods in Behavioral Research", McGraw-Hill. 
The example below uses the Cholesky factor for the target matrix in 
the previous example. The Cholesky factor matrix is printed below. 
One compute statement is required for each new variable. Note that the 
values in the columns of the Cholesky matrix act as weights for the 
original standard normal variables and these weighted variables are 
combined to form the new variables. 
Cholesky Matrix for the Target Correlation Matrix. 
1.0 0.4 0.3 0.2 0.1 
0 0.91652 0.30551 0.24004 0.17457 
0 0 0.9037 0.29508 0.23976 
0 0 0 0.90294 0.29608 
0 0 0 0 0.90243 
* SPSS-X, SPSS or SPSS/PC+ program - assume an active file exists with 
five independent random vars created by compute r1=normal(1). 
correlations r1 to r5 / statistics=descriptives. 
* This correlation matrix should be nearly an identity matrix, 1's in 
the diagonal and 0's elsewhere. 
factor variables = r1 to r5 / print = default det 
/criteria = factors(5) /save=reg (all,pr). 
correlations pr1 to pr5 / statistics=descriptives. 
* This correlation matrix should be an identity matrix. 
* use series of computes to post-mult data matrix by 
a Cholesky matrix calculated outside spss . 
compute nr1 = pr1. 
compute nr2 = 0.4*pr1 + 0.91652*pr2 . 
compute nr3 = 0.3*pr1 + 0.30551*pr2 + 0.9037*pr3. 
compute nr4 = 0.2*pr1 + 0.24004*pr2 + 0.29508*pr3 + 0.90294*pr4. 
compute nr5 = 0.1*pr1 + 0.17457*pr2 + 0.23976*pr3 + 0.29608*pr4 + 0.90243*pr5. 
correlations nr1 to nr5 / statistics=descriptives. 
* This correlation matrix should be like the target correlation matrix.
...
Navigate from here