1. Bootstrap and random numbers
  2. Block Designs
  3. Batch files
  4. Area under the curve (AUC)
  5. Charts and Tables
  6. Labels, Variable Names and Format
  7. Cluster Analysis
  8. Combinations, Permutations, Interactions
  9. Compute
  10. Unclassified
  11. Factor Analysis
  12. IGRAPH
  13. Concatenate/modify string variables
  14. Parse or Flag data
  15. Data Editor
  16. Data validation
  17. Dates and time
  18. Distributions
  19. Export & Import
  20. Flag or Select Cases
  21. Item Analysis
  22. Matching data files
  23. Matrix
  24. Multiple responses
  25. OMS (Output Management System)
  26. Outliers
  27. Random Sampling
  28. Read, Write or Create Data
  29. Regression, Repeated Measures
  30. Test if file or variable exists
  31. Self Adjusing Code
  32. Ranking, largest values, sorting, grouping
  33. Tests of Inequality
  34. Working with Many Files
  35. Meta Analysis
  36. Transform variable
  37. RFM Analysis
  38. Remove Characters, Duplicates or Variables
  39. Strings
  40. Working with missing values
  41. ROC curves
  42. Restructure File
  43. Sample Size and Power
  44. Survival Analysis
  45. Time Series
  46. Conjoint Analysis
  47. T-Test or Means or ANOVA
  48. Standard Data Files
  49. Tutorials
If you don't measure it… you can't improve it!


Bootstrap and random numbers

  1. Bootstrap confidence interval for Cronbach alpha
  2. Bootstrap crosstab
  3. Bootstrap ordinary least square (OSL) estimators
  4. Bootstrap the mean and median
  5. Generate random triad numbers
  6. Generating multivariate hypergeometric random variables
  7. Generating multivariate normal variables with a specific covariance matrix (AnswerNet)
  8. Get random sample of various size then calculate statistics (Compare means of n samples of size s1 s2 ... sn ...)
  9. Get various random samples of same size calculate statistics (Compare means of n1 n2 ... nn samples of size s)
  10. OMS bootstrapping
  11. Sampling distribution of the correlation between 2 variables
  12. Generating multinominal random variables
  13. Bootstrap confidence interval for the variance of a variable

Block Designs

  1. Completely Randomized Designs (equal or unequal n per treatment)
  2. Random assignment of units to experimental treatments This is for Randomized Block Designs (Simple & Generalized) and Completely Randomized Designs (equal n per treatment)
  3. Bootstrap confidence interval for the variance of a variable

Batch files

  1. Example of bat file running an sps file
  2. Run syntax from batch file or command line

Area under the curve (AUC)

  1. Incremental Area under the curve
  2. Area under the curve using Trapezoidal Integration

Charts and Tables

  1. Bar charts for school types by sex where percentages of each sex add up to 100 percent
  2. Blank bar for unselected category (from AnswerNet
  3. Blank bar for unselected category (generalized) (note however that to show empty categories in CTABLES is trivial. All that is required is to specify EMPTY=INCLUDE in the /Categories subcommand)
  4. Compare (superimpose) two histograms could also use a population pyramid (see IGRAPH section below for an example)
  5. Count outliers (show number of outliers in a boxplot)
  6. Do bar charts excluding categories with small number of cases
  7. Do many histograms with the same axis boundaries (this demonstrates how the use of the macro Dograph)
  8. Graph cumulative percentage retired at attained age by categorical variable
  9. Graph cumulative percent on X axis
  10. Graph survey question
  11. Histogram with percent on y axis instead of numbers
  12. Identify your own data in the chart
  13. Identify your own data in the chart version2 This is a generalization of the above syntax.
  14. Print current date and time in chart title (Same technique can be used with Tables)
  15. Print current date in chart title
  16. Print histogram or bar chart depending on data (A good macro example)
  17. Print school names as part of graph titles
  18. Show mean values in line graph
  19. Show 2 categories on same histogram
  20. ZIPF law and graph
  21. Construct a table "manually" in the data editor (A good example of data restructuration)
  22. Find population frequency when multiple response with long strings
  23. Construct a table "manually" example no 2
  24. List variables infrequency table by order of medians
  25. Put 4 variables in the same frequency table
  26. Print mean plus minus standard deviation in Table
  27. Print actual namegroup and id in heading of each listing
  28. Show empty category in tables (from AnswerNet) Note: This is trivial with CTABLE)
  29. Sort categories by decreasing count but with Others as last one
  30. Show number of valid cases in table footnote
  31. Get statistics for grouping of variables
  32. Table where list of variables is generated by macro (Illustrates the !IF ...!ELSE... !IFEND macro command)
  33. Using Macros and CTABLE
  34. Do not display value labels in pivot tables
  35. Show empty categories in tables (second method)
  36. Hide Cell With Less Than N Persons

Labels, Variable Names and Format

  1. Add (or replace) a character at the beginning of each var names
  2. Add'_99' at the end of every variable names
  3. Apply lab1 as value label to var1 by syntax
  4. Assign same label to many variables
  5. Assign value labels to a vector
  6. Assign variable and value labels of a given variable to other variables
  7. Convert variable format
  8. Automatically rename variables
  9. Autovariable renaming or copying
  10. Create dummy variables (also called indicator or binary variables)
  11. Create dummy variables (AnswerNet)
  12. Create new variable equal to number of occurrences of var1
  13. Define a global variable (this is a useful programming technique)
  14. Define variable label by Macro
  15. Delete all variable labels of a given sav file
  16. Delete List Of Variable Names But Some May Not Exist
  17. Delete variables with all values equal to zero
  18. Delete or reorder variable names (data fields)
  19. Delete many variable labels
  20. Match label file with data file
  21. Print variable labels and value labels in FREQ Tables
  22. Read ASCII data variable name, value and value labels
  23. Recode variables var1 becomes varx etc
  24. Remove underscores from all variable names (can be adapted to remove any other character)
  25. Rename variables
  26. Rename all variables t2abc becomes t1abc etc
  27. Rename var in file1 to names in file2
  28. Reverse scale and value labels
  29. Round and change format of all numeric variables
  30. Show 0.45 instead of.45
  31. Sort variable names by alphabetical order (AnswerNet)
  32. Sort variables by name in data file (sent by A. Paul Beaulne)
  33. Sort Variables By Alphabetical Order
  34. Write value labels to ASCII file (AnswerNet)
  35. Group data and define corresponding value labels

Cluster Analysis

  1. Cluster analysis using similarity proximity (count) data as input
  2. Save centers of Hierarchical cluster analysis as initial value of K-means

Combinations, Permutations, Interactions

  1. All combinations of 3numbers out of n (see "Find all Combinations .." below for a generalization)
  2. Find all combinations of 1 up to n items out of m items (highpower stuff!)
  3. Find all combinations of n items out of m items (high power stuff!)
  4. All combinations of 3letters out of n (with replacement)
  5. Calculate interaction terms between 2 categorical variables (within a regression context)
  6. Create a new variable for each combination of 2 variables
  7. Find all permutations of integers 1 to n Maximum value of n is 7.Combined with recode, this can find permutations of any strings or numbers.
  8. Generate orders for block of trials
  9. Get all possible crossproducts of pairs of variables (contains a fair amount of comments)


  1. Find the cubic root
  2. Reverse the digits on an integer
  3. Create a new variable equal to mean of an other variable
  4. Compute average of m variables where m is a variable in the data file
  5. Compute distances between 2 points on earth (with thanks to Simon Freidin)
  6. Compute percentage of patients having each fracture category
  7. Automatically compute sample weights to approximate population
  8. Box-Cox Transformation To transform var1 using each of the 31 values of lambda that are between -2 and 1 (increments of 0.1).
  9. Compute z = x / max( y) where max( y) is over all cases
  10. Count number of distinct values across 400 variables
  11. Weight data based on 2 or more vars
  12. Find LAG(var1, var2) (variable lag)


  1. Calculate average percent score
  2. Calculations on dynamic columns
  3. Fill in the gaps (information in file has been left blank when it equals the information in the preceding case, this syntax fills the gap)
  4. Fill in the gaps (within ID)
  5. Interaction in factorial designs when dependent variable is not normal Thanks to Marta Garcia-Granero for this code.
  6. Stop or resume generating outputs in the output window
  7. P-value adjustments for Multiple Comparisons

Factor Analysis

  1. Factor analysis with Spearman correlation through a matrix


  1. Clustered bars with percent based on total in cluster
  2. Example of surface plot
  3. Graphing an arbitrary function
  4. Graph showing interaction in multiple regression
  5. How to speed up IGRAPH (A similar approach could be used for other type of graphs)
  6. Produce long IGRAPHs
  7. Population pyramids
  8. Separate box plot graph for each category value (syntax can be adapted to any other type of graph)

Concatenate/modify string variables

  1. Apparent problem with concat Newbies should take a look at this example.
  2. Combine a string variable and a numeric variable
  3. Concatenate content of cases with same id
  4. Concatenate numbers
  5. Concatenate 22 variables
  6. Convert first letter of each word to uppercase (Thanks to A. Paul Beaulne for sending me this code)
  7. Create an id using name and dob
  8. Normalize string (delete spaces at beginning, remove period at end, capitalize all letters)
  9. Normalise alpha (Capitalise the first letter of each word, use lower case for the other letters)
  10. Remove initial from name
  11. Remove period from string (can be modified to remove any other characters)
  12. Reorganize names (place family name at the beginning of the sting)
  13. Transform ascii codes into characters
  14. Concatenate All Values Into Constant

Parse or Flag data

  1. Extract bits from an integer
  2. Extract portion of string (string contains first and last name, want first 3 letters of last name)
  3. Extract portion of string starting with a digit
  4. Extract Zip code from address field
  5. Extract two numbers from as tring (e.g. string "120/90" becomes numbers 120 and 90)
  6. Flag if last characters of string are 'Esq'
  7. Parse a string into one letter per variable
  8. Parse comma separated numbers
  9. Parse data separated by slashes
  10. Parse domain name from email addresses
  11. Parse comma separated strings then autorecode results
  12. Parsing a variable which has embedded line feeds (thanks to Bjarte Aagnes)
  13. Remove letter at end of string and convert remaining string to a number
  14. Splita string variable into plaintiff and defendant portions
  15. String variable contains items separated by a slash (there is a variable number of items from one case to the next)
  16. Weed out letters in a string and create a number with remaining digits

Data Editor

  1. Reduce size of columns in data editor
  2. Right align strings in data editor

Data validation

  1. Perform tests on ssn
  2. Validate likert and continuous values

Dates and time

  1. Add 60 days to a date then find end of that month
  2. Add leading zeros to a string date
  3. Ages are in nnH nnD nnM and nnA
  4. Break down number of days in hospital by calendar month
  5. Calculate age
  6. Calculate time differences to milliseconds
  7. Calculate mean date and standard deviation in days
  8. Calculate nb of days within the eligibility period
  9. Calculate number of minutes between 2 timestamps (crossover midnight)
  10. Calculate number of months between 2 dates
  11. Calculate waiting time when time is coded in hh min
  12. Compute number of weekdays between 2 dates
  13. Compute number of weekdays excluding public holidays
  14. Compute sleep time
  15. Convert string formatted as hhmmss into numeric time variable (thanks to Jim Marks)
  16. Convert string 1997-08-22 into a date variable
  17. Convert basis
  18. Convert string"04Apri03" to a date variable
  19. Convert string 01jan1992 to a date variable
  20. Convert string into date and time variables
  21. Convert strings into numbers (variable contains age in either of the following format "7Y" for 7 years, "3m" for 3 months, "28D" for28 days. Need to convert these to years.)
  22. Date plus 3 months
  23. Extract time portion from string variable containing date and time
  24. From AM PM to military time
  25. Importing from Excel (convert days into dates)
  26. Keep time portion of date when creating Tab delimited file
  27. Make variable equal to current date
  28. Print date and time before a procedure
  29. Number of consecutive 30 minutes of hypoxia
  30. Print current date as part of graph title
  31. Print day name along with date
  32. Read time stamp
  33. Save data file with current date as part of name
  34. Select a range of dates
  35. Time an SPSS procedure
  36. Convert string to date and select cases which fall during the weekend
  37. Dates appear as asterisk on chart (solution)
  38. Convert Datetime String to Datetime Format Variable


  1. Add variables containing lower and upper CI for mean
  2. Bayes estimates for proportions and their CI with thanks to Evgeny Ivashkevich (this also calculates Confidence Intervals for a category not present in the sample)
  3. Calculate Chi-square significance given q and df
  4. Calculate 95 percent confidence interval for the median (thanks to Marta Garcia-Granero)
  5. Calculate McNemar Chi-Square test (thanks to Marta)
  6. Hodges-Lehmann Confidence Interval for Median difference (thanks to Marta Garcia-Granero)
  7. Exact Confidence Limits for a Binomial Parameter
  8. Goodness of Fit Test for Poisson Distribution
  9. Inferences and Confidence Intervals for Proportions
  10. Fitting Models with Overdispersion
  11. Tests of General Linear Hypotheses
  12. Normalization of raw scores

Export & Import

  1. Export all tables in word
  2. Export data and value labels to excel
  3. Export content of data editor to a specified sheet of an existing Excel workbook
  4. Export from SPSS to ACCESS
  5. Export from SPSS to ACCESS (method2)
  6. Export more than 256 vars to Excel
  7. Export some SPSS vars to many sheets of Excel workbook
  8. Import from ACCESS or Lotus Notes (no DSN needed: this is very handy. Thanks to Tom Dierickx)
  9. Writing back an SPSS 10 file to an ODBC database (from AnswerNet)

Flag or Select Cases

  1. Exclude "outliers" from analysis (where outliers are defined as cases outside Mean +/- 2 SD)
  2. Flag cases where a given string variable contains a given word
  3. Flag cases where any of a list of variables have same value
  4. Flag cases meeting a certain condition as well as preceding and following case for the same person
  5. Flag cases where salary is in top 95 percentile
  6. Flag first and last dates (within each ID)
  7. Keep only duplicate cases
  8. Print frequency table of the n most (less) frequent items
  9. Select cases where same letter appears twice in string
  10. Select patients where drug1 was given before drug2
  11. Sophisticated search in string variable (data were scanned, portion of strings include letters (eg B) instead of numbers (eg 8); this syntax flags the errors)

Item Analysis

  1. Syntax for item analysis
  2. Syntax For Item Analysis V6 This is a much improved version of the above. It is fully automated and has been developed and tested using SPSS 15.

Matching data files

  1. Create data file if double entries are equals (where entries done by 2 different persons in 2 different files)
  2. Double entry check
  3. Find errors in 2 files (data entered twice)
  4. Match one to many where key has 4 variables
  5. Match 2 files using between-dates criteria
  6. Merge 2 data files based on many to many relationship
  7. Compare 2 Data Files with thanks to Simon Freidin


  1. Example that reads, writes, creates and transforms matrices
  2. Export variance-covariance matrix to ASCII file
  3. Export variance-covariance matrix to sav file
  4. Find inverse of a matrix
  5. Macro autogenerate initial data file with thanks to Fernando Cartwright
  6. Matrix out in
  7. Read matrix data
  8. Reliability analysis when input is a correlation matrix
  9. Transform a matrix into a vector
  10. Cohen's Kappa

Multiple responses

  1. Count unique occurrences of a multiple response
  2. Create dichotomous variables from multiple responses which are not in order
  3. Multiple responses are encoded as comma separated letters

OMS (Output Management System)

  1. Crosstab Chi-square and Phi in same table
  2. OMS and macros


  1. Exclude cases over mean plus 2 times sd
  2. Replace outliers by average of cases with same characteristics
  3. Replace outliers by mean plus/minus n times sd
  4. Winsorize a mean

Random Sampling

  1. Complex sampling without replacement
  2. Draw without replacement (random permutation of numbers)
  3. Generate random phone numbers
  4. Find random pairs of cases for T-test
  5. Find random pairs of cases with same characteristics
  6. Flag n random cases within each subgroups
  7. Get 2independent samples meeting given criteria
  8. Get 2 random samples same sex age education
  9. Get n independent random samples of size m from same file
  10. Get random sample of x% of each stratum
  11. Get random sample of N cases from each stratum
  12. List of random cases id10 per line
  13. Match cases on basis of propensity scores (this involves matching cases which do not match but are close to each other)
  14. Proportional sampling without replace
  15. Proportional random sampling
  16. Proportional sampling without replacement
  17. Random sample n males and n females
  18. Random samples with same age sex education
  19. Random split a file in two files
  20. Randomize a variable n times and keep each randomization
  21. Scramble social insurance numbers
  22. Select 2 cases from each group
  23. Select random samples of each group
  24. Split files in 2 random portions
  25. Split a file into 10random groups of equal size
  26. Systematic fixed sampling
  27. Getting repeated sampling from same file

Read, Write or Create Data

  1. Adding new cases using syntax
  2. Add variable equal to function of an existing Var
  3. Copy some variables from each record type 1 to add a new record of type 0
  4. A few simple examples of INPUT PROGRAM (a short tutorial)
  5. Create consecutive records at the end of the file
  6. Create constants for each non missing date
  7. Define new variables in empty data set
  8. Define varx to vary
  9. Duplicate cases n times where n is variable (see also Expand Crosstab Data below)
  10. Expand crosstab data into original data file (disaggregate data)
  11. Expand data x and y times eg from a case where age=20, males=5  and females=6 want to create 5 cases with age 20 and sex=1 and 6 cases where age=20and sex=0
  12. Fill the gaps when Aggregate has empty categories Syntax creates cases to fill the gaps
  13. Generate random dates
  14. INPUT program (to generate a random data file)
  15. Insert missing cases (within id)
  16. Insert missing dates(within id)
  17. Printing date time in output
  18. Read ASCII (logical case is made up of 5 rows of 10 cases)
  19. Read ASCII file using FILE TYPE
  20. Read ASCII file using INPUT PROGRAM
  21. Read ASCII file with a forward slash delimiter
  22. Read a variable number of records per case
  23. Example of data list
  24. Example of INPUT program
  25. Read data inline File Type MIXED Records
  26. Read ASCII file with comma or dash delimited data
  27. Read ASCII file with comma separated data (within quotes)
  28. Read ASCII file with fixed and free data
  29. Read ASCII file with FIXED Data
  30. Read ASCII with comma and dot separated decimals
  31. Read text file where n columns are to be ignored (n is a variable which varies by file)
  32. Skip first 6 Records
  33. Skip one line of data
  34. Read ASCII file with REPEATING data
  35. Read complex file
  36. Read data files that has no carriage returns (from AnswerNet) (data is just one long stream, with no separation between records or fields, and no carriage returns)
  37. Read data produced by CGI script
  38. Read data where each case has 4 numeric records and a variable number of string records (this is illustrates the use of the REREAD command)
  39. Write comma or tab delimited file
  40. Write frequency percentages to data file
  41. Write missing values as a dot
  42. Write special ASCII file
  43. Writing value labels instead of values
  44. Read comma delimited fields with commas inside quoted strings
  45. Read comments between the lines of data
  46. Read data list free with consecutive commas

Regression, Repeated Measures

  1. Add casewise regression coefficients to data file
  2. Calculate predicted values (unianova)
  3. Compare coefficients generated by various groups
  4. Compare regression coefficients (thanks to A. Paul Beaulne for sending me this code)
  5. Conditional logistic regression
  6. Do All-Subsets regressions
  7. Do all univariate linear and logistic regressions (thanks to Marta Garcia-Granero)
  8. Logistic regression by macro
  9. Regression calculates table of predicted values
  10. Regression in a loop
  11. Regression when holding out k cases
  12. Regression with correlation matrix as input
  13. Regression with normed weight
  14. Repeated-measures macro
  15. Chow test
  16. White's test: calculate the statistics and its significance (thanks to Marta Garcia-Granero)
  17. White's standard errors full OLS and White's SE output (thanks to Gwilym Pryce)
  18. Testing individual regressors in logistic regression
  19. Non-linear regression (NLR) with variance of residuals as the loss function (this is not trivial)
  20. Piecewise regression (also known as "spline regression" and "piecewise polynomials")
  21. Breusch-Pagan & Koenker test (thanks to Marta Garcia-Granero)

Test if file or variable exists

  1. Check for existence of file
  2. Choice of include file depends on existence of a given variable

Self Adjusing Code

  1. Automated data transform from tall to wide
  2. Choice of include file depends on data
  3. End of macro DOLOOP comes from data
  4. Execute selective portions of syntax
  5. From 2 files to 1 cases per id
  6. Syntax varies based on name of data file

Ranking, largest values, sorting, grouping

  1. Aggregating with the median
  2. Calculate cumulative sum of Var1
  3. Calculate mode
  4. Calculate number of distinct values within Case
  5. Calculate z-scores across variables
  6. Code using percentiles of a subset
  7. Compute percentiles for one variable and by one or more grouping variables (with thanks to Tom Dierickx). Note that the percentiles end up in the data editor, not in the Output window.
  8. Compute percentages based on values of first case
  9. For each case, find the earliest case in the preceding 7 days (relatively complex stuff)
  10. Find5 largest values within case
  11. Find last 2 scores on repeated measure
  12. Identify variables having minimum value (with thanks to Maciek Lobinski)
  13. New variable equals cumulative totals by id
  14. Number consecutively cases with the same id
  15. Random order
  16. Rank equal intervals between minimum and maximum
  17. Rank on basis of percentage of good
  18. Rank variable names in alpha order
  19. Rank within cases
  20. Replace missing by median values within the case
  21. Round up to the higher point 5
  22. Saving confidence interval for mean (within groups)
  23. Score a test with an answer key (thanks to A. Paul Beaulne)
  24. Sorting values within cases (using the bubble sort algorithm)
  25. Syntax group data in bands
  26. Create n tiles based on percent ranges rather than on count
  27. Identify 3 Highest Values within Each Case
  28. Sorting Within Cases
  29. Finding All Modal Values for Subgroups of Cases

Tests of Inequality

  1. Many tests of inequality v5 (this chart template is used by the syntax)
  2. Dissimilarity Index

Working with Many Files

  1. Combine 2 data files many to many
  2. Combine any number of consecutively named sav files 50 at a time
  3. Combine many data files with same variables
  4. Combine many xls files into a single sav file
  5. Data list is outside the main syntax (illustrates how a syntax file can be modified by syntax)
  6. Delete cases contained in file2 from the main data file
  7. Erase files
  8. Example 1 using UPDATE command
  9. Example 2 using UPDATE command
  10. Get mean from 3different files
  11. Keep only cases from Master file whose id are in second file
  12. Macro to delete a list of files
  13. Many folders and many files
  14. Run a macro on every file whose name is in a sav file
  15. Run syntax on files whose names are derived from a data file
  16. Show number of differences, if any, between 2 files (to check double entry of data).
  17. Split big files into separate categories (create a different sav file for each value of a numeric categorical variable)
  18. Split big files into separate categories string var create adifferent sav file for each value of a string categorical variable)
  19. Split file with kn cases into k files of n cases each
  20. Unusual file merge
  21. Include 200syntax files by macro
  22. Process All .xls Files in a Given Folder (script)

Meta Analysis

  1. Meta Analysis: fixed and random effects models (With thanks to Valentim R. Alferes) This SPSS syntax does a meta-analysis on a set of studies comparing two independent means. It produces results for both fixed and random effects models, using Cohen's d statistics. The user has a total of 10 modes for entering summary data.
  2. Meta-SPSS An exhaustive set of syntax files written by Marta Garcia-Granero as well as sample data files and supporting documents.

Transform variable

  1. Constrain a variable to a given interval (syntax is first given, then it is generalized using 2 macros)
  2. Convert numbers to string with leading zeros
  3. Create variable equal to z-scores of an existing variable
  4. Extract fist or first 2 digits of a large integer
  5. Global autorecode A nice problem: Autorecode many string variables where the recode formula (eg a=1,b=2, etc) is the same for all variables even though none of the variables have all possible values
  6. Replace confidential information eg a ssn by a new(known) id
  7. Replace values higher than n by the mean of the other values
  8. Automatically rescale variable to be between 0 and 1
  9. Examples of Converting Strings To Numbers
  10. Replace a Letter to 9999 and Convert To Number
  11. Transform Alphanumeric Codes to Numeric

RFM Analysis

  1. RFM-analysis on aggregated data (comments are in Russian)

Remove Characters, Duplicates or Variables

  1. Delete cases with offset cases
  2. Delete double entries (thanks to Maciek Lobinski) For instance if, for a given case, var1 equals var2, the syntax replaces var2 by sysmis.
  3. Find duplicates
  4. Remove double quotes
  5. Remove duplicate records
  6. Remove unused variables from many files
  7. Replace character In string
  8. Replace consecutive spaces in string by a single space
  9. Save duplicates in a separate file


  1. Soundex Phonetic Comparison
  2. Convert string to numeric variable
  3. Convert numbers to strings
  4. Convert string'250 million' into a number (or '16 billion' etc)
  5. Change all strings in data file to lower case
  6. Are All Words Present? This tests whether all words passed to the macro are present within a given string variable
  7. Are All Words Present? - to Dichotomy Vars Similar to above but creates one dichotomy variables for each target word

Working with missing values

  1. Replace missing by random value taken from cases with valid value (see Hot Deck above for a more general solution)
  2. Conditionally replacing missing by mean
  3. Missing values and DO IF
  4. Replace missing with mean
  5. Replace missing by median values within each case
  6. Replace missing by mean of category
  7. Replace "Blanks" by value from preceding case
  8. Recode certain dates as missing
  9. Mean substitution in additive scale
  10. List variable names with missing values and identify main elements of cases
  11. Hot Deck substitution of missing values of X within STRATUM (thanks to Theo van der Weegen)
  12. Identifying the3 types of missing values
  13. Conditionally replacing missing by mean example 2
  14. Delete variables that have only missing values

ROC curves

  1. ROC curves and Youden's Index

Restructure File

  1. Allocate dummy variables to 24 hours
  2. Automated data transform from tall to wide
  3. Automated restructure from long to wide with thanks to Hillel Vardi.
  4. Collapse empty variables within a case
  5. Deduplicate cases while keeping all the information (a cute little problem)
  6. Each variable occupies 5 rows of 10 columns (another nice little problem)
  7. Find beginning and end of continuous periods
  8. From many to one example1
  9. From many to one example2
  10. From many to one with alpha data
  11. From many to one with specific order of new variables
  12. From one to Many simple
  13. From one to many with indicator variable
  14. Restructure data fileexample1
  15. Restructure data fileexample2
  16. Restructure data fileexample3
  17. Restructure data fileexample4
  18. Restructure from tall to wide (general solution) (non-trivial macro code...)
  19. Restructure time periods to a time matrix
  20. Restructure to calculate Kappa
  21. Transpose(FLIP) string variables
  22. VarsToCases and CasesToVars
  23. Automated Data Restructure (thanks to Kevin Hynes) This example maintains a grouping factor while restructuring data from tall to wide
  24. Use Former Variable Names As Value Labels

Sample Size and Power

  1. Power analysis examples With thanks to Bruce Weaver
  2. Sample size for means With thanks to Marta Garcia-Granero. This is a collection of several short macros that perform sample size calculations for confidence interval estimation and one sample / two samples tests for means (this last one with equal or unequal sample sizes).
  3. Sample size for proportions With thanks to Marta Garcia-Granero. A collection of macros that perform sample size calculation for the estimation of one proportion and one or two samples hypothesis testing, as well as the calculation of the power of a test.
  4. Sample size for correlation hypothesis testing thanks to Marta!

Survival Analysis

  1. Show95pc CI for failure points on a survival plot
  2. Survival Analysis Example

Time Series

  1. Gaussian Filter

Conjoint Analysis

  1. Textbook Example Analysis of Plan 2 by 2 (comments are in Russian)

T-Test or Means or ANOVA

  1. ANOVA A*B (thanks to Valentim Alferes) This does an A*B Factorial ANOVA and calculates variance components, measures of association, measures of effect size and observed power. Works with raw data or published summary statistics.
  2. T-Tests and Likert scales
  3. Compare mean of each hospital with mean of all other hospitals (nice little macro)
  4. ANOVA Tables using 4 methods (thanks to Valentim Alferes) method1:for Ns, Means and SDs; method 2 for Ns, Means and Variances; method3 for Ns, Means and MS Error; method 4 for Means, Df num, Df den and MSError.
  5. Standardized effects size (Cohen Glass and Hedges's d) (with thanks to Marta) The effects size and their standard errors are added to the data file.
  6. ONEWAY with summarydataI2 Performs several ONEWAY ANOVAS plus several Homogeneity of variances tests on summary data. Any number of variables can beanalysed. Thanks to Marta Garcia-Granero.
  7. ONEWAY with summarydata1 Performs a ONEWAY ANOVA plus several Homogeneity of Variances tests on summary data. Thanks to Marta Garcia-Granero
  8. Multiple Mann-Whitney tests (using a macro to have a procedure inside a LOOP)
  9. Hotelling's T**2 & Profile Analysis (thanks to Richard MacLennan)
  10. Do a T-Test with only the Means, SD and Ns (uses ANOVA)
  11. DoT-Test with only means, SD and Ns (thanks to Marta Garcia-Granero)this includes Hartley's F test, the standard T-test and Welch test, asymptotic and non asymptotic 95% CI are calculated.
  12. Cochran Hartley Critical Values This gives the tabulated critical values at 5% and 1% for both HOV tests. Thanks to Marta Garcia-Granero
  13. T Test: Measures of Effect Size and Nonoverlap, and Observed Power (thanks to Valentim Alferes) User can either analyse raw data or reproduce the SPSS T-Test standard output using summary statistics in published articles

Standard Data Files

  1. Create labels with common names for US County FIPS codes
  2. US County FIPS Code to Common Name Recode


  1. String Manipulation Tutorial