Validate Content Format in a String Variable

This is a more general (more flexible and powerful) solution for the problem of string validation, considered in Test SSN example in Syntax section. The solution uses regular expression by re module of Python which allows to test more complex format patterns, including variable length, variations of splitting symbols, etc.

As an example we consider original SSN data (variable ssn) and more complex case of validating chemical elements names from periodic table (variable elem). Regexp idea for chemical element validation is taken from here.

* Encoding: UTF-8.
* Data example.
DATA LIST FIXED /ssn 1-11 (A) elem 13-15 (A).
BEGIN DATA
123-45-6789 Ba
1234-5-6789 H
12e-45-6789 Te
123&45-2222 Ht
111-44-2389 Uus
122-33-678  ult
END DATA.
LIST.

BEGIN PROGRAM Python.

import spss, re
cur = spss.Cursor(accessType='w')
# Create vars to store found errors, 0 means number variable type
cur.SetVarNameAndType(['error_ssn', 'error_elem'], [0,0])
cur.CommitDictionary()
for i in range(spss.GetCaseCount()):
    row = cur.fetchone()
    # Index 0 - first variable, index 1 - second one
    ssn =  row[0]
    elem = row[1]
    if not re.match(r'^\d{3}-\d{2}-\d{4}$', ssn):
       cur.SetValueNumeric('error_ssn', 1)
    # VERBOSE flag below - to allow pretty regexp formatting (with leading spaces)
    if not re.match(r"""^A[cglmrstu]\s$|^B[aehikr]?\s*$|^C[adeflmnorsu]?\s*$|
                         ^D[bsy]\s|^E[rsu]\s$|^F[elmr]?\s*$|^G[ade]\s$|
                         ^H[efgos]?\s*$|^I[nr]?\s*$|^Kr?\s*$|^L[airuv]\s$|
                         ^M[dgnot]\s$|^N[abdeiop]?\s*$|^Os?\s*$|^P[abdmortu]?\s*$|
                         ^R[abefghnu]\s$|^S[bcegimnr]?\s*$|^T[abcehilm]\s$|
                         ^U(u[opst])?\s*$|^V\s*$|^W\s*$|^Xe\s$|^Yb?\s*$|^Z[nr]\s$""",
                    elem, re.VERBOSE):  
       cur.SetValueNumeric('error_elem', 1)
    cur.CommitCase()
cur.close()

38	END PROGRAM.

...

Navigate from here