Validate Content Format in a String Variable
This is a more general (more flexible and powerful) solution for the problem of string validation, considered in Test SSN example in Syntax section. The solution uses regular expression by re module of Python which allows to test more complex format patterns, including variable length, variations of splitting symbols, etc.
As an example we consider original SSN data (variable ssn) and more complex case of validating chemical elements names from periodic table (variable elem). Regexp idea for chemical element validation is taken from here.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | * Encoding: UTF-8. * Data example. DATA LIST FIXED /ssn 1-11 (A) elem 13-15 (A). BEGIN DATA 123-45-6789 Ba 1234-5-6789 H 12e-45-6789 Te 123&45-2222 Ht 111-44-2389 Uus 122-33-678 ult END DATA. LIST. BEGIN PROGRAM Python. |
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import spss, re cur = spss.Cursor(accessType='w') # Create vars to store found errors, 0 means number variable type cur.SetVarNameAndType(['error_ssn', 'error_elem'], [0,0]) cur.CommitDictionary() for i in range(spss.GetCaseCount()): row = cur.fetchone() # Index 0 - first variable, index 1 - second one ssn = row[0] elem = row[1] if not re.match(r'^\d{3}-\d{2}-\d{4}$', ssn): cur.SetValueNumeric('error_ssn', 1) # VERBOSE flag below - to allow pretty regexp formatting (with leading spaces) if not re.match(r"""^A[cglmrstu]\s$|^B[aehikr]?\s*$|^C[adeflmnorsu]?\s*$| ^D[bsy]\s|^E[rsu]\s$|^F[elmr]?\s*$|^G[ade]\s$| ^H[efgos]?\s*$|^I[nr]?\s*$|^Kr?\s*$|^L[airuv]\s$| ^M[dgnot]\s$|^N[abdeiop]?\s*$|^Os?\s*$|^P[abdmortu]?\s*$| ^R[abefghnu]\s$|^S[bcegimnr]?\s*$|^T[abcehilm]\s$| ^U(u[opst])?\s*$|^V\s*$|^W\s*$|^Xe\s$|^Yb?\s*$|^Z[nr]\s$""", elem, re.VERBOSE): cur.SetValueNumeric('error_elem', 1) cur.CommitCase() cur.close() |
38 | END PROGRAM.
|
Related pages
...