This article explains how to simplify your life by using the overly complicated Regular Expressions with AMPscript in Salesforce Marketing Cloud.
Regular what?
Regular Expressions or RegEx, are easily recognizable as a gibberish chain of characters in an otherwise clean and readable piece of code.
SET @Pattern = "[.*+?^${}()|[\]\\]"
They are used to describe a pattern, something like an email address or a bank account number, which have a specific format and can be easily singled out.
Why use them?
Regular Expressions are commonly used to verify if a chain of characters matches a specific pattern.
In other words, we can check if a piece of text is an email address, a bank account or something completely different.
But did you know that you can also use RegEx to add some native HTML5 validation rules to your web form?
Or that RegEx can also be used to isolate the different parts of a chain of characters, like the domain name of a URL or the extension of a file.
Let’s have a look at what these weird patterns can do.
Divide and conquer
In RegEx, isolating the different parts of a chain of characters is a matter of adding the parentheses. This will split the chain into a set of numbered groups.
Consider a URL with some parameters and an anchor value.
https://example.net/categories?cat=25&context=email#flag
Imagine that our goal is to remove the anchor and the parameters from the URL. This usually requires some string manipulations.
SET @URL = "https://example.net/categories?cat=25&context=email#flag"
IF IndexOf(@URL,'?') > 0 THEN
SET @CleanURL = Substring(@URL,0,Subtract(IndexOf(@URL,'?'),1))
ELSE
SET @CleanURL = @URL
ENDIF
Seems overly complicated for something so simple. Now let’s use RegEx and feel the difference.
SET @URL = "https://example.net/categories?cat=25&context=email#flag"
SET @URLRegEx = "^(http[s]?:\/\/?[^:\/\s]+[^\/]*?\/\w+\.)*([^#?\s]+)(\?([^#]*))?(#(.*))?$"
SET @CleanURL = RegExMatch(@URL, @URLRegEx, 2)
In this code, we isolate the group #2 using the RegExMatch function which corresponds to everything that comes before the ? or # character. If the chain of characters is not a URL, the RegExMatch function will return empty.
SET @URL = "https://example.net/categories?cat=25&context=email#flag"
SET @URLRegEx = "^(http[s]?:\/\/?[^:\/\s]+[^\/]*?\/\w+\.)*([^#?\s]+)(\?([^#]*))?(#(.*))?$"
IF NOT EMPTY(RegExMatch(@URL, @URLRegEx, 0)) THEN
// THE URL IS VALID
ENDIF
We can also use other groups from the pattern to isolate the different parts of the URL.
SET @URL = "https://example.net/categories?cat=25&context=email#flag"
SET @URLRegEx = "^(http[s]?:\/\/?[^:\/\s]+[^\/]*?\/\w+\.)*([^#?\s]+)(\?([^#]*))?(#(.*))?$"
SET @CleanURL = RegExMatch(@URL, @URLRegEx, 2) // https://example.net/categories
SET @Params = RegExMatch(@URL, @URLRegEx, 4) // cat=25&context=email
SET @Anchor = RegExMatch(@URL, @URLRegEx, 6) // flag
Needless to say that this opens a vast range of possibilities for later.
RegEx use cases
Here are most common use cases of Regular Expressions. Please keep in mind that some of these patterns are only valid for the European data formats.
Email address
SET @Email = "john.doe@freemail.com"
SET @EmailRegEx = "^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,})$"
SET @EmailAddress = RegExMatch(@Email, @EmailRegEx, 0) // john.doe@freemail.com
SET @EmailName = RegExMatch(@Email, @EmailRegEx, 1) // john.doe
SET @EmailDomainPt1 = RegExMatch(@Email, @EmailRegEx, 2) // freemail
SET @EmailDomainPt2 = RegExMatch(@Email, @EmailRegEx, 3) // com
URL
SET @URL = "https://example.net/categories?cat=25&context=email#flag"
SET @URLRegEx = "^(http[s]?:\/\/?[^:\/\s]+[^\/]*?\/\w+\.)*([^#?\s]+)(\?([^#]*))?(#(.*))?$"
SET @isURL = RegExMatch(@URL, @URLRegEx, 2) // https://example.net/categories
SET @Params = RegExMatch(@URL, @URLRegEx, 4) // cat=25&context=email
SET @Anchor = RegExMatch(@URL, @URLRegEx, 6) // flag
Phone number
SET @Phone = "0032498492823"
SET @PhoneRegEx = "^(\+[0-9]{2}[.\-\s]?|00[.\-\s]?[0-9]{2}|0)([0-9]{1,3}[.\-\s]?(?:[0-9]{2}[.\-\s]?){4})$"
SET @isPhone = RegExMatch(@Phone, @PhoneRegEx, 0) // 0032498492823
SET @PhonePrefix = RegExMatch(@Phone, @PhoneRegEx, 1) // 0032
VAT Number
SET @VATNumber = "BE1234567890"
SET @VATRegEx = "^([A-Z]{2})([0-9A-Z]{8,12})$"
SET @isVATNumber = RegExMatch(@VATNumber, @VATRegEx, 0) // BE1234567890
SET @CountryCode = RegExMatch(@VATNumber, @VATRegEx, 1) // BE
IBAN
SET @IBANNumber = "BE68539007547034"
SET @IBANRegEx = "^([A-Z]{2})(\d{2})([A-Z\d]+)$"
SET @isIBAN = RegExMatch(@IBANNumber, @IBANRegEx, 0) // BE68539007547034
SET @CountryCode = RegExMatch(@IBANNumber, @IBANRegEx, 1) // BE
SET @Key = RegExMatch(@IBANNumber, @IBANRegEx, 2) // 68
Form validation
When it comes to form validation rules, there are 2 ways to go: Javascript or Native. Each method has it’s pros and cons, but when it comes to Regular Expressions, the Native way is easier and cleaner.
Copy/paste the RegEx in the pattern attribute and watch the magic work.
Just remember to add the required attribute to the tag, otherwise the validation rules will not apply.
<form action="" method="post">
<label>Email</label>
<input type="text" pattern="^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,})$" name="EmailAddress" placeholder="ex: john.doe@mail.com" required>
<button>Send</button>
</form>
Have I missed anything?
Please poke me with a sharp comment below or use the contact form.
If I understand it correctly, a reg ex is a more fundamental and universal concept than is indicated here. I believe every input or output message or data flow, that is readable or writable by a computer, can be described as a reg ex – a logical structure in which the atomic elements are organised under hierarchy composed of a sequence, selection and iteration components. Plead do correct me if I am wrong.
You are not wrong and I agree with you, but I believe putting it this way would scare people away 🙂 Sometimes, less is more.
Exactly the info I was after. Thanks for writing!