Regular Expressions in Tableau

Today we’ll be learning how to use Regular Expressions or RegEx in Tableau. RegEx is very useful when we are working with messy data and want to extract some information from it. This tutorial will we be very helpful for basic data cleaning in Tableau.

If you check Tableau Calculated field we have four different use cases for Regular Expressions as shown below.

1-min


 

REGEXP_EXTRACT

Using this function we can extract the desired value from a string by giving patter as input. In this tutorial we’ll be extracting Mail id and Phone number from the string we have.

The Syntax of the function is REGEXP_EXTRACT(string,pattern) which return the values which matches to the pattern. We can use the function in the following examples.

 

Mail id Extract ( e.g. ab_898cd@efghij.klmo)

REGEXP_EXTRACT
(
[Messy Data],
'([A-Za-Z0-9 ._% + -] + @ [A-Za-Z0-9 .-] + \. [Az] {2,4})'
)

 

Following pattern is easy to read and understand:

[A-Za-z0-9._%+-]+ – Find all the string which contain values between a to z, A to Z, 0 to 9 or contains the symbols(.,_,%,+,-) and plus indicates more than 1.

@[A-Za-z0-9.-]+ – Here the syntax will find values which contain previous values with @ and other values.
\. – Find the ‘.’
[a-z]{2,4} – Find a to z values whose length is between 2 to 4. (e.g. co, com, abcd).

 

Phone. No. extract- ( e.g. 111-2222-3333)

REGEXP_EXTRACT([Messy Data],'(\d{3}\-\d{4}\-\d{4})')

\d{3} – Find digit of length 3
\-\d{4} – Find the hyphen(-) first then digit of length 4
\-\d{4} – Find the hyphen(-) first then digit of length 4


REGEXP_MATCH

Using REGEXP_MATCH we can find if the string the matches with entered pattern or not. If the string matches with the pattern it return true else false. The syntax is REGEXP_MATCH(string,pattern)

It will return true of the string contains email id
REGEXP_MATCH
(
[Messy Data],
'([A-Za-Z0-9 ._% + -] + @ [A-Za-Z0-9 .-] + \. [Az] {2,4})'
)

REGEXP_REPLACE

Using REGEXP_REPLACE we can replace a pattern to a string. The syntax is REGEXP_REPLACE(string,pattern,replacement).

Here email id will be replaced with a space.
REGEXP_REPLACE
(
[Messy Data],
'([A-Za-Z0-9 ._% + -] + @ [A-Za-Z0-9 .-] + \. [Az] {2,4})',
''
)

Nested REGEXP_REPLACE

We can use REGEXP_REPLACE inside a REGEXP_REPLACE as shown below. Here mail id will be replaced by space first then phone number by space. So the final string wont be having mail id or phone number.

REGEXP_REPLACE(
REGEXP_REPLACE(
[Messy Data],
"([A-Za-Z0-9 ._% + -] @ + [A-Za-z0-9 .-] + \. [Az] {2,4}) ',' '),
'(\ D {3} \ - \ d {4} \ - \ d {4})', '')

 

2-min

 

 

Further reading

Regular Expressions Cheat Sheet

Tableau Regular Expressions

 

Try Live Live RegEx testers to check your regular expression before using in Tableau

RegExr

RegEx101

Regex Tester

 

Thanks for reading! Comment your doubts and suggestions. Keep visiting Analytics Tuts for analytics tutorials.

 

 

3 comments

  • I need to replace only the “,” in the i/p String : [“xyc”,”abc”,”cvx”],”fhdj”,”ffh” in this string by a “;” only for the commas in the [ ] not the out side ones.

    i/p String : [“xyc”,”abc”,”cvx”],”fhdj”,”ffh”

    o/p String : [“xyc”;”abc”;”cvx”],”fhdj”,”ffh”

    • Hi Sandy,

      Using this Regex you can get the commas inside the square brackets: [.,](?=[^\[]*\])
      and replace formula will be: [.;](?=[^\[]*\])

      Let me know if this helps.
      Thanks

  • Hola tengo un campo con valores como el siguiente:
    “ECI – Antes de las 06:00 100%
    LCO – Después de las 23:59 100%”

    y quiero extraer los valores (uno en cada campo de calculado):
    06:00
    100
    23:59
    100

    pero no logro identificar como el enter que va o bien los siguientes valores mencionados

    Gracias espero puedan ayudarme

Leave a Reply

Your email address will not be published. Required fields are marked *