Python Regular Expressions: Examples & Reference

Last updated:
Table of Contents

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

Usage examples for regular expressions in Python.

Unless otherwise stated, examples use Python 3

String matches a regex

import re

# match(regex,string)
if re.match('w+','foobar'):
    # match
else:
    # no match      

Extract group matches

re.match(regex,string)

import re

# this pattern matches things like "foo-bar" or "bar-baz"

# note that we have 2 capture groups, delimited by parens
pattern = "^(\w{3})-(\w{3})$"

string1 = "abc-def"

matches = re.match(pattern,string1)

# re.match() returns None is there is no match
# so you can use a simple 'if'
if matches:
    # match indices start at 1
    first_group_match = matches.group(1) 

    second_group_match = matches.group(2)

    print(first_group_match+" AND "+second_group_match) 

    # prints: "abc AND def"

Replace occurrences of a regex

If you just need to replace a static string, string.replace() is much, much faster!

Use re.sub(regex,replacement,string). This will replace all occurrences of regex with the replacement string.

import re

# re.sub(regex,replacement,string)
re.sub('\d','#','123foo456')
# returns '###foo###'

# this is the pattern for html tags
re.sub('<[^>]+>','','foo <p> bar')
# returns 'foo  bar'

Replace with capture groups

Groups are referenced like this: '\1' for first group, '\2' for second group, etc.

Note the use of the r'' modifier for the strings.

Groups are 1-indexed (start at 1, not 0)

import re

# match strings like "foo.txt" and "bar.csv"
extension_pattern = r'(^.+)\.(.+)$'

# replace the extension with "MYEXT"
re.sub(extension_pattern,r'\1.MYEXT',"foo.txt")
# returns foo.MYEXT

# replace the file name with "MYFILE"
re.sub(extension_pattern,r'MYFILE.\2',"foo.txt")
# returns MYFILE.txt

Match anywhere in the string

This matches a pattern anywhere in the string, but only once.

Use re.search(pattern,string) rather than re.match

import re

# letter 'b' followed by two characters or numbers
pattern = r'b\w{2}'

re.search(pattern,"foo bar baz quux")
# >>> <_sre.SRE_Match object; span=(4, 7), match='bar'>

re.match(pattern,"foo bar baz quux")
# >>> None

List all occurrences, anywhere in the string

Link to docs: re.findall

Use re.findall(pattern,string). Note that this returns a list of string, rather than a list of Match objects.

import re

# letter 'b' followed by two characters or numbers
pattern = r'b\w{2}'

re.findall(pattern,"foo bar baz quux")
# >>> ['bar', 'baz']

Note: re.finditer(pattern,string) works the same say, but an iterator of Match objects is returned instead of a simple list of strings matched.

Split by regex

Just use re.split(pattern, string). Returns a list of strings that matched.

Non-capturing groups

TEMPLATE: (?:PATTERN)

Use non-capturing groups when you want to match something that is not just made up of characters (otherwise you'd just use a set of characters in square brackets) and you don't need to capture it.

Non-capturing groups are also slightly faster to compute than capturing groups.

import re

# Replace "foo" when it's got non-words or line boundaries to the left and to the right
pattern = r'(?:\W|^)foo(?:\W|$)'
replacement = " FOO "

string = 'foo bar foo foofoo barfoobar foo'

re.sub(pattern,replacement,string)
# >>> ' FOO bar FOO foofoo barfoobar FOO '

Lookbehind

TEMPLATE: (?<=PATTERN)

Match something ONLY if it is IMMEDIATELY PRECEDED by something else.

import re

# replace "bar" with "BAR" if it's preceded by "foo"
pattern = "(?<=foo)bar"
replacement = "BAR"

string = "foo bar foobar"

re.sub(pattern,replacement,string)
# >>> 'foo bar fooBAR'

Lookahead

TEMPLATE: (?=PATTERN)

Match something ONLY if it is IMMEDIATELY FOLLOWED by something else.

import re

# match "foo" only if it is followed by "bar"
pattern = "foo(?=bar)"
replacement = "FOO"
string = "foo bar foobar"

re.sub(pattern,replacement,string)

Replacing a static string:

If you just need to replace a static string, calling string.replace() instead of re.sub is much faster.

"foo bar baz".replace("foo","FOO")
>> "FOO bar baz"

References and other resources

Dialogue & Discussion