Python Regular Expressions: Examples & Reference

Last updated:
Table of Contents

Usage examples for regular expressions in Python.

Unless otherwise stated, examples use Python 3

String matches regex

In re.match, the pattern must match the whole string

Example pattern: one or more alphanumeric characters

import re

# match(pattern,string)
if re.match('w+','foobar'):
    # match
else:
    # no match      

String contains regex

Use re.search(pattern,string)

Example pattern: a sequence of 3 digits:

import re

# returns a match
re.search('\d{3}','foo 123 bar')
# >>> <_sre.SRE_Match object; span=(4, 7), match='123'>

# no match, returns None
re.search('\d{3}','foo 1 23 bar')
# >>> None

Extract group

Use re.match(pattern,string). Note that the pattern must match the whole string.

Example pattern: 3 characters followed by a dash then another 3 characters

import re

# this pattern matches things like "foo-bar" or "bar-baz"
pattern = "^(\w{3})-(\w{3})$"

string1 = "abc-def"

# returns Null if no match
matches = re.match("^(\w{3})-(\w{3})$",string1)

if matches:
    # match indices start at 1
    first_group_match = matches.group(1) 
    # abc

    second_group_match = matches.group(2)
    # def

    print(first_group_match+" AND "+second_group_match) 

    # prints: "abc AND def"

Extract the first occurrence of regex in string

This matches a pattern anywhere in the string, but only once.

Use re.search(pattern,string) rather than re.match

Example pattern: letter 'b' followed by two alphanumeric characters

import re

pattern = r'b\w{2}'

re.search(pattern,"foo bar baz quux")
# >>> <_sre.SRE_Match object; span=(4, 7), match='bar'>

Extract all occurrences of regex in string

Link to docs: re.findall

Use re.findall(pattern,string). Note that this returns a list of string, rather than a list of Match objects.

Example pattern: letter 'b' followed by two alphanumeric characters

import re

# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'

re.findall(pattern,"foo bar baz quux")
# >>> ['bar', 'baz']

Note: re.finditer(pattern,string) works the same say, but an iterator of Match objects is returned instead of a simple list of strings matched.

Replace all occurrences of regex in string

If you just need to replace a static string, string.replace() is much, much faster!

Use re.sub(pattern,replacement,string). This will replace all occurrences of regex with the replacement string.

import re

# Example pattern: a digit
re.sub('\d','#','123foo456')
# returns '###foo###'

# Example pattern: valid HTML tags
re.sub('<[^>]+>','','foo <p> bar')
# returns 'foo  bar'

Replace using captured groups

Groups are referenced like this: '\1' for first group, '\2' for second group, etc.

Note the use of the r'' modifier for the strings.

Groups are 1-indexed (start at 1, not 0)

Example pattern: alphanumeric characters, a period (".") then more alphanumeric characters. (In other words, matches file extensions)

import re

# match strings like "foo.txt" and "bar.csv"
pattern = r'(^.+)\.(.+)$'

# replace the extension with "MYEXT"
re.sub(pattern,r'\1.MYEXT',"foo.txt")
# returns foo.MYEXT

# replace the file name with "MYFILE"
re.sub(pattern,r'MYFILE.\2',"foo.txt")
# returns MYFILE.txt
Twitter Linkedin YC Hacker News Reddit

Split string by regex

Use re.split(pattern, string). Returns a list of strings that matched.

Example pattern: split string by spaces or commas

import re

re.split('[\s,]+','foo,bar bar  quux')
# ['foo', 'bar', 'bar', 'quux']

Non-capturing groups

TEMPLATE: (?:PATTERN)

Use non-capturing groups when you want to match something that is not just made up of characters (otherwise you'd just use a set of characters in square brackets) and you don't need to capture it.

Non-capturing groups are also slightly faster to compute than capturing groups.

Example pattern: a word boundary, the string "foo", then another word boundary

import re

# Replace "foo" when it's got non-words or line boundaries to the left and to the right
pattern = r'(?:\W|^)foo(?:\W|$)'
replacement = " FOO "

string = 'foo bar foo foofoo barfoobar foo'

re.sub(pattern,replacement,string)
# >>> ' FOO bar FOO foofoo barfoobar FOO '

Lookbehind

TEMPLATE: (?<=PATTERN)

Match pattern ONLY if it IS IMMEDIATELY PRECEDED by something.

Example pattern: "bar" only if it's preceded by "foo"

import re

# replace "bar" with "BAR" if it IS preceded by "foo"
pattern = "(?<=foo)bar"
replacement = "BAR"

string = "foo bar foobar"

re.sub(pattern,replacement,string)
# 'foo bar fooBAR'

Negative lookbehind

TEMPLATE: (?<!PATTERN)

Match pattern if it's NOT IMMEDIATELY PRECEDED by something.

Exampe pattern: "bar" if it's NOT preceded by "foo"

import re

# replace "bar" with BAR if it is NOT preceded by "foo"
pattern = "(?<!foo)bar"
replacement = "BAR"
string = "foo bar foobar"

re.sub(pattern,replacement,string)
# 'foo BAR foobar'

Lookahead

TEMPLATE: (?=PATTERN)

Match pattern if it IS IMMEDIATELY FOLLOWED by something else.

Example pattern: "foo" only if it's followed by "bar"

import re

# replace "foo" only if it IS followed by "bar"
pattern = "foo(?=bar)"
replacement = "FOO"
string = "foo bar foobar"

re.sub(pattern,replacement,string)
# 'foo bar fooBAR'

Negative Lookahead

TEMPLATE: (?!PATTERN)

Match pattern if it IS NOT IMMEDIATELY FOLLOWED by something else.

Example pattern: "foo" only if it's NOT followed by "bar"

import re

# replace "foo" only if it is NOT followed by "bar"
pattern = "foo(?!bar)"
replacement = "FOO"
string = "foo bar foobar"

re.sub(pattern,replacement,string)
# 'FOO bar foobar'

re.match matches the whole string a single time, re.search matches the pattern a single time anywhere in the string

import re

## returns NULL
re.match('abc','xx abc xx')

## returns a MATCH
re.search('abc','xx abc xx')

Case-insensitive regular expressions

Just add (?i) before the pattern string.

Example pattern: match strings beginning with any variation of "foo", such as "FOO","Foo", etc.

import re

# matches
re.match("(?i)^foo.*","Foo")

# matches too
re.match("(?i)^foo.*","FOO")

# also this
re.match("(?i)^foo.*","foo")

Replacing a static string:

If you just need to replace a static string, calling string.replace() instead of re.sub is much faster.

"foo bar baz".replace("foo","FOO")
>> "FOO bar baz"

References and other resources

Dialogue & Discussion