Python Regular Expressions: Examples & Reference

Last updated:
Table of Contents

Usage examples for regular expressions in Python.

Unless otherwise stated, examples use Python 3

String matches regex

Use re.match(pattern,string). This method returns a match object in case there was a match, or None if there was no match.

In re.match, the pattern must match the beginning of the string.

Example pattern: one or more alphanumeric characters

import re

# match(pattern,string)
if re.match('w+','foobar'):
    # match
else:
    # no match      

Whole string matches regex

In this case, there is only a match if the string fully matches the given pattern.

Example pattern: "foo" followed by one or more digits

  • If on Python 3.4 or newer, use re.fullmatch(pattern,string):

    import re
    # Python version >=3.4
    
    # match because th string FULLY matches the pattern
    re.fullmatch('foo\d+','foo123')
    #>>> <_sre.SRE_Match object; span=(0, 6), match='foo123'>
    
    # no match because although the pattern matches the beginning of the
    # string, there's extra characters at the end ("bar")
    re.fullmatch('foo\d+','foo123bar')
    # >>>None
    
  • On older Python versions, wrap the pattern with ^ and $ and use re.match(pattern,string)

    import re
    # Python version < 3.4
    
    # match because th string FULLY matches the pattern
    re.match('^foo\d+$','foo123')
    #>>> <_sre.SRE_Match at 0x7f690806b648>
    
    # no match because although the pattern matches the beginning of the
    # string, there's extra characters at the end ("bar")
    re.match('^foo\d+$','foo123bar')
    # >>>None  
    

String contains regex

Use re.search(pattern,string)

Example pattern: a sequence of 3 digits:

import re

# returns a match
re.search('\d{3}','foo 123 bar')
# >>> <_sre.SRE_Match object; span=(4, 7), match='123'>

# no match, returns None
re.search('\d{3}','foo 1 23 bar')
# >>> None

Extract group

Use re.match(pattern,string). Note that the pattern must match the whole string.

Example pattern: 3 characters followed by a dash then another 3 characters

import re

# this pattern matches things like "foo-bar" or "bar-baz"
pattern = "^(\w{3})-(\w{3})$"

string1 = "abc-def"

# returns Null if no match
matches = re.match("^(\w{3})-(\w{3})$",string1)

if matches:
    # match indices start at 1
    first_group_match = matches.group(1) 
    # abc

    second_group_match = matches.group(2)
    # def

    print(first_group_match+" AND "+second_group_match) 

    # prints: "abc AND def"

Extract the first occurrence of regex in string

This matches a pattern anywhere in the string, but only once.

Use re.search(pattern,string) rather than re.match

Example pattern: letter 'b' followed by two alphanumeric characters

import re

pattern = r'b\w{2}'

re.search(pattern,"foo bar baz quux")
# >>> <_sre.SRE_Match object; span=(4, 7), match='bar'>

Extract all occurrences of regex in string

Link to docs: re.findall

Use re.findall(pattern,string). Note that this returns a list of string, rather than a list of Match objects.

Example pattern: letter 'b' followed by two alphanumeric characters

import re

# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'

re.findall(pattern,"foo bar baz quux")
# >>> ['bar', 'baz']

Note: re.finditer(pattern,string) works the same say, but an iterator of Match objects is returned instead of a simple list of strings matched.

Replace all occurrences of regex in string

If you just need to replace a static string, string.replace() is much, much faster!

Use re.sub(pattern,replacement,string). This will replace all occurrences of regex with the replacement string.

import re

# Example pattern: a digit
re.sub('\d','#','123foo456')
# returns '###foo###'

# Example pattern: valid HTML tags
re.sub('<[^>]+>','','foo <p> bar')
# returns 'foo  bar'

Replace using captured groups

Groups are referenced like this: '\1' for first group, '\2' for second group, etc.

Note the use of the r'' modifier for the strings.

Groups are 1-indexed (start at 1, not 0)

Example pattern: alphanumeric characters, a period (".") then more alphanumeric characters. (In other words, matches file extensions)

import re

# match strings like "foo.txt" and "bar.csv"
pattern = r'(^.+)\.(.+)$'

# replace the extension with "MYEXT"
re.sub(pattern,r'\1.MYEXT',"foo.txt")
# returns foo.MYEXT

# replace the file name with "MYFILE"
re.sub(pattern,r'MYFILE.\2',"foo.txt")
# returns MYFILE.txt

Split string by regex

Use re.split(pattern, string). Returns a list of strings that matched.

Example pattern: split string by spaces or commas

import re

re.split('[\s,]+','foo,bar bar  quux')
# ['foo', 'bar', 'bar', 'quux']

Non-capturing groups

TEMPLATE: (?:PATTERN)

Use non-capturing groups when you want to match something that is not just made up of characters (otherwise you'd just use a set of characters in square brackets) and you don't need to capture it.

Non-capturing groups are also slightly faster to compute than capturing groups.

Example pattern: a word boundary, the string "foo", then another word boundary

import re

# Replace "foo" when it's got non-words or line boundaries to the left and to the right
pattern = r'(?:\W|^)foo(?:\W|$)'
replacement = " FOO "

string = 'foo bar foo foofoo barfoobar foo'

re.sub(pattern,replacement,string)
# >>> ' FOO bar FOO foofoo barfoobar FOO '

Lookbehind

TEMPLATE: (?<=PATTERN)

Match pattern ONLY if it IS IMMEDIATELY PRECEDED by something.

Example pattern: "bar" only if it's preceded by "foo"

import re

# replace "bar" with "BAR" if it IS preceded by "foo"
pattern = "(?<=foo)bar"
replacement = "BAR"

string = "foo bar foobar"

re.sub(pattern,replacement,string)
# 'foo bar fooBAR'

Negative lookbehind

TEMPLATE: (?<!PATTERN)

Match pattern if it's NOT IMMEDIATELY PRECEDED by something.

Exampe pattern: "bar" if it's NOT preceded by "foo"

import re

# replace "bar" with BAR if it is NOT preceded by "foo"
pattern = "(?<!foo)bar"
replacement = "BAR"
string = "foo bar foobar"

re.sub(pattern,replacement,string)
# 'foo BAR foobar'

Lookahead

TEMPLATE: (?=PATTERN)

Match pattern if it IS IMMEDIATELY FOLLOWED by something else.

Example pattern: "foo" only if it's followed by "bar"

import re

# replace "foo" only if it IS followed by "bar"
pattern = "foo(?=bar)"
replacement = "FOO"
string = "foo bar foobar"

re.sub(pattern,replacement,string)
# 'foo bar fooBAR'

Negative Lookahead

TEMPLATE: (?!PATTERN)

Match pattern if it IS NOT IMMEDIATELY FOLLOWED by something else.

Example pattern: "foo" only if it's NOT followed by "bar"

import re

# replace "foo" only if it is NOT followed by "bar"
pattern = "foo(?!bar)"
replacement = "FOO"
string = "foo bar foobar"

re.sub(pattern,replacement,string)
# 'FOO bar foobar'

re.match(pattern,string) re.search(pattern,string)
pattern must match the beginning of
the string or nothing at all
pattern may be anywhere in the string, but
only the first match is returned.

re.match matches the beginning of the string, re.search matches the pattern a single time anywhere in the string

import re

# returns None, no match
re.match('abc','xx abc xx')
# None

# return a single MATCH
re.match('abc','abc')
# <_sre.SRE_Match object; span=(0, 3), match='abc'>

## returns a single MATCH
re.search('abc','xx abc xx')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>

# still returns a single MATCH, even though pattern occurs more than once
re.search('abc','xx abc xx abc')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>

Case-insensitive regular expressions

Just add (?i) before the pattern string.

Example pattern: match strings beginning with any variation of "foo", such as "FOO","Foo", etc.

import re

# matches
re.match("(?i)^foo.*","Foo")

# matches too
re.match("(?i)^foo.*","FOO")

# also this
re.match("(?i)^foo.*","foo")

Replacing a static string:

If you just need to replace a static string, calling string.replace() instead of re.sub is much faster.

"foo bar baz".replace("foo","FOO")
>> "FOO bar baz"

References and other resources

Dialogue & Discussion