Python Regular Expressions: Examples & Reference
Last updated:- String matches regex
- Whole string matches regex
- String contains regex
- Extract capture group
- Extract capture group, anywhere in string
- Extract capture group, multiple times
- Extract first occurrence of regex
- Extract all occurrences of regex
- Extract all regex matches
- Replace regex in string
- Replace only the first occurrence of regex
- Replace captured group
- Split string by regex
- Split string by word boundary
- Non-capturing groups
- re.match VS re.search
- re.findall VS re.finditer
- Case-insensitive regex
Usage examples for regular expressions in Python.
Unless otherwise stated, examples use Python 3. See all examples on this jupyter notebook
String matches regex
The pattern must match at the beginning of the string. To match the full string, see below
Use re.match(pattern, string)
.
This method returns a match object in case there was a match, or None
if there was no match.
Example pattern: one or more digits
import re
if re.match('\d+','123foo'):
# match because the string starts with '123'
else:
# no match
Whole string matches regex
In this case, there is only a match if the string fully matches the given pattern.
Example pattern: "foo" followed by one or more digits
If on Python 3.4 or newer, use
re.fullmatch(pattern,string)
:import re # Python version >=3.4 # match because th string FULLY matches the pattern re.fullmatch('foo\d+','foo123') #>>> <_sre.SRE_Match object; span=(0, 6), match='foo123'> # no match because although the pattern matches the beginning of the # string, there's extra characters at the end ("bar") re.fullmatch('foo\d+','foo123bar') # >>>None
On older Python versions, wrap the pattern with
^
and$
and usere.match(pattern,string)
instead:import re # Python version < 3.4 # match because th string FULLY matches the pattern re.match('^foo\d+$','foo123') #>>> <_sre.SRE_Match at 0x7f690806b648> # no match because although the pattern matches the beginning of the # string, there's extra characters at the end ("bar") re.match('^foo\d+$','foo123bar') # >>>None
String contains regex
To know if a regex is present in a string, use re.search(pattern, string)
:
Example pattern: a sequence of 3 digits:
import re
# returns a match
re.search('\d{3}','foo 123 bar')
# >>> <_sre.SRE_Match object; span=(4, 7), match='123'>
# no match, returns None
re.search('\d{3}','foo 1 23 bar')
# >>> None
Extract capture group
To extract anywhere in the string, use re.search instead
Use re.match(pattern, string)
. Note that the pattern must match the whole string.
Example pattern: 3 characters followed by a dash then another 3 characters
import re
# this pattern matches things like "foo-bar" or "bar-baz"
pattern = "^(\w{3})-(\w{3})$"
string1 = "abc-def"
# returns Null if no match
matches = re.match("^(\w{3})-(\w{3})$",string1)
if matches:
# match indices start at 1
first_group_match = matches.group(1)
# abc
second_group_match = matches.group(2)
# def
print(first_group_match+" AND "+second_group_match)
# prints: "abc AND def"
Extract capture group, anywhere in string
Only the first occurrence of the capture can be extracted.
Use re.search(pattern, string)
and .group()
.
Example: capture digits followed by "x"
import re
# one or more digits followed by 'x'
pat = r"(\d+)x"
# the second occurrence ('456') is not captured.
re.search(pat, "123x 456x").group(1)
# >>> "123"
Extract capture group, multiple times
finditer
also works, but you get the fullMatch
object for each capture
import re
pattern = r'a(\d+)'
re.findall(pattern, 'a1 b2 a32')
# >>> ['1', '32']
Extract first occurrence of regex
This matches a pattern anywhere in the string, but only once.
Will return None
if there are no matches.
Use re.search(pattern, string)
rather than re.match
Example pattern: letter 'b' followed by two alphanumeric characters
import re
pattern = r'b\w{2}'
match = re.search(pattern,"foo bar baz quux")
# >>> <_sre.SRE_Match object; span=(4, 7), match='bar'>
# will be None if there are no matches
if match:
match.group(0)
# >>> 'bar'
Extract all occurrences of regex
findall
is likefinditer
, but returns strings instead of Match objects.
Use re.findall(pattern, string)
to extract multiple ocurrences of pattern
in string
.
import re
# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'
re.findall(pattern,"foo bar baz quux")
# >>> ['bar', 'baz']
Extract all regex matches
finditer
is likefindall
, but returns Match objects instead of strings
To extract all matches of a regex in a string, returning Match
objects, use re.finditer(pattern, string)
:
import re
# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'
matches = re.finditer(pattern,'foo bar baz quux')
# wrap the result in a list because it's a generator
list(matches)
# >>> [<_sre.SRE_Match object; span=(4, 7), match='bar'>,
# <_sre.SRE_Match object; span=(8, 11), match='baz'>]
Replace regex in string
This will replace all occurrences of regex in a string.
Use re.sub(pattern, replacement, string)
:
import re
# Example pattern: a digit
re.sub('\d','#','123foo456')
# returns '###foo###'
# Example pattern: valid HTML tags
re.sub('<[^>]+>','','foo <p> bar')
# returns 'foo bar'
Replace only the first occurrence of regex
count=N
means replace only the first N occurrences
Use re.sub(pattern, replacement, string, count=1)
Replace captured group
Groups are referenced like this: '\1'
for first group, '\2'
for second group, etc.
Note the use of the r''
modifier for the strings.
Groups are 1-indexed (they start at 1, not 0)
Example pattern: match file names (e.g. "file.txt"
)
import re
# match strings like "foo.txt" and "bar.csv"
pattern = r'(^.+)\.(.+)$'
# replace the extension with "MYEXT"
re.sub(pattern, r'\1.MYEXT',"foo.txt")
# returns foo.MYEXT
# replace the file name with "MYFILE"
re.sub(pattern, r'MYFILE.\2',"foo.txt")
# returns MYFILE.txt
Split string by regex
Use re.split(pattern, string)
. Returns a list of strings that matched.
Example pattern: split string by spaces or commas
import re
re.split('[\s,]+', 'foo,bar bar quux')
# ['foo', 'bar', 'bar', 'quux']
Split string by word boundary
You can't use re.split(r'\b', string)
because Python will complain: split() requires a non-empty pattern match.
Use findall(r'\w+', string)
instead:
import re
# ValueError: split() requires a non-empty pattern match.
re.split(r'\b','foo,bar bar-quux')
# this is what you want:
re.findall(r'\w+','foo,bar bar-quux')
# >>> ['foo', 'bar', 'bar', 'quux']
Non-capturing groups
TEMPLATE:
(?:PATTERN)
Use non-capturing groups when you want to match something that is not just made up of characters (otherwise you'd just use a set of characters in square brackets) and you don't need to capture it.
Non-capturing groups are also slightly faster to compute than capturing groups.
Example pattern: a word boundary, the string "foo"
, then another word boundary
import re
# Replace "foo" when it's got non-words or line boundaries to the left and to the right
pattern = r'(?:\W|^)foo(?:\W|$)'
replacement = " FOO "
string = 'foo bar foo foofoo barfoobar foo'
re.sub(pattern,replacement,string)
# >>> ' FOO bar FOO foofoo barfoobar FOO '
re.match VS re.search
re.match(pattern,string) | re.search(pattern,string) |
---|---|
pattern must match the beginning ofthe string or nothing at all |
pattern may be anywhere in the string, but only the first match is returned. |
re.match
matches the beginning of the string, re.search
matches the pattern a single time anywhere in the string
import re
# returns None, no match
re.match('abc','xx abc xx')
# None
# return a single MATCH
re.match('abc','abc')
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
## returns a single MATCH
re.search('abc','xx abc xx')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>
# still returns a single MATCH, even though pattern occurs more than once
re.search('abc','xx abc xx abc')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>
re.findall VS re.finditer
Both findall(pattern, string)
and finditer(pattern, string)
can be used to return multiple occurrences of a pattern in a string.
re.findall(pattern, text) | re.finditer(pattern, text) |
---|---|
Returns a (possibly empty) list of string s where the regex found a match in the string |
Returns a (possibly empty) list of Match objects, containing the matching text but also the starting and end positions where there were matches |
Case-insensitive regex
In order to make your regular expressions case-insensitive Just add (?i)
before the pattern string.
Example pattern: match strings beginning with "foo"
, "FOO"
,"Foo"
, etc.
import re
re.match("(?i)^foo.*","Foo")
# >>> <_sre.SRE_Match object; span=(0, 3), match='Foo'>
re.match("(?i)^foo.*","FOO")
# >>> <_sre.SRE_Match object; span=(0, 3), match='FOO'>
re.match("(?i)^foo.*","foo")
# >>> <_sre.SRE_Match object; span=(0, 3), match='foo'>