Python Regular Expressions: Examples & Reference
Last updated:- Summary
- String matches regex
- Whole string matches regex
- String contains regex
- Extract capture group
- Extract capture group, anywhere in string
- Extract capture group, multiple times
- Extract first occurrence of regex
- Extract all occurrences of regex
- Extract all regex matches
- Replace regex in string
- Replace only the first occurrence of regex
- Replace captured group
- Split string by regex
- Split string by word boundary
- Non-capturing groups
- re.match VS re.search
- re.findall VS re.finditer
- Case-insensitive regex
Usage examples for regular expressions in Python 3
Unless otherwise stated, examples use Python 3. See all examples on this jupyter notebook
Summary
function | where | number of matches returned |
---|---|---|
re.match | Match the beginning of the string | First match only |
re.search | Match anywhere in a string | First match only |
re.findall | Match anywhere in a string | All matches |
String matches regex
The pattern must match at the beginning of the string. To match the full string, see below
Use re.match(pattern, string)
.
This method returns a match object in case there was a match, or None
if there was no match.
Example pattern: one or more digits
import re
if re.match('\d+','123foo'):
# match because the string starts with '123'
else:
# no match
Whole string matches regex
In this case, there is only a match if the string fully matches the given pattern.
Example pattern: "foo" followed by one or more digits
If on Python 3.4 or newer, use
re.fullmatch(pattern,string)
:import re # Python version >=3.4 # match because th string FULLY matches the pattern re.fullmatch('foo\d+','foo123') #>>> <_sre.SRE_Match object; span=(0, 6), match='foo123'> # no match because although the pattern matches the beginning of the # string, there's extra characters at the end ("bar") re.fullmatch('foo\d+','foo123bar') # >>>None
On older Python versions, wrap the pattern with
^
and$
and usere.match(pattern,string)
instead:import re # Python version < 3.4 # match because th string FULLY matches the pattern re.match('^foo\d+$','foo123') #>>> <_sre.SRE_Match at 0x7f690806b648> # no match because although the pattern matches the beginning of the # string, there's extra characters at the end ("bar") re.match('^foo\d+$','foo123bar') # >>>None
String contains regex
To know if a regex is present in a string, use re.search(pattern, string)
:
Example pattern: a sequence of 3 digits:
import re
# returns a match
re.search('\d{3}','foo 123 bar')
# >>> <_sre.SRE_Match object; span=(4, 7), match='123'>
# no match, returns None
re.search('\d{3}','foo 1 23 bar')
# >>> None
Extract capture group
To extract anywhere in the string, use re.search instead
Use re.match(pattern, string)
. Note that the pattern must match the whole string.
Example pattern: 3 characters followed by a dash then another 3 characters
import re
# this pattern matches things like "foo-bar" or "bar-baz"
pattern = "^(\w{3})-(\w{3})$"
string1 = "abc-def"
# returns Null if no match
matches = re.match("^(\w{3})-(\w{3})$",string1)
if matches:
# match indices start at 1
first_group_match = matches.group(1)
# abc
second_group_match = matches.group(2)
# def
print(first_group_match+" AND "+second_group_match)
# prints: "abc AND def"
Extract capture group, anywhere in string
Only the first occurrence of the capture can be extracted.
Use re.search(pattern, string)
and .group()
.
Example: capture digits followed by "x"
import re
# one or more digits followed by 'x'
pat = r"(\d+)x"
# the second occurrence ('456') is not captured.
re.search(pat, "123x 456x").group(1)
# >>> "123"
Extract capture group, multiple times
finditer
also works, but you get the fullMatch
object for each capture
import re
pattern = r'a(\d+)'
re.findall(pattern, 'a1 b2 a32')
# >>> ['1', '32']
Extract first occurrence of regex
This matches a pattern anywhere in the string, but only once.
Will return None
if there are no matches.
Use re.search(pattern, string)
rather than re.match
Example pattern: letter 'b' followed by two alphanumeric characters
import re
pattern = r'b\w{2}'
match = re.search(pattern,"foo bar baz quux")
# >>> <_sre.SRE_Match object; span=(4, 7), match='bar'>
# will be None if there are no matches
if match:
match.group(0)
# >>> 'bar'
Extract all occurrences of regex
findall
is likefinditer
, but returns strings instead of Match objects.
Use re.findall(pattern, string)
to extract multiple ocurrences of pattern
in string
.
import re
# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'
re.findall(pattern,"foo bar baz quux")
# >>> ['bar', 'baz']
Extract all regex matches
finditer
is likefindall
, but returns Match objects instead of strings
To extract all matches of a regex in a string, returning Match
objects, use re.finditer(pattern, string)
:
import re
# letter 'b' followed by two alphanumeric characters
pattern = r'b\w{2}'
matches = re.finditer(pattern,'foo bar baz quux')
# wrap the result in a list because it's a generator
list(matches)
# >>> [<_sre.SRE_Match object; span=(4, 7), match='bar'>,
# <_sre.SRE_Match object; span=(8, 11), match='baz'>]
Replace regex in string
This will replace all occurrences of regex in a string.
Use re.sub(pattern, replacement, string)
:
import re
# Example pattern: a digit
re.sub('\d','#','123foo456')
# returns '###foo###'
# Example pattern: valid HTML tags
re.sub('<[^>]+>','','foo <p> bar')
# returns 'foo bar'
Replace only the first occurrence of regex
count=N
means replace only the first N occurrences
Use re.sub(pattern, replacement, string, count=1)
Replace captured group
Groups are referenced like this: '\1'
for first group, '\2'
for second group, etc.
Note the use of the r''
modifier for the strings.
Groups are 1-indexed (they start at 1, not 0)
Example pattern: match file names (e.g. "file.txt"
)
import re
# match strings like "foo.txt" and "bar.csv"
pattern = r'(^.+)\.(.+)$'
# replace the extension with "MYEXT"
re.sub(pattern, r'\1.MYEXT',"foo.txt")
# returns foo.MYEXT
# replace the file name with "MYFILE"
re.sub(pattern, r'MYFILE.\2',"foo.txt")
# returns MYFILE.txt
Split string by regex
Use re.split(pattern, string)
. Returns a list of strings that matched.
Example pattern: split string by spaces or commas
import re
re.split('[\s,]+', 'foo,bar bar quux')
# ['foo', 'bar', 'bar', 'quux']
Split string by word boundary
You can't use re.split(r'\b', string)
because Python will complain: split() requires a non-empty pattern match.
Use findall(r'\w+', string)
instead:
import re
# ValueError: split() requires a non-empty pattern match.
re.split(r'\b','foo,bar bar-quux')
# this is what you want:
re.findall(r'\w+','foo,bar bar-quux')
# >>> ['foo', 'bar', 'bar', 'quux']
Non-capturing groups
TEMPLATE:
(?:PATTERN)
Use non-capturing groups to match something you don't need to capture.
Non-capturing groups are also faster to compute than capturing groups.
Example pattern: a word boundary, the string "foo"
, then another word boundary
import re
# Replace "foo" when it's got non-words or line boundaries to the left and to the right
pattern = r'(?:\W|^)foo(?:\W|$)'
replacement = " FOO "
string = 'foo bar foo foofoo barfoobar foo'
re.sub(pattern,replacement,string)
# >>> ' FOO bar FOO foofoo barfoobar FOO '
re.match VS re.search
re.match(pattern,string) | re.search(pattern,string) |
---|---|
pattern must match the beginning ofthe string or nothing at all |
pattern may be anywhere in the string, but only the first match is returned. |
re.match
matches the beginning of the string, re.search
matches the pattern a single time anywhere in the string
import re
# returns None, no match
re.match('abc','xx abc xx')
# None
# return a single MATCH
re.match('abc','abc')
# <_sre.SRE_Match object; span=(0, 3), match='abc'>
## returns a single MATCH
re.search('abc','xx abc xx')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>
# still returns a single MATCH, even though pattern occurs more than once
re.search('abc','xx abc xx abc')
# <_sre.SRE_Match object; span=(3, 6), match='abc'>
re.findall VS re.finditer
Both findall(pattern, string)
and finditer(pattern, string)
can be used to return multiple occurrences of a pattern in a string.
re.findall(pattern, text) | re.finditer(pattern, text) |
---|---|
Returns a (possibly empty) list of string s where the regex found a match in the string |
Returns a (possibly empty) list of Match objects, containing the matching text but also the starting and end positions where there were matches |
Case-insensitive regex
In order to make your regular expressions case-insensitive Just add (?i)
before the pattern string.
Example pattern: match strings beginning with "foo"
, "FOO"
,"Foo"
, etc.
import re
re.match("(?i)^foo.*","Foo")
# >>> <_sre.SRE_Match object; span=(0, 3), match='Foo'>
re.match("(?i)^foo.*","FOO")
# >>> <_sre.SRE_Match object; span=(0, 3), match='FOO'>
re.match("(?i)^foo.*","foo")
# >>> <_sre.SRE_Match object; span=(0, 3), match='foo'>