Characters: Character Legend Example Sample Match
Characters: Character Legend Example Sample Match
Sample
Character Legend Example Match
Most engines: one digit
\d from 0 to 9 file_\d\d file_25
\d .NET, Python 3: one Unicode digit in any script file_\d\d file_9੩
Most engines: "word character": ASCII letter, digit or
\w underscore \w-\w\w\w A-b_1
.Python 3: "word character": Unicode letter, ideogram,
\w digit, or underscore \w-\w\w\w 字-ま_۳
.NET: "word character": Unicode letter, ideogram,
\w digit, or connector \w-\w\w\w 字-ま‿۳
Most engines: "whitespace character": space, tab, ab
\s newline, carriage return, vertical tab a\sb\sc c
.NET, Python 3, JavaScript: "whitespace character": ab
\s any Unicode separator a\sb\sc c
One character that is not a digit as defined by your
\D engine's \d \D\D\D ABC
One character that is not a word character as defined by
\W your engine's \w \W\W\W\W\W *-+=)
One character that is not a whitespace character as
\S defined by your engine's \s \S\S\S\S Yoyo
Quantifiers
Quantifier Legend Example Sample Match
+ One or more Version \w-\w+ Version A-b1_1
{3} Exactly three times \D{3} ABC
{2,4} Two to four times \d{2,4} 156
{3,} Three or more times \w{3,} regex_tutorial
* Zero or more times A*B*C* AAACC
? Once or none plurals? plural
More Characters
Character Legend Example Sample Match
. Any character except line break a.c abc
. Any character except line break .* whatever, man.
A period (special character: needs to be
\. escaped by a \) a\.c a.c
\ Escapes a special character \.\*\+\? \$\^\/\\ .*+? $^/\
\ Escapes a special character \[\{\(\)\}\] [{()}]
Logic
Logic Legend Example Sample Match
| Alternation / OR operand 22|33 33
( … ) Capturing group A(nt|pple) Apple (captures "pple")
\1 Contents of Group 1 r(\w)g\1x regex
\2 Contents of Group 2 (\d\d)\+(\d\d)=\2\+\1 12+65=65+12
(?: … ) Non-capturing group A(?:nt|pple) Apple
More White-Space
Sample
Character Legend Example Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
AB
\r\n Line separator on Windows AB\r\nCD CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace
\h character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
Perl, PCRE (C, PHP, R…), Java: one vertical whitespace
character: line feed, carriage return, vertical tab, form feed,
\v paragraph or line separator
Perl, PCRE (C, PHP, R…), Java: any character that is not a
\V vertical whitespace
Perl, PCRE (C, PHP, R…), Java: one line break (carriage return +
\R line feed pair, and all the characters matched by \v)
More Quantifiers
Quantifier Legend Example Sample Match
The + (one or more) is
+ "greedy" \d+ 12345
? Makes quantifiers "lazy" \d+? 1 in 12345
The * (zero or more) is
* "greedy" A* AAA
? Makes quantifiers "lazy" A*? empty in AAA
{2,4} Two to four times, "greedy" \w{2,4} abcd
? Makes quantifiers "lazy" \w{2,4}? ab in abcd
Character Classes
Character Legend Example Sample Match
[ … ] One of the characters in the brackets [AEIOU] One uppercase vowel
[ … ] One of the characters in the brackets T[ao]p Tap or Top
- Range indicator [a-z] One lowercase letter
[x-y] One of the characters in the range from x to y [A-Z]+ GREAT
[AB1-5w- One of either:
[ … ] One of the characters in the brackets z] A,B,1,2,3,4,5,w,x,y,z
Characters in the
printable section of
[x-y] One of the characters in the range from x to y [ -~]+ the ASCII table.
[^x] One character that is not x [^a-z]{3} A1!
Characters that are not in
One of the characters not in the range from x the printable section of
[^x-y] to y [^ -~]+ the ASCII table.
Any characters, inc-
luding new lines, which
the regular dot doesn't
[\d\D] One character that is a digit or a non-digit [\d\D]+ match
Matches the character at hexadecimal position [\x41-
[\x41] 41 in the ASCII table, i.e. A \x45]{3} ABE
POSIX Classes
Character Legend Example Sample Match
PCRE (C, PHP, R…): ASCII letters A-Z and
[:alpha:] a-z [8[:alpha:]]+ WellDone88
[:alpha:] Ruby 2: Unicode letter or ideogram [[:alpha:]\d]+ кошка99
PCRE (C, PHP, R…): ASCII digits and
[:alnum:] letters A-Z and a-z [[:alnum:]]{10} ABCDE12345
[:alnum:] Ruby 2: Unicode digit, letter or ideogram [[:alnum:]]{10} кошка90210
PCRE (C, PHP, R…): ASCII punctuation
[:punct:] mark [[:punct:]]+ ?!.,:;
[:punct:] Ruby: Unicode punctuation mark [[:punct:]]+ ‽,:〽⁆
Inline Modifiers
None of these are supported in JavaScript. In Ruby, beware of (?s) and (?m).
Sample
Modifier Legend Example Match
Case-insensitive mode
(?i) (except JavaScript) (?i)Monday monDAY
DOTALL mode (except JS and Ruby). The
dot (.) matches new line characters (\r\n).
Also known as "single-line mode" because
the dot treats the entire input as a single From A
(?s) line (?s)From A.*to Z to Z
Multiline mode 1
(except Ruby and JS) ^ and $ match at 2
(?m) the beginning and end of every line (?m)1\r\n^2$\r\n^3$ 3
In Ruby: the same as (?s) in other
engines, i.e. DOTALL mode, i.e. dot From A
(?m) matches line breaks (?m)From A.*to Z to Z
(?x) # this is a
# comment
abc # write on multiple
Free-Spacing Mode mode # lines
(except JavaScript). Also known as [ ]d # spaces must be
(?x) comment mode or whitespace mode # in brackets abc d
Turns all (parentheses) into non-
capture groups. To capture,
(?n) .NET, PCRE 10.30+: named capture only use named groups.
The dot and the ^ and $ anchors
(?d) Java: Unix linebreaks only are only affected by \n
(?^) PCRE 10.32+: unset modifiers Unsets ismnxmodifiers
Lookarounds
Lookaround Legend Example Sample Match
(?=…) Positive lookahead (?=\d{10})\d{5} 01234 in 0123456789
(?<=…) Positive lookbehind (?<=\d)cat cat in 1cat
(?!…) Negative lookahead (?!theatre)the\w+ theme
(?<!…) Negative lookbehind \w{3}(?<!mon)ster Munster
Other Syntax
Sample
Syntax Legend Example Match
Keep Out
Perl, PCRE (C, PHP, R…), Python's alternate regexengine, Ruby
2+: drop everything that was matched so far from the overall
\K match to be returned prefix\K\d+ 12
Perl, PCRE (C, PHP, R…), Java: treat anything between the \Q(C++
\Q…\E delimiters as a literal string. Useful to escape metacharacters. ?)\E (C++ ?)