0% found this document useful (0 votes)
77 views

Characters: Character Legend Example Sample Match

This document provides a summary of regular expression syntax elements including: 1. Common character classes like \d, \w, \s that match digits, word characters, and whitespace respectively. 2. Quantifiers like +, *, ? that specify how many times a character or group can occur. 3. Anchors like ^ and $ that match the start or end of a string or line. 4. Character sets defined with [] that match one character in the set. 5. Groups defined with () that capture part of the match or are used for backreferences. 6. Boundaries like \b that match word boundaries. 7. POSIX character classes like [
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Characters: Character Legend Example Sample Match

This document provides a summary of regular expression syntax elements including: 1. Common character classes like \d, \w, \s that match digits, word characters, and whitespace respectively. 2. Quantifiers like +, *, ? that specify how many times a character or group can occur. 3. Anchors like ^ and $ that match the start or end of a string or line. 4. Character sets defined with [] that match one character in the set. 5. Groups defined with () that capture part of the match or are used for backreferences. 6. Boundaries like \b that match word boundaries. 7. POSIX character classes like [
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Characters

Sample
Character Legend Example Match
Most engines: one digit
\d from 0 to 9 file_\d\d file_25
\d .NET, Python 3: one Unicode digit in any script file_\d\d file_9੩
Most engines: "word character": ASCII letter, digit or
\w underscore \w-\w\w\w A-b_1
.Python 3: "word character": Unicode letter, ideogram,
\w digit, or underscore \w-\w\w\w 字-ま_۳
.NET: "word character": Unicode letter, ideogram,
\w digit, or connector \w-\w\w\w 字-ま‿۳
Most engines: "whitespace character": space, tab, ab
\s newline, carriage return, vertical tab a\sb\sc c
.NET, Python 3, JavaScript: "whitespace character": ab
\s any Unicode separator a\sb\sc c
One character that is not a digit as defined by your
\D engine's \d \D\D\D ABC
One character that is not a word character as defined by
\W your engine's \w \W\W\W\W\W *-+=)
One character that is not a whitespace character as
\S defined by your engine's \s \S\S\S\S Yoyo

Quantifiers
Quantifier Legend Example Sample Match
+ One or more Version \w-\w+ Version A-b1_1
{3} Exactly three times \D{3} ABC
{2,4} Two to four times \d{2,4} 156
{3,} Three or more times \w{3,} regex_tutorial
* Zero or more times A*B*C* AAACC
? Once or none plurals? plural

More Characters
Character Legend Example Sample Match
. Any character except line break a.c abc
. Any character except line break .* whatever, man.
A period (special character: needs to be
\. escaped by a \) a\.c a.c
\ Escapes a special character \.\*\+\? \$\^\/\\ .*+? $^/\
\ Escapes a special character \[\{\(\)\}\] [{()}]

Logic
Logic Legend Example Sample Match
| Alternation / OR operand 22|33 33
( … ) Capturing group A(nt|pple) Apple (captures "pple")
\1 Contents of Group 1 r(\w)g\1x regex
\2 Contents of Group 2 (\d\d)\+(\d\d)=\2\+\1 12+65=65+12
(?: … ) Non-capturing group A(?:nt|pple) Apple
More White-Space
Sample
Character Legend Example Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
AB
\r\n Line separator on Windows AB\r\nCD CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace
\h character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
Perl, PCRE (C, PHP, R…), Java: one vertical whitespace
character: line feed, carriage return, vertical tab, form feed,
\v paragraph or line separator
Perl, PCRE (C, PHP, R…), Java: any character that is not a
\V vertical whitespace
Perl, PCRE (C, PHP, R…), Java: one line break (carriage return +
\R line feed pair, and all the characters matched by \v)

More Quantifiers
Quantifier Legend Example Sample Match
The + (one or more) is
+ "greedy" \d+ 12345
? Makes quantifiers "lazy" \d+? 1 in 12345
The * (zero or more) is
* "greedy" A* AAA
? Makes quantifiers "lazy" A*? empty in AAA
{2,4} Two to four times, "greedy" \w{2,4} abcd
? Makes quantifiers "lazy" \w{2,4}? ab in abcd

Character Classes
Character Legend Example Sample Match
[ … ] One of the characters in the brackets [AEIOU] One uppercase vowel
[ … ] One of the characters in the brackets T[ao]p Tap or Top
- Range indicator [a-z] One lowercase letter
[x-y] One of the characters in the range from x to y [A-Z]+ GREAT
[AB1-5w- One of either:
[ … ] One of the characters in the brackets z] A,B,1,2,3,4,5,w,x,y,z
Characters in the
printable section of
[x-y] One of the characters in the range from x to y [ -~]+ the ASCII table.
[^x] One character that is not x [^a-z]{3} A1!
Characters that are not in
One of the characters not in the range from x the printable section of
[^x-y] to y [^ -~]+ the ASCII table.
Any characters, inc-
luding new lines, which
the regular dot doesn't
[\d\D] One character that is a digit or a non-digit [\d\D]+ match
Matches the character at hexadecimal position [\x41-
[\x41] 41 in the ASCII table, i.e. A \x45]{3} ABE

Anchors and Boundaries


Anchor Legend Example Sample Match
Start of string or start of linedepending on multiline
^ mode. (But when [^inside brackets], it means "not") ^abc .* abc (line start)
End of string or end of linedepending on multiline
$ mode. Many engine-dependent subtleties. .*? the end$ this is the end
Beginning of string abc (string...
\A (all major engines except JS) \Aabc[\d\D]* ...start)
Very end of the string this is...\n...the
\z Not available in Python and JS the end\z end
End of string or (except Python) before final line
break this is...\n...the
\Z Not available in JS the end\Z end\n
Beginning of String or End of Previous Match
\G .NET, Java, PCRE (C, PHP, R…), Perl, Ruby
Word boundary
Most engines: position where one side only is an
\b ASCII letter, digit or underscore Bob.*\bcat\b Bob ate the cat
Word boundary
.NET, Java, Python 3, Ruby: position where one side Bob ate the
\b only is a Unicode letter, digit or underscore Bob.*\b\кошка\b кошка
\B Not a word boundary c.*\Bcat\B.* copycats

POSIX Classes
Character Legend Example Sample Match
PCRE (C, PHP, R…): ASCII letters A-Z and
[:alpha:] a-z [8[:alpha:]]+ WellDone88
[:alpha:] Ruby 2: Unicode letter or ideogram [[:alpha:]\d]+ кошка99
PCRE (C, PHP, R…): ASCII digits and
[:alnum:] letters A-Z and a-z [[:alnum:]]{10} ABCDE12345
[:alnum:] Ruby 2: Unicode digit, letter or ideogram [[:alnum:]]{10} кошка90210
PCRE (C, PHP, R…): ASCII punctuation
[:punct:] mark [[:punct:]]+ ?!.,:;
[:punct:] Ruby: Unicode punctuation mark [[:punct:]]+ ‽,:〽⁆

Inline Modifiers
None of these are supported in JavaScript. In Ruby, beware of (?s) and (?m).
Sample
Modifier Legend Example Match
Case-insensitive mode
(?i) (except JavaScript) (?i)Monday monDAY
DOTALL mode (except JS and Ruby). The
dot (.) matches new line characters (\r\n).
Also known as "single-line mode" because
the dot treats the entire input as a single From A
(?s) line (?s)From A.*to Z to Z
Multiline mode 1
(except Ruby and JS) ^ and $ match at 2
(?m) the beginning and end of every line (?m)1\r\n^2$\r\n^3$ 3
In Ruby: the same as (?s) in other
engines, i.e. DOTALL mode, i.e. dot From A
(?m) matches line breaks (?m)From A.*to Z to Z
(?x) # this is a
# comment
abc # write on multiple
Free-Spacing Mode mode # lines
(except JavaScript). Also known as [ ]d # spaces must be
(?x) comment mode or whitespace mode # in brackets abc d
Turns all (parentheses) into non-
capture groups. To capture,
(?n) .NET, PCRE 10.30+: named capture only use named groups.
The dot and the ^ and $ anchors
(?d) Java: Unix linebreaks only are only affected by \n
(?^) PCRE 10.32+: unset modifiers Unsets ismnxmodifiers

Lookarounds
Lookaround Legend Example Sample Match
(?=…) Positive lookahead (?=\d{10})\d{5} 01234 in 0123456789
(?<=…) Positive lookbehind (?<=\d)cat cat in 1cat
(?!…) Negative lookahead (?!theatre)the\w+ theme
(?<!…) Negative lookbehind \w{3}(?<!mon)ster Munster

Character Class Operations


Class
Operation Legend Example Sample Match
.NET: character class
subtraction. One character that
is in those on the left, but not Any lowercase
[…-[…]] in the subtracted class. [a-z-[aeiou]] consonant
An Arabic
character that is
not a non-digit,
.NET: character class i.e., an Arabic
[…-[…]] subtraction. [\p{IsArabic}-[\D]] digit
Java, Ruby 2+: character class An non-
intersection. One character whitespace
that is both in those on the left character that is
[…&&[…]] and in the && class. [\S&&[\D]] a non-digit.
An non-
whitespace
character that a
Java, Ruby 2+: character class non-digit and not
[…&&[…]] intersection. [\S&&[\D]&&[^a-zA-Z]] a letter.
Java, Ruby 2+: character class An English
subtraction is obtained by lowercase letter
intersecting a class with a that is not a
[…&&[^…]] negated class [a-z&&[^aeiou]] vowel.
An Arabic
character that is
Java, Ruby 2+: character class not a letter or a
[…&&[^…]] subtraction [\p{InArabic}&&[^\p{L}\p{N}]] number

Other Syntax
Sample
Syntax Legend Example Match
Keep Out
Perl, PCRE (C, PHP, R…), Python's alternate regexengine, Ruby
2+: drop everything that was matched so far from the overall
\K match to be returned prefix\K\d+ 12
Perl, PCRE (C, PHP, R…), Java: treat anything between the \Q(C++
\Q…\E delimiters as a literal string. Useful to escape metacharacters. ?)\E (C++ ?)

You might also like