0% found this document useful (0 votes)
476 views1 page

Reg Ex Cheat Sheet

This document provides a cheat sheet for regular expressions (regex) in R. It outlines common regex patterns for matching characters, word boundaries, grouping, quantifiers and more. It also describes functions like grep(), regexpr(), stringr::str_extract() and sub() for extracting, detecting, replacing matches in strings using regex patterns.

Uploaded by

Ian Flores
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
476 views1 page

Reg Ex Cheat Sheet

This document provides a cheat sheet for regular expressions (regex) in R. It outlines common regex patterns for matching characters, word boundaries, grouping, quantifiers and more. It also describes functions like grep(), regexpr(), stringr::str_extract() and sub() for extracting, detecting, replacing matches in strings using regex patterns.

Uploaded by

Ian Flores
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

pattern

regmatches(string, regexpr(pattern, string))


Cheat Sheet extract first match [1] "tam" "tim"
string regmatches(string, gregexpr(pattern, string))
extracts all matches, outputs a list
[[1]] "tam" [[2]] character(0) [[3]] "tim" "tom"
stringr::str_extract(string, pattern)
extract first match [1] "tam" NA "tim"
[[:digit:]] or \\d Digits; [0-9] stringr::str_extract_all(string, pattern)
\\D Non-digits; [^0-9] extract all matches, outputs a list
[[:lower:]] Lower-case letters; [a-z] > string <- c("Hiphopopotamus", "Rhymenoceros", "time for bottomless lyrics")
stringr::str_extract_all(string, pattern, simplify = TRUE)
[[:upper:]] Upper-case letters; [A-Z] > pattern <- "t.m"
extract all matches, outputs a matrix
[[:alpha:]] Alphabetic characters; [A-z]
stringr::str_match(string, pattern)
[[:alnum:]] Alphanumeric characters [A-z0-9]
extract first match + individual character groups
\\w Word characters; [A-z0-9_]
\\W Non-word characters grep(pattern, string) regexpr(pattern, string) stringr::str_match_all(string, pattern)
[[:xdigit:]] or \\x Hexadec. digits; [0-9A-Fa-f] [1] 1 3 find starting position and length of first match extract all matches + individual character groups
[[:blank:]] Space and tab grep(pattern, string, value = TRUE) gregexpr(pattern, string)
[[:space:]] or \\s Space, tab, vertical tab, newline, [1] "Hiphopopotamus" find starting position and length of all matches
form feed, carriage return [2] "time for bottomless lyrics stringr::str_locate(string, pattern)
\\S Not space; [^[:space:]] sub(pattern, replacement, string)
grepl(pattern, string) find starting and end position of first match replace first match
[[:punct:]] Punctuation characters; [1] TRUE FALSE TRUE stringr::str_locate_all(string, pattern) gsub(pattern, replacement, string)
!"#$%&()*+,-./:;<=>?@[]^_`{|}~
stringr::str_detect(string, pattern) find starting and end position of all matches replace all matches
Graphical char.;
[[:graph:]] [[:alnum:][:punct:]]
[1] TRUE FALSE TRUE
stringr::str_replace(string, pattern, replacement)
Printable characters; replace first match
[[:print:]] [[:alnum:][:punct:]\\s]
[[:cntrl:]] or \\c Control characters; \n, \r etc. stringr::str_replace_all(string, pattern, replacement)
strsplit(string, pattern) or stringr::str_split(string, pattern) replace all matches

\n New line . Any character except \n * Matches at least 0 times


^ Start of the string
\r Carriage return | Or, e.g. (a|b) + Matches at least 1 time
$ End of the string
\t Tab [] List permitted characters, e.g. [abc] ? Matches at most 1 time; optional string
\\b Empty string at either edge of a word
\v Vertical tab [a-z] Specify character ranges {n} Matches exactly n times
\\B NOT the edge of a word
\f Form feed [^] List excluded characters {n,} Matches at least n times
\\< Beginning of a word
() Grouping, enables back referencing using {,n} Matches at most n times
\\> End of a word
\\N where N is an integer {n,m} Matches between n and m times

(?=) Lookahead (requires PERL = TRUE),


e.g. (?=yx): position followed by 'xy' By default R uses POSIX extended regular Metacharacters (. * + etc.) can be used as By default the asterisk * is greedy, i.e. it always
(?!) Negative lookahead (PERL = TRUE); expressions. You can switch to PCRE regular literal characters by escaping them. Characters matches the longest possible string. It can be
position NOT followed by pattern expressions using PERL = TRUE for base or by can be escaped using \\ or by enclosing them used in lazy mode by adding ?, i.e. *?.
(?<=) Lookbehind (PERL = TRUE), e.g. wrapping patterns with perl() for stringr. in \\Q...\\E.
(?<=yx): position following 'xy' Greedy mode can be turned off using (?U). This
Negative lookbehind (PERL = TRUE); All functions can be used with literal searches switches the syntax, so that (?U)a* is lazy and
(?<!) position NOT following pattern using fixed = TRUE for base or by wrapping (?U)a*? is greedy.
patterns with fixed() for stringr. Regular expressions can be made case insensitive
?(if)then If-then-condition (PERL = TRUE); use
using (?i). In backreferences, the strings can be
lookaheads, optional char. etc in if-clause
All base functions can be made case insensitive converted to lower or upper case using \\L or \\U
?(if)then|else If-then-else-condition (PERL = TRUE) Regular expressions can conveniently be
by specifying ignore.cases = TRUE. (e.g. \\L\\1). This requires PERL = TRUE.
*see, e.g. http://www.regular-expressions.info/lookaround.html created using rex::rex().
http://www.regular-expressions.info/conditional.html

CC BY Ian Kopacka [email protected] Updated: 09/16

You might also like