0% found this document useful (0 votes)

86 views

Fast String Matching With Wildcards, Globs, and Gitignore-Style Globs - How Not To Blow It Up - CodeProject

Fast String Matching With Wildcards, Globs, And Gitignore-Style Globs - How Not to Blow It Up - CodeProject

Uploaded by

gfgomes

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views

Fast String Matching With Wildcards, Globs, and Gitignore-Style Globs - How Not To Blow It Up - CodeProject

Fast String Matching With Wildcards, Globs, And Gitignore-Style Globs - How Not to Blow It Up - CodeProject

Uploaded by

gfgomes

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Fast String Matching with Wildcards, Globs, and

Gitignore-Style Globs - How Not to Blow it Up

Robert van Engelen

19 Sep 2019 CPOL

Classic globbing and modern gitignore-style globbing algorithms can be fast, whereas recursive implementations are known to
blow up exponentially; why some freely available source code should not be used.

Download Python sources - 1.8 KB

Download JavaScript sources - 1.9 KB
Download Java sources - 7.8 KB
Download C and C++ sources - 4.1 KB

Introduction
Jack Handy’s wildcard string compare efficiently compares a string to a pattern containing * and ? wildcards. The algorithm is fast
and safe, but does not support gitignore-style globbing rules. In this article, I will illustrate that classic globbing and modern
gitignore-style globbing algorithms can be fast too. I will also explain what’s wrong with recursive implementations that blow up
exponentially and why some freely available source code should not be used.

Background
Wildcard string matching and globbing isn’t as trivial as it may seem at first. In fact, mistakes in the past resulted in serious
vulnerabilities such as denial of service. Simple patches, such as limiting the number of matches to limit the CPU time, have been
applied to fix implementations that suffer exponential blow-up in execution time. More disconcerting is that buggy globbing
implementations can be easily located on the Web: just search for wildmat.c to find copies of implementations that may crash when
a character class range is incomplete, such as [a-.

The Trouble With Recursion

To understand how globbing implementations may cause denial of service, we will take a quick look at some examples. Recursion is
often used in simple implementations of wildcard matching with * and ?. The idea is to scan the pattern and the string from left to
right while pairwise matching the characters. When a star is encountered in the pattern, the do-match function is recursively called
to compare the rest of the pattern to the rest of the string. If the recursive do-match call succeeds, then we’re done. If the recursive
call fails however, we go back where we left off before the recursive call, advance one character further in the string, and repeat the
recursive call. This loop is the “star-loop” that consumes none, one, or more characters in place of a star in the pattern as shown in
the example below:

// returns TRUE if text string matches wild pattern with * and ?

int naive_recursive_match(const char *text, const char *wild)
{
while (*text != '\0')
{
if (*wild == '*')
{
// any number of stars act as one star
while (*++wild == '*')
continue;
// a star at the end of the pattern matches any text
if (*wild == '\0')
return TRUE;
// star-loop: match the rest of the pattern and text
while (naive_recursive_match(text, wild) == FALSE && *text != '\0')
text++;
return *text != '\0' ? TRUE : FALSE;
}
// ? matches any character or we match the current non-NUL character
if (*wild != '?' && *wild != *text)
return FALSE;
text++;
wild++;
}
// ignore trailing stars
while (*wild == '*')
wild++;
// at end of text means success if nothing else is left to match
return *wild == '\0' ? TRUE : FALSE;
}

This algorithm is fairly simple and needs little explanation: recursive calls are made for stars until matching or the end of the string
of text is reached. A ? matches any character. Otherwise, the current pattern character is matched with the current character in the
text. We move up in the text and in the pattern by one character, and repeat until the end of the text is reached. Note that any
trailing stars can be ignored in the pattern when we get to the end of the text.

This approach works well for short strings to match and patterns with few stars. However, it is easy to find examples that result in an
explosive execution time blow-up due to excessive backtracking on stars. Cases such as a*a*a*a*a*a*a*a*b and a string
aaa…aaa of 100 a’s take minutes to terminate with a failure to match. Just try the following two shell commands:

$ touch `python3 -c 'print(100*"a")'`

$ ls a*a*a*a*a*a*a*a*b

It takes bash-3.2 about 8 minutes to terminate on my 2.9GHz i7 machine with decent performance. It takes tcsh-6.18 over
an hour to terminate. The explosive 2n states for n stars is to blame for this behavior.

Using ABORT
One of the first globbing algorithms in history written by Rich Salz wildmat.c (original) had this undesirable behavior, but was
updated by Lars Mathiesen in the early 2000s with an ABORT state. Their comments in the updated wildmat.c source code still
remain there to this day and are quite insightful:

Quote:

Once the control of one instance of DoMatch enters the star-loop, that instance will return either TRUE or ABORT, and any
calling instance will therefore return immediately after (without calling recursively again). In effect, only one star-loop is ever
active. It would be possible to modify the code to maintain this context explicitly, eliminating all recursive calls at the cost of
some complication and loss of clarity (and the ABORT stuff seems to be unclear enough by itself).

Apparently, the exponential blow-up problem was already well known at that time, as we find the following comment in wildmat.c
demonstrating the issue:

pattern: -*-*-*-*-*-*-12-*-*-*-m-*-*-*
text 1: -adobe-courier-bold-o-normal--12-120-75-75-m-70-iso8859-1
text 2: -adobe-courier-bold-o-normal--12-120-75-75-X-70-iso8859-1
Text 1 matches with 51 calls, while text 2 fails with 54 calls.
Without the ABORT, then it takes 22310 calls to fail. Ugh.
The powerful insight here is that only the last star-loop should be active, because advancing any of the previous star-loops is not
productive when the last star-loop fails. This clever change made globbing execute in linear time for the typical case. Only in the
worst case, the algorithm takes quadratic time in the length of the pattern and string.

Let’s apply this improvement to our example recursive matcher:

enum { FALSE = 0, TRUE = 1, ABORT = 2 };

// returns TRUE if text string matches wild pattern with * and ?
int recursive_match(const char *text, const char *wild)
{
while (*text != '\0')
{
if (*wild == '*')
{
int ret;
// any number of stars act as one star
while (*++wild == '*')
continue;
// a star at the end of the pattern matches any text
if (*wild == '\0')
return TRUE;
// star-loop: match the rest of the pattern and text
while ((ret = recursive_match(text, wild)) == FALSE && *text != '\0')
text++;
// ABORT recursion when failing the last star-loop
if (ret != TRUE)
return ABORT;
return *text != '\0' ? TRUE : FALSE;
}
// ? matches any character or we match the current non-NUL character
if (*wild != '?' && *wild != *text)
return FALSE;
text++;
wild++;
}
// ignore trailing stars
while (*wild == '*')
wild++;
// at end of text means success if nothing else is left to match
return *wild == '\0' ? TRUE : FALSE;
}

This improvement avoids exponential blow-up, but still makes a recursive call for each star in the pattern. This is not necessary. We
can save that state of the matcher so that we can restore it from our backup to execute another iteration of the last star-loop until
we are done.

Non-Recursive Wildcard String Matching

To replace recursion with iteration, we need two backup positions: the current position in the pattern right after the star and the
current position in the text string. To iterate the star-loop, we restore the backed-up positions, advance one character in the text
string, and try again. We keep iterating until the match succeeds or we run out of text. If a star occurs after our last backup position,
we effectively enter a new star-loop and remove the one we were iterating on previously:

// returns TRUE if text string matches wild pattern with * and ?

int match(const char *text, const char *wild)
{
const char *text_backup = NULL;
const char *wild_backup = NULL;
while (*text != '\0')
{
if (*wild == '*')
{
// new star-loop: backup positions in pattern and text
text_backup = text;
wild_backup = ++wild;
}
else if (*wild == '?' || *wild == *text)
{
// ? matched any character or we matched the current non-NUL character
text++;
wild++;
}
else
{
// if no stars we fail to match
if (wild_backup == NULL)
return FALSE;
// star-loop: backtrack to the last * by restoring the backup positions
// in the pattern and text
text = ++text_backup;
wild = wild_backup;
}
}
// ignore trailing stars
while (*wild == '*')
wild++;
// at end of text means success if nothing else is left to match
return *wild == '\0' ? TRUE : FALSE;
}

The execution time of this algorithm is linear in the length of the pattern and string for typical cases. In the worst case, this
algorithm takes quadratic time (to see why, consider pattern *aaa…aaab with ½n a’s and string aaa…aaa with n a’s, which
requires ¼n2 comparisons before failing.)

In case you're asking if there is a linear worst-case algorithm, the answer is yes: by constructing a deterministic finite automaton
(DFA) from the pattern for matching. This is beyond the scope of this article and requires taking into account the (high) cost of DFA
construction.

Non-Recursive Glob Matching

Now that we reviewed the background and issues with string matching and globbing, let's move on to the main part of this article.
Globbing differs from string matching with respect to a subtle meaning of the * and ? wildcards in globs: these wildcards do not
match pathname separators, i.e., do not match / (and do not match \ in Windows). For example, foo*.h does not match foo/bar.h.

To adjust our previous match algorithm to perform glob matching, we add some checks for the / pathname separator such that
* and ? cannot match it:

// returns TRUE if text string matches glob-like pattern with * and ?

int globly_match(const char *text, const char *glob)
{
const char *text_backup = NULL;
const char *glob_backup = NULL;
while (*text != '\0')
{
if (*glob == '*')
{
// new star-loop: backup positions in pattern and text
text_backup = text;
glob_backup = ++glob;
}
else if ((*glob == '?' && *text != '/') || *glob == *text)
{
// ? matched any character except /, or we matched the current non-NUL character
text++;
glob++;
}
else
{
if (glob_backup == NULL || *text_backup == '/')
return FALSE;
// star-loop: backtrack to the last * but do not jump over /
text = ++text_backup;
glob = glob_backup;
}
}
// ignore trailing stars
while (*glob == '*')
glob++;
// at end of text means success if nothing else is left to match
return *glob == '\0' ? TRUE : FALSE;
}

For practical glob matching applications however, this glob-like ("globly") algorithm is not yet complete. It lacks support for
character classes, such as [a-zA-Z] to match letters and [^0-9] to match any character except a digit. It also lacks a means to
escape the special meaning of the *, ?, and [ meta characters with a backslash. Adding these new features requires a redesign of
the glob meta character tests in the main loop, using a switch to select glob meta characters and by using continue to
commence the main loop. Breaking from the switch moves control to the star-loop:

#define PATHSEP '/' // pathname separator, we should define '\\' for Windows instead
// returns TRUE if text string matches glob pattern with * and ?
int glob_match(const char *text, const char *glob)
{
const char *text_backup = NULL;
const char *glob_backup = NULL;
while (*text != '\0')
{
switch (*glob)
{
case '*':
// new star-loop: backup positions in pattern and text
text_backup = text;
glob_backup = ++glob;
continue;
case '?':
// match any character except /
if (*text == PATHSEP)
break;
text++;
glob++;
continue;
case '[':
{
int lastchr;
int matched = FALSE;
int reverse = glob[1] == '^' || glob[1] == '!' ? TRUE : FALSE;
// match any character in [...] except /
if (*text == PATHSEP)
break;
// inverted character class
if (reverse)
glob++;
// match character class
for (lastchr = 256; *++glob != '\0' && *glob != ']'; lastchr = *glob)
if (lastchr < 256 && *glob == '-' &&
glob[1] != '\0' ? *text <= *++glob && *text >= lastchr : *text == *glob)
matched = TRUE;
if (matched == reverse)
break;
text++;
if (*glob != '\0')
glob++;
continue;
}
case '\\':
// literal match \-escaped character
glob++;
// FALLTHROUGH
default:
// match the current non-NUL character
if (*glob != *text && !(*glob == '/' && *text == PATHSEP))
break;
text++;
glob++;
continue;
}
if (glob_backup == NULL || *text_backup == PATHSEP)
return FALSE;
// star-loop: backtrack to the last * but do not jump over /
text = ++text_backup;
glob = glob_backup;
}
// ignore trailing stars
while (*glob == '*')
glob++;
// at end of text means success if nothing else is left to match
return *glob == '\0' ? TRUE : FALSE;
}

The PATHSEP character is either the conventional / or the Windows \ used in the string text to separate pathnames. Note that
traditional Unix uses ! for character class inversion, for example [!0-9]. Here, we offer a choice of ! and the more conventional
^ to invert a character class, for example [^0-9].
The execution time of this algorithm is linear in the length of the pattern and string for typical cases and takes quadratic time in the
worst case.

Non-recursive gitignore-style glob Matching

As we arrive at the second main part of this article, let's take a look at modern gitignore-style globbing rules and design an efficient
non-recursive algorithm.

Gitignore-style globbing applies the following rules to determine file and directory pathname matches:

* matches anything except a /

? matches any one character except a /

[a-z] matches one character in the selected range of characters

[^a-z] matches one character not in the selected range of characters

[!a-z] matches one character not in the selected range of characters

/ when used at the begin of a glob, matches if pathname has no /

**/ matches zero or more directories

/** when at the end of a glob, matches everything after the /

\? matches a ? (or any character specified after the backslash)

We also make the following assumption: when a glob contains a path separator /, the full pathname is matched. Otherwise, the
basename of a file or directory is matched. For example, *.h matches foo.h and bar/foo.h, bar/*.h matches bar/foo.h but not foo.h
and not bar/bar/foo.h. A leading / may be used to force /*.h to match foo.h but not bar/foo.h.

Examples:

* matches a, b, x/a, x/y/b

a matches a, x/a, x/y/a but not b, x/b, a/a/b

/* matches a, b but not x/a, x/b, x/y/a

/a matches a but not x/a, x/y/a

a?b matches axb, ayb but not a, b, ab, a/b

a[xy]b matches axb, ayb but not a, b, azb

a[a-z]b matches aab, abb, acb, azb but not a, b, a3b, aAb, aZb

a[^xy]b matches aab, abb, acb, azb but not a, b, axb, ayb

a[^a-z]b matches a3b, aAb, aZb but not a, b, aab, abb, acb, azb

a/*/b matches a/x/b, a/y/b but not a/b, a/x/y/b

**/a matches a, x/a, x/y/a but not b, x/b

a/**/b matches a/b, a/x/b, a/x/y/b but not x/a/b, a/b/x

a/** matches a/x, a/y, a/x/y but not a, b/x

a\?b matches a?b but not a, b, ab, axb, a/b

To implement gitignore-style globbing, we need two star-loops: one for single star, the “*-loop”, and another for double star, the
“**-loop”. The **-loop overrules the *-loop because there is no point in backtracking on a single star when we encounter a double
star after it in the glob pattern. The converse is not true: we should backtrack on a double star when a single star that comes after it
in the glob does not match:

// returns TRUE if text string matches gitignore-style glob pattern

int gitignore_glob_match(const char *text, const char *glob)
{
const char *text1_backup = NULL;
const char *glob1_backup = NULL;
const char *text2_backup = NULL;
const char *glob2_backup = NULL;
// match pathname if glob contains a / otherwise match the basename
if (*glob == '/')
{
// if pathname starts with ./ then ignore these pairs
while (*text == '.' && text[1] == PATHSEP)
text += 2;
glob++;
}
else if (strchr(glob, '/') == NULL)
{
const char *sep = strrchr(text, PATHSEP);
if (sep)
text = sep + 1;
}
while (*text != '\0')
{
switch (*glob)
{
case '*':
if (*++glob == '*')
{
// trailing ** match everything after /
if (*++glob == '\0')
return TRUE;
// ** followed by a / match zero or more directories
if (*glob != '/')
return FALSE;
// new **-loop, discard *-loop
text1_backup = NULL;
glob1_backup = NULL;
text2_backup = text;
glob2_backup = ++glob;
continue;
}
// trailing * matches everything except /
text1_backup = text;
glob1_backup = glob;
continue;
case '?':
// match any character except /
if (*text == PATHSEP)
break;
text++;
glob++;
continue;
case '[':
{
int lastchr;
int matched = FALSE;
int reverse = glob[1] == '^' || glob[1] == '!' ? TRUE : FALSE;
// match any character in [...] except /
if (*text == PATHSEP)
break;
// inverted character class
if (reverse)
glob++;
// match character class
for (lastchr = 256; *++glob != '\0' && *glob != ']'; lastchr = *glob)
if (lastchr < 256 && *glob == '-' &&
glob[1] != '\0' ? *text <= *++glob && *text >= lastchr : *text == *glob)
matched = TRUE;
if (matched == reverse)
break;
text++;
if (*glob != '\0')
glob++;
continue;
}
case '\\':
// literal match \-escaped character
glob++;
// FALLTHROUGH
default:
// match the current non-NUL character
if (*glob != *text && !(*glob == '/' && *text == PATHSEP))
break;
text++;
glob++;
continue;
}
if (glob1_backup != NULL && *text1_backup != PATHSEP)
{
// *-loop: backtrack to the last * but do not jump over /
text = ++text1_backup;
glob = glob1_backup;
continue;
}
if (glob2_backup != NULL)
{
// **-loop: backtrack to the last **
text = ++text2_backup;
glob = glob2_backup;
continue;
}
return FALSE;
}
// ignore trailing stars
while (*glob == '*')
glob++;
// at end of text means success if nothing else is left to match
return *glob == '\0' ? TRUE : FALSE;
}

The execution time of this algorithm is linear in the length of the pattern and string for typical cases and cubic in the worst case,
when both a **-loop and a *-loop are active.
Refinements
Files with names starting with a dot (.) are treated differently by shell globs, meaning that a dot beginning a name must be
matched explicitly and cannot be matched by a wildcard. To replicate this behavior, we add the following:

int glob_match(const char text, const char glob)

{
int nodot = 1; // new flag to indicate that we shouldn't match a dot, e.g. after a /
...
while (*text != '\0')
{
switch (*glob)
{
case '*':
// match anything except . after /
if (nodot && *text == '.')
break;
...
case '?':
// match anything except . after /
if (nodot && *text == '.')
break;
...
case '[':
{
...
// match any character in [...] except . after /
if (nodot && *text == '.')
break;
...
}
...
default:
// match the current non-NUL character
if (*glob != *text && !(*glob == '/' && *text == PATHSEP))
break;
nodot = *glob == '/';
...

Case-insensitive matching in ASCII can be achieved with tolower() as follows:

// match the current non-NUL character

if (tolower(*glob) != tolower(*text) && !(*glob == '/' && *text == PATHSEP))
break;

// match character class

for (lastchr = 256; *++glob != '\0' && *glob != ']'; lastchr = tolower(*glob))
if (lastchr < 256 && *glob == '-' &&
glob[1] != '\0' ? tolower(*text) <= tolower(*++glob) &&
tolower(*text) >= lastchr : tolower(*text) == tolower(*glob))
matched = TRUE;

Unicode matching can be supported in two ways: by using wide strings, i.e., wchar_t or std::wstring or by using UTF-8
encoded strings. The wide string option requires no changes to the algorithms. The UTF-8 version requires a few changes for ? and
character classes, with everything else staying the same:

case '?':
// match anything except /
if (*text == PATHSEP)
break;
utf8(&text);
glob++;
continue;
case '[':
{
int chr;
…
// match character class
lastchr = 0x10ffff;
glob++;
while (*glob != '\0' && *glob != ']')
if (lastchr < 0x10ffff && *glob == '-' ? (chr = utf8(&text)) <= utf8(&glob) &&
chr >= lastchr : utf8(&text) == (lastchr = utf8(&glob)))
matched = TRUE;
if (matched == reverse)
break;
if (*glob)
glob++;
continue;
}

where utf8() returns the wide character and advances by one UTF-8 character in the glob:

int utf8(const char **s)

{
int c1, c2, c3, c = (unsigned char)**s;
if (c != '\0')
(*s)++;
if (c < 0x80)
return c;
c1 = (unsigned char)**s;
if (c < 0xC0 || (c == 0xC0 && c1 != 0x80) || c == 0xC1 || (c1 & 0xC0) != 0x80)
return 0xFFFD;
if (c1 != '\0')
(*s)++;
c1 &= 0x3F;
if (c < 0xE0)
return (((c & 0x1F) << 6) | c1);
c2 = (unsigned char)**s;
if ((c == 0xE0 && c1 < 0x20) || (c2 & 0xC0) != 0x80)
return 0xFFFD;
if (c2 != '\0')
(*s)++;
c2 &= 0x3F;
if (c < 0xF0)
return (((c & 0x0F) << 12) | (c1 << 6) | c2);
c3 = (unsigned char)**s;
if (c3 != '\0')
(*s)++;
if ((c == 0xF0 && c1 < 0x10) || (c == 0xF4 && c1 >= 0x10) || c >= 0xF5 || (c3 & 0xC0) !=
0x80)
return 0xFFFD;
return (((c & 0x07) << 18) | (c1 << 12) | (c2 << 6) | (c3 & 0x3F));
}

Conclusions
Wildcard matching, classic globbing, and modern gitignore-style globbing can be fast when implemented with star-loop iterations
instead of recursion. These algorithms typically run in linear time in the length of the string and pattern and in the worst case, may
run in quadratic or cubic time in the length of the pattern and string, without blowing up CPU usage exponentially.

Extended globbing features such as shell brace expansion also use recursion with backtracking, which may make the exponential
blow-up problem worse. Consider for example a*{b,a*{b,a*{b,a*{b,a*{b,a*{b,a*{b,a*b}}}}}}}. There is no
easy way to limit the execution time for brace expansion except perhaps by limiting the number of braces or by converting an
extended glob to a regex for matching.

I wrote this article after implementing gitignore-style globbing for ugrep, due to the lack of available and usable open source code.
The versions of wildmat.c that I found in official repositories all had a nasty bug that is actually documented in the source code:
"Might not be robust in face of malformed patterns; e.g., "foo[a-" could cause a segmentation violation." Ugh.

History
3rd August, 2019: First draft
5th August, 2019: First publication
8th August, 2019: Updated
12th September, 2019: Added Java download sources
19th September, 2019: Added shell execution example; [] does not match /; dotglob refinement; new and updated
download sources

License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Robert van Engelen
CEO
United States

Founder of Genivia inc, Professor of Computer Science

Comments and Discussions

4 messages have been posted for this article Visit https://www.codeproject.com/Articles/5163931/Fast-String-Matching-
with-Wildcards-Globs-and-Giti to post and view comments on this article, or click here to get a print view with messages.

CS32 Homework 1, Spring 2019 PDF
No ratings yet
CS32 Homework 1, Spring 2019 PDF
10 pages
JavaScript Related Questions
No ratings yet
JavaScript Related Questions
34 pages
Modifiers: M// S/// QR// ??
No ratings yet
Modifiers: M// S/// QR// ??
29 pages
Perlre Perl Regular Expressions
No ratings yet
Perlre Perl Regular Expressions
16 pages
4 Pattern Matching
No ratings yet
4 Pattern Matching
14 pages
HASKELL
No ratings yet
HASKELL
26 pages
Transcribe 2
No ratings yet
Transcribe 2
43 pages
IT Notes - in
No ratings yet
IT Notes - in
29 pages
CODE's
No ratings yet
CODE's
15 pages
Javascript Cheat Sheet
No ratings yet
Javascript Cheat Sheet
6 pages
Programming in C - 161-180
No ratings yet
Programming in C - 161-180
20 pages
55 TriesNOTES
No ratings yet
55 TriesNOTES
18 pages
Answer Key Set A CT2
No ratings yet
Answer Key Set A CT2
16 pages
Metacharacters: Any Character, Except A Newline. For Example, The Regexp
No ratings yet
Metacharacters: Any Character, Except A Newline. For Example, The Regexp
2 pages
12 JavaScript Concepts That Will Level Up Your Development Skills - Hacker Noon
No ratings yet
12 JavaScript Concepts That Will Level Up Your Development Skills - Hacker Noon
33 pages
Chapter 11
No ratings yet
Chapter 11
10 pages
HO38 String
No ratings yet
HO38 String
10 pages
builder
No ratings yet
builder
6 pages
Arrays: in and Out and All About: Marge Scerbo, CHPDM/UMBC
No ratings yet
Arrays: in and Out and All About: Marge Scerbo, CHPDM/UMBC
7 pages
m1 (Autosaved)
No ratings yet
m1 (Autosaved)
24 pages
Preg Match
No ratings yet
Preg Match
21 pages
String 1 Notes
No ratings yet
String 1 Notes
22 pages
Regular Expressions in Java
No ratings yet
Regular Expressions in Java
30 pages
Old Course Notes Strings
No ratings yet
Old Course Notes Strings
7 pages
Java Made So Easyxcxc
No ratings yet
Java Made So Easyxcxc
40 pages
Characters and Strings
No ratings yet
Characters and Strings
14 pages
Imp Functions in Alteryx
No ratings yet
Imp Functions in Alteryx
11 pages
String Class in Java
No ratings yet
String Class in Java
5 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
Implode
No ratings yet
Implode
2 pages
Strings
No ratings yet
Strings
34 pages
A Practical Guide to RNN and LSTM in Keras | by Mohit Mayank | Towards Data Science
No ratings yet
A Practical Guide to RNN and LSTM in Keras | by Mohit Mayank | Towards Data Science
13 pages
Prepared by Zanamwe N at 2014
No ratings yet
Prepared by Zanamwe N at 2014
33 pages
Java Material
No ratings yet
Java Material
248 pages
Escape Sequences in Strings
No ratings yet
Escape Sequences in Strings
13 pages
Introduction To Algorithms
No ratings yet
Introduction To Algorithms
18 pages
COMP322: Assignment 2 - Winter 2010: Due at 11:59pm EST, 3 Mar 2010
No ratings yet
COMP322: Assignment 2 - Winter 2010: Due at 11:59pm EST, 3 Mar 2010
3 pages
Theory PDF
No ratings yet
Theory PDF
24 pages
JEDI Course Notes DataSt Chapter02 Stacks
No ratings yet
JEDI Course Notes DataSt Chapter02 Stacks
19 pages
Java Regex Tutorial: Lars Vogel
No ratings yet
Java Regex Tutorial: Lars Vogel
20 pages
45 Useful JavaScript
No ratings yet
45 Useful JavaScript
17 pages
How To Read Code PDF
No ratings yet
How To Read Code PDF
42 pages
REGEX Extended
No ratings yet
REGEX Extended
39 pages
Descriptiondd
No ratings yet
Descriptiondd
3 pages
What Is A String?: Tutorial 4
No ratings yet
What Is A String?: Tutorial 4
5 pages
45 Useful JavaScript Tips Tricks and Best Practices
No ratings yet
45 Useful JavaScript Tips Tricks and Best Practices
11 pages
Chapter 6 - Stack
No ratings yet
Chapter 6 - Stack
8 pages
Manish_JS_10_Quiz
No ratings yet
Manish_JS_10_Quiz
4 pages
09d Javascript
No ratings yet
09d Javascript
145 pages
Perl Tutorial
No ratings yet
Perl Tutorial
6 pages
ES6 For Humans
No ratings yet
ES6 For Humans
14 pages
Ex 0002
No ratings yet
Ex 0002
4 pages
The Trie of Knowledge
No ratings yet
The Trie of Knowledge
7 pages
Reg Ex Program
No ratings yet
Reg Ex Program
16 pages
Java - Lang: 1. The Basic Java Application
No ratings yet
Java - Lang: 1. The Basic Java Application
9 pages
29 Advanced Java
No ratings yet
29 Advanced Java
3 pages
Wednesday 9_04_24
No ratings yet
Wednesday 9_04_24
30 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
JavaScript.
From Everand
JavaScript.
Tom Henricksen
No ratings yet
Migrating Your SQL Server Workloads To PostgreSQL - Part 3 - CodeProject
No ratings yet
Migrating Your SQL Server Workloads To PostgreSQL - Part 3 - CodeProject
6 pages
JavaScript Export To Excel HTML Table With Input Tags - CodeProject
No ratings yet
JavaScript Export To Excel HTML Table With Input Tags - CodeProject
5 pages
SignalR Chat App With ASP - Net WebForm and BootStrap - Part Three - CodeProject
No ratings yet
SignalR Chat App With ASP - Net WebForm and BootStrap - Part Three - CodeProject
8 pages
Wexflow - Open Source Workflow Engine - CodeProject
No ratings yet
Wexflow - Open Source Workflow Engine - CodeProject
84 pages
REST API or SOAP Testing Automation With ZeroCode JSON-Based Open Source Test Framework - CodeProject
No ratings yet
REST API or SOAP Testing Automation With ZeroCode JSON-Based Open Source Test Framework - CodeProject
11 pages
Wexflow - Open Source Workflow Engine - CodeProject
No ratings yet
Wexflow - Open Source Workflow Engine - CodeProject
84 pages
Cucumber Utils - CodeProject
No ratings yet
Cucumber Utils - CodeProject
6 pages
Face Detection On Android With Google ML Kit - CodeProject
No ratings yet
Face Detection On Android With Google ML Kit - CodeProject
12 pages
ITDSD - 4. Quantitative Analysis of Distributed Software - CodeProject
No ratings yet
ITDSD - 4. Quantitative Analysis of Distributed Software - CodeProject
12 pages
RoomExistingSQLiteDBConverter - CodeProject
No ratings yet
RoomExistingSQLiteDBConverter - CodeProject
14 pages
Real-Time Barcode Scanning From Camera Stream - CodeProject
No ratings yet
Real-Time Barcode Scanning From Camera Stream - CodeProject
12 pages
Introduction Into Logging With Loguru - CodeProject
No ratings yet
Introduction Into Logging With Loguru - CodeProject
3 pages
Amazon's AWS S3 Java API 2.0 (Using Spring Boot As Client) - CodeProject
No ratings yet
Amazon's AWS S3 Java API 2.0 (Using Spring Boot As Client) - CodeProject
60 pages
Introduction Into C++ Builds With Gradle - CodeProject
No ratings yet
Introduction Into C++ Builds With Gradle - CodeProject
6 pages
Simple JSON REST Consumption With GSON API - CodeProject
No ratings yet
Simple JSON REST Consumption With GSON API - CodeProject
4 pages
Sam
No ratings yet
Sam
21 pages
RPT Manual 1.12
No ratings yet
RPT Manual 1.12
45 pages
Arduino
No ratings yet
Arduino
5 pages
Lab Manual
No ratings yet
Lab Manual
59 pages
C++ Imp Questions
No ratings yet
C++ Imp Questions
48 pages
Inter Process Communication
No ratings yet
Inter Process Communication
52 pages
Assignment of Programming Fundamentals
No ratings yet
Assignment of Programming Fundamentals
9 pages
Top 10 Tricks For Delphi
0% (1)
Top 10 Tricks For Delphi
23 pages
Microlink Information Technology College
No ratings yet
Microlink Information Technology College
43 pages
PC Unit II Notes
No ratings yet
PC Unit II Notes
56 pages
Prakhar Comp
No ratings yet
Prakhar Comp
13 pages
NXC Guide
No ratings yet
NXC Guide
123 pages
DartLangSpec-v2 2
No ratings yet
DartLangSpec-v2 2
206 pages
'C' Programming by NIKHIL
No ratings yet
'C' Programming by NIKHIL
50 pages
Javascript Variables
No ratings yet
Javascript Variables
10 pages
Ceres
No ratings yet
Ceres
54 pages
Object Oriented Programming With PHP
No ratings yet
Object Oriented Programming With PHP
61 pages
C++ Question Bank
100% (1)
C++ Question Bank
32 pages
mcqs2
No ratings yet
mcqs2
13 pages
Memory Allocation: Subject: COMP6047 Algorithm and Programming Year: 2019
No ratings yet
Memory Allocation: Subject: COMP6047 Algorithm and Programming Year: 2019
44 pages
Classes
No ratings yet
Classes
112 pages
ES03 Lab Manual
100% (2)
ES03 Lab Manual
59 pages
Oops
No ratings yet
Oops
227 pages
What Is The Benefit of Using An Enum Rather Than A #Define Constant?
No ratings yet
What Is The Benefit of Using An Enum Rather Than A #Define Constant?
7 pages
DCPUM
No ratings yet
DCPUM
370 pages
Rust Language Cheat Sheet
No ratings yet
Rust Language Cheat Sheet
9 pages
Box2D v2.3.0 User Manual
No ratings yet
Box2D v2.3.0 User Manual
69 pages
Programming in C++ - Module 01
No ratings yet
Programming in C++ - Module 01
39 pages
Modern GPU
100% (1)
Modern GPU
221 pages
Comprog Reviewer
No ratings yet
Comprog Reviewer
14 pages

Fast String Matching With Wildcards, Globs, and Gitignore-Style Globs - How Not To Blow It Up - CodeProject

Uploaded by

Fast String Matching With Wildcards, Globs, and Gitignore-Style Globs - How Not To Blow It Up - CodeProject

Uploaded by

Fast String Matching with Wildcards, Globs, and

Gitignore-Style Globs - How Not to Blow it Up

19 Sep 2019 CPOL

Download Python sources - 1.8 KB

The Trouble With Recursion

// returns TRUE if text string matches wild pattern with * and ?

$ touch `python3 -c 'print(100*"a")'`

Let’s apply this improvement to our example recursive matcher:

enum { FALSE = 0, TRUE = 1, ABORT = 2 };

Non-Recursive Wildcard String Matching

// returns TRUE if text string matches wild pattern with * and ?

Non-Recursive Glob Matching

// returns TRUE if text string matches glob-like pattern with * and ?

Non-recursive gitignore-style glob Matching

* matches anything except a /

? matches any one character except a /

[a-z] matches one character in the selected range of characters

[^a-z] matches one character not in the selected range of characters

[!a-z] matches one character not in the selected range of characters

/ when used at the begin of a glob, matches if pathname has no /

**/ matches zero or more directories

/** when at the end of a glob, matches everything after the /

\? matches a ? (or any character specified after the backslash)

* matches a, b, x/a, x/y/b

a matches a, x/a, x/y/a but not b, x/b, a/a/b

/* matches a, b but not x/a, x/b, x/y/a

/a matches a but not x/a, x/y/a

a?b matches axb, ayb but not a, b, ab, a/b

a/*/b matches a/x/b, a/y/b but not a/b, a/x/y/b

**/a matches a, x/a, x/y/a but not b, x/b

a/**/b matches a/b, a/x/b, a/x/y/b but not x/a/b, a/b/x

a/** matches a/x, a/y, a/x/y but not a, b/x

a\?b matches a?b but not a, b, ab, axb, a/b

// returns TRUE if text string matches gitignore-style glob pattern

int glob_match(const char *text, const char *glob)

Case-insensitive matching in ASCII can be achieved with tolower() as follows:

// match the current non-NUL character

// match character class

int utf8(const char **s)

About the Author

Founder of Genivia inc, Professor of Computer Science

Comments and Discussions

Permalink Article Copyright 2019 by Robert van Engelen

You might also like

int glob_match(const char text, const char glob)