syntax:pattern

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
syntax:pattern [2014/04/26 12:21]
ormaaj [ksh93 extras]
syntax:pattern [2021/10/21 00:52]
fgrose links to POSIX character classes & collation symbols
Line 3: Line 3:
 {{keywords>​bash shell scripting glob globbing wildcards filename pattern matching}} {{keywords>​bash shell scripting glob globbing wildcards filename pattern matching}}
  
-A pattern is a **string description**. Bash can use it in various ​situations:+A pattern is a **string description**. Bash uses them in various ​ways:
   * [[syntax:​expansion:​globs | Pathname expansion]] (Globbing - matching filenames)   * [[syntax:​expansion:​globs | Pathname expansion]] (Globbing - matching filenames)
   * Pattern matching in [[syntax:​ccmd:​conditional_expression | conditional expressions]]   * Pattern matching in [[syntax:​ccmd:​conditional_expression | conditional expressions]]
Line 10: Line 10:
  
 The pattern description language is relatively easy. Any character that's not mentioned below matches itself. The pattern description language is relatively easy. Any character that's not mentioned below matches itself.
-The ''​NUL''​ character may not occur in a pattern. If special characters are quoted, they'​re matched literally without their special meaning.+The ''​NUL''​ character may not occur in a pattern. If special characters are quoted, they'​re matched literally, i.e., without their special meaning
 + 
 +Do **not** confuse patterns with **//regular expressions//​**,​ because they share some symbols and do similar matching work.
  
 ===== Normal pattern language ===== ===== Normal pattern language =====
Line 18: Line 20:
 |''?''​|Matches any **single character**| |''?''​|Matches any **single character**|
 |''​X''​|Matches the character ''​X''​ which can be any character that has no special meaning| |''​X''​|Matches the character ''​X''​ which can be any character that has no special meaning|
-|''​\X''​|Matches the character ''​X'',​ where the character'​s special meaning is taken away using the backslash|+|''​\X''​|Matches the character ''​X'',​ where the character'​s special meaning is stripped by the backslash|
 |''​\\''​|Matches a backslash| |''​\\''​|Matches a backslash|
 |''​[...]''​|Defines a pattern **bracket expression** (see below). Matches any of the enclosed characters at this position.| |''​[...]''​|Defines a pattern **bracket expression** (see below). Matches any of the enclosed characters at this position.|
Line 25: Line 27:
 ==== Bracket expressions ==== ==== Bracket expressions ====
  
-The bracket expression ''​[...]''​ mentioned above has some useful ​specialities:+The bracket expression ''​[...]''​ mentioned above has some useful ​applications:
 ^Bracket expression^Description^ ^Bracket expression^Description^
 |''​[XYZ]''​|The "​normal"​ bracket expression, matching either ''​X'',​ ''​Y''​ or ''​Z''​| |''​[XYZ]''​|The "​normal"​ bracket expression, matching either ''​X'',​ ''​Y''​ or ''​Z''​|
-|''​[X-Z]''​|A range expression: Matching all the characters from ''​X''​ to ''​Y''​ (whatever that means in your current **locale**, ​it depends ​how the characters are **sorted**!)| +|''​[X-Z]''​|A range expression: Matching all the characters from ''​X''​ to ''​Y''​ (your current **locale**, ​defines ​how the characters are **sorted**!)| 
-|''<​nowiki>​[[:​class:​]]</​nowiki>''​|Matches all the characters defined by a POSIX(r) character class: ''​alnum'',​ ''​alpha'',​ ''​ascii'',​ ''​blank'',​ ''​cntrl'',​ ''​digit'',​ ''​graph'',​ ''​lower'',​ ''​print'',​ ''​punct'',​ ''​space'',​ ''​upper'',​ ''​word''​ and ''​xdigit''​|+|''<​nowiki>​[[:​class:​]]</​nowiki>''​|Matches all the characters defined by a [[https://​pubs.opengroup.org/​onlinepubs/​009696899/​basedefs/​xbd_chap07.html#​tag_07_03_01|POSIX(r) character class]]: ''​alnum'',​ ''​alpha'',​ ''​ascii'',​ ''​blank'',​ ''​cntrl'',​ ''​digit'',​ ''​graph'',​ ''​lower'',​ ''​print'',​ ''​punct'',​ ''​space'',​ ''​upper'',​ ''​word''​ and ''​xdigit''​|
 |''​[^...]''​|A negating expression: It matches all the characters that are **not** in the bracket expression| |''​[^...]''​|A negating expression: It matches all the characters that are **not** in the bracket expression|
 |''​[!...]''​|Equivalent to ''​[^...]''​| |''​[!...]''​|Equivalent to ''​[^...]''​|
Line 58: Line 60:
     * => ''​Hello world+''​     * => ''​Hello world+''​
     * => ''​Hello world?''​     * => ''​Hello world?''​
-  * Pattern ''<​nowiki>​[[.backslash.]]Hello[[.vertical-line.]]world[[.exclamation-mark.]]</​nowiki>''​ matches (using collation ​sybols)+  * Pattern ''<​nowiki>​[[.backslash.]]Hello[[.vertical-line.]]world[[.exclamation-mark.]]</​nowiki>''​ matches (using ​[[https://​pubs.opengroup.org/​onlinepubs/​009696899/​basedefs/​xbd_chap07.html#​tag_07_03_02_04|collation ​symbols]])
     * => ''​\Hello|world!''​     * => ''​\Hello|world!''​
  
 ===== Extended pattern language ===== ===== Extended pattern language =====
  
-If you set the [[internals:​shell_options | shell option]] ''​extglob'',​ Bash understands some more powerful patterns. ​Here, a ''<​PATTERN-LIST>''​ is one or more pattern, separated by the pipe-symbol (''​|''​).+If you set the [[internals:​shell_options | shell option]] ''​extglob'',​ Bash understands some powerful patterns. ​''<​PATTERN-LIST>''​ is one or more patterns, separated by the pipe-symbol (''​PATTERN|PATTERN''​).
  
 |''?​(<​PATTERN-LIST>​)''​|Matches **zero or one** occurrence of the given patterns| |''?​(<​PATTERN-LIST>​)''​|Matches **zero or one** occurrence of the given patterns|
Line 97: Line 99:
  * All of the extglob quantifiers supported by bash were supported by ksh88. The set of extglob quantifiers supported by ksh88 are identical to those supported by Bash, mksh, ksh93, and zsh.  * All of the extglob quantifiers supported by bash were supported by ksh88. The set of extglob quantifiers supported by ksh88 are identical to those supported by Bash, mksh, ksh93, and zsh.
  
- * mksh does not support POSIX character classes. Therefore, character ranges like ''​[0-9]''​ are somewhat more portable an equivalent POSIX class like ''​[:​digit:​]''​.+ * mksh does not support POSIX character classes. Therefore, character ranges like ''​[0-9]''​ are somewhat more portable ​than an equivalent POSIX class like ''​[:​digit:​]''​.
  
- * Bash uses a custom runtime interpreter for pattern matching. (at least) ksh93 and zsh translate patterns into regexes and then use a regex compiler to emit and cache optimized pattern matching code. This means Bash may be an order of magnitude or more slower in cases that involve complex back-tracking (usually that means extglob quantifier nesting). You may wish to use Bash's regex support (the ''​=~''​ operator) if performance is a problem, because Bash will go through ​your C library regex implementation rather than using its own pattern matcher.+ * Bash uses a custom runtime interpreter for pattern matching. (at least) ksh93 and zsh translate patterns into regexes and then use a regex compiler to emit and cache optimized pattern matching code. This means Bash may be an order of magnitude or more slower in cases that involve complex back-tracking (usually that means extglob quantifier nesting). You may wish to use Bash's regex support (the ''​=~''​ operator) if performance is a problem, because Bash will use your C library regex implementation rather than its own pattern matcher.
  
 TODO: describe the pattern escape bug https://​gist.github.com/​ormaaj/​6195070 TODO: describe the pattern escape bug https://​gist.github.com/​ormaaj/​6195070
Line 107: Line 109:
 ksh93 supports some very powerful pattern matching features in addition to those described above. ksh93 supports some very powerful pattern matching features in addition to those described above.
  
- * ksh93 supports arbitrary quantifiers just like ERE using the ''​{from,​to}(pattern-list)''​ syntax. ''​{2,​4}(foo)bar''​ matches between 2-4 "​foo"'​s followed by "​bar"​. ''​{2,​}(foo)bar''​ matches 2 or more "​foo"'​s followed by "​bar"​. You can probably figure out the rest. None of the other shells ​so far support this syntax.+ * ksh93 supports arbitrary quantifiers just like ERE using the ''​{from,​to}(pattern-list)''​ syntax. ''​{2,​4}(foo)bar''​ matches between 2-4 "​foo"'​s followed by "​bar"​. ''​{2,​}(foo)bar''​ matches 2 or more "​foo"'​s followed by "​bar"​. You can probably figure out the rest. So far, none of the other shells support this syntax.
  
  * In ksh93, a ''​pattern-list''​ may be delimited by either ''&''​ or ''​|''​. ''&''​ means "all patterns must be matched"​ instead of "any pattern"​. For example, <​code>​[[ fo0bar == @(fo[0-9]&​+([[:​alnum:​]]))bar ]]</​code>​ would be true while <​code>​[[ f00bar == @(fo[0-9]&​+([[:​alnum:​]]))bar ]]</​code>​ is false, because all members of the and-list must be satisfied. No other shell supports this so far, but you can simulate some cases in other shells using double extglob negation. The aforementioned ksh93 pattern is equivalent in Bash to: <​code>​[[ fo0bar == !(!(fo[0-9])|!(+([[:​alnum:​]])))bar ]]</​code>,​ which is technically more portable, but ugly.  * In ksh93, a ''​pattern-list''​ may be delimited by either ''&''​ or ''​|''​. ''&''​ means "all patterns must be matched"​ instead of "any pattern"​. For example, <​code>​[[ fo0bar == @(fo[0-9]&​+([[:​alnum:​]]))bar ]]</​code>​ would be true while <​code>​[[ f00bar == @(fo[0-9]&​+([[:​alnum:​]]))bar ]]</​code>​ is false, because all members of the and-list must be satisfied. No other shell supports this so far, but you can simulate some cases in other shells using double extglob negation. The aforementioned ksh93 pattern is equivalent in Bash to: <​code>​[[ fo0bar == !(!(fo[0-9])|!(+([[:​alnum:​]])))bar ]]</​code>,​ which is technically more portable, but ugly.
  • syntax/pattern.txt
  • Last modified: 2021/10/21 00:52
  • by fgrose