Regular expressions are used to match patterns against strings.
Within a pattern, all characters except ., |, (, ), [, {, +, \, ^, $, *, and ? match themselves. If you want to match one of these special characters literally, precede it with a backslash.
Patterns for matching single characters:
Matches the character x.
Matches nothing, but quotes the following character.
Matches the backslash character.
Matches the character with octal value 0n (0 <= n <= 7).
Matches the character with octal value 0nn (0 <= n <= 7).
Matches the character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7).
Matches the character with hexadecimal value 0xhh.
Matches the character with hexadecimal value 0xhhhh.
Matches the tab character ('\u0009').
Matches the newline (line feed) character ('\u000A').
Matches the carriage-return character ('\u000D').
Matches the form-feed character ('\u000C').
Matches the alert (bell) character ('\u0007').
Matches the escape character ('\u001B').
Matches the control character corresponding to x.
To match a character from a set of characters the following character classes are supported. A character class is a set of characters between brackets. The significance of the special regular expression characters ., |, (, ), [, {, +, ^, $, *, and ? is turned off inside the brackets. However, normal string substitution still occurs, so (for example) \b represents a backspace character and \n a newline. To include the literal characters ] and - within a character class, they must appear at the start.
Matches the characters a, b, or c.
Matches any character except a, b, or c (negation).
Matches the characters a through z or A through Z, inclusive (range).
Matches the characters a through d, or m through p: [a-dm-p] (union).
Matches the characters d, e, or f (intersection).
Matches the characters a through z, except for b and c: [ad-z] (subtraction).
Matches the characters a through z, and not m through p: [a-lq-z] (subtraction).
Predefined character classes:
Matches any character.
Matches a digit: [0-9].
Matches a non-digit: [^0-9].
Matches a whitespace character: [ \t\n\x0B\f\r].
Matches a non-whitespace character: [^\s].
Matches a word character: [a-zA-Z_0-9].
Matches a non-word character: [^\w].
POSIX character classes (US-ASCII):
Matches a lower-case alphabetic character: [a-z].
Matches an upper-case alphabetic character: [A-Z].
Matches all ASCII characters: [\x00-\x7F].
Matches an alphabetic character: [\p{Lower}\p{Upper}].
Matches a decimal digit: [0-9].
Matches an alphanumeric character: [\p{Alpha}\p{Digit}].
Matches a punctuation character: one of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Matches a visible character: [\p{Alnum}\p{Punct}].
Matches a printable character: [\p{Graph}].
Matches a space or a tab: [ \t].
Matches a control character: [\x00-\x1F\x7F].
Matches a hexadecimal digit: [0-9a-fA-F].
Matches a whitespace character: [ \t\n\x0B\f\r].
Classes for Unicode blocks and categories:
Matches a character in the Greek block (simple block).
Matches an uppercase letter (simple category).
Matches a currency symbol.
Matches any character except one in the Greek block (negation).
Matches any letter except an uppercase letter (subtraction).
Character sequences are matched by string the characters together.
Matches X followed by Y.
The following constructs are used to easily match character sequences containing special characters.
Quotes all characters until \E.
Ends quoting started by \Q.
Repetition modifiers allow to match multiple occurrences of a pattern.
Matches X once or not at all.
Matches X zero or more times.
Matches X one or more times.
Matches X exactly n times.
Matches X at least n times.
Matches X at least n but not more than m times.
These patterns are greedy, i.e. they will match as much of a string as they can. This behavior can be altered to let them match the minimum by adding a question mark suffix to the repetition modifier.
An unescaped vertical bar "|" matches either the regular expression that precedes it or the regular expression that follows it.
Matches either X or Y.
Parentheses are used to group terms within a regular expression. Everything within the group is treated as a single regular expression.
Matches X.
The following boundaries can be specified.
Matches the beginning of a line.
Matches the end of a line.
Matches a word boundary.
Matches a non-word boundary.
Matches the beginning of the string.
Matches the end of the previous match.
Matches the end of the string but for the final terminator (e.g newline), if any.
Matches the end of the string.