regex guide
References taken from regular-expressions.info
Contents |
special chacters
\ ^ $ . | ? * + ( ) [ {
inside character classes only [class] i.e. [0-9]
\ ^ and additionally - ]
Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like a{1,3}
Some flavors also support the \Q…\E escape sequence. All the characters between the \Q and the \E are interpreted as literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*.
The backslash in combination with a literal character can create a regex token with a special meaning. E.g. \d is a shorthand that matches a single digit from 0 to 9.
Programming Languages
In your source code, you have to keep in mind which characters get special treatment inside strings by your programming language. That is because those characters are processed by the compiler, before the regex library sees the string.
Non-Printable Characters
You can use special character sequences to put non-printable characters in your regular expression. Use \t to match a tab character (ASCII 0x09), \r for carriage return (0x0D) and \n for line feed (0x0A).
Regex Syntax versus String Syntax
Many programming languages support similar escapes for non-printable characters in their syntax for literal strings in source code. Then such escapes are translated by the compiler into their actual characters before the string is passed to the regex engine. If the regex engine does not support the same escapes, this can cause an apparent difference in behavior when a regex is specified as a literal string in source code compared with a regex that is read from a file or received from user input.
Character Classes or Character Sets
A character class matches only a single character.
[ae] - matches a or e
[0-9] matches a single digit between 0 and 9
[0-9a-fA-F] matches a single hexadecimal digit
Negated Character Classes
Typing a caret after the opening square bracket negates the character class.
[^0-9\r\n] matches any character that is not a digit or a line break.