Difference between revisions of "regex guide"
(→Regex Syntax versus String Syntax) |
(→special chacters) |
||
Line 4: | Line 4: | ||
\ ^ $ . | ? * + ( ) [ { | \ ^ $ . | ? * + ( ) [ { | ||
+ | |||
+ | inside character classes only | ||
+ | |||
+ | ^ - ] | ||
Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like a{1,3} | Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like a{1,3} | ||
Line 14: | Line 18: | ||
=== Programming Languages === | === Programming Languages === | ||
In your source code, you have to keep in mind which characters get special treatment inside strings by your programming language. That is because those characters are processed by the compiler, before the regex library sees the string. | In your source code, you have to keep in mind which characters get special treatment inside strings by your programming language. That is because those characters are processed by the compiler, before the regex library sees the string. | ||
− | |||
== [http://regular-expressions.mobi/nonprint.html Non-Printable Characters] == | == [http://regular-expressions.mobi/nonprint.html Non-Printable Characters] == |
Revision as of 02:32, 14 May 2017
References taken from regular-expressions.info
Contents |
special chacters
\ ^ $ . | ? * + ( ) [ {
inside character classes only
^ - ]
Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like a{1,3}
Some flavors also support the \Q…\E escape sequence. All the characters between the \Q and the \E are interpreted as literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*.
The backslash in combination with a literal character can create a regex token with a special meaning. E.g. \d is a shorthand that matches a single digit from 0 to 9.
Programming Languages
In your source code, you have to keep in mind which characters get special treatment inside strings by your programming language. That is because those characters are processed by the compiler, before the regex library sees the string.
Non-Printable Characters
You can use special character sequences to put non-printable characters in your regular expression. Use \t to match a tab character (ASCII 0x09), \r for carriage return (0x0D) and \n for line feed (0x0A).
Regex Syntax versus String Syntax
Many programming languages support similar escapes for non-printable characters in their syntax for literal strings in source code. Then such escapes are translated by the compiler into their actual characters before the string is passed to the regex engine. If the regex engine does not support the same escapes, this can cause an apparent difference in behavior when a regex is specified as a literal string in source code compared with a regex that is read from a file or received from user input.
Character Classes or Character Sets
A character class matches only a single character.
[ae] - matches a or e
[0-9] matches a single digit between 0 and 9
[0-9a-fA-F] matches a single hexadecimal digit
Negated Character Classes
Typing a caret after the opening square bracket negates the character class.
[^0-9\r\n] matches any character that is not a digit or a line break.
Metacharacters Inside Character Classes
In most regex flavors, the only special characters or metacharacters inside a character class are the closing bracket ], the backslash \, the caret ^, and the hyphen -. The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for a star or plus, use [+*]. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.