regex guide

From thelinuxwiki
Revision as of 00:51, 14 May 2017 by Nighthawk (Talk | contribs)

Jump to: navigation, search

References taken from regular-expressions.info

special chacters

\ ^ $ . | ? * + ( ) [ {

Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator like a{1,3}

Some flavors also support the \Q…\E escape sequence. All the characters between the \Q and the \E are interpreted as literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*.

The backslash in combination with a literal character can create a regex token with a special meaning. E.g. \d is a shorthand that matches a single digit from 0 to 9.


Programming Languages

In your source code, you have to keep in mind which characters get special treatment inside strings by your programming language. That is because those characters are processed by the compiler, before the regex library sees the string.


Non-Printable Characters

You can use special character sequences to put non-printable characters in your regular expression. Use \t to match a tab character (ASCII 0x09), \r for carriage return (0x0D) and \n for line feed (0x0A).

Regex Syntax versus String Syntax

Many programming languages support similar escapes for non-printable characters in their syntax for literal strings in source code. Then such escapes are translated by the compiler into their actual characters before the string is passed to the regex engine. If the regex engine does not support the same escapes, this can cause an apparent difference in behavior when a regex is specified as a literal string in source code compared with a regex that is read from a file or received from user input.