Regular expressions

Regular expressions are a powerful way to express search patterns.

Many languages provide a way to use them in your programs.

Internally, regular expressions are represented as finite state machines.

How do regular expressions work

The interpreter / compiler parses the regular expression
It builds the finite state machine that will make it efficient to execute the regex
Applying the regex to an input, executes the FSM and produces matches

Regex syntax basics

Characters:

. # represents any character
\s # represents a whites-pace character
\d # represents a digit
\n # represents a new line
\. # represents a literal . character

Qualifiers:

* # represents 0 or more repetitions
.* # 0 or more of any character
a* # 0 or more 'a' characters

+ # represents 1 or more repetitions
.+ # one or more of any character

? # represents 0 or 1 repetitions
b? # 0 or 1 b characters

Groupings:

[abcxyz] # any of the given characters
[a-z] # any character in the given range
[^xyz] # any character other than the given

Boundaries:

^ # represents the start of a line
$ # represents the end of a line

Putting it all together

The following regular expression selects all lines that:

start with either a, b or c
have any number of any characters
do NOT end with a digit


cat my-long-file | grep -e '^[abc].*[^\d]$'

Useful Resources

Regex cheatsheet