Regular Expressions are a clear and concise way to search text.

Does this group of characters match a specific pattern?
The pattern starts after / and ends before /. In between slashes is regular expression known as a pattern.

There are two main parts to a regular expression:

  1. Subject string
    The text searching through to find a match.
  2. Regex / Regular Expression
    Group of characters that represent rules for searching or matching strings in a concise manner.

Regex walks each character in the pattern and the subject trying to find matches.

Common uses are for validation (numbers, email, passwords, domain names) or searching (words in a sentence, unwanted characters, extracting sections, formatting).


| is an or operator, conditionally match 407 or 321 (takes left most match first).

+ is a plus operator (quantifier), which will repeat the pattern immediately prior one or more times until no longer matched.
/ar+/ allows as many r

Partial match where part of the subject matches the pattern.

A character set is represented by square brackets [a-z], with the range of characters inside a-z:

  • a range only works within a character set
  • a character set only represents 1 character in subject (i.e. if written first, the first character).
  • range is case sensitive
    Otherwise, can use the plus operator:
    To allow uppercase alpha range, can use capitals in character set, or add a modifier outside the pattern:
    /[a-zA-Z]+/ or /[a-z]+/i

i modifier means case insensitive, which will match upper and lower case characters (modifiers can be language specific).

To match space, can use the literal space or can use \s (whitespace character).
\s will match for:

  • spaces, tabs, new lines and more
    \s can be used within character set to match on whitespace:
    The order of characters within character set is irrelevant.

\w is the word metacharacter. Will match on [a-zA-Z0-9].


Special characters

. is the wildcard metacharacter that will match any character except the new line. Must escape a dot to match the literal . directly: \..
+ matches 1 or more time. To match the literal +: \+.
? makes preceeding pattern optional. To match literally must be escaped: \?.
() paratheses can be used to create groups: (com|net|edu).


Using anchors, can ensure there's nothing before or after desired pattern:
^: start looking at the beginning of the subject
$: stop looking at the end of the subject



\b is word boundary metacharacter like ^ or $ except it allows matches on whole words only.
? matches the preceeding character 0 to 1 time. Or, use with a group /pirate\s(ship)?/
\d: any number
When placed within a character set, ^ means not: [^\d] (not any number).

Capitalised metacharacters will match the opposite of their lowercase version:
\D: match every character except numbers (same as [^\d])
\S: match every character except whitespace (same as [^\s])
\W: match every character except words (same as [^\w])

Interval expressions matches a specific number of times:
/[a-z]{2}: match lowercase character set twice
/[a-z]{1,3}: or in a range: match lowercase character minimum of 1, maximum of 3

Multi-line Strings

\n is the new line character.
g global modifier will match every instance of pattern, not just the first instance.
m multi-line modifier which allows anchors /^pattern$/ to anchor at every line, not just entire string.

Capture Groups

Groups can be used to return values.
For example, /(learn((by)doing))/ will return match groups learnbydoing, bydoing, and by.

?: at beginning of group creates a non-capturing group: (?:street|lane).