Regular Expressions Guide for Developers

Regular expressions (regex) are one of the most powerful and versatile tools in a developer's toolkit. They let you search, match, validate, and transform text using compact pattern definitions. Every major programming language supports them, and mastering even the basics will make you significantly more productive.

This guide covers regex fundamentals from the ground up: syntax building blocks, the most common real-world patterns, flags that change matching behavior, and performance tips to avoid costly mistakes. To test patterns as you learn, use our Regex Tester — it provides live matching, group highlighting, and explanation of your pattern.

What Is Regex and When Should You Use It?

A regular expression is a sequence of characters that defines a search pattern. When you apply that pattern to a string, the regex engine scans the text and reports matches. Regex is the right tool when you need to:

  • Validate input — check if an email, URL, phone number, or date matches an expected format
  • Search and extract — pull specific data from logs, HTML, CSV, or unstructured text
  • Find and replace — transform text patterns across files (renaming variables, reformatting dates)
  • Parse structured text — extract fields from log lines, URLs, or configuration values

Regex is not the right tool for parsing nested structures like HTML or JSON — use a proper parser for those. The classic advice: "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." Use regex for flat pattern matching, not complex parsing.

Basic Syntax: Character Classes, Quantifiers, and Anchors

Character Classes

Character classes match a single character from a defined set:

Pattern Matches Example
. Any character except newline c.t matches "cat", "cot", "c9t"
\d Any digit (0-9) \d{3} matches "123", "456"
\w Word character (a-z, A-Z, 0-9, _) \w+ matches "hello", "user_1"
\s Whitespace (space, tab, newline) \s+ matches one or more spaces
[abc] Any character in the set [aeiou] matches any vowel
[^abc] Any character NOT in the set [^0-9] matches non-digits

Uppercase versions negate: \D matches non-digits, \W matches non-word characters, \S matches non-whitespace.

Quantifiers

Quantifiers specify how many times the preceding element should match:

Quantifier Meaning Example
* 0 or more ab*c matches "ac", "abc", "abbc"
+ 1 or more ab+c matches "abc", "abbc" but not "ac"
? 0 or 1 (optional) colou?r matches "color" and "colour"
{n} Exactly n times \d{4} matches exactly 4 digits
{n,m} Between n and m times \d{2,4} matches 2-4 digits
{n,} n or more times \w{8,} matches 8+ word chars

Anchors

Anchors match a position, not a character:

  • ^ — start of string (or line with m flag)
  • $ — end of string (or line with m flag)
  • \b — word boundary (between a \w and \W character)

For example, ^\d{5}$ matches a string that is exactly a 5-digit ZIP code — nothing more, nothing less. Without the anchors, \d{5} would also match inside "The ZIP is 90210 for that area."

Common Regex Patterns You Will Actually Use

Here are battle-tested patterns for the most frequently needed validations:

Email Address

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Matches: user@example.com, first.last@company.co.uk. Note: for production email validation, also send a confirmation email — regex alone cannot verify deliverability.

URL

^https?:\/\/[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}(\/[^\s]*)?$

Matches: https://example.com, http://docs.example.com/api/v2. For more permissive URL matching, consider the URL constructor in JavaScript which handles edge cases better.

Phone Number (US)

^(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$

Matches: (555) 123-4567, 555.123.4567, +1-555-123-4567, 5551234567.

IPv4 Address

^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

Matches valid IPv4 addresses where each octet is 0-255. This is more accurate than the simpler \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} which would also match invalid addresses like 999.999.999.999.

Date (YYYY-MM-DD)

^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Matches ISO 8601 dates like 2026-03-29. Note: this validates format but not calendar validity (it will accept 2026-02-31). Use a date library for full validation.

Test any of these patterns with real data in our Regex Tester, or browse our Regex Library for more pre-built patterns.

Regex Flags: g, i, m, s

Flags modify how the regex engine interprets your pattern. In JavaScript, flags are placed after the closing slash: /pattern/flags.

  • g (global) — find all matches in the string, not just the first. Without g, the regex stops after the first match. Essential when using String.matchAll() or String.replace() to replace all occurrences.
  • i (case-insensitive)/hello/i matches "Hello", "HELLO", and "hElLo". Use when case does not matter, such as matching HTML tags or user input.
  • m (multiline) — makes ^ and $ match the start and end of each line, not the entire string. Critical when processing multi-line text like log files or configuration.
  • s (dotAll) — makes . match newline characters (\n) as well. By default, . matches any character except newlines. Use s when your pattern needs to span multiple lines.

Flags can be combined: /pattern/gim applies global, case-insensitive, and multiline matching simultaneously.

Performance Tips: Avoiding Regex Pitfalls

Regex engines use backtracking, which means poorly written patterns can be exponentially slow on certain inputs. This is called catastrophic backtracking (or ReDoS — Regular Expression Denial of Service). Here is how to avoid it:

1. Avoid Nested Quantifiers

Patterns like (a+)+ or (a*)* create exponential backtracking. The engine tries every possible way to split the input between the inner and outer quantifiers. On the input "aaaaaaaaaaaaaab", this can take millions of steps.

Fix: Flatten to a+ — there is no reason for the nested group.

2. Use Non-Capturing Groups

If you do not need to extract the matched text, use (?:...) instead of (...). Capturing groups store the matched text in memory, which is unnecessary overhead when you only need grouping for alternation or quantification.

// Slower (capturing)
/(https?):\/\/(www\.)?(.+)/

// Faster (non-capturing where not needed)
/(?:https?):\/\/(?:www\.)?(.+)/

3. Be Specific, Not Greedy

Instead of .* (match everything), use a more specific character class. For example, to match a quoted string, "[^"]*" is much faster than ".*?" because it cannot backtrack past the closing quote.

4. Anchor When Possible

Adding ^ and $ anchors tells the engine exactly where to start and stop, eliminating unnecessary scanning. If you know the pattern must match the entire string, always anchor it.

5. Set Timeouts in Production

When running user-supplied regex (search forms, filters), always set a timeout. In JavaScript, use a Web Worker with a timeout. In Python, use the regex library's timeout parameter. Never let untrusted regex run unbounded.

Test Your Regex Patterns

Our Regex Tester lets you write patterns, see matches highlighted in real time, inspect capturing groups, and get plain-English explanations of what your pattern does. It supports all JavaScript regex flags and runs entirely in your browser — your patterns and test data never leave your device.

Frequently Asked Questions

A regular expression (regex) is a sequence of characters that defines a search pattern. It is used to match, find, and manipulate text in strings. Regex is supported in virtually every programming language (JavaScript, Python, Java, C#, Go, PHP, Ruby) and many command-line tools (grep, sed, awk). Common uses include input validation, search and replace, and log parsing.
Greedy quantifiers (*, +, ?) match as much text as possible, then backtrack if the rest of the pattern fails. Lazy quantifiers (*?, +?, ??) match as little text as possible. For example, given the text "start middle end", the greedy pattern start.*end matches the entire string, while start.*?end also matches the entire string but tries the shortest match first. The distinction matters most with alternating delimiters.
A practical email regex is: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ — this matches most real-world email addresses. The full RFC 5322 email specification is extremely complex, so for production use, combine a basic regex check with a confirmation email to verify the address is real and reachable.
The g (global) flag finds all matches instead of stopping at the first. The i (case-insensitive) flag matches regardless of uppercase or lowercase. The m (multiline) flag makes ^ and $ match the start and end of each line. The s (dotAll) flag makes the dot (.) also match newline characters. In JavaScript, flags appear after the closing slash: /pattern/gi.
Catastrophic backtracking occurs when a regex engine tries exponentially many paths on certain inputs. To prevent it: avoid nested quantifiers like (a+)+, use non-capturing groups (?:...) when you don't need the match, prefer specific character classes like [^"]* over .*, anchor patterns with ^ and $, and always set a timeout when running user-supplied regex patterns.