Regex
Regular Expressions (Regex) are powerful tools for pattern matching and text processing in programming.
1. Basics of Regex Syntax
Common Regex Symbols
Symbol | Description | Example | Matches |
---|---|---|---|
. |
Any single character except newline | a.b |
acb , a1b , a_b |
^ |
Start of string | ^hello |
Matches “hello world”, not “world hello” |
$ |
End of string | world$ |
Matches “hello world”, not “world hello” |
* |
0 or more repetitions | ab* |
Matches a , ab , abb , abbb |
+ |
1 or more repetitions | ab+ |
Matches ab , abb , abbb but not a |
? |
0 or 1 occurrence | colou?r |
Matches color and colour |
{n} |
Exactly n repetitions |
a{3} |
Matches aaa |
{n,} |
At least n repetitions |
a{2,} |
Matches aa , aaa , aaaa |
{n,m} |
Between n and m repetitions |
a{2,4} |
Matches aa , aaa , aaaa |
[] |
Character set | [abc] |
Matches a , b , or c |
[^ ] |
Negated character set | [^abc] |
Matches any character except a , b , c |
\d |
Digit (0-9) | \d+ |
Matches 123 , 456 |
\D |
Non-digit | \D+ |
Matches abc , hello |
\w |
Word character (a-z, A-Z, 0-9, _) | \w+ |
Matches hello123 |
\W |
Non-word character | \W+ |
Matches @#$%^ |
\s |
Whitespace | \s+ |
Matches spaces, tabs, newlines |
\S |
Non-whitespace | \S+ |
Matches hello , world |
\b |
Word boundary | \bhello\b |
Matches “hello” in “hello world” but not “helloworld” |
\B |
Non-word boundary | \Bhello\B |
Matches “hello” inside a word |
2. Groups & Alternation
Using Parentheses for Grouping
()
groups patterns together.- Example:
(hello|hi)
matches"hello"
or"hi"
.
import re
= re.compile(r"(hello|hi) world")
pattern print(bool(pattern.search("hi world"))) # True
Capturing Groups
- Captures parts of a match for later use.
- Example: Extract date
(2024-02-17)
= re.search(r"(\d{4})-(\d{2})-(\d{2})", "Date: 2024-02-17")
match if match:
print(match.group(1)) # "2024"
print(match.group(2)) # "02"
print(match.group(3)) # "17"
Non-Capturing Groups (?:...)
- Groups without storing matches.
r"(?:hello|hi) world", "hi world") # Matches but not stored re.search(
Named Groups (?P<name>...)
- Assign names to capture groups.
= re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", "2024-02-17")
match print(match.group("year")) # 2024
3. Lookaheads & Lookbehinds
Lookaheads and Lookbehinds match a pattern without consuming characters.
Positive Lookahead (?=...)
- Ensures something follows a pattern.
- Example: Match
hello
only if it’s followed byworld
r"hello(?= world)", "hello world") # Matches re.search(
Negative Lookahead (?!...)
- Ensures something does not follow.
- Example: Match
hello
but nothello world
r"hello(?! world)", "hello everyone") # Matches re.search(
Positive Lookbehind (?<=...)
- Ensures something precedes a pattern.
- Example: Match
world
only if preceded byhello
r"(?<=hello )world", "hello world") # Matches re.search(
Negative Lookbehind (?<!...)
- Ensures something does not precede.
- Example: Match
world
only if not preceded byhello
r"(?<!hello )world", "hi world") # Matches re.search(
4. Regex in Python (re
Module)
Finding All Matches (findall
)
import re
= "My number is 123-456-7890 and yours is 987-654-3210."
text = r"\d{3}-\d{3}-\d{4}"
pattern = re.findall(pattern, text)
matches print(matches) # ['123-456-7890', '987-654-3210']
Replacing Text (sub
)
= "I love Java, Java is great!"
text print(re.sub(r"Java", "Python", text)) # "I love Python, Python is great!"
Splitting Strings (split
)
= "apple,banana;orange|grape"
text print(re.split(r"[,;|]", text)) # ['apple', 'banana', 'orange', 'grape']
Case-Insensitive Search
r"hello", "HELLO", re.IGNORECASE) # Matches re.search(
5. Practical Use Cases
1️⃣ Validate Email Addresses
= r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
email_pattern print(bool(re.match(email_pattern, "user@example.com"))) # True
2️⃣ Validate Phone Numbers
= r"^\+?[1-9]\d{1,14}$"
phone_pattern print(bool(re.match(phone_pattern, "+1234567890"))) # True
4️⃣ Validate Password Strength
= r"^(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$"
password_pattern print(bool(re.match(password_pattern, "StrongP@ss1"))) # True
6. Regex Performance Optimization
Use raw strings (
r"pattern"
) to avoid escaping issues.Use compiled regex when reusing patterns:
= re.compile(r"\d{3}-\d{3}-\d{4}") pattern "Call 123-456-7890 or 987-654-3210") pattern.findall(
Avoid backtracking traps by limiting
.*
usage:# Bad (greedy match) r"<.*>", "<div>content</div>") # Matches "<div>content</div>" re.search( # Good (lazy match) r"<.*?>", "<div>content</div>") # Matches "<div>" re.search(
Conclusion
🚀 Regex is powerful for text search, validation, and extraction. ✔ Learn the basic symbols (.
, *
, ?
, {}
)
✔ Use groups & lookaheads for flexible patterns
✔ Apply regex for emails, passwords, phone numbers, logs
✔ Optimize with compiled regex and lazy matches