RegEx(Regular Expression) in Python
Hey there! In this guide, we'll explore RegEx or Regular Expression in Python. A Regular Expression or RegEx is a special sequence of characters that uses a search pattern to find a string or set of strings. Let's dive in!
Regular Expressionβ
Regular expressions (regex) are a powerful tool for working with text by allowing you to define patterns that match specific character sequences.Regex is commonly used for tasks like:
- Searching for specific patterns within a string
- Validating inputs (e.g., email addresses, phone numbers)
- Extracting or replacing parts of text
1. RegEx Module in Pythonβ
Python has a built-in module named re
that is used for regular expressions in Python. We can import this module by using the import statement.
Example:
# importing re module
import re
After importing the module you can start using the various function provided .
2. Metacharatersβ
Metacharacters are the characters with special meaning.These are used for pattern making. Below is a list of few Metacharacters with their descriptions
Metacharacter | Description |
---|---|
. | Matches any character except a newline. |
^ | Matches the start of a string. |
$ | Matches the end of a string. |
* | Matches 0 or more repetitions of the preceding character or group. |
+ | Matches 1 or more repetitions of the preceding character or group. |
? | Matches 0 or 1 occurrence of the preceding character or group (makes it optional). |
{} | Specifies the exact number of repetitions {n} , a range {n,m} , or a minimum {n,} . |
[] | Matches any single character inside the brackets. |
| | Acts as an OR operator between expressions (e.g., `a |
() | Groups multiple expressions together and captures the matched text. |
\ | Escapes special characters (e.g., \. matches a literal . ) or introduces special sequences like \d (We will learn Special Sequences in the next section). |
3. Special Sequencesβ
Special Sequence in regex are shorthand notations that match specific types of characters, like digits, whitespace, or word characters, making patterns more concise and readable. Table below shows various examples Special Sequence, their description and examples
Special Sequence | Description | Example |
---|---|---|
\d | Matches any digit (equivalent to [0-9] ). | \d{3} matches 123 in abc123 |
\D | Matches any non-digit character. | \D+ matches abc in abc123 |
\w | Matches any alphanumeric character or underscore. | \w+ matches abc123 in abc123 |
\W | Matches any non-alphanumeric character. | \W+ matches !@# in abc!@# |
\s | Matches any whitespace character (space, tab, newline). | \s+ matches spaces in a b c |
\S | Matches any non-whitespace character. | \S+ matches abc123 in abc123 |
\b | Matches a word boundary. | \bword\b matches word in word! but not in sword |
\B | Matches a non-word boundary. | \Bword\B matches word in swordplay but not word alone |
\A | Matches the start of the string. | \Aabc matches abc in abc123 |
\Z | Matches the end of the string. | 123\Z matches 123 in abc123 |
\ | Escapes special characters. | \. matches a literal . in 3.14 |
\t | Matches a tab character. | a\tb matches a b (tab) |
\n | Matches a newline character. | a\nb matches a b with newline |
3. RegEx Functionsβ
Regex functions in Python, available through the re module, provide powerful tools to search, match, split, replace, and manipulate text based on specified patterns, making it easy to handle complex text processing tasks.
Function | Description | Example Usage |
---|---|---|
re.search() | Searches for the first occurrence of the pattern within the string and returns a match object if found. | re.search(r"\d+", "Age: 25") finds 25 . |
re.match() | Checks if the pattern matches only at the beginning of the string. Returns a match object if found. | re.match(r"Hello", "Hello World") matches Hello . |
re.findall() | Finds all non-overlapping matches of the pattern in the string and returns them as a list. | re.findall(r"\w+", "Hello World") returns ['Hello', 'World'] . |
re.split() | Splits the string by occurrences of the pattern and returns a list of substrings. | re.split(r"\s", "Split this sentence") returns ['Split', 'this', 'sentence'] . |
re.sub() | Replaces occurrences of the pattern in the string with a specified replacement. | re.sub(r"\d", "#", "Phone: 123-456") returns Phone: ###-### . |
re.compile() | Compiles a regex pattern into a regex object for reuse, which can then use search() , match() , etc. | pattern = re.compile(r"\d+"); pattern.search("ID: 789") finds 789 . |
Example:
import re
pattern = r"\b[A-Za-z]+\b" # matches any word consisting of letters only
text = "Hello, Algo world!"
matches = re.findall(pattern, text)
print(matches) # Output: ['Hello', 'Algo', 'world']
4. Practical Use Case: Password Validatorβ
Hereβs an example of a Python Program to check the validity of password input by the user.
import re
def validate_password(password):
# Define the regex pattern for password validation
pattern = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$"
# Check if the password matches the pattern
if re.match(pattern, password):
return "Password is valid"
else:
return "Password is invalid"
# Test the function
print(validate_password("Password123!")) # Output: Password is valid
print(validate_password("Pass123")) # Output: Password is invalid (too short, no special character)