Python Regular Expressions

Python Regular Expressions, RegEx Functions/methods, Metacharacters, Special Sequences, Search & Replace, and Matching Versus Searching.

What is Regular Expression?

A Regular Expression (RegEx or RE) in a programming language is a special text string used for describing a search pattern. It is extremely useful for extracting information from text such as code, files, logs, spreadsheets, or even documents.

Python has a built-in package called re, which can be used to work with Regular Expressions.

import re

We use RegEx functions/methods, Metacharacters, and Special Sequences for creating Regular Expressions.

I. RegEx Functions/Methods

The re module provides users a variety of functions to search for a pattern in a particular string.

findall – It returns a list containing all matches
search – It returns a Match object if there is a match anywhere in the string
split – It returns a list where the string has been split at each match
sub – It replaces one or many matches with a string

1. re.findall()

The re.findall() function returns a list of strings containing all matches of the specified pattern.

Example:

import re

mystr = “at what time?”
match = re.findall(‘at’,mystr)
print (match)

Note: The above example will return a list of all the instances of the substring at in the given string.

2. re.search()

The re.search() function returns a match object in case a match is found.

Example:

import re

string = “at what time?”
match = re.search(‘at’,string)
if (match):
print (“String is found at: ” ,match.start())
else:
print (“String is not found!”)

Note: The start() method/function returns the start index of the matched string.

3. re.plit() Function

The split() function returns a list where the string has been split at each match.

Example:

Split at each white-space character:

import re

txt = “India Country”
x = re.split(“\s”, txt)
print(x) # [‘India’, ‘Country’]

4. re.sub()

The re.sub() function is used to replace occurrences of a particular sub-string with another sub-string.

Example:

import re

string = “at what time?”
match = re.sub(“\s”,”,”,string)
print (match)

II. Metacharacters

Metacharacters are characters with a special meaning.

Examples:

1. [] – A set of characters

import re

txt = “I am a Python Programmer”

#Find all lower case characters alphabetically between “a” and “m”:

x = re.findall(“[a-e]”, txt)
print(x)

2. \ – Signals a special sequence

import re

txt = “Is it 20 Rs. or 30 Rs.”

#Find all digit characters:

x = re.findall(“\d”, txt)
print(x) # [‘2’, ‘0’, ‘3’, ‘0’]

3. ^ – Starts with

import re

txt = “My Country is India”

#Check if the string starts with ‘My’:

x = re.findall(“^My”, txt)
if x:
print(“Yes, the string starts with ‘My'”)
else:
print(“No match”)

4. $ – Ends with

import re

txt = “My Country is India”

#Check if the string ends with ‘India’:

x = re.findall(“India$”, txt)
if x:
print(“Yes, the string ends with ‘India'”)
else:
print(“No match”)

5. | – Either or

import re

txt = “India is my country and I love my country”

#Check if the string contains either “falls” or “stays”:

x = re.findall(“my|hate”, txt)

print(x)

if x:
print(“Yes, there is at least one match!”)
else:
print(“No match”)

Output:
[‘my’, ‘my’]
Yes, there is at least one match!

III. Special Sequences

A special sequence is a \ followed by one character and has a special meaning.

1. \A

Returns a match if the specified characters are at the beginning of the string

import re

txt = “Python is a Lightweight Programming Language”

#Check if the string starts with “Python”:

x = re.findall(“\APython”, txt)

print(x)

if x:
print(“Yes, there is a match!”)
else:
print(“No match”)

2. \D

It returns a match where the string DOES NOT contain digits

import re

txt = “India@123”

#Return a match at every no-digit character:

x = re.findall(“\D”, txt)

print(x)

if x:
print(“Yes, there is at least one match!”)
else:
print(“No match”)

3. \d

It returns a match where the string contains digits (numbers from 0-9)

import re

txt = “India@123”

#Check if the string contains any digits (numbers from 0-9):

x = re.findall(“\d”, txt)

print(x)

if x:
print(“Yes, there is at least one match!”)
else:
print(“No match”)

4. \S

It returns a match where the string DOES NOT contain a white space character.

import re

txt = “A B C”

#Return a match at every NON white-space character:

x = re.findall(“\S”, txt)

print(x)

if x:
print(“Yes, there is at least one match!”)
else:
print(“No match”)

5. \s

It returns a match where the string contains a white space character.

import re

txt = “I am a Python Programmer”

#Return a match at every white-space character:

x = re.findall(“\s”, txt)

print(x)

if x:
print(“Yes, there is at least one match!”)
else:
print(“No match”)

Python Regular Expressions

Python Complete Tutorial

Python Video Tutorial

Python Programming Syllabus

Python Programming Quiz

Python Interview Questions for Fresher

1. Introduction to Python Programming Language

2. Download and Install Python

3. Python Language Syntax

4. Python Keywords and Identifiers

5. Comments in Python

6. Python Variables

7. Python Data Types

8. Python Operators

9. Python Conditional Statements

10. Python Loops

11. Python Branching Statements

12. Python Numbers

13. String Handling in Python

14. Python Data Structures – Lists

15. Python Data Structures – Sets

16. Python Data Structures – Tuples

17. Python Data Structures – Dictionaries

18. Python User Defined Functions

19. Python Built-in Functions

20. Python Modules

21. Python User Input

22. File Handling in Python

23. Python Date and Time

24. Python Object-Oriented Programming

Follow me on social media: