↓ Code Available Below! ↓

This video shows how to match and strip punctuation from a text string using the regular expressions package in Python. Text data often contains punctuation that you might want to strip out of text as a preprocessing step before splitting the text into its constituent words for data analysis. The Python regular expressions package makes it easy to match and replace punctuation with empty strings, effectively stripping out all punctuation.


If you find this video useful, like, share and subscribe to support the channel!
► Subscribe: https://www.youtube.com/c/DataDaft?sub_confirmation=1


Code used in this Python Code Clip:

import re

lines = '''
Nappa @ Vegeta: What does the scouter say about his power level?
Vegeta @ Nappga: It's over (9000!)
Nappa @ Vegeta: [What 9000?] That can't be right... Can it?'''

# Match and strip punctuation with re.sub()
re.sub(pattern = "[^\w\s]",
repl = "",
string = lines)

# Match and strip punctuation and whitespace with re.sub()
re.sub(pattern = "\W",
repl = "",
string = lines)

* Note you can access some common punctuation characters using:

import string
string.punctuation

** Note: YouTube does not allow greater than or less than symbols in the text description, so the code above will not be exactly the same as the code shown in the video! I will use Unicode large < and > symbols in place of the standard sized ones. .


⭐ Kite is a free AI-powered coding assistant that integrates with popular editors and IDEs to give you smart code completions and docs while you’re typing. It is a cool application of machine learning that can also help you code faster! Check it out here: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=datadaft&utm_content=description-only