Regex cheat sheet
From my Python class notes...
re.compile()
useful when there will be multiple results or searching over iteration
here's the pattern:
-assign re.compile(r'REGEX') to an OBJECT
patternobj = re.compile(r'REGEX')
-use OBJECT.search(VAR)
for ITERATOR in SOMEFILE.readlines():
if patternobj.search(ITERATOR):
print ITERATOR
re.search()
useful for one-off matched
here's the pattern:
if re.search(r'REGEX', THING TO LOOK IN): do something
re.match()
always matches from the start of the string
here's the pattern:
if re.match(r'REGEX', THING TO LOOK IN): do something
re.compile() produces a PATTERNOBJ
re.search() and re.match() produce a MATCHOBJ
MATCHOBJ.group()
gets the text matched by the group
here's the pattern:
string1 = "Find REGEXTHING in this sentence"
matchobj = re.search(r'(REGEX)', string1) #Note parens
id = matchobj.group(1) #Num refers to first paren group
or:
OBJECT = re.compile(r'REGEX')
matchobj = OBJECT.search(THING TO LOOK IN)
if (matchobj):
print "the matched text was "" + matchobj.group(1) +"
MATCHOBJ.groups()
returns all parens groups in a tuple
re.findall()
finds all matches (returns a list?)
here's the pattern:
OBJECT = re.findall(r'REGEX', THING TO LOOK IN)
as in
text = "High: 33, low: 17"
temp_tuples = re.findall(r'(\w+):\s+(\d+)', text)
print temp_tuples #[('High', '33'), ('low', '17')]
PATTERNOBJ.sub()
does search and replace on text returned by re.compile
here's the pattern:
OBJECT = ("some", "tuple", "of", "strings")
OBJ2 = re.compile(r'REGEX')
for ITERATOR in OBJ2:
ITERATOR = OBJ2.sub('THING TO SWAP IN', ITERATOR)
print ITERATOR
options:
ITERATOR = OBJ2.sub('THING TO SWAP IN', ITERATOR, count=n)
#limits number of substitutions in a found term
PATTERNOBJ.subn()
like sub(), but returns a 2-element tuple containing the subbed text
and # of substitutions made
(THING TO LOOK IN, NUM) = PATTERNOBJ.subn('THING TO SWAP IN', THING TO LOOK IN)
**
Flags, classes, qualifiers, and metachars
^ = beginning of line
$ = eol
\A = beginning of line
\Z = end of line
(?i) = case insensitive; put before string
re.I = case insensitive alt; put after THING TO LOOK IN
re.MULTILINE = allows ^ and $ to match on start or end of lines
re.DOTALL = splits a string allowing regex as the splitter
\d = digit 0-9
\w = word char: letters, nums or underscores; contains \d
\s = any whitespace char
\D = not in \d
\W = not in \w
\S = not in \s
[abcdef] = custom wildcard class
[^] = negates custom char class
string.hexdigits = list of hex digits
string.ascii_letters
string.ascii_lowercase
string.ascii_uppercase
string.digits
string.punctuation
string.uppercase
string.whitespace
string.printable
. = wildcard
\ = toggle/escape the char's function
* = 0 or more
+ = 1 or more
? = 0 or 1
{3,10} = between 3 and 10 (inclusive)
{3,} = 3 or more
() = grouping
() \1 = backreference
\b = boundary char, finds word boundary (whitespace, punctuation)
**
Working with files
FILEOBJ.read()
the idea is to open the while file into a string and then apply the regex to it
here's the pattern:
FILE = open('FILENAME')
TEXT = FILE.read()
if re.search(r'REGEX', TEXT, re.I):
print "I found this: ", text
