Regex cheat sheet

Filed Under:

From my Python class notes...

    re.compile()
        useful when there will be multiple results or searching over iteration
        here's the pattern:
        -assign re.compile(r'REGEX') to an OBJECT
            patternobj = re.compile(r'REGEX')
        -use OBJECT.search(VAR)
            for ITERATOR in SOMEFILE.readlines():
                if patternobj.search(ITERATOR):
                    print ITERATOR
                    
    re.search()
        useful for one-off matched
        here's the pattern:
            if re.search(r'REGEX', THING TO LOOK IN): do something
    
    re.match()
        always matches from the start of the string
        here's the pattern:
            if re.match(r'REGEX', THING TO LOOK IN): do something
    
    re.compile() produces a PATTERNOBJ
    re.search() and re.match() produce a MATCHOBJ
    
    MATCHOBJ.group()
        gets the text matched by the group
        here's the pattern:
            string1 = "Find REGEXTHING in this sentence"
            matchobj = re.search(r'(REGEX)', string1) #Note parens
            id = matchobj.group(1) #Num refers to first paren group
        or:
            OBJECT = re.compile(r'REGEX')
            matchobj = OBJECT.search(THING TO LOOK IN)
            if (matchobj):
                print "the matched text was "" + matchobj.group(1) +"
    MATCHOBJ.groups()
        returns all parens groups in a tuple
        
    re.findall()
        finds all matches (returns a list?)
        here's the pattern:
            OBJECT = re.findall(r'REGEX', THING TO LOOK IN)
        as in
            text = "High: 33, low: 17"
            temp_tuples = re.findall(r'(\w+):\s+(\d+)', text)
            print temp_tuples #[('High', '33'), ('low', '17')]
    
    PATTERNOBJ.sub()
        does search and replace on text returned by re.compile
        here's the pattern:
            OBJECT = ("some", "tuple", "of", "strings")
            OBJ2 = re.compile(r'REGEX')
            for ITERATOR in OBJ2:
                ITERATOR = OBJ2.sub('THING TO SWAP IN', ITERATOR)
                print ITERATOR
        options:
            ITERATOR = OBJ2.sub('THING TO SWAP IN', ITERATOR, count=n) 
                #limits number of substitutions in a found term
            
    PATTERNOBJ.subn()
        like sub(), but returns a 2-element tuple containing the subbed text 
        and # of substitutions made
        (THING TO LOOK IN, NUM) = PATTERNOBJ.subn('THING TO SWAP IN', THING TO LOOK IN)
            

**
        
    Flags, classes, qualifiers, and metachars
        ^ = beginning of line
        $ = eol
        \A = beginning of line
        \Z = end of line
        (?i) = case insensitive; put before string
        re.I = case insensitive alt; put after THING TO LOOK IN
        re.MULTILINE = allows ^ and $ to match on start or end of lines
        re.DOTALL = splits a string allowing regex as the splitter
        \d = digit 0-9
        \w = word char: letters, nums or underscores; contains \d
        \s = any whitespace char
        \D = not in \d
        \W = not in \w
        \S = not in \s
        [abcdef] = custom wildcard class
        [^] = negates custom char class
        string.hexdigits = list of hex digits
        string.ascii_letters
        string.ascii_lowercase
        string.ascii_uppercase
        string.digits
        string.punctuation
        string.uppercase
        string.whitespace
        string.printable
        . = wildcard
        \ = toggle/escape the char's function
        * = 0 or more
        + = 1 or more
        ? = 0 or 1
        {3,10} = between 3 and 10 (inclusive)
        {3,} = 3 or more
        () = grouping
        () \1 = backreference
        \b = boundary char, finds word boundary (whitespace, punctuation)
        
**

    Working with files
    
    FILEOBJ.read()
        the idea is to open the while file into a string and then apply the regex to it
        here's the pattern:
            FILE = open('FILENAME')
            TEXT = FILE.read()
            if re.search(r'REGEX', TEXT, re.I):
                print "I found this: ", text