regex - Python split string by pattern -
i have strings "aaaaabbbbbbbbbbbbbbccccccccccc". number of chars can differ , there can dash inside string, "aaaaa-bbbbbbbbbbbbbbccccccccccc".
is there smart way either split "aaaaa","bbbbbbbbbbbbbb","ccccccccccc" , indices of split or indices, without looping through every string? if dash between patterns can end either in left or right 1 long handled same.
any idea?
regular expression matchobject results include indices of match. remains match repeating characters:
import re repeat = re.compile(r'(?p<start>[a-z])(?p=start)+-?') would match if given letter character (a-z) repeated @ least once:
>>> match in repeat.finditer("aaaaabbbbbbbbbbbbbbccccccccccc"): ... print match.group(), match.start(), match.end() ... aaaaa 0 5 bbbbbbbbbbbbbb 5 19 ccccccccccc 19 30 the .start() , .end() methods on match result give exact positions in input string.
dashes included in matches, not non-repeating characters:
>>> match in repeat.finditer("a-bb-cccccccc"): ... print match.group(), match.start(), match.end() ... bb- 2 5 cccccccc 5 13 if want a- part match, replace + * multiplier:
repeat = re.compile(r'(?p<start>[a-z])(?p=start)*-?')
Comments
Post a Comment