regex - Python split string by pattern -
i have strings "aaaaabbbbbbbbbbbbbbccccccccccc"
. number of chars can differ , there can dash inside string, "aaaaa-bbbbbbbbbbbbbbccccccccccc"
.
is there smart way either split "aaaaa"
,"bbbbbbbbbbbbbb"
,"ccccccccccc"
, indices of split or indices, without looping through every string? if dash between patterns can end either in left or right 1 long handled same.
any idea?
regular expression matchobject
results include indices of match. remains match repeating characters:
import re repeat = re.compile(r'(?p<start>[a-z])(?p=start)+-?')
would match if given letter character (a
-z
) repeated @ least once:
>>> match in repeat.finditer("aaaaabbbbbbbbbbbbbbccccccccccc"): ... print match.group(), match.start(), match.end() ... aaaaa 0 5 bbbbbbbbbbbbbb 5 19 ccccccccccc 19 30
the .start()
, .end()
methods on match result give exact positions in input string.
dashes included in matches, not non-repeating characters:
>>> match in repeat.finditer("a-bb-cccccccc"): ... print match.group(), match.start(), match.end() ... bb- 2 5 cccccccc 5 13
if want a-
part match, replace +
*
multiplier:
repeat = re.compile(r'(?p<start>[a-z])(?p=start)*-?')
Comments
Post a Comment