regex - Python split string by pattern -


i have strings "aaaaabbbbbbbbbbbbbbccccccccccc". number of chars can differ , there can dash inside string, "aaaaa-bbbbbbbbbbbbbbccccccccccc".

is there smart way either split "aaaaa","bbbbbbbbbbbbbb","ccccccccccc" , indices of split or indices, without looping through every string? if dash between patterns can end either in left or right 1 long handled same.

any idea?

regular expression matchobject results include indices of match. remains match repeating characters:

import re  repeat = re.compile(r'(?p<start>[a-z])(?p=start)+-?') 

would match if given letter character (a-z) repeated @ least once:

>>> match in repeat.finditer("aaaaabbbbbbbbbbbbbbccccccccccc"): ...     print match.group(), match.start(), match.end() ...  aaaaa 0 5 bbbbbbbbbbbbbb 5 19 ccccccccccc 19 30 

the .start() , .end() methods on match result give exact positions in input string.

dashes included in matches, not non-repeating characters:

>>> match in repeat.finditer("a-bb-cccccccc"): ...     print match.group(), match.start(), match.end() ...  bb- 2 5 cccccccc 5 13 

if want a- part match, replace + * multiplier:

repeat = re.compile(r'(?p<start>[a-z])(?p=start)*-?') 

Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -