Acronym finder regex building with perl -
having n defined words, question 3 words e.g.: open icebreaker umbrela.
wondering here possible acronym exists english word, e.g. want run like:
grep -pi '^o(p(e?))?i(c(e?))?um?$' my_long_wordlist.txt in above regex decided can use
- from 1st word
o, oroporope(first, or first two, or first 3 letters) - from 2nd word
i, oricorice(first, or first two, or first 3 letters) - and form last word can use first or firtst 2 letters -
uorum
for fun - above regex return me word: opium :)
constructing regexes by hand acceptable 1 2 tests, want check many word combinatons, so, looking way how generating regexes above.
want build "acronym finder regex script" following calling:
acrobuild open:4 icebreaker:3 umbrela:3 as can see, the args words, , number after delimiter maximum number of letter beginning can used in acronym.
now question - i'm lost how build regex given length. need hint, idea or like.. - check "need here" :)
currently have this:
#!/usr/bin/perl use 5.012; use strict; use warnings; do_grep( make_regex(@argv) ); exit; sub make_regex { my(@words) = @_; $regex; foreach $wordnum (@words) { $regex .= make_word_regex( split(/:/, $wordnum) ); } $regex = '^' . $regex . '$' if $regex; return $regex; } sub make_word_regex { my($word, $num) = @_; return "" unless $word; $num = length($word) unless defined($num); #for make legal -> word:0 my(@chars) = split(//, substr($word,0,$num) ); #regex building x or xy? or x(y(z?))? etc... :( $re = ""; foreach $c (reverse(@chars)) { #reverse, building inside-out # how build regex here? # need here } return($re); } sub do_grep { my($re) = @_; "$re"; return; #tmp $recomp = qr/$re/i; open(my $fdict, "<", "/usr/share/dict/web2") or die("no dict file $!"); while(<$fdict>) { chomp; $_ if m/$recomp/; } close($fdict); }
rather nested regexp o(p(e?)?), make list of alternates: (o|op|ope).
sub make_regex_word { ($word)=@_; ($base,$count)=split(/:/,$word); @chars=split(//,$base); @re=(); ($i=0;$i<$count;$i++) { push @re,join("",@chars[0..$i]); } return "(".join("|",@re).")"; }
Comments
Post a Comment