Acronym finder regex building with perl -

having n defined words, question 3 words e.g.: open icebreaker umbrela.

wondering here possible acronym exists english word, e.g. want run like:

grep -pi '^o(p(e?))?i(c(e?))?um?$' my_long_wordlist.txt

in above regex decided can use

from 1st word o, or op or ope (first, or first two, or first 3 letters)
from 2nd word i, or ic or ice (first, or first two, or first 3 letters)
and form last word can use first or firtst 2 letters - u or um

for fun - above regex return me word: opium :)

constructing regexes by hand acceptable 1 2 tests, want check many word combinatons, so, looking way how generating regexes above.

want build "acronym finder regex script" following calling:

acrobuild open:4 icebreaker:3 umbrela:3

as can see, the args words, , number after delimiter maximum number of letter beginning can used in acronym.

now question - i'm lost how build regex given length. need hint, idea or like.. - check "need here" :)

currently have this:

#!/usr/bin/perl  use 5.012; use strict; use warnings;  do_grep(  make_regex(@argv) ); exit;  sub make_regex {     my(@words) = @_;     $regex;     foreach $wordnum (@words) {         $regex .= make_word_regex( split(/:/, $wordnum) );     }     $regex = '^' . $regex . '$' if $regex;     return $regex; }  sub make_word_regex {     my($word, $num) = @_;      return "" unless $word;     $num = length($word) unless defined($num);  #for make legal -> word:0      my(@chars) = split(//, substr($word,0,$num) );      #regex building x  or   xy?  or  x(y(z?))? etc... :(     $re = "";     foreach $c (reverse(@chars)) {   #reverse, building inside-out         # how build regex here?         # need here     }     return($re); }  sub do_grep {     my($re) = @_;     "$re"; return; #tmp     $recomp = qr/$re/i;      open(my $fdict, "<", "/usr/share/dict/web2") or die("no dict file $!");     while(<$fdict>) {         chomp;         $_ if m/$recomp/;     }     close($fdict); }

rather nested regexp o(p(e?)?), make list of alternates: (o|op|ope).

sub make_regex_word {     ($word)=@_;     ($base,$count)=split(/:/,$word);     @chars=split(//,$base);     @re=();     ($i=0;$i<$count;$i++) {         push @re,join("",@chars[0..$i]);     }     return "(".join("|",@re).")"; }

Search This Blog

Babette

Acronym finder regex building with perl -

Comments

Post a Comment

Popular posts from this blog

node.js - Bad Request - node js ajax post -

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -