java - Search for a specific term in a Lucene index -
i'm trying conduct search on lucene index specific words know indexed result not good.
how perform query specific term ("129202")? i've tried adding plus sign @ beginning of string did not work.
my query:
queryparser q = new queryparser(version.lucene_42, "tags", new simpleanalyzer(version.lucene_42)); query query = q.parse("sapatilha feminina ramarim 129202 cinza"); below document (xml) indexed want get
<?xml version="1.0" encoding="utf-8"?> <product> <tags> <tag>sapatilha pedras preto</tag> <tag>ramarin</tag> <tag>ramarin 129202</tag> <tag>preto</tag> </tags> <id>71</id> <url>http://www.dafiti.com.br/sapatilha-pedras-preto-1135428.html</url> </product>
simpleanalyzer, analyzer using query (and assume index), uses lettertokenizer, which, according documentation:
...defines tokens maximal strings of adjacent letters, defined java.lang.character.isletter()
which say, not numbers. numbers lost entirely analyzer. recommend different one, such standardanalyzer or whitespaceanalyzer.
to demonstrate:
stringreader reader = new stringreader("ramarim 129202 cinza"); lettertokenizer stream = new lettertokenizer(version.lucene_42, reader); stream.setreader(reader); stream.reset(); while(stream.incrementtoken()) { system.out.println(stream.reflectasstring(false)); } stream.close(); outputs:
term=ramarim,bytes=[72 61 6d 61 72 69 6d],startoffset=19,endoffset=26 term=cinza,bytes=[63 69 6e 7a 61],startoffset=34,endoffset=39 substituting in standardtokenizer (which used standardanalyzer) you:
term=ramarim,bytes=[72 61 6d 61 72 69 6d],startoffset=19,endoffset=26,positionincrement=1,type=<alphanum> term=129202,bytes=[31 32 39 32 30 32],startoffset=27,endoffset=33,positionincrement=1,type=<num> term=cinza,bytes=[63 69 6e 7a 61],startoffset=34,endoffset=39,positionincrement=1,type=<alphanum>
Comments
Post a Comment