Postgresql 9.x: Index to optimize `xpath_exists` (XMLEXISTS) queries -
we have queries of form
select sum(acol) xpath_exists('/root/keyvalue[key="val"]/value//text()', xmlcol)
what index can built speed clause ?
a btree index created using
create index idx_01 using btree(xpath_exists('/root/keyvalue[key="val"]/value//text()', xmlcol))
does not seem used @ all.
edit
setting enable_seqscan
off
, query using xpath_exists
faster (one order of magnitude) , shows using corresponding index (the btree index built xpath_exists
).
any clue why postgresql not using index , attempt slower sequential scan ?
since not want disable sequential scanning globally, square 1 , happily welcoming suggestions.
edit 2 - explain plans
see below - cost of first plan (seqscan off) higher processing time much faster
b2box=# set enable_seqscan=off; set b2box=# explain analyze select count(*) b2head.item cluster = 'b2box' , ( ( xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()', content) ) ) offset 0 limit 1; query plan ----------------------------------------------------------------------------------------------------------------------------------------------------------------- limit (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1) -> aggregate (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1) -> bitmap heap scan on item (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1) filter: (xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()'::text, content, '{}'::text[]) , ((cluster)::text = 'b2box'::text)) -> bitmap index scan on item_counter_01 (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1) index cond: (xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()'::text, content, '{}'::text[]) = true) total runtime: 606.136 ms (7 rows)
b2box=# set enable_seqscan=on; set b2box=# explain analyze select count(*) b2head.item cluster = 'b2box' , ( ( xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()', content) ) ) offset 0 limit 1; query plan ----------------------------------------------------------------------------------------------------------------------------------------------------------------- limit (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1) -> aggregate (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1) -> seq scan on item (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1) filter: (xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()'::text, content, '{}'::text[]) , ((cluster)::text = 'b2box'::text)) rows removed filter: 108945 total runtime: 10864.242 ms (6 rows)
planner cost parameters
cost of first plan (seqscan off) higher processing time faster
this tells me random_page_cost
, seq_page_cost
wrong. you're on storage fast random i/o - either because of database cached in ram or because you're using ssd, san cache, or other storage random i/o inherently fast.
try:
set random_page_cost = 1; set seq_page_cost = 1.1;
to reduce cost param differences , re-run. if job consider changing params in postgresql.conf.
.
your row-count estimates reasonable, doesn't planner mis-estimation problem or problem bad table statistics.
incorrect query
your query incorrect. offset 0 limit 1
without order by
produce unpredictable results unless you're guaranteed have 1 match, in case offset ... limit ...
clauses unnecessary , can removed entirely.
you're better off phrasing such queries select max(...)
or select min(...)
possible; postgresql tend able use index pluck off desired value without doing expensive table scan or index scan , sort.
tips
btw, future questions postgresql wiki has information in performance category , guide asking slow query questions.
Comments
Post a Comment