Postgresql 9.x: Index to optimize `xpath_exists` (XMLEXISTS) queries -

we have queries of form

select sum(acol) xpath_exists('/root/keyvalue[key="val"]/value//text()', xmlcol)

what index can built speed clause ?

a btree index created using

create index idx_01 using btree(xpath_exists('/root/keyvalue[key="val"]/value//text()', xmlcol))

does not seem used @ all.

edit

setting enable_seqscan off, query using xpath_exists faster (one order of magnitude) , shows using corresponding index (the btree index built xpath_exists).

any clue why postgresql not using index , attempt slower sequential scan ?

since not want disable sequential scanning globally, square 1 , happily welcoming suggestions.

edit 2 - explain plans

see below - cost of first plan (seqscan off) higher processing time much faster

b2box=# set enable_seqscan=off; set b2box=# explain analyze select count(*)  b2head.item cluster = 'b2box' , (  ( xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()', content) )  )  offset 0 limit 1;                                                                            query plan -----------------------------------------------------------------------------------------------------------------------------------------------------------------  limit  (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.042..606.042 rows=1 loops=1)    ->  aggregate  (cost=22766.63..22766.64 rows=1 width=0) (actual time=606.039..606.039 rows=1 loops=1)          ->  bitmap heap scan on item  (cost=1058.65..22701.38 rows=26102 width=0) (actual time=3.290..603.823 rows=4085 loops=1)                filter: (xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()'::text, content, '{}'::text[]) , ((cluster)::text = 'b2box'::text))                ->  bitmap index scan on item_counter_01  (cost=0.00..1052.13 rows=56515 width=0) (actual time=2.283..2.283 rows=4085 loops=1)                      index cond: (xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()'::text, content, '{}'::text[]) = true)  total runtime: 606.136 ms (7 rows)

plan on explain.depesz.com

b2box=# set enable_seqscan=on; set b2box=# explain analyze select count(*)  b2head.item cluster = 'b2box' , (  ( xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()', content) )  )  offset 0 limit 1;                                                                            query plan -----------------------------------------------------------------------------------------------------------------------------------------------------------------  limit  (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.163..10864.163 rows=1 loops=1)    ->  aggregate  (cost=22555.71..22555.72 rows=1 width=0) (actual time=10864.160..10864.160 rows=1 loops=1)          ->  seq scan on item  (cost=0.00..22490.45 rows=26102 width=0) (actual time=33.574..10861.672 rows=4085 loops=1)                filter: (xpath_exists('/messageinfo[finalrecipient="abigbank"]//text()'::text, content, '{}'::text[]) , ((cluster)::text = 'b2box'::text))                rows removed filter: 108945  total runtime: 10864.242 ms (6 rows)

plan on explain.depesz.com

planner cost parameters

cost of first plan (seqscan off) higher processing time faster

this tells me random_page_cost , seq_page_cost wrong. you're on storage fast random i/o - either because of database cached in ram or because you're using ssd, san cache, or other storage random i/o inherently fast.

try:

set random_page_cost = 1; set seq_page_cost = 1.1;

to reduce cost param differences , re-run. if job consider changing params in postgresql.conf..

your row-count estimates reasonable, doesn't planner mis-estimation problem or problem bad table statistics.

incorrect query

your query incorrect. offset 0 limit 1 without order by produce unpredictable results unless you're guaranteed have 1 match, in case offset ... limit ... clauses unnecessary , can removed entirely.

you're better off phrasing such queries select max(...) or select min(...) possible; postgresql tend able use index pluck off desired value without doing expensive table scan or index scan , sort.

tips

btw, future questions postgresql wiki has information in performance category , guide asking slow query questions.

Search This Blog

Babette