sql - PostgreSQL partition query by date optimization -


we have table has approximately 1 billion records per month. considering 18 months of history talking 18 billion records.

this table partitioned weekly date (so have around 74 partitions).

for 1 of our queries need last 1000 records of 1 given unit. this

  select code, obs_time     unit_position     unit_id = 1 order obs_time desc limit 1000; 

the problem have following result in explain:

limit (cost=96181.06..96181.09 rows=10 width=12)

-> sort (cost=96181.06..102157.96 rows=2390760 width=12)

sort key: unit_position .obs_time   ->  result  (cost=0.00..44517.60 rows=2390760 width=12)      ->  append  (cost=0.00..44517.60 rows=2390760 width=12)       ->  seq scan on unit_position (cost=0.00..42336.00 rows=2273600 width=12)       ->  seq scan on unit_position_week350 unit_position (cost=0.00..21.60 rows=1160 width=12)       ->  ... (all other partitions) ...       ->  seq scan on unit_position_week450 unit_position   (cost=0.00..21.60 rows=1160 width=12) 

in other hand if query (limiting query first interval can 1000 records) can >2x faster results:

  select fake, obs_time     unit_position     unit_id = 1      , obs_time >= now() - '7 weeks'::interval order obs_time desc limit 1000; 

the question is, considering ordering obs_time, there way make query use partitions , search first n partitions needed?

in cases results in recent 4 partitions (so search 4 partitions) , in few have search partitions.

if after getting n partitions (by order) finds 1000 results won't consider rest of partitions (billions of records discarded). tests/explain show postgresql not doing this. going partitions (if doesn't state limiting query constrains of partitions. there way force this? (for example in oracle 1 can give suggestion db engine on how perform queries, though don't know if partitions)

the overhead of doing each on of partitions manually (giving intervals) worst results (and doing these working without partitions, better have different tables).

any other suggestions?

this function dynamically query 1 week @ time limit taking advantage of partitioning. sql fiddle

create or replace function unit_position_limited_by(l integer) returns setof unit_position language plpgsql $function$  declare     week timestamp := date_trunc('week', transaction_timestamp());     total integer := 0;     inserted integer;     not_exists boolean; begin     loop         return query execute $$             select *             unit_position                             unit_id = 1                 , obs_time >= $1 , obs_time < $2             order obs_time desc             limit $3         $$ using week, week + interval '1 week', l - total;         diagnostics inserted := row_count;         total := total + inserted;         exit when total = l;          if inserted = 0             execute $$                 select not exists (                     select 1                     unit_position                     obs_time < $1                     )             $$ not_exists using week;             exit when not_exists;         end if;         week := week - interval '1 week';     end loop; end; $function$; 

to select it:

select * unit_position_limited_by(1000); 

Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -