c++ - OpenMP share file handler -
i've got loop, parallelize using openmp. in loop, read triangle file, , perform operations on data. these operations independent each triangle another, thought easy parallelize, long kept actual reading of files in critical section.
- order in triangles read not important
- some triangles read , discarded pretty quickly, need more algorithmic work (bbox construction, ...)
- i'm doing binary i/o
- using c++ ifstream *tri_data*
- i'm testing on ssd
readtriangle calls file.read() , reads 12 floats ifstream.
#pragma omp parallel shared (tri_data) for(int = 0; < ntriangles ; i++) { vec3 v0,v1,v2,normal; #pragma omp critical { readtriangle(tri_data,v0,v1,v2,normal); } (working triangle here) } now, behaviour i'm observing openmp enabled, whole process slower. i've added timers code track time spent in i/o method, , time spent in loop itself.
without openmp:
total io in time : 41.836 s. total algorithm time : 15.495 s. with openmp:
total io in time : 48.959 s. total algorithm time : 44.61 s. my guess is, since reading in critical section, threads waiting eachother finish using file handler, resulting in longer waiting time.
any pointers on how resolve this? program benefit possibility process read triangles multiple processes. i've tried toying thread scheduling , related stuff, doesn't seem lot in instance.
since i'm working on out-of-core algorithm, introducing buffer hold multitude of triangles not option.
so, solution propose based on master/slave strategy, where:
- the master (thread 0) performs i/o
- the slaves work on retrieved data
the pseudo-code read following:
#include<omp.h> vector<vec3> v0; vector<vec3> v1; vector<vec3> v2; vector<vec3> normal; vector<int> tdone; int nthreads; int triangles_read = 0; /* ... */ #pragma omp parallel shared(tri_data) { int id = omp_get_thread_num(); /* * initialize buffers in master thread. * notice size in memory similar example. */ #pragma omp single { nthreads = omp_get_num_threads(); v0.resize(nthreads); v1.resize(nthreads); v2.resize(nthreads); normal.resize(nthreads); tdone.resize(nthreads,1); } if ( id == 0 ) { // producer thread int next = 1; while( triangles_read != ntriangles ) { if ( tdone[next] ) { // if next thread free readtriangle(tri_data,v0[next],v1[next],v2[next],normal[next]); // read data , fill correct buffer triangles_read++; tdone[next] = 0; // set flag thread next start working #pragma omp flush (tdone[next],triangles_read) // flush } next = next%(nthreads - 1) + 1; // set next } // while } else { // consumer threads while( true ) { // wait work if( tdone[id] == 0) { /* ... work here on v0[id], v1[id], v2[id], normal[id] ... */ tdone[id] == 1; #pragma omp flush (tdone[id]) // flush } if( tdone[id] == 1 && triangles_read == ntriangles) break; // work finished } } #pragma omp barrier } i not sure if still valuable nice teaser anyhow!
Comments
Post a Comment