hadoop - HBase MapReduce split scan for different mappers -


i'm struggling distributing hbase rows in proper way several map tasks. aim split scan via row key , distribute set of rows each map job.

as far able define scan mappers 1 row @ time. not want - need map-input set-wise.

so there possibility split-up hbase table resp. scan n sets of rows, input n mappers?

i not looking solution start mapreduce job writing n files , mapreduce job reading them again text input getting these sets.

thanks in advance!

mappers 1 row @ time - that's way map-reduce work if want relate multiple rows on map side can either (e.g using static variables etc.) or write logic combiner map-side "reduce" step.

note you'd still need reducer handle edge cases related keys handles different mappers - since in hbase keys ordered on disk you'd @ end/begining of split. can reduce risk of happening pre-splitting


Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -