hadoop - HBase MapReduce split scan for different mappers -
i'm struggling distributing hbase rows in proper way several map tasks. aim split scan via row key , distribute set of rows each map job.
as far able define scan mappers 1 row @ time. not want - need map-input set-wise.
so there possibility split-up hbase table resp. scan n sets of rows, input n mappers?
i not looking solution start mapreduce job writing n files , mapreduce job reading them again text input getting these sets.
thanks in advance!
mappers 1 row @ time - that's way map-reduce work if want relate multiple rows on map side can either (e.g using static variables etc.) or write logic combiner map-side "reduce" step.
note you'd still need reducer handle edge cases related keys handles different mappers - since in hbase keys ordered on disk you'd @ end/begining of split. can reduce risk of happening pre-splitting
Comments
Post a Comment