hadoop - Pig map reduce job to place values within proper range -

i have list of values 1 data source , second dataset contains ranges tied value.

dataset 1: 3 4 6 20 25 38  dataset 2: 1|3|a 4|10|b 11|20|c 21|30|d 31|31|e 32|38|f 39|40|g  result: 3,a 4,b 6,b 20,c 25,d 38,f

i'd create type of "join" tie value in dataset 1 character in dataset 2.

if either of donald miner's suggestions work fast enough i'd those, make faster, if dataset 2 has 250k-500k entries should able fit entire thing memory. therefore could: write udf stores dataset 2 memory (see getcachefiles how store hdfs file distributedcache. write evalfunc takes single item of dataset a, binary searches it's location in dataset 2, , returns answer want.

answer = foreach dataset1 generate mybinarysearchudf(number)     myresult:tuple(originalnumber:int, dataset2id:chararray);

Search This Blog

Babette

hadoop - Pig map reduce job to place values within proper range -

Comments

Post a Comment

Popular posts from this blog

node.js - Bad Request - node js ajax post -

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -