mapreduce - How job client in hadoop compute inputSplits -
i trying insight of map reduce architecture. consulting http://answers.oreilly.com/topic/2141-how-mapreduce-works-with-hadoop/ article. have questions regarding component jobclient of mapreduce framework. questions is:
how jobclient computes input splits on data?
according stuff consulting , job client computes input splits on data located in input path on hdfs specified while running job. article says job client copies resources(jars , compued input splits) hdfs. here question, when input data in hdfs, why jobclient copies computed inputsplits hdfs.
lets assume job client copies input splits hdfs, when job submitted job tracker , job tracker intailize job why retrieves input splits hdfs?
apologies if question not clear. beginner. :)
no jobclient not copy input splits hdfs. have quoted answer yourself:
job client computes input splits on data located in input path on hdfs specified while running job. article says job client copies resources(jars , computed input splits) hdfs.
the input relies on cluster. client computes on meta information got namenode (block size, data length, block locations). these computed
input splits carry meta information tasks, e.g. of block offset , length compute on.
have @ org.apache.hadoop.mapreduce.lib.input.filesplit
, contains file path start offset , length of chunk single task operate on input. serializable class may want have @ is: org.apache.hadoop.mapreduce.split.jobsplit.splitmetainfo
.
this meta information computed each task run, , copied jars node execute task.
Comments
Post a Comment