Since the data is replicated thrice in HDFS, does it mean that any calculation done on one node will also be replicated on the other two? | Hadoop Questions

Since there are 3 nodes, when we send the MapReduce programs, calculations will be done only on the original data. The master node will know which node exactly has that particular data. In case, if one of the nodes is not responding, it is assumed to be failed. Only then, the required calculation will be done on the second replica.

3 comments:

  1. I think answer is wrong. Correct me if I am wrong.

    ReplyDelete
  2. yes it is wrong it should replicate on all three not matters u r doing calculation on one or other.

    ReplyDelete
  3. @nitish :- it is not like that replication is for fault tolerance. The most important here to understand is HDFS file system follow WORM(Write Once Read Many) rule, which means once the file is written in hdfs it cannot be edited. if you are processing a file in HDFS you will never change the original data, you will just extract the meaning of data.

    ReplyDelete