What is distributed copy (distcp)? | Hadoop admin questions

Distcp is a Hadoop utility for launching MapReduce jobs to copy data. The primary usage is for copying a large amount of data
One of the major challenges in the Hadoop enviroment is copying data across multiple clusters and distcp will allow multiple datanodes to be leveraged for parallel copying of the data.

1 comment:

  1. It is nice blog Thank you provide important information and i am searching for same information to save my timeHadoop Administration Online Course

    ReplyDelete