Why do we use HDFS for applications having large data sets and not when there are lot of small files? | Hadoop Questions | Hadoop (Big Data) Interview Questions and Answers

Home » Unlabelled » Why do we use HDFS for applications having large data sets and not when there are lot of small files? | Hadoop Questions

Why do we use HDFS for applications having large data sets and not when there are lot of small files? | Hadoop Questions

HDFS is more suitable for large amount of data sets in a single file as compared to small amount of data spread across multiple files. This is because Namenode is a very expensive high performance system, so it is not prudent to occupy the space in the Namenode by unnecessary amount of metadata that is generated for multiple small files. So, when there is a large amount of data in a single file, name node will occupy less space. Hence for getting optimized performance, HDFS supports large data sets instead of multiple small files.

1 comment:

kalyan hadoop26 March 2015 at 04:19
You want big data interview questions and answers follow this link.
http://kalyanhadooptraining.blogspot.in/search/label/Big%20Data%20Interview%20Questions%20and%20Answers
ReplyDelete
Replies

Add comment

Pages

Why do we use HDFS for applications having large data sets and not when there are lot of small files? | Hadoop Questions

1 comment: