70 TOP Hadoop Multiple Choice Questions and Answers for freshers and experienced pdf

1. What is a SequenceFile?
2. Is there a map input format?
3. In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
4. Which of the following best describes the workings of TextInputFormat?
5. Which of the following statements most accurately describes the relationship between MapReduce and Pig?
6. You need to import a portion of a relational database every day as files to HDFS, and generate Java classes to Interact with your imported data. Which of the following tools should you use to accomplish this?
7. You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?
8. Workflows expressed in Oozie can contain:
9. You need a distributed, scalable, data Store that allows you random, realtime read/write access to hundreds of terabytes of data. Which of the following would you use?
10. Which of the following utilities allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
11. You are running a Hadoop cluster with all monitoring facilities properly configured. Which scenario will go undetected.?
12. Which of the following scenarios makes HDFS unavailable?
13. Which MapReduce stage serves as a barrier, where all previous stages must be completed before it may proceed?
14. Which of the following statements most accurately describes the general approach to error recovery when using MapReduce?
15. The Combine stage, if present, must perform the same aggregation operation as Reduce.
16. What is the implementation language of the Hadoop MapReduce framework?
17. Which of the following MapReduce execution frameworks focus on execution in sharedmemory environments?
18. How can a distributed filesystem such as HDFS provide opportunities for optimization of a MapReduce operation?
19. What is the input to the Reduce function?
20. Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution?
21. The size of block in HDFS is
22. The switch given to “hadoop fs” command for detailed help is
23. RPC means
24. Which method of the FileSystem object is used for reading a file in HDFS
25. How many states does Writable interface defines
26. What are supported programming languages for Map Reduce?
28. What are sequence files and why are they important?
29. What are map files and why are they important?
30. How can you use binary data in MapReduce?
31. What is map - side join?
32. What is reduce - side join?
34. What is PIG?
35. How can you disable the reduce step?
36. Why would a developer create a map-reduce without the reduce step?
37. What is the default input format?
38. How can you overwrite the default input format?
39. What are the common problems with map-side join?
40. Which is faster: Map-side join or Reduce-side join? Why?
41. Will settings using Java API overwrite values in configuration files?
42. What is AVRO?
43. Can you run Map - Reduce jobs directly on Avro data?
44. What is distributed cache?
45. What is the best performance one can expect from a Hadoop cluster?
46. What is writable?
47. The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes. What are these writable data types optimized for?
48. Can a custom type for data Map-Reduce processing be implemented?
49. What happens if mapper output does not match reducer input?
50. Can you provide multiple input paths to a map-reduce jobs?
51. In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
52. Which process describes the lifecycle of a Mapper?
53. Determine which best describes when the reduce method is first called in a MapReduce job?
54. You have written a Mapper which invokes the following five calls to the OutputColletor.collect method:
55. To process input key-value pairs, your mapper needs to lead a 512 MB data file in memory.
What is the best way to accomplish this?
56. In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values?
57.  You need to create a job that does frequency analysis on input data. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?
58. You want to count the number of occurrences for each unique word in the supplied input data.
You’ve decided to implement this by having your mapper tokenize each word and emit a literal value 1, and then have your reducer increment a counter for each literal 1 it receives. After successful implementing this, it occurs to you that you could optimize this by specifying a combiner. Will you be able to reuse your existing Reduces as your combiner in this case and why or why not?
59. Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation.
60. Which project gives you a distributed, Scalable, data store that allows you random, realtime read/write access to hundreds of terabytes of data?
61. What is a SequenceFile?
62. Given a directory of files with the following structure: line number, tab character, string:
63. In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
64. Which of the following best describes the workings of TextInputFormat?
65. Which of the following statements most accurately describes the relationship between MapReduce and Pig?
66. You need to import a portion of a relational database every day as files to HDFS, and generate Java classes to Interact with your imported data. Which of the following tools should you use to accomplish this?
67. You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?
68. Workflows expressed in Oozie can contain:
69. You need a distributed, scalable, data Store that allows you random, realtime read/write access to hundreds of terabytes of data. Which of the following would you use?
70. Which of the following utilities allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?

3 comments: