Big data hadoop practice test2

New Document

Test your caliber

1. In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
Increase the parameter that controls minimum split size in the job configuration. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
Set the number of mappers equal to the number of input files you want to process. Write a custom FileInputFormat and override the method isSplitable to always return false.

Free Hadoop Quiz Online Practice Test1

New Document

Test your caliber

1. What is a SequenceFile?
ASequenceFilecontains a binaryencoding ofan arbitrary numberof homogeneous writable objects. ASequenceFilecontains a binary encoding of an arbitrary number of heterogeneous writable objects.
ASequenceFilecontains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order. ASequenceFilecontains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be sametype.

Hadoop Online Practice Quiz Tests

Hadoop Online Quiz Test1 New Document

Test your caliber

1. What is a SequenceFile?
ASequenceFilecontains a binaryencoding ofan arbitrary numberof homogeneous writable objects. ASequenceFilecontains a binary encoding of an arbitrary number of heterogeneous writable objects.
ASequenceFilecontains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order. ASequenceFilecontains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be sametype.

Which of the following utilities allows you to create and run Map Reduce jobs with any executable or script as the mapper and/or the reducer?

A. Oozie
B. Sqoop
C. Flume
D. Hadoop Str

Answer: D

You need a distributed, scalable, data Store that allows you random, realtime read/write access to hundreds of terabytes of data. Which of the following would you use?

A. Hue
B. Pig
C. Hive
D. Oozie
E. HBase
F. Flume
G. Sqoop

Answer: E

Workflows expressed in Oozie can contain:

A. Iterative repetition of MapReduce jobs until a desired answer or state is reached.
B. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.
C. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.
D. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.

Answer: D

You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?

A. Pig
B. Hue
C. Hive
D. Sqoop
E. Oozie
F. Flume
G. Hadoop Streaming

Answer: C