Hortonworks Hortonworks-Certified-Apache-Hadoop-2.0- - Developer Hadoop 2.0 Certification exam for Pig and Hive Developer Exam
Page: 1 / 22
Total 108 questions
Question #1 (Topic: )
Review the following data and Pig code.
M,38,95111
F,29,95060
F,45,95192
M,62,95102
F,56,95102
A = LOAD 'data' USING PigStorage('.') as (gender:Chararray,
age:int, zlp:chararray);
B = FOREACH A GENERATE age;
Which one of the following commands would save the results of B to a folder in hdfs named
myoutput?
M,38,95111
F,29,95060
F,45,95192
M,62,95102
F,56,95102
A = LOAD 'data' USING PigStorage('.') as (gender:Chararray,
age:int, zlp:chararray);
B = FOREACH A GENERATE age;
Which one of the following commands would save the results of B to a folder in hdfs named
myoutput?
A. STORE A INTO 'myoutput' USING PigStorage(',');
B. DUMP B using PigStorage('myoutput');
C. STORE B INTO 'myoutput';
D. DUMP B INTO 'myoutput';
Answer: C
Question #2 (Topic: )
You need to create a job that does frequency analysis on input data. You will do this by
writing a Mapper that uses TextInputFormat and splits each value (a line of text from an
input file) into individual characters. For each one of these characters, you will emit the
character as a key and an InputWritable as the value. As this will produce proportionally
more intermediate data than input data, which two resources should you expect to be
bottlenecks?
writing a Mapper that uses TextInputFormat and splits each value (a line of text from an
input file) into individual characters. For each one of these characters, you will emit the
character as a key and an InputWritable as the value. As this will produce proportionally
more intermediate data than input data, which two resources should you expect to be
bottlenecks?
A. Processor and network I/O
B. Disk I/O and network I/O
C. Processor and RAM
D. Processor and disk I/O
Answer: B
Question #3 (Topic: )
MapReduce v2 (MRv2/YARN) is designed to address which two issues?
A. Single point of failure in the NameNode.
B. Resource pressure on the JobTracker.
C. HDFS latency.
D. Ability to run frameworks other than MapReduce, such as MPI.
E. Reduce complexity of the MapReduce APIs.
F. Standardize on a single MapReduce API.
Answer: A,B
Question #4 (Topic: )
What are the TWO main components of the YARN ResourceManager process? Choose 2
answers
answers
A. Job Tracker
B. Task Tracker
C. Scheduler
D. Applications Manager
Answer: C,D
Question #5 (Topic: )
Given a directory of files with the following structure: line number, tab character, string:
Example:
1abialkjfjkaoasdfjksdlkjhqweroij
2kadfjhuwqounahagtnbvaswslmnbfgy
3kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you
use to complete the line: conf.setInputFormat (____.class) ; ?
Example:
1abialkjfjkaoasdfjksdlkjhqweroij
2kadfjhuwqounahagtnbvaswslmnbfgy
3kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you
use to complete the line: conf.setInputFormat (____.class) ; ?
A. SequenceFileAsTextInputFormat
B. SequenceFileInputFormat
C. KeyValueFileInputFormat
D. BDBInputFormat
Answer: C