Developer Hadoop 2.0 Certification exam for Pig and Hive Developer v7.0 (Hortonworks-Certified-Apache-Hadoop-2.0-)

Page:    1 / 8   
Total 116 questions

Which two of the following are true about this trivial Pig program' (choose Two)

  • A. The contents of myfile appear on stdout
  • B. Pig assumes the contents of myfile are comma delimited
  • C. ABC has a schema associated with it
  • D. myfile is read from the user's home directory in HDFS

Answer : A,D

Which best describes what the map method accepts and emits?

  • A. It accepts a single key-value pair as input and emits a single key and list of corresponding values as output.
  • B. It accepts a single key-value pairs as input and can emit only one key-value pair as output.
  • C. It accepts a list key-value pairs as input and can emit only one key-value pair as output.
  • D. It accepts a single key-value pairs as input and can emit any number of key-value pair as output, including zero.

Answer : D

Explanation: public class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> extends Object
Maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks which transform input records into a intermediate records.
The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs.
Reference: org.apache.hadoop.mapreduce

Which one of the following Hive commands uses an HCatalog table named x?

  • A. SELECT * FROM x;
  • B. SELECT x.-FROM org.apache.hcatalog.hive.HCatLoader('x');
  • C. SELECT * FROM org.apache.hcatalog.hive.HCatLoader('x');
  • D. Hive commands cannot reference an HCatalog table

Answer : C

Given the following Hive command:
Which one of the following statements is true?

  • A. The contents of myothertable are appended to mytable
  • B. Any existing data in mytable will be overwritten
  • C. A new table named mytable is created, and the contents of myothertable are copied into mytable
  • D. The statement is not a valid Hive command

Answer : B

Assuming default settings, which best describes the order of data provided to a reducers reduce method:

  • A. The keys given to a reducer arent in a predictable order, but the values associated with those keys always are.
  • B. Both the keys and values passed to a reducer always appear in sorted order.
  • C. Neither keys nor values are in any predictable order.
  • D. The keys given to a reducer are in sorted order but the values associated with each key are in no predictable order

Answer : D

Explanation: Reducer has 3 primary phases:
1. Shuffle
The Reducer copies the sorted output from each Mapper using HTTP across the network.
2. Sort
The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key).
The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.

SecondarySort -
To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.
3. Reduce
In this phase the reduce(Object, Iterable, Context) method is called for each <key,
(collection of values)> in the sorted inputs.
The output of the reduce task is typically written to a RecordWriter via
TaskInputOutputContext.write(Object, Object).
The output of the Reducer is not re-sorted.
Reference: org.apache.hadoop.mapreduce, Class

Table metadata in Hive is:

  • A. Stored as metadata on the NameNode.
  • B. Stored along with the data in HDFS.
  • C. Stored in the Metastore.
  • D. Stored in ZooKeeper.

Answer : C

Explanation: By default, hive use an embedded Derby database to store metadata information. The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
The Metastore is an application that runs on an RDBMS and uses an open source ORM layer called DataNucleus, to convert object representations into a relational schema and vice versa. They chose this approach as opposed to storing this information in hdfs as they need the Metastore to be very low latency. The DataNucleus layer allows them to plugin many different RDBMS technologies.
* By default, Hive stores metadata in an embedded Apache Derby database, and other client/server databases like MySQL can optionally be used.
* features of Hive include:
Metadata storage in an RDBMS, significantly reducing the time to perform semantic checks during query execution.
Reference: Store Hive Metadata into RDBMS

You need to run the same job many times with minor variations. Rather than hardcoding all job configuration options in your drive code, youve decided to have your Driver subclass org.apache.hadoop.conf.Configured and implement the org.apache.hadoop.util.Tool interface.
Indentify which invocation correctly with a value of Example to

  • A. hadoop “” MyDriver input output
  • B. hadoop MyDriver input output
  • C. hadoop MyDrive –D input output
  • D. hadoop setproperty MyDriver input output
  • E. hadoop setproperty (“”) MyDriver input output

Answer : C

Explanation: Configure the property using the -D key=value notation:
-D'My Job'
You can list a whole bunch of options by calling the streaming jar with just the -info argument
Reference: Python hadoop streaming : Setting a job name

In Hadoop 2.0, which one of the following statements is true about a standby NameNode?
The Standby NameNode:

  • A. Communicates directly with the active NameNode to maintain the state of the active NameNode.
  • B. Receives the same block reports as the active NameNode.
  • C. Runs on the same machine and shares the memory of the active NameNode.
  • D. Processes all client requests and block reports from the appropriate DataNodes.

Answer : B

Which one of the following statements describes the relationship between the
ResourceManager and the ApplicationMaster?

  • A. The ApplicationMaster requests resources from the ResourceManager
  • B. The ApplicationMaster starts a single instance of the ResourceManager
  • C. The ResourceManager monitors and restarts any failed Containers of the ApplicationMaster
  • D. The ApplicationMaster starts an instance of the ResourceManager within each Container

Answer : A

Review the following 'data' file and Pig code.

Which one of the following statements is true?

  • A. The Output Of the DUMP D command IS (M,{(M,62.95102),(M,38,95111)})
  • B. The output of the dump d command is (M, {(38,95in),(62,95i02)})
  • C. The code executes successfully but there is not output because the D relation is empty
  • D. The code does not execute successfully because D is not a valid relation

Answer : A

What does Pig provide to the overall Hadoop solution?

  • A. Legacy language Integration with MapReduce framework
  • B. Simple scripting language for writing MapReduce programs
  • C. Database table and storage management services
  • D. C++ interface to MapReduce and data warehouse infrastructure

Answer : B

In the reducer, the MapReduce API provides you with an iterator over Writable values.
What does calling the next () method return?

  • A. It returns a reference to a different Writable object time.
  • B. It returns a reference to a Writable object from an object pool.
  • C. It returns a reference to the same Writable object each time, but populated with different data.
  • D. It returns a reference to a Writable object. The API leaves unspecified whether this is a reused object or a new object.
  • E. It returns a reference to the same Writable object if the next value is the same as the previous value, or a new Writable object otherwise.

Answer : C

Explanation: Calling will always return the SAME EXACT instance of
IntWritable, with the contents of that instance replaced with the next value.
Reference: manupulating iterator in mapreduce

Which one of the following statements is true about a Hive-managed table?

  • A. Records can only be added to the table using the Hive INSERT command.
  • B. When the table is dropped, the underlying folder in HDFS is deleted.
  • C. Hive dynamically defines the schema of the table based on the FROM clause of a SELECT query.
  • D. Hive dynamically defines the schema of the table based on the format of the underlying data.

Answer : B

Which Two of the following statements are true about hdfs? Choose 2 answers

  • A. An HDFS file that is larger than dfs.block.size is split into blocks
  • B. Blocks are replicated to multiple datanodes
  • C. HDFS works best when storing a large number of relatively small files
  • D. Block sizes for all files must be the same size

Answer : A,B

Given the following Hive commands:

Which one of the following statements Is true?

  • A. The file mydata.txt is copied to a subfolder of /apps/hive/warehouse
  • B. The file mydata.txt is moved to a subfolder of /apps/hive/warehouse
  • C. The file mydata.txt is copied into Hive's underlying relational database 0.
  • D. The file mydata.txt does not move from Its current location in HDFS

Answer : A

Page:    1 / 8   
Total 116 questions