Cloudera CCA175 - CCA Spark and Hadoop Developer Exam Exam
Page: 2 / 20
Total 96 questions
Question #6 (Topic: )
Problem Scenario 65 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "cat", "owl", "gnu", "ant"), 2)
val b = sc.parallelize(1 to a.count.tolnt, 2)
val c = a.zip(b)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(String, Int)] = Array((owl,3), (gnu,4), (dog,1), (cat,2>, (ant,5))
val a = sc.parallelize(List("dog", "cat", "owl", "gnu", "ant"), 2)
val b = sc.parallelize(1 to a.count.tolnt, 2)
val c = a.zip(b)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(String, Int)] = Array((owl,3), (gnu,4), (dog,1), (cat,2>, (ant,5))
Answer: See the explanation for Step by Step Solution and configuration.
Question #7 (Topic: )
Problem Scenario 1:
You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.categories
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following activities.
1. Connect MySQL DB and check the content of the tables.
2. Copy "retaildb.categories" table to hdfs, without specifying directory name.
3. Copy "retaildb.categories" table to hdfs, in a directory name "categories_target".
4. Copy "retaildb.categories" table to hdfs, in a warehouse directory name
"categories_warehouse".
You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.categories
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following activities.
1. Connect MySQL DB and check the content of the tables.
2. Copy "retaildb.categories" table to hdfs, without specifying directory name.
3. Copy "retaildb.categories" table to hdfs, in a directory name "categories_target".
4. Copy "retaildb.categories" table to hdfs, in a warehouse directory name
"categories_warehouse".
Answer: See the explanation for Step by Step Solution and configuration.
Question #8 (Topic: )
Problem Scenario 63 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, String}] = Array((4,lion), (3,dogcat), (7,panther), (5,tigereagle))
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, String}] = Array((4,lion), (3,dogcat), (7,panther), (5,tigereagle))
Answer: See the explanation for Step by Step Solution and configuration.
Question #9 (Topic: )
Problem Scenario 89 : You have been given below patient data in csv format,
patientID,name,dateOfBirth,lastVisitDate
1001,Ah Teck,1991-12-31,2012-01-20
1002,Kumar,2011-10-29,2012-09-20
1003,Ali,2011-01-30,2012-10-21
Accomplish following activities.
1. Find all the patients whose lastVisitDate between current time and '2012-09-15'
2. Find all the patients who born in 2011
3. Find all the patients age
4. List patients whose last visited more than 60 days ago
5. Select patients 18 years old or younger
patientID,name,dateOfBirth,lastVisitDate
1001,Ah Teck,1991-12-31,2012-01-20
1002,Kumar,2011-10-29,2012-09-20
1003,Ali,2011-01-30,2012-10-21
Accomplish following activities.
1. Find all the patients whose lastVisitDate between current time and '2012-09-15'
2. Find all the patients who born in 2011
3. Find all the patients age
4. List patients whose last visited more than 60 days ago
5. Select patients 18 years old or younger
Answer: See the explanation for Step by Step Solution and configuration.
Question #10 (Topic: )
Problem Scenario 48 : You have been given below Python code snippet, with intermediate
output.
We want to take a list of records about people and then we want to sum up their ages and
count them.
So for this example the type in the RDD will be a Dictionary in the format of {name: NAME,
age:AGE, gender:GENDER}.
The result type will be a tuple that looks like so (Sum of Ages, Count)
people = []
people.append({'name':'Amit', 'age':45,'gender':'M'})
people.append({'name':'Ganga', 'age':43,'gender':'F'})
people.append({'name':'John', 'age':28,'gender':'M'})
people.append({'name':'Lolita', 'age':33,'gender':'F'})
people.append({'name':'Dont Know', 'age':18,'gender':'T'})
peopleRdd=sc.parallelize(people) //Create an RDD
peopleRdd.aggregate((0,0), seqOp, combOp) //Output of above line : 167, 5)
Now define two operation seqOp and combOp , such that
seqOp : Sum the age of all people as well count them, in each partition. combOp :
Combine results from all partitions.
output.
We want to take a list of records about people and then we want to sum up their ages and
count them.
So for this example the type in the RDD will be a Dictionary in the format of {name: NAME,
age:AGE, gender:GENDER}.
The result type will be a tuple that looks like so (Sum of Ages, Count)
people = []
people.append({'name':'Amit', 'age':45,'gender':'M'})
people.append({'name':'Ganga', 'age':43,'gender':'F'})
people.append({'name':'John', 'age':28,'gender':'M'})
people.append({'name':'Lolita', 'age':33,'gender':'F'})
people.append({'name':'Dont Know', 'age':18,'gender':'T'})
peopleRdd=sc.parallelize(people) //Create an RDD
peopleRdd.aggregate((0,0), seqOp, combOp) //Output of above line : 167, 5)
Now define two operation seqOp and combOp , such that
seqOp : Sum the age of all people as well count them, in each partition. combOp :
Combine results from all partitions.
Answer: See the explanation for Step by Step Solution and configuration.