Advanced Analytics Specialist Exam for Data Scientists v7.0 (E20-065)

Page:    1 / 5   
Total 72 questions

You are analyzing written transcripts of focus groups conducted on product X. You approach is to use TF-IDF for your analysis.
What combination of TF-IDF scores should you examine to ensure you only report on the most important terms?

  • A. High TF score and high DF score
  • B. High TF score and high IDF score
  • C. High TF score and low IDF score
  • D. Low TF score and low DF score


Answer : C

What is a typical use of a UDF in Pig?

  • A. Creating functionality outside of what is provided by the built-in functions
  • B. Providing Functional access to user-defined data in HDFS
  • C. Providing advanced analytics to Hadoop
  • D. Providing an interface from Pig to Microsoft Excel for easier data manipulation


Answer : A

Which scenario is a proper use case for multinomial logistic regression?

  • A. A marketing firm wants to estimate the personal income of a group of potential customers. Using inputs such as age, education, marital status, and credit card expenditures, a data scientist is building a model that will estimate a person's income
  • B. A logistic distribution company wants to minimize the distance traveled by its delivery trucks. A data scientist is building a model to determine the optimal route for each of tis trucks
  • C. To improve the initial routing of a loan application, a financial institution plans to classify a loan application as Approve, Reject, or Possibly_Approve. Based on the company's historical loan application data, a data scientist is building a model to assign one of these three outcomes to each submitted application.
  • D. A manufacturer plans to determine the optimal number of workers to employ in an assembly line process. Utilizing the observed distributions of the task durations of each process step, a data scientist is building a model to mimic the interactions and dependencies between each stage in the manufacturing process.


Answer : C

What is an intended application of the MapReduce framework?

  • A. Processing can be broken into smaller pieces
  • B. Processing a large number of small files
  • C. Processing in real time is required
  • D. Processing a small subset of data


Answer : A

Which HDFS feature protects against user errors causing accidental loss of data?

  • A. Encryption
  • B. Replication
  • C. Namenode federation
  • D. Snapshots


Answer : B

A simul-ation to compare two different sales models yields different results for the same set of input variables in different runs.
What is the likely cause?

  • A. bit operating system was used
  • B. The same number of trials was used.
  • C. A linear congruenlial generator (LCG) was used for pseudo-random number generation.
  • D. Different seeds forthe random number generator were used


Answer : C

What is a characteristic of stemming?

  • A. Reduces words of variant forms to their base forms based on a set of heuristics
  • B. Can be performed by calling the stemming!) function on a lemma in NLTK
  • C. Can be performed by calling the stemming() function on a synset in NLTK
  • D. Reduces words of variant forms to their base forms based on a dictionary


Answer : A

Which metric would be most helpful in identifying a node that may cause network disruption if the node were removed?

  • A. Degree
  • B. Closeness
  • C. Betweenness
  • D. PageRank


Answer : A

How is the relative value of a node visualized in a sunburst?

  • A. Color
  • B. Area
  • C. Gradient
  • D. Position


Answer : A

What describes how nodes in a social network are similar to each other in characteristics?

  • A. Community clustering
  • B. Modularity
  • C. Homophily
  • D. Strongly tied network


Answer : C

What are the major components of the YARN architecture?

  • A. ResourceManager and NodeManager
  • B. Task Tracker and NameNode
  • C. HDFS, Tez, and Spark
  • D. Avro, ZooKeeper, and HDFS


Answer : A

In a connected, undirected graph of 5 nodes with 10 edges, how many more edges need to be added to make the clustering coefficient of every node equal 1 ?

  • A. 0
  • B. 5
  • C. 10
  • D. 15


Answer : A

A data engineer is asked to process several large datasets using MapReduce. Upon initial inspection the engineer realizes that there are complex interdependencies between the datasets.
Why is this a problem?

  • A. MapReduce works best on unstructured data
  • B. There is no problem; MapReduce accommodates all the data
  • C. MapReduce can only parse one file at a time.
  • D. MapReduce is not ideal when the processing of one dataset depends on another.


Answer : D

Consider the two sentences below.
-> I mailed my credit card application to the bank
-> We walked along the river bank until we came to a waterwheel
What type of NLP ambiguity might occur when interpreting the word "bank"?

  • A. Discourse
  • B. Syntactic
  • C. Semantic
  • D. Acoustic


Answer : C

Which is NOT a tenet of the Apache Pig Philosophy?

  • A. It must be easily commanded
  • B. Any type of data can be processed
  • C. Hadoop is required
  • D. Data should be processed quickly


Answer : D

Page:    1 / 5   
Total 72 questions