You are analyzing written transcripts of focus groups conducted on product X. You approach is to use TF-IDF for your analysis.
What combination of TF-IDF scores should you examine to ensure you only report on the most important terms?
Answer : C
What is a typical use of a UDF in Pig?
Answer : A
Which scenario is a proper use case for multinomial logistic regression?
Answer : C
What is an intended application of the MapReduce framework?
Answer : A
Which HDFS feature protects against user errors causing accidental loss of data?
Answer : B
A simul-ation to compare two different sales models yields different results for the same set of input variables in different runs.
What is the likely cause?
Answer : C
What is a characteristic of stemming?
Answer : A
Which metric would be most helpful in identifying a node that may cause network disruption if the node were removed?
Answer : A
How is the relative value of a node visualized in a sunburst?
Answer : A
What describes how nodes in a social network are similar to each other in characteristics?
Answer : C
What are the major components of the YARN architecture?
Answer : A
In a connected, undirected graph of 5 nodes with 10 edges, how many more edges need to be added to make the clustering coefficient of every node equal 1 ?
Answer : A
A data engineer is asked to process several large datasets using MapReduce. Upon initial inspection the engineer realizes that there are complex interdependencies between the datasets.
Why is this a problem?
Answer : D
Consider the two sentences below.
-> I mailed my credit card application to the bank
-> We walked along the river bank until we came to a waterwheel
What type of NLP ambiguity might occur when interpreting the word "bank"?
Answer : C