Data Science Associate Exam v7.0 (E20-007)

Page:    1 / 14   
Total 198 questions

Refer to the exhibit -


Consider the training data set shown in the exhibit. What are the classification (Y = 0 or 1) and the probability of the classification for the tupleX(0, 0, 1) using Naive Bayesian classifier?

  • A. Classification Y = 1, Probability = 4/54
  • B. Classification Y = 0, Probability = 1/54
  • C. Classification Y = 1, Probability = 1/54
  • D. Classification Y = 0, Probability = 4/54


Answer : A

What is Hadoop?

  • A. Java classes for HDFS types and MapReduce job management and HDFS
  • B. Java classes for HDFS types and MapReduce job management and the MapReduce paradigm
  • C. MapReduce paradigm and HDFS
  • D. MapReduce paradigm and massive unstructured data storage on commodity hardware


Answer : A

Consider the example of an analysis for fraud detection on credit card usage. You will need to ensure higher-risk transactions that may indicate fraudulent credit card activity are retained in your data for analysis, and not dropped as outliers during pre-processing. What will be your approach for loading data into the analytical sandbox for this analysis?

  • A. ELT
  • B. ETL
  • C. EDW
  • D. OLTP


Answer : A

Your customer provided you with 2, 000 unlabeled records and asked you to separate them into three groups. What is the correct analytical method to use?

  • A. K-means clustering
  • B. Linear regression
  • C. Naive Bayesian classification
  • D. Logistic regression


Answer : A

You are using MADlib for Linear Regression analysis. Which value does the statement return?
SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;

  • A. Goodness of fit
  • B. Coefficients
  • C. Standard error
  • D. P-value


Answer : A

Which word or phrase completes the statement; “A theater actor is to ‘artistic and expressive’ as a data scientist is to ____________.”?

  • A. Communicative and collaborative
  • B. Introverted and technical
  • C. Logical and steadfast
  • D. Independent and intelligent


Answer : A

You submit a MapReduce job to a Hadoop cluster and notice that although the job was successfully submitted, it is not completing. What should you do?

  • A. Ensure that the TaskTracker is running.
  • B. Ensure that the JobTracker is running
  • C. Ensure that the NameNode is running
  • D. Ensure that a DataNode is running


Answer : A

The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in their massively parallel database. Which tool should they use to export the structured data from Hadoop?

  • A. Sqoop
  • B. Pig
  • C. Chukwa
  • D. Scribe


Answer : A

Refer to the exhibit.


Click on the calculator icon in the upper left corner. You are given a list of pre-defined association rules:

A) RENTER => BAD CREDIT -

B) RENTER => GOOD CREDIT -

C) HOME OWNER => BAD CREDIT -

D) HOME OWNER => GOOD CREDIT -

E) FREE HOUSING => BAD CREDIT -

F) FREE HOUSING => GOOD CREDIT -
For your next analysis, you must limit your dataset based on rules with confidence greater than 60%.
Which of the rules will be kept in the analysis?

  • A. Rules B and D
  • B. Rules A and F
  • C. Rules C and E
  • D. Rules D and E


Answer : A

Which word or phrase completes the statement? A data warehouse is to a centralized database for reporting as an analytic sandbox is to a _______?

  • A. Collection of data assets for modeling
  • B. Collection of low-volume databases
  • C. Centralized database of KPIs
  • D. Collection of data assets for ETL


Answer : A

What would be considered "Big Data"?

  • A. An OLAP Cube containing customer demographic information about 100, 000, 000 customers
  • B. Daily Log files from a web server that receives 100, 000 hits per minute
  • C. Aggregated statistical data stored in a relational database table
  • D. Spreadsheets containing monthly sales data for a Global 100 corporation


Answer : B

Refer to the exhibit.


You have scored your Naive bayesian classifier model on a hold out test data for cross validation and determined the way the samples scored and tabulated them as shown in the exhibit.
What are the the False Positive Rate (FPR) and the False Negative Rate (FNR) of the model?

  • A. FPR = 15/262 FNR = 26/288
  • B. FPR = 26/288 FNR = 15/262
  • C. FPR = 262/15 FNR = 288/26
  • D. FPR = 288/26 FNR = 262/15


Answer : A

What does R code nv <- v[v < 1000] do?

  • A. Selects the values in vector v that are less than 1000 and assigns them to the vector nv
  • B. Sets nv to TRUE or FALSE depending on whether all elements of vector v are less than 1000
  • C. Removes elements of vector v less than 1000 and assigns the elements >= 1000 to nv
  • D. Selects values of vector v less than 1000, modifies v, and makes a copy to nv


Answer : A

A data scientist is given an R data frame, empdata, with the columns Age, Salary,
Occupation, Education, and Gender. The data scientist would like to examine only the
Salary and Occupation columns for ages greater than 40. Which command extracts the appropriate rows and columns from the data frame?

  • A. empdata[empdata$Age > 40, c("Salary", "Occupation")]
  • B. empdata[c("Salary", "Occupation"), empdata$Age > 40]
  • C. empdata[Age > 40, ("Salary", "Occupation")]
  • D. empdata[, c("Salary", "Occupation")]$Age > 40


Answer : A

What is the reason for using LOESS?

  • A. Fits a smoothed curve to scatterplot data; providing a general idea of the data's behavior
  • B. Significance test for the correlation between two variables
  • C. Plots a continuous variable versus a discrete variable; comparing distributions across classes
  • D. Runs after a one-way ANOVA; determining which population has the highest mean value


Answer : A

Page:    1 / 14   
Total 198 questions