IBM Big Data Architect v7.0 (C2090-102)

Page:    1 / 8   
Total 110 questions

The CAP Theorem states that it is not possible for a distributed computer system to guarantee all three of these?

  • A. Consistency, Accuracy, and Partition tolerance
  • B. Concurrency, Availability, and Parallel updates
  • C. Concurrency, Accuracy, and Parallel updates
  • D. Consistency, Availability, and Partition tolerance


Answer : B

You have implemented a large Hadoop MapReduce cluster and the applications and users are multiplying. You are now faced with requests for interactive and streaming data applications while you still need to support the original MapReduce batch Jobs. Select the best option for continued support and performance.

  • A. Just add several data nodes as Hadoop clusters are designed to scale-up easily
  • B. Keep your original cluster configuration, all that is needed is re-optimizing the Oozie- workflow management
  • C. Implement Yarn to decouple MapReduce and resource management
  • D. Implement Apache Cassandra to automatically optimize multi-tenancy workloads


Answer : D

For company B, 85% of their analytics queries only involve about 25% of their data; another 10% of the queries will touch 35% of the rest of the data, and only 5% of the queries will touch the remaining 40% of the data. The estimated volume is 50TB growing at
1 TB per year. Which of the following would provide the best value (business benefit) and lowest TCO?

  • A. Place the entire set of data in a data warehouse with proper partitioning and indexing
  • B. Place the entire set of data in a hadoop environment – using commodity HW
  • C. Place the top 25% of data (used by 85% of the query) in a hadoop environment, and the rest in a data warehouse
  • D. Place the top 25% of data (used by 85% of the query) in a data warehouse, and the rest in a hadoop environment


Answer : C

Which of the following statements regarding Big R is TRUE?

  • A. Missing data values must be handled by ETL processes prior to analyzing data with Big R
  • B. A bigr.frame loads data in memory for optimal performance
  • C. A Big R user is responsible for parallelizing the execution of the R functions being used in the R program
  • D. Performing a mathematical operation on a Big R vector variable will automatically loop through each item inthe vector


Answer : D

Explanation:
References: http://www.computerworld.com/article/2497319/business-intelligence- beginner-s-guide-to-r-syntax-quirks-you-llwant-to-know.html

Which architecture document is used to help organize projects, manage the complexity of the solution, and ensurethat all architecturerequirements have been addressed?

  • A. Operational Model
  • B. Component Model
  • C. Connection Model
  • D. API Model


Answer : B

In a typical Hadoop HA cluster, two separate machines are configured as which of the following?

  • A. Data Nodes
  • B. Edge Nodes
  • C. Name Nodes
  • D. None of the Above


Answer : A

Reference:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop- hdfs/HDFSHighAvailabilityWithQJM.html

Which of the following statements regarding Big R is TRUE?

  • A. Unless specified otherwise, Big R automatically assumes all data to be integers
  • B. Big R’s ‘bigr.frame’ is equivalent to R’s ‘data.frames’
  • C. When you execute Big R apply function, Big R transparently extracts data out of HDFS into the Big R engine
  • D. A data analyst using Big R employs MapReduce programming principles


Answer : A

Reference:
https://developer.ibm.com/hadoop/docs/biginsights-value-add/big-r/bigr-tutorial/

A component of IBM Industry Model forms the basis of the Logical Data Warehouse
Model that spans across the traditional RDBMS and Hadoop technology. It defines all of the data structures that would be expected to be defined in the Detailed System of Record.
What is the name of this component?

  • A. Business data model
  • B. Atomic warehouse model
  • C. Dimensional warehouse models
  • D. Metadata management


Answer : A

Explanation:
References:
http://www.ibm.com/support/knowledgecenter/SS9NBR_9.1.0/com.ibm.ima.using/comp/bd m/intro.dita

In designing a new Hadoop system for a customer, the option of using SAN versus DAS was brought up. Which of the following would justify choosing SAN storage?

  • A. SAN storage provides better performance than DAS
  • B. SAN storage reduces and removes a lot of the HDFS complexity and management issues
  • C. SAN storage removes the Single Point of Failure for the NameNode
  • D. SAN storage supports replication, reducing the need for 3-way replication


Answer : D

Which of the following is the section of the Component Model that details how the solution integrates?

  • A. Component Relationship Diagram
  • B. Component Interface Diagram
  • C. Component Interaction Diagram
  • D. Component Reaction Diagram


Answer : A

You are designing storage for a new Hadoop cluster. Which of the following statements is
TRUE regarding the usage of SAN or NAS?

  • A. SAN or NAS should not be used to set up HDFS
  • B. SAN or NAS must be used, if available, to provide backup capabilities
  • C. SAN or NAS can be used to support retention policies
  • D. SAN or NAS cannot be used if your Hadoop cluster spans 2 sites


Answer : A

Explanation:
References: http:// www-01.ibm.com/software/data/infosphere/hadoop/hdfs/

Company A has decided to implement a new data system to support their rapidly growing business. They have an existing 20 TB worth of raw data, with an expected weekly incoming rate of 50 GB of new raw data. The data is mostly text based and unstructured. A typical query can involve pulling in 10 GB of data. Historically, performance has been an issue and currently needs to be addressed. Which of the following would you suggest to support these requirements?

  • A. Set up a Hadoop system with commodity HW for scalability
  • B. Utilize de-duplication and compression technology
  • C. Use a mixture of different disk-types to provide hot/cold storage
  • D. Create range partitions for the data


Answer : A

Which of the following is NOT a valid Service Level Agreement (SLA) metric?

  • A. Mean time between failures
  • B. Mean time to repair
  • C. Identification to responsible party
  • D. Identification of failing component


Answer : D

Explanation:
References: https://en.wikipedia.org/wiki/Service-level_agreement

A media company collects customer behavior data, such as how frequently they tune in, specific viewing habits, and peak usage in real time, in order to improve their services.The company likes to segmentits customers for advertisers by correlating viewing habits with public data, such as voter registration, in order to launch highly targeted campaigns to specific demographics. What technology should their Data Architect consider?

  • A. InfoSphere Streams, BigInsights, and Pure Data for Analytics PDA
  • B. BigInsights and Pure Data for Operational Analytics PDOA
  • C. InfoSphere Streams, Spark, and BigR
  • D. PureData for Analytics and SPSS


Answer : D

Reference:
http://www.ibm.com/software/data/puredata/analytics/nztechnology/analytics.html

A large Retailer (online and brick & mortar) processes data for analyzing marketing campaigns for their loyalty club members. The current process takes weeks for processing only 10% of social data. What is the most costeffective platform for processing and analyzing campaign results from social data on a daily basis using 100% dataset?

  • A. Enterprise Data Warehouse
  • B. BigInsights Open Data Platform
  • C. High Speed Mainfraime Processing
  • D. In Memory Computing


Answer : B

Explanation:
References: http://www.ibm.com/developerworks/data/library/techarticle/dm-
1110biginsightsintro/

Page:    1 / 8   
Total 110 questions