Implementing a SQL Data Warehouse v1.0 (70-767)

Page:    1 / 10   
Total 147 questions

Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to that question.
You are loading data from an OLTP database to a data warehouse.
The OLTP database includes a table for sales data and a table for refund data.
The data warehouse contains a single table for all the sales and refund data.
Which component should you use to load the data to the data warehouse?

  • A. the Slowly Changing Dimension transformation
  • B. the Conditional Split transformation
  • C. the Merge transformation
  • D. the Data Conversion transformation
  • E. an Execute SQL task
  • F. the Aggregate transformation
  • G. the Lookup transformation


Answer : C

Explanation:
The Merge transformation combines two sorted datasets into a single dataset. The rows from each dataset are inserted into the output based on values in their key columns.
By including the Merge transformation in a data flow, you can merge data from two data sources, such as tables and files.
References:
https://docs.microsoft.com/en-us/sql/integration-services/data-flow/transformations/merge-transformation?view=sql-server-2017

Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to that question.
You are designing a data warehouse and the load process for the data warehouse.
You have a source system that contains two tables named Table1 and Table2. All the rows in each table have a corresponding row in the other table.
The primary key for Table1 is named Key1. The primary key for Table2 is named Key2.
You need to combine both tables into a single table named Table3 in the data warehouse. The solution must ensure that all the nonkey columns in Table1 and
Table2 exist in Table3.
Which component should you use to load the data to the data warehouse?

  • A. the Slowly Changing Dimension transformation
  • B. the Conditional Split transformation
  • C. the Merge transformation
  • D. the Data Conversion transformation
  • E. an Execute SQL task
  • F. the Aggregate transformation
  • G. the Lookup transformation


Answer : G

Explanation:
The Lookup transformation performs lookups by joining data in input columns with columns in a reference dataset. You use the lookup to access additional information in a related table that is based on values in common columns.
You can configure the Lookup transformation in the following ways:
Specify joins between the input and the reference dataset.
Add columns from the reference dataset to the Lookup transformation output.
Etc.
Incorrect Answers:
F: The Aggregate transformation applies aggregate functions, such as Average, to column values and copies the results to the transformation output. Besides aggregate functions, the transformation provides the GROUP BY clause, which you can use to specify groups to aggregate across.
References:
https://docs.microsoft.com/en-us/sql/integration-services/data-flow/transformations/lookup-transformation

Note: This question is part of a series of questions that use the same or similar answer choices. An answer choice may be correct for more than one question in the series. Each question is independent of the other questions in this series. Information and details provided in a question apply only to that question.
You are implementing the data load process for a data warehouse.
The data warehouse uses daily partitions to store data added or modified during the last 60 days. Older data is stored in monthly partitions.
You need to ensure that the ETL process can modify the partition scheme during the data load process.
Which component should you use to load the data to the data warehouse?

  • A. the Slowly Changing Dimension transformation
  • B. the Conditional Split transformation
  • C. the Merge transformation
  • D. the Data Conversion transformation
  • E. an Execute SQL task
  • F. the Aggregate transformation
  • G. the Lookup transformation


Answer : E

Incorrect Answers:
A: The Slowly Changing Dimension transformation coordinates the updating and inserting of records in data warehouse dimension tables.
References:
https://social.technet.microsoft.com/wiki/contents/articles/50781.dynamic-management-of-ssas-partitions-with-ssis.aspx

You are building a server to host a data warehouse.
The planned disk activity for the data warehouse is five percent write activity and 95 percent read activity.
You need to recommend a storage solution for the data files of the data warehouse. The solution must meet the following requirements:
-> Ensure that the data warehouse is available if two disks fail.
-> Minimize hardware costs.
Which RAID configuration should you recommend?

  • A. RAID 1
  • B. RAID 5
  • C. RAID 6
  • D. RAID 10


Answer : C

Explanation:
According to the Storage Networking Industry Association (SNIA), the definition of RAID 6 is: "Any form of RAID that can continue to execute read and write requests to all of a RAID array's virtual disks in the presence of any two concurrent disk failures."
Incorrect Answers:
A: RAID 1 can only handle a single disk failure.
B: RAID 5 can only handle a single disk failure.
D: RAID 10 is a stripe of mirrors. It requires more disks compared with RAID 6.
References:
https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_6

You have a data warehouse named DW1.
In DW1, you plan to create a table named Table1 that will be partitioned by hour. Table1 will contain the last three hours of data.
You plan to implement a sliding window process for inserting data into Table1.
You need to recommend the minimum number of partitions that must be included in Table1 to support the planned implementation. The solution must minimize the number of transaction log records created during the insert process.
How many partitions should you recommend?

  • A. 3
  • B. 5
  • C. 9
  • D. 24


Answer : B

DRAG DROP -
You need to recommend a storage solution for a data warehouse that minimizes load times. The solution must provide availability if a hard disk fails.
Which RAID configuration should you recommend for each type of database file? To answer, drag the appropriate RAID configurations to the correct database file types. Each RAID configuration may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:




Answer :

Explanation:

Box 1: RAID 5 -
RAID 5 is the similar to that of RAID 0 provided that the number of disks is the same. However, due to the fact that it is useless to read the parity data, the read speed is just (N-1) times faster but not N times as in RAID 0.

Box 2: RAID 10 -
Always place log files on RAID 1+0 (or RAID 1) disks. This provides better protection from hardware failure, and better write performance.
Note: In general RAID 1+0 will provide better throughput for write-intensive applications. The amount of performance gained will vary based on the HW vendor"™s
RAID implementations. Most common alternative to RAID 1+0 is RAID 5. Generally, RAID 1+0 provides better write performance than any other RAID level providing data protection, including RAID 5.
Incorrect Answers:
RAID 6 is slower and more expensive than RAID 5. RAID 6 read speed is (N-2) times faster than in case of a single disk - two disks in the row hold a parity which is useless to read.
References:
https://technet.microsoft.com/en-us/library/cc966534.aspx

You plan to deploy several Microsoft SQL Server Integration Services (SSIS) packages to a highly available SQL Server instance. The instance is configured to use an AlwaysOn availability group that has two replicas.
You need to identify which deployment method must be used to ensure that the packages are always accessible from all the nodes in the availability group.
Which deployment method should you use for the packages?

  • A. Deploy to the msdb database on the secondary replica.
  • B. Deploy to the msdb database on the primary replica.
  • C. Deploy to a file on the hard drive of the primary replica.
  • D. Deploy to a shared folder on a file server.


Answer : A

Explanation:
Before you can configure SSIS to enable support of AlwaysOn on the new added secondary Replicas, you must connect to all new added secondary replicas.


Note: To use SSIS with AlwaysOn, you"™ll need to add the SSIS Catalog (SSISDB) into an Availability Group. You"™ll need to do the following steps:
-> Make sure you meet the prerequisites for using AlwaysOn
-> Connect to every node and create the SSISDB catalog. We need to create the catalog even on secondary nodes to create the other server-level objects
(cleanup jobs, keys, accounts etc) that are used by SSIS.
-> Delete the SSISDB databases on secondary nodes.
-> Create an availability group, specifying SSISDB as the user database
-> Specify secondary replicas.
References:
https://chrislumnah.com/2017/05/09/enabling-alwayson-for-ssisdb/

You have a database named DB1 that contains millions of rows.
You plan to perform a weekly audit of the changes to the rows.
You need to ensure that you can view which rows were modified and the hour that the modification occurred.
What should you do?

  • A. Enable Policy-Based Management
  • B. Configure Stretch Database.
  • C. Configure an SSIS database.
  • D. Enable change data capture.


Answer : D

Explanation:
SQL Server 2017 provides two features that track changes to data in a database: change data capture and change tracking.
Change data capture provides historical change information for a user table by capturing both the fact that DML changes were made and the actual data that was changed. Changes are captured by using an asynchronous process that reads the transaction log and has a low impact on the system.
References:
https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/track-data-changes-sql-server

Note: This question is part of a series of questions that use the same scenario. For your convenience, the scenario is repeated in each question. Each question presents a different goal and answer choices, but the text of the scenario is exactly the same in each question in the series.

Start of repeated scenario -
You have a Microsoft SQL Server data warehouse instance that supports several client applications.
The data warehouse includes the following tables: Dimension.SalesTerritory, Dimension.Customer, Dimension.Date, Fact.Ticket, and Fact.Order. The
Dimension.SalesTerritory and Dimension.Customer tables are frequently updated. The Fact.Order table is optimized for weekly reporting, but the company wants to change it to daily. The Fact.Order table is loaded by using an ETL process. Indexes have been added to the table over time, but the presence of these indexes slows data loading.
All tables are in a database named DB1. You have a second database named DB2 that contains copies of production data for a development environment. The data warehouse has grown and the cost of storage has increased. Data older than one year is accessed infrequently and is considered historical.
The following requirements must be met:
-> Implement table partitioning to improve the manageability of the data warehouse and to avoid the need to repopulate all transactional data each night. Use a partitioning strategy that is as granular as possible.
-> Partition the Fact.Order table and retain a total of seven years of data.
-> Partition the Fact.Ticket table and retain seven years of data. At the end of each month, the partition structure must apply a sliding window strategy to ensure that a new partition is available for the upcoming month, and that the oldest month of data is archived and removed.
-> Optimize data loading for the Dimension.SalesTerritory, Dimension.Customer, and Dimension.Date tables.
-> Incrementally load all tables in the database and ensure that all incremental changes are processed.
-> Maximize the performance during the data loading process for the Fact.Order partition.
-> Ensure that historical data remains online and available for querying.
-> Reduce ongoing storage costs while maintaining query performance for current data.
You are not permitted to make changes to the client applications.

End of repeated scenario -
You need to implement the data partitioning strategy.
How should you partition the Fact.Order table?

  • A. Create 17,520 partitions.
  • B. Use a granularity of one day.
  • C. Use a granularity of one month.
  • D. Create 1,460 partitions.


Answer : B

Explanation:
We create on partition for each day, which means that a granularity of one day is used.
Note: If we calculate the partitions that are needed, we get: 7 years times 365 days is 2,555. Make that 2,557 to provide for leap years.
From scenario: Partition the Fact.Order table and retain a total of seven years of data.
The Fact.Order table is optimized for weekly reporting, but the company wants to change it to daily.
Maximize the performance during the data loading process for the Fact.Order partition.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-partition

DRAG DROP -

Table1 -

Table1 -
You are designing an indexing strategy for a data warehouse. The data warehouse contains a table named
. Data is bulk inserted into
.
You plan to create the indexes configured as shown in the following table.


Which type of index should you use to minimize the query times of each index? To answer, drag the appropriate index types to the correct indexes. Each index type may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
Select and Place:



Answer :

Explanation:

Box 1: Clustered columnstore -
A clustered columnstore index is the physical storage for the entire table.
With columnstore index, SQL Server processes aggregate in BatchMode thereby delivering order of magnitude better performance when compared to rowstore.
SQL Server 2016 takes the aggregate performance to the next level by pushing aggregate computations to the SCAN node.

Box 2: Nonclustered columnstore -
A nonclustered columnstore index and a clustered columnstore index function the same. The difference is that a nonclustered index is a secondary index that's created on a rowstore table, but a clustered columnstore index is the primary storage for the entire table.
The nonclustered index contains a copy of part or all of the rows and columns in the underlying table. The index is defined as one or more columns of the table and has an optional condition that filters the rows.
References:
https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview

HOTSPOT -
Note: This question is part of a series of questions that use the same scenario. For your convenience, the scenario is repeated in each question. Each question presents a different goal and answer choices, but the text of the scenario is exactly the same in each question in the series.

Start of repeated scenario -
You have a Microsoft SQL Server data warehouse instance that supports several client applications.
The data warehouse includes the following tables: Dimension.SalesTerritory, Dimension.Customer, Dimension.Date, Fact.Ticket, and Fact.Order. The
Dimension.SalesTerritory and Dimension.Customer tables are frequently updated. The Fact.Order table is optimized for weekly reporting, but the company wants to change it to daily. The Fact.Order table is loaded by using an ETL process. Indexes have been added to the table over time, but the presence of these indexes slows data loading.
All data in the data warehouse is stored on a shared SAN. All tables are in a database named DB1. You have a second database named DB2 that contains copies of production data for a development environment. The data warehouse has grown and the cost of storage has increased. Data older than one year is accessed infrequently and is considered historical.
The following requirements must be met:
-> Implement table partitioning to improve the manageability of the data warehouse and to avoid the need to repopulate all transactional data each night. Use a partitioning strategy that is as granular as possible.
-> Partition the Fact.Order table and retain a total of seven years of data.
-> Partition the Fact.Ticket table and retain seven years of data. At the end of each month, the partition structure must apply a sliding window strategy to ensure that a new partition is available for the upcoming month, and that the oldest month of data is archived and removed.
-> Optimize data loading for the Dimension.SalesTerritory, Dimension.Customer, and Dimension.Date tables.
-> Incrementally load all tables in the database and ensure that all incremental changes are processed.
-> Maximize the performance during the data loading process for the Fact.Order partition.
-> Ensure that historical data remains online and available for querying.
-> Reduce ongoing storage costs while maintaining query performance for current data.
You are not permitted to make changes to the client applications.

End of repeated scenario -
You need to optimize data loading for the Dimension.SalesTerritory, Dimension.Customer, and Dimension.Date tables.
Which technology should you use for each table?
To answer, select the appropriate technologies in the answer area.
Hot Area:




Answer :

Explanation:

Box 1: Temporal table -

Box 2: Temporal table -
Compared to CDC, Temporal tables are more efficient in storing historical data as it ignores insert actions.
Box 3: Change Data Capture (CDC)
By using change data capture, you can track changes that have occurred over time to your table. This kind of functionality is useful for applications, like a data warehouse load process that need to identify changes, so they can correctly apply updates to track historical changes over time.
CDC is good for maintaining slowly changing dimensions.
Scenario: Optimize data loading for the Dimension.SalesTerritory, Dimension.Customer, and Dimension.Date tables.
The Dimension.SalesTerritory and Dimension.Customer tables are frequently updated.
References:
https://www.mssqltips.com/sqlservertip/5212/sql-server-temporal-tables-vs-change-data-capture-vs-change-tracking--part-2/ https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-2017

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
Each night you receive a comma separated values (CSV) file that contains different types of rows. Each row type has a different structure. Each row in the CSV file is unique. The first column in every row is named Type. This column identifies the data type.
For each data type, you need to load data from the CSV file to a target table. A separate table must contain the number of rows loaded for each data type.
Solution: You create a SQL Server Integration Services (SSIS) package as shown in the exhibit. (Click the Exhibit tab.)


Does the solution meet the goal?

  • A. Yes
  • B. No


Answer : B

Explanation:
The conditional split must be before the count.

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
Each night you receive a comma separated values (CSV) file that contains different types of rows. Each row type has a different structure. Each row in the CSV file is unique. The first column in every row is named Type. This column identifies the data type.
For each data type, you need to load data from the CSV file to a target table. A separate table must contain the number of rows loaded for each data type.
Solution: You create a SQL Server Integration Services (SSIS) package as shown in the exhibit. (Click the Exhibit tab.)


Does the solution meet the goal?

  • A. Yes
  • B. No


Answer : B

Explanation:
The conditional split must be before the count.

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
Each night you receive a comma separated values (CSV) file that contains different types of rows. Each row type has a different structure. Each row in the CSV file is unique. The first column in every row is named Type. This column identifies the data type.
For each data type, you need to load data from the CSV file to a target table. A separate table must contain the number of rows loaded for each data type.
Solution: You create a SQL Server Integration Services (SSIS) package as shown in the exhibit. (Click the Exhibit tab.)


Does the solution meet the goal?

  • A. Yes
  • B. No


Answer : A

Explanation:
The conditional split is correctly placed before the count.

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You configure a new matching policy Master Data Services (MDS) as shown in the following exhibit.


You review the Matching Results of the policy and find that the number of new values matches the new values.
You verify that the data contains multiple records that have similar address values, and you expect some of the records to match.
You need to increase the likelihood that the records will match when they have similar address values.
Solution: You increase the relative weights for Address Line 1 of the matching policy.
Does this meet the goal?

  • A. Yes
  • B. No


Answer : B

Explanation:
Decrease the Min. matching score.
A data matching project consists of a computer-assisted process and an interactive process. The matching project applies the matching rules in the matching policy to the data source to be assessed. This process assesses the likelihood that any two rows are matches in a matching score. Only those records with a probability of a match greater than a value set by the data steward in the matching policy will be considered a match.
References:
https://docs.microsoft.com/en-us/sql/data-quality-services/data-matching

Page:    1 / 10   
Total 147 questions