You are designing a serving layer for data. The design must meet the following requirements:
✑ Authenticate users by using Azure Active Directory (Azure AD).
✑ Serve as a hot path for data.
✑ Support query scale out.
✑ Support SQL queries.
What should you include in the design?
Answer : B
Explanation:
Do you need serving storage that can serve as a hot path for your data? If yes, narrow your options to those that are optimized for a speed serving layer. This would be Cosmos DB among the options given in this question.
Note: Analytical data stores that support querying of both hot-path and cold-path data are collectively referred to as the serving layer, or data serving storage.
There are several options for data serving storage in Azure, depending on your needs:
✑ Azure Synapse Analytics
✑ Azure Cosmos DB
✑ Azure Data Explorer
Azure SQL Database -
You are designing a storage solution for streaming data that is processed by Azure Databricks. The solution must meet the following requirements:
✑ The data schema must be fluid.
✑ The source data must have a high throughput.
✑ The data must be available in multiple Azure regions as quickly as possible.
What should you include in the solution to meet the requirements?
Answer : A
Explanation:
Azure Cosmos DB is Microsoftג€™s globally distributed, multi-model database. Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azureג€™s geographic regions. It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs).
You can read data from and write data to Azure Cosmos DB using Databricks.
Note on fluid schema:
If you are managing data whose structures are constantly changing at a high rate, particularly if transactions can come from external sources where it is difficult to enforce conformity across the database, you may want to consider a more schema-agnostic approach using a managed NoSQL database service like Azure
Cosmos DB.
Reference:
https://docs.databricks.com/data/data-sources/azure/cosmosdb-connector.html https://docs.microsoft.com/en-us/azure/cosmos-db/relational-nosql
You are designing a log storage solution that will use Azure Blob storage containers.
CSV log files will be generated by a multi-tenant application. The log files will be generated for each customer at five-minute intervals. There will be more than
5,000 customers. Typically, the customers will query data generated on the day the data was created.
You need to recommend a naming convention for the virtual directories and files. The solution must minimize the time it takes for the customers to query the log files.
What naming convention should you recommend?
Answer : B
Reference:
https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs
You are designing an Azure Cosmos DB database that will contain news articles.
The articles will have the following properties: Category, Created Datetime, Publish Datetime, Author, Headline, Body Text, and Publish
Status. Multiple articles will be published in each category daily, but no two stories in a category will be published simultaneously.
Headlines may be updated over time. Publish Status will have the following values: draft, published, updated, and removed. Most articles will remain in the published or updated status. Publish Datetime will be populated only when Publish Status is set to published.
You will serve the latest articles to websites for users to consume.
You need to recommend a partition key for the database container. The solution must ensure that the articles are served to the websites as quickly as possible.
Which partition key should you recommend?
Answer : B
Explanation:
You can form a partition key by concatenating multiple property values into a single artificial partitionKey property. These keys are referred to as synthetic keys.
Incorrect Answers:
D: Publish Datetime will be populated only when Publish Status is set to published.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/synthetic-partition-keys
You are designing a product catalog for a customer. The product data will be stored in Azure Cosmos DB. The product properties will be different for each product and additional properties will be added to products as needed.
Which Cosmos DB API should you use to provision the database?
Answer : A
Explanation:
Cassandrsa is a type of NoSQL database.
NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases.
Incorrect Answers:
B: Core (SQL) API is a relational database which does not fit this scenario.
C: Gremlin is the graph traversal language of Apache TinkerPop. Gremlin is a functional, data-flow language that enables users to succinctly express complex traversals on (or queries of) their application's property graph.
Reference:
https://www.tutorialspoint.com/cassandra/cassandra_introduction.htm
You work for a finance company.
You need to design a business network analysis solution that meets the following requirements:
✑ Analyzes the flow of transactions between the Azure environments of the companyג€™s various partner organizations
✑ Supports Gremlin (graph) queries
What should you include in the solution?
Answer : A
Explanation:
Gremlin is one of the most popular query languages for exploring and analyzing data modeled as property graphs. There are many graph-database vendors out there that support Gremlin as their query language, in particular Azure Cosmos DB which is one of the worldג€™s first self-managed, geo-distributed, multi-master capable graph databases.
Azure Synapse Link for Azure Cosmos DB is a cloud native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data. Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.
Reference:
https://jayanta-mondal.medium.com/analyzing-and-improving-the-performance-azure-cosmos-db-gremlin-queries-7f68bbbac2c https://docs.microsoft.com/en-us/azure/cosmos-db/synapse-link-use-cases
HOTSPOT -
You are evaluating the use of an Azure Cosmos DB account for a new database.
The proposed account will be configured as shown in the following exhibit.
Answer :
Explanation:
Box 1: vertices and edges -
Gremlin API is selected.
You can use the Gremlin language to create graph entities (vertices and edges), modify properties within those entities, perform queries and traversals, and delete entities.
Box 2: US East -
The (US) West US is selected as the primary location and geo- redundancy is enabled.
The secondary location for West US is East US.
Note: When a storage account is created, the customer chooses the primary location for their storage account. However, the secondary location for the storage account is fixed and customers do not have the ability to change this. The following table shows the current primary and secondary location pairings:
You are designing a streaming solution that must meet the following requirements:
✑ Accept input data from an Azure IoT hub.
✑ Write aggregated data to Azure Cosmos DB.
✑ Calculate minimum, maximum, and average sensor readings every five minutes.
✑ Define calculations by using a SQL query.
✑ Deploy to multiple environments by using Azure Resource Manager templates.
What should you include in the solution?
Answer : C
Explanation:
Cosmos DB is ideally suited for IoT solutions. Cosmos DB can ingest device telemetry data at high rates.
Architecture -
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to store delimited text files in an Azure Data Lake Storage account that will be organized into department folders.
You need to configure data access so that users see only the files in their respective department folder.
Solution: From the storage account, you enable a hierarchical namespace, and you use access control lists (ACLs).
Does this meet the goal?
Answer : B
Explanation:
Azure Data Lake Storage implements an access control model that derives from HDFS, which in turn derives from the POSIX access control model.
Blob container ACLs does not support the hierarchical namespace, so it must be disabled.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues
You need to design a solution to support the storage of datasets. The solution must meet the following requirements:
✑ Send email alerts when new datasets are added.
✑ Control access to collections of datasets by using Azure Active Directory groups.
Support the storage of Microsoft Excel, Comma Separated Values (CSV), and zip files.
Answer : B
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-storage
Design data processing solutions
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Overview -
You develop data engineering solutions for Graphics Design Institute, a global media company with offices in New York City, Manchester, Singapore, and
Melbourne.
The New York office hosts SQL Server databases that stores massive amounts of customer data. The company also stores millions of images on a physical server located in the New York office. More than 2 TB of image data is added each day. The images are transferred from customer devices to the server in New
York.
Many images have been placed on this server in an unorganized manner, making it difficult for editors to search images. Images should automatically have object and color tags generated. The tags must be stored in a document database, and be queried by SQL.
You are hired to design a solution that can store, transform, and visualize customer data.
Requirements -
Business -
The company identifies the following business requirements:
You must transfer all images and customer data to cloud storage and remove on-premises servers.
You must develop an analytical processing solution for transforming customer data.
You must develop an image object and color tagging solution.
Capital expenditures must be minimized.
Cloud resource costs must be minimized.
Technical -
The solution has the following technical requirements:
Tagging data must be uploaded to the cloud from the New York office location.
Tagging data must be replicated to regions that are geographically close to company office locations.
Image data must be stored in a single data store at minimum cost.
Customer data must be analyzed using managed Spark clusters.
Power BI must be used to visualize transformed customer data.
All data must be backed up in case disaster recovery is required.
Security and optimization -
All cloud data must be encrypted at rest and in transit. The solution must support:
parallel processing of customer data
hyper-scale storage of images
global region data replication of processed image data
DRAG DROP -
You need to design the image processing solution to meet the optimization requirements for image tag data.
What should you configure? To answer, drag the appropriate setting to the correct drop targets.
Each source may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
Answer :
Explanation:
Tagging data must be uploaded to the cloud from the New York office location.
Tagging data must be replicated to regions that are geographically close to company office locations.
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Overview -
You develop data engineering solutions for Graphics Design Institute, a global media company with offices in New York City, Manchester, Singapore, and
Melbourne.
The New York office hosts SQL Server databases that stores massive amounts of customer data. The company also stores millions of images on a physical server located in the New York office. More than 2 TB of image data is added each day. The images are transferred from customer devices to the server in New
York.
Many images have been placed on this server in an unorganized manner, making it difficult for editors to search images. Images should automatically have object and color tags generated. The tags must be stored in a document database, and be queried by SQL.
You are hired to design a solution that can store, transform, and visualize customer data.
Requirements -
Business -
The company identifies the following business requirements:
You must transfer all images and customer data to cloud storage and remove on-premises servers.
You must develop an analytical processing solution for transforming customer data.
You must develop an image object and color tagging solution.
Capital expenditures must be minimized.
Cloud resource costs must be minimized.
Technical -
The solution has the following technical requirements:
Tagging data must be uploaded to the cloud from the New York office location.
Tagging data must be replicated to regions that are geographically close to company office locations.
Image data must be stored in a single data store at minimum cost.
Customer data must be analyzed using managed Spark clusters.
Power BI must be used to visualize transformed customer data.
All data must be backed up in case disaster recovery is required.
Security and optimization -
All cloud data must be encrypted at rest and in transit. The solution must support:
parallel processing of customer data
hyper-scale storage of images
global region data replication of processed image data
HOTSPOT -
You need to design the image processing and storage solutions.
What should you recommend? To answer, select the appropriate configuration in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Answer :
Explanation:
From the scenario:
The company identifies the following business requirements:
✑ You must transfer all images and customer data to cloud storage and remove on-premises servers.
✑ You must develop an image object and color tagging solution.
The solution has the following technical requirements:
✑ Image data must be stored in a single data store at minimum cost.
✑ All data must be backed up in case disaster recovery is required.
All cloud data must be encrypted at rest and in transit. The solution must support:
✑ hyper-scale storage of images
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tier-hyperscale
Design data processing solutions
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Background -
Current environment -
The company has the following virtual machines (VMs):
Answer : B
Explanation:
CONT_SQL3 with the SQL Server role, 100 GB database size, Hyper-VM to be migrated to Azure VM.
The storage should be configured to optimized storage for database OLTP workloads.
Azure SQL Database provides three basic in-memory based capabilities (built into the underlying database engine) that can contribute in a meaningful way to performance improvements:
In-Memory Online Transactional Processing (OLTP)
Clustered columnstore indexes intended primarily for Online Analytical Processing (OLAP) workloads
Nonclustered columnstore indexes geared towards Hybrid Transactional/Analytical Processing (HTAP) workloads
Reference:
https://www.databasejournal.com/features/mssql/overview-of-in-memory-technologies-of-azure-sql-database.html
Design data processing solutions
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Overview -
General Overview -
ADatum Corporation is a medical company that has 5,000 physicians located in more than 300 hospitals across the US. The company has a medical department, a sales department, a marketing department, a medical research department, and a human resources department.
You are redesigning the application environment of ADatum.
Physical Locations -
ADatum has three main offices in New York, Dallas, and Los Angeles. The offices connect to each other by using a WAN link. Each office connects directly to the
Internet. The Los Angeles office also has a datacenter that hosts all the company's applications.
Existing Environment -
Health Review -
ADatum has a critical OLTP web application named Health Review that physicians use to track billing, patient care, and overall physician best practices.
Health Interface -
ADatum has a critical application named Health Interface that receives hospital messages related to patient care and status updates. The messages are sent in batches by each hospital's enterprise relationship management (ERM) system by using a VPN. The data sent from each hospital can have varying columns and formats.
Currently, a custom C# application is used to send the data to Health Interface. The application uses deprecated libraries and a new solution must be designed for this functionality.
Health Insights -
ADatum has a web-based reporting system named Health Insights that shows hospital and patient insights to physicians and business users. The data is created from the data in Health Review and Health Interface, as well as manual entries.
Database Platform -
Currently, the databases for all three applications are hosted on an out-of-date VMware cluster that has a single instance of Microsoft SQL Server 2012.
Problem Statements -
ADatum identifies the following issues in its current environment:
Over time, the data received by Health Interface from the hospitals has slowed, and the number of messages has increased.
When a new hospital joins ADatum, Health Interface requires a schema modification due to the lack of data standardization.
The speed of batch data processing is inconsistent.
Business Requirements -
Business Goals -
ADatum identifies the following business goals:
Migrate the applications to Azure whenever possible.
Minimize the development effort required to perform data movement.
Provide continuous integration and deployment for development, test, and production environments.
Provide faster access to the applications and the data and provide more consistent application performance.
Minimize the number of services required to perform data processing, development, scheduling, monitoring, and the operationalizing of pipelines.
Health Review Requirements -
ADatum identifies the following requirements for the Health Review application:
Ensure that sensitive health data is encrypted at rest and in transit.
Tag all the sensitive health data in Health Review. The data will be used for auditing.
Health Interface Requirements -
ADatum identifies the following requirements for the Health Interface application:
Upgrade to a data storage solution that will provide flexible schemas and increased throughput for writing data. Data must be regionally located close to each hospital, and reads must display be the most recent committed version of an item.
Reduce the amount of time it takes to add data from new hospitals to Health Interface.
Support a more scalable batch processing solution in Azure.
Reduce the amount of development effort to rewrite existing SQL queries.
Health Insights Requirements -
ADatum identifies the following requirements for the Health Insights application:
The analysis of events must be performed over time by using an organizational date dimension table.
The data from Health Interface and Health Review must be available in Health Insights within 15 minutes of being committed.
The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables.
What should you recommend as a batch processing solution for Health Interface?
Answer : B
Explanation:
Scenario: ADatum identifies the following requirements for the Health Interface application:
Support a more scalable batch processing solution in Azure.
Reduce the amount of time it takes to add data from new hospitals to Health Interface.
Data Factory integrates with the Azure Cosmos DB bulk executor library to provide the best performance when you write to Azure Cosmos DB.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db
Design data processing solutions
Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.
To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.
Overview -
You are a data engineer for Trey Research. The company is close to completing a joint project with the government to build smart highways infrastructure across
North America. This involves the placement of sensors and cameras to measure traffic flow, car speed, and vehicle details.
You have been asked to design a cloud solution that will meet the business and technical requirements of the smart highway.
Solution components -
Telemetry Capture -
The telemetry capture system records each time a vehicle passes in front of a sensor. The sensors run on a custom embedded operating system and record the following telemetry data:
Time
Location in latitude and longitude
Speed in kilometers per hour (kmph)
Length of vehicle in meters -
Answer :
Explanation: