Designing an Azure Data Solution v1.0 (DP-201)

Page:    1 / 14   
Total 211 questions

You are designing a serving layer for data. The design must meet the following requirements:
✑ Authenticate users by using Azure Active Directory (Azure AD).
✑ Serve as a hot path for data.
✑ Support query scale out.
✑ Support SQL queries.
What should you include in the design?

  • A. Azure Data Lake Storage
  • B. Azure Cosmos DB
  • C. Azure Blob storage
  • D. Azure Synapse Analytics


Answer : B

Explanation:
Do you need serving storage that can serve as a hot path for your data? If yes, narrow your options to those that are optimized for a speed serving layer. This would be Cosmos DB among the options given in this question.
Note: Analytical data stores that support querying of both hot-path and cold-path data are collectively referred to as the serving layer, or data serving storage.
There are several options for data serving storage in Azure, depending on your needs:
✑ Azure Synapse Analytics
✑ Azure Cosmos DB
✑ Azure Data Explorer

Azure SQL Database -


✑ SQL Server in Azure VM
✑ HBase/Phoenix on HDInsight
✑ Hive LLAP on HDInsight
✑ Azure Analysis Services
Incorrect Answers:
A, C: Azure Data Lake Storage & Azure Blob storage are not data serving storage in Azure.
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/analytical-data-stores

You are designing a storage solution for streaming data that is processed by Azure Databricks. The solution must meet the following requirements:
✑ The data schema must be fluid.
✑ The source data must have a high throughput.
✑ The data must be available in multiple Azure regions as quickly as possible.
What should you include in the solution to meet the requirements?

  • A. Azure Cosmos DB
  • B. Azure Synapse Analytics
  • C. Azure SQL Database
  • D. Azure Data Lake Storage


Answer : A

Explanation:
Azure Cosmos DB is Microsoftג€™s globally distributed, multi-model database. Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azureג€™s geographic regions. It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs).
You can read data from and write data to Azure Cosmos DB using Databricks.
Note on fluid schema:
If you are managing data whose structures are constantly changing at a high rate, particularly if transactions can come from external sources where it is difficult to enforce conformity across the database, you may want to consider a more schema-agnostic approach using a managed NoSQL database service like Azure
Cosmos DB.
Reference:
https://docs.databricks.com/data/data-sources/azure/cosmosdb-connector.html https://docs.microsoft.com/en-us/azure/cosmos-db/relational-nosql

You are designing a log storage solution that will use Azure Blob storage containers.
CSV log files will be generated by a multi-tenant application. The log files will be generated for each customer at five-minute intervals. There will be more than
5,000 customers. Typically, the customers will query data generated on the day the data was created.
You need to recommend a naming convention for the virtual directories and files. The solution must minimize the time it takes for the customers to query the log files.
What naming convention should you recommend?

  • A. {year}/{month}/{day}/{hour}/{minute}/{CustomerID}.csv
  • B. {year}/{month}/{day}/{CustomerID}/{hour}/{minute}.csv
  • C. {minute}/{hour}/{day}/{month}/{year}/{CustomeriD}.csv
  • D. {CustomerID}/{year}/{month}/{day}/{hour}/{minute}.csv


Answer : B

Reference:
https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs

You are designing an Azure Cosmos DB database that will contain news articles.
The articles will have the following properties: Category, Created Datetime, Publish Datetime, Author, Headline, Body Text, and Publish
Status. Multiple articles will be published in each category daily, but no two stories in a category will be published simultaneously.
Headlines may be updated over time. Publish Status will have the following values: draft, published, updated, and removed. Most articles will remain in the published or updated status. Publish Datetime will be populated only when Publish Status is set to published.
You will serve the latest articles to websites for users to consume.
You need to recommend a partition key for the database container. The solution must ensure that the articles are served to the websites as quickly as possible.
Which partition key should you recommend?

  • A. Publish Status
  • B. Category + Created Datetime
  • C. Headline
  • D. Publish Date + random suffix


Answer : B

Explanation:
You can form a partition key by concatenating multiple property values into a single artificial partitionKey property. These keys are referred to as synthetic keys.
Incorrect Answers:
D: Publish Datetime will be populated only when Publish Status is set to published.
Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/synthetic-partition-keys

You are designing a product catalog for a customer. The product data will be stored in Azure Cosmos DB. The product properties will be different for each product and additional properties will be added to products as needed.
Which Cosmos DB API should you use to provision the database?

  • A. Cassandra API
  • B. Core (SQL) API
  • C. Gremlin API


Answer : A

Explanation:
Cassandrsa is a type of NoSQL database.
NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases.
Incorrect Answers:
B: Core (SQL) API is a relational database which does not fit this scenario.
C: Gremlin is the graph traversal language of Apache TinkerPop. Gremlin is a functional, data-flow language that enables users to succinctly express complex traversals on (or queries of) their application's property graph.
Reference:
https://www.tutorialspoint.com/cassandra/cassandra_introduction.htm

You work for a finance company.
You need to design a business network analysis solution that meets the following requirements:
✑ Analyzes the flow of transactions between the Azure environments of the companyג€™s various partner organizations
✑ Supports Gremlin (graph) queries
What should you include in the solution?

  • A. Azure Cosmos DB
  • B. Azure Synapse
  • C. Azure Analysis Services
  • D. Azure Data Lake Storage Gen2


Answer : A

Explanation:
Gremlin is one of the most popular query languages for exploring and analyzing data modeled as property graphs. There are many graph-database vendors out there that support Gremlin as their query language, in particular Azure Cosmos DB which is one of the worldג€™s first self-managed, geo-distributed, multi-master capable graph databases.
Azure Synapse Link for Azure Cosmos DB is a cloud native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data. Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.
Reference:
https://jayanta-mondal.medium.com/analyzing-and-improving-the-performance-azure-cosmos-db-gremlin-queries-7f68bbbac2c https://docs.microsoft.com/en-us/azure/cosmos-db/synapse-link-use-cases

HOTSPOT -
You are evaluating the use of an Azure Cosmos DB account for a new database.
The proposed account will be configured as shown in the following exhibit.


Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.
NOTE: Each correct selection is worth one point.
Hot Area:



Answer :

Explanation:

Box 1: vertices and edges -
Gremlin API is selected.
You can use the Gremlin language to create graph entities (vertices and edges), modify properties within those entities, perform queries and traversals, and delete entities.

Box 2: US East -
The (US) West US is selected as the primary location and geo- redundancy is enabled.
The secondary location for West US is East US.
Note: When a storage account is created, the customer chooses the primary location for their storage account. However, the secondary location for the storage account is fixed and customers do not have the ability to change this. The following table shows the current primary and secondary location pairings:


Reference:
https://docs.microsoft.com/en-us/azure/cosmos-db/gremlin-support https://technet2.github.io/Wiki/blogs/windowsazurestorage/windows-azure-storage-redundancy-options-and-read-access-geo-redundant-storage.html

You are designing a streaming solution that must meet the following requirements:
✑ Accept input data from an Azure IoT hub.
✑ Write aggregated data to Azure Cosmos DB.
✑ Calculate minimum, maximum, and average sensor readings every five minutes.
✑ Define calculations by using a SQL query.
✑ Deploy to multiple environments by using Azure Resource Manager templates.
What should you include in the solution?

  • A. Azure Functions
  • B. Azure HDInsight with Spark Streaming
  • C. Azure Databricks
  • D. Azure Stream Analytics


Answer : C

Explanation:
Cosmos DB is ideally suited for IoT solutions. Cosmos DB can ingest device telemetry data at high rates.

Architecture -



Data flow -
1. Events generated from IoT devices are sent to the analyze and transform layer through Azure IoT Hub as a stream of messages. Azure IoT Hub stores streams of data in partitions for a configurable amount of time.
2. Azure Databricks, running Apache Spark Streaming, picks up the messages in real time from IoT Hub, processes the data based on the business logic and sends the data to Serving layer for storage. Spark Streaming can provide real time analytics such as calculating moving averages, min and max values over time periods.
3. Device messages are stored in Cosmos DB as JSON documents.
Reference:
https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/iot-using-cosmos-db

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to store delimited text files in an Azure Data Lake Storage account that will be organized into department folders.
You need to configure data access so that users see only the files in their respective department folder.
Solution: From the storage account, you enable a hierarchical namespace, and you use access control lists (ACLs).
Does this meet the goal?

  • A. Yes
  • B. No


Answer : B

Explanation:
Azure Data Lake Storage implements an access control model that derives from HDFS, which in turn derives from the POSIX access control model.
Blob container ACLs does not support the hierarchical namespace, so it must be disabled.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-known-issues

You need to design a solution to support the storage of datasets. The solution must meet the following requirements:
✑ Send email alerts when new datasets are added.
✑ Control access to collections of datasets by using Azure Active Directory groups.
Support the storage of Microsoft Excel, Comma Separated Values (CSV), and zip files.


What should you include in the solution?

  • A. Azure SQL Database
  • B. Azure Storage
  • C. Azure Cosmos DB
  • D. Azure HDInsight


Answer : B

Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-storage
Design data processing solutions

Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Overview -
You develop data engineering solutions for Graphics Design Institute, a global media company with offices in New York City, Manchester, Singapore, and
Melbourne.
The New York office hosts SQL Server databases that stores massive amounts of customer data. The company also stores millions of images on a physical server located in the New York office. More than 2 TB of image data is added each day. The images are transferred from customer devices to the server in New
York.
Many images have been placed on this server in an unorganized manner, making it difficult for editors to search images. Images should automatically have object and color tags generated. The tags must be stored in a document database, and be queried by SQL.
You are hired to design a solution that can store, transform, and visualize customer data.

Requirements -

Business -
The company identifies the following business requirements:
You must transfer all images and customer data to cloud storage and remove on-premises servers.
You must develop an analytical processing solution for transforming customer data.
You must develop an image object and color tagging solution.
Capital expenditures must be minimized.
Cloud resource costs must be minimized.

Technical -
The solution has the following technical requirements:
Tagging data must be uploaded to the cloud from the New York office location.
Tagging data must be replicated to regions that are geographically close to company office locations.
Image data must be stored in a single data store at minimum cost.
Customer data must be analyzed using managed Spark clusters.
Power BI must be used to visualize transformed customer data.
All data must be backed up in case disaster recovery is required.

Security and optimization -
All cloud data must be encrypted at rest and in transit. The solution must support:
parallel processing of customer data
hyper-scale storage of images
global region data replication of processed image data


DRAG DROP -
You need to design the image processing solution to meet the optimization requirements for image tag data.
What should you configure? To answer, drag the appropriate setting to the correct drop targets.
Each source may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:




Answer :

Explanation:
Tagging data must be uploaded to the cloud from the New York office location.
Tagging data must be replicated to regions that are geographically close to company office locations.

Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Overview -
You develop data engineering solutions for Graphics Design Institute, a global media company with offices in New York City, Manchester, Singapore, and
Melbourne.
The New York office hosts SQL Server databases that stores massive amounts of customer data. The company also stores millions of images on a physical server located in the New York office. More than 2 TB of image data is added each day. The images are transferred from customer devices to the server in New
York.
Many images have been placed on this server in an unorganized manner, making it difficult for editors to search images. Images should automatically have object and color tags generated. The tags must be stored in a document database, and be queried by SQL.
You are hired to design a solution that can store, transform, and visualize customer data.

Requirements -

Business -
The company identifies the following business requirements:
You must transfer all images and customer data to cloud storage and remove on-premises servers.
You must develop an analytical processing solution for transforming customer data.
You must develop an image object and color tagging solution.
Capital expenditures must be minimized.
Cloud resource costs must be minimized.

Technical -
The solution has the following technical requirements:
Tagging data must be uploaded to the cloud from the New York office location.
Tagging data must be replicated to regions that are geographically close to company office locations.
Image data must be stored in a single data store at minimum cost.
Customer data must be analyzed using managed Spark clusters.
Power BI must be used to visualize transformed customer data.
All data must be backed up in case disaster recovery is required.

Security and optimization -
All cloud data must be encrypted at rest and in transit. The solution must support:
parallel processing of customer data
hyper-scale storage of images
global region data replication of processed image data


HOTSPOT -
You need to design the image processing and storage solutions.
What should you recommend? To answer, select the appropriate configuration in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:




Answer :

Explanation:
From the scenario:
The company identifies the following business requirements:
✑ You must transfer all images and customer data to cloud storage and remove on-premises servers.
✑ You must develop an image object and color tagging solution.
The solution has the following technical requirements:
✑ Image data must be stored in a single data store at minimum cost.
✑ All data must be backed up in case disaster recovery is required.
All cloud data must be encrypted at rest and in transit. The solution must support:
✑ hyper-scale storage of images
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tier-hyperscale
Design data processing solutions

Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Background -

Current environment -
The company has the following virtual machines (VMs):



Requirements -

Storage and processing -
You must be able to use a file system view of data stored in a blob.
You must build an architecture that will allow Contoso to use the DB FS filesystem layer over a blob store. The architecture will need to support data files, libraries, and images. Additionally, it must provide a web-based interface to documents that contain runnable command, visualizations, and narrative text such as a notebook.
CONT_SQL3 requires an initial scale of 35000 IOPS.
CONT_SQL1 and CONT_SQL2 must use the vCore model and should include replicas. The solution must support 8000 IOPS.
The storage should be configured to optimized storage for database OLTP workloads.

Migration -
You must be able to independently scale compute and storage resources.
You must migrate all SQL Server workloads to Azure. You must identify related machines in the on-premises environment, get disk size data usage information.
Data from SQL Server must include zone redundant storage.
You need to ensure that app components can reside on-premises while interacting with components that run in the Azure public cloud.
SAP data must remain on-premises.
The Azure Site Recovery (ASR) results should contain per-machine data.

Business requirements -
You must design a regional disaster recovery topology.
The database backups have regulatory purposes and must be retained for seven years.
CONT_SQL1 stores customers sales data that requires ETL operations for data analysis. A solution is required that reads data from SQL, performs ETL, and outputs to Power BI. The solution should use managed clusters to minimize costs. To optimize logistics, Contoso needs to analyze customer sales data to see if certain products are tied to specific times in the year.
The analytics solution for customer sales data must be available during a regional outage.

Security and auditing -
Contoso requires all corporate computers to enable Windows Firewall.
Azure servers should be able to ping other Contoso Azure servers.
Employee PII must be encrypted in memory, in motion, and at rest. Any data encrypted by SQL Server must support equality searches, grouping, indexing, and joining on the encrypted data.
Keys must be secured by using hardware security modules (HSMs).
CONT_SQL3 must not communicate over the default ports

Cost -
All solutions must minimize cost and resources.
The organization does not want any unexpected charges.
The data engineers must set the SQL Data Warehouse compute resources to consume 300 DWUs.
CONT_SQL2 is not fully utilized during non-peak hours. You must minimize resource costs for during non-peak hours.

You need to optimize storage for CONT_SQL3.
What should you recommend?

  • A. AlwaysOn
  • B. Transactional processing
  • C. General
  • D. Data warehousing


Answer : B

Explanation:
CONT_SQL3 with the SQL Server role, 100 GB database size, Hyper-VM to be migrated to Azure VM.
The storage should be configured to optimized storage for database OLTP workloads.
Azure SQL Database provides three basic in-memory based capabilities (built into the underlying database engine) that can contribute in a meaningful way to performance improvements:
In-Memory Online Transactional Processing (OLTP)
Clustered columnstore indexes intended primarily for Online Analytical Processing (OLAP) workloads
Nonclustered columnstore indexes geared towards Hybrid Transactional/Analytical Processing (HTAP) workloads
Reference:
https://www.databasejournal.com/features/mssql/overview-of-in-memory-technologies-of-azure-sql-database.html
Design data processing solutions

Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Overview -

General Overview -
ADatum Corporation is a medical company that has 5,000 physicians located in more than 300 hospitals across the US. The company has a medical department, a sales department, a marketing department, a medical research department, and a human resources department.
You are redesigning the application environment of ADatum.

Physical Locations -
ADatum has three main offices in New York, Dallas, and Los Angeles. The offices connect to each other by using a WAN link. Each office connects directly to the
Internet. The Los Angeles office also has a datacenter that hosts all the company's applications.

Existing Environment -

Health Review -
ADatum has a critical OLTP web application named Health Review that physicians use to track billing, patient care, and overall physician best practices.

Health Interface -
ADatum has a critical application named Health Interface that receives hospital messages related to patient care and status updates. The messages are sent in batches by each hospital's enterprise relationship management (ERM) system by using a VPN. The data sent from each hospital can have varying columns and formats.
Currently, a custom C# application is used to send the data to Health Interface. The application uses deprecated libraries and a new solution must be designed for this functionality.

Health Insights -
ADatum has a web-based reporting system named Health Insights that shows hospital and patient insights to physicians and business users. The data is created from the data in Health Review and Health Interface, as well as manual entries.

Database Platform -
Currently, the databases for all three applications are hosted on an out-of-date VMware cluster that has a single instance of Microsoft SQL Server 2012.

Problem Statements -
ADatum identifies the following issues in its current environment:
Over time, the data received by Health Interface from the hospitals has slowed, and the number of messages has increased.
When a new hospital joins ADatum, Health Interface requires a schema modification due to the lack of data standardization.
The speed of batch data processing is inconsistent.

Business Requirements -

Business Goals -
ADatum identifies the following business goals:
Migrate the applications to Azure whenever possible.
Minimize the development effort required to perform data movement.
Provide continuous integration and deployment for development, test, and production environments.
Provide faster access to the applications and the data and provide more consistent application performance.
Minimize the number of services required to perform data processing, development, scheduling, monitoring, and the operationalizing of pipelines.

Health Review Requirements -
ADatum identifies the following requirements for the Health Review application:
Ensure that sensitive health data is encrypted at rest and in transit.
Tag all the sensitive health data in Health Review. The data will be used for auditing.

Health Interface Requirements -
ADatum identifies the following requirements for the Health Interface application:
Upgrade to a data storage solution that will provide flexible schemas and increased throughput for writing data. Data must be regionally located close to each hospital, and reads must display be the most recent committed version of an item.
Reduce the amount of time it takes to add data from new hospitals to Health Interface.
Support a more scalable batch processing solution in Azure.
Reduce the amount of development effort to rewrite existing SQL queries.

Health Insights Requirements -
ADatum identifies the following requirements for the Health Insights application:
The analysis of events must be performed over time by using an organizational date dimension table.
The data from Health Interface and Health Review must be available in Health Insights within 15 minutes of being committed.
The new Health Insights application must be built on a massively parallel processing (MPP) architecture that will support the high performance of joins on large fact tables.

What should you recommend as a batch processing solution for Health Interface?

  • A. Azure CycleCloud
  • B. Azure Stream Analytics
  • C. Azure Data Factory
  • D. Azure Databricks


Answer : B

Explanation:
Scenario: ADatum identifies the following requirements for the Health Interface application:
Support a more scalable batch processing solution in Azure.
Reduce the amount of time it takes to add data from new hospitals to Health Interface.
Data Factory integrates with the Azure Cosmos DB bulk executor library to provide the best performance when you write to Azure Cosmos DB.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db
Design data processing solutions

Case study -
This is a case study. Case studies are not timed separately. You can use as much exam time as you would like to complete each case. However, there may be additional case studies and sections on this exam. You must manage your time to ensure that you are able to complete all questions included on this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is provided in the case study. Case studies might contain exhibits and other resources that provide more information about the scenario that is described in the case study. Each question is independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your answers and to make changes before you move to the next section of the exam. After you begin a new section, you cannot return to this section.

To start the case study -
To display the first question in this case study, click the Next button. Use the buttons in the left pane to explore the content of the case study before you answer the questions. Clicking these buttons displays information such as business requirements, existing environment, and problem statements. If the case study has an All Information tab, note that the information displayed is identical to the information displayed on the subsequent tabs. When you are ready to answer a question, click the Question button to return to the question.

Overview -
You are a data engineer for Trey Research. The company is close to completing a joint project with the government to build smart highways infrastructure across
North America. This involves the placement of sensors and cameras to measure traffic flow, car speed, and vehicle details.
You have been asked to design a cloud solution that will meet the business and technical requirements of the smart highway.

Solution components -

Telemetry Capture -
The telemetry capture system records each time a vehicle passes in front of a sensor. The sensors run on a custom embedded operating system and record the following telemetry data:
Time
Location in latitude and longitude
Speed in kilometers per hour (kmph)

Length of vehicle in meters -



Visual Monitoring -
The visual monitoring system is a network of approximately 1,000 cameras placed near highways that capture images of vehicle traffic every 2 seconds. The cameras record high resolution images. Each image is approximately 3 MB in size.

Requirements. Business -
The company identifies the following business requirements:
External vendors must be able to perform custom analysis of data using machine learning technologies.
You must display a dashboard on the operations status page that displays the following metrics: telemetry, volume, and processing latency.
Traffic data must be made available to the Government Planning Department for the purpose of modeling changes to the highway system. The traffic data will be used in conjunction with other data such as information about events such as sporting events, weather conditions, and population statistics. External data used during the modeling is stored in on-premises SQL Server 2016 databases and CSV files stored in an Azure Data Lake Storage Gen2 storage account.
Information about vehicles that have been detected as going over the speed limit during the last 30 minutes must be available to law enforcement officers.
Several law enforcement organizations may respond to speeding vehicles.
The solution must allow for searches of vehicle images by license plate to support law enforcement investigations. Searches must be able to be performed using a query language and must support fuzzy searches to compensate for license plate detection errors.

Requirements. Security -
The solution must meet the following security requirements:
External vendors must not have direct access to sensor data or images.
Images produced by the vehicle monitoring solution must be deleted after one month. You must minimize costs associated with deleting images from the data store.
Unauthorized usage of data must be detected in real time. Unauthorized usage is determined by looking for unusual usage patterns.
All changes to Azure resources used by the solution must be recorded and stored. Data must be provided to the security team for incident response purposes.

Requirements. Sensor data -
You must write all telemetry data to the closest Azure region. The sensors used for the telemetry capture system have a small amount of memory available and so must write data as quickly as possible to avoid losing telemetry data.


DRAG DROP -
You need to design the system for notifying law enforcement officers about speeding vehicles.
How should you design the pipeline? To answer, drag the appropriate services to the correct locations. Each service may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:



Answer :

Explanation:


Scenario:
Information about vehicles that have been detected as going over the speed limit during the last 30 minutes must be available to law enforcement officers. Several law enforcement organizations may respond to speeding vehicles.

Telemetry Capture -
The telemetry capture system records each time a vehicle passes in front of a sensor. The sensors run on a custom embedded operating system and record the following telemetry data:
✑ Time
✑ Location in latitude and longitude
✑ Speed in kilometers per hour (kmph)
✑ Length of vehicle in meters
Reference:
https://docs.microsoft.com/en-us/azure/azure-databricks/what-is-azure-databricks
Design data processing solutions

Page:    1 / 14   
Total 211 questions