SnowPro Advanced Data Engineer v1.0 (SnowPro Advanced Data Engineer)

Page:    1 / 5   
Total 65 questions

What kind of Snowflake integration is required when defining an external function in Snowflake?

  • A. API integration
  • B. HTTP integration
  • C. Notification integration
  • D. Security integration


Answer : A

A Data Engineer is writing a Python script using the Snowflake Connector for Python. The Engineer will use the snowflake.connector.connect function to connect to Snowflake.
The requirements are:
Raise an exception if the specified database, schema, or warehouse does not exist

Improve download performance -
Which parameters of the connect function should be used? (Choose two.)

  • A. authenticator
  • B. arrow_number_to_decimal
  • C. client_prefetch_threads
  • D. client_session_keep_alive
  • E. validate_default_parameters


Answer : CE

A Data Engineer wants to centralize grant management to maximize security. A user needs OWNERSHIP on a table in a new schema. However, this user should not have the ability to make grant decisions.
What is the correct way to do this?

  • A. Grant OWNERSHIP to the user on the table.
  • B. Revoke grant decisions from the user on the table.
  • C. Revoke grant decisions from the user on the schema.
  • D. Add the WITH MANAGED ACCESS parameter on the schema.


Answer : D

A CSV file, around 1 TB in size, is generated daily on an on-premise server. A corresponding table, internal stage, and file format have already been created in Snowflake to facilitate the data loading process.
How can the process of bringing the CSV file into Snowflake be automated using the LEAST amount of operational overhead?

  • A. Create a task in Snowflake that executes once a day and runs a COPY INTO statement that references the internal stage. The internal stage will read the files directly from the on-premise server and copy the newest file into the table from the on-premise server to the Snowflake table.
  • B. On the on-premise server, schedule a SQL file to run using SnowSQL that executes a PUT to push a specific file to the internal stage. Create a task that executes once a day in Snowflake and runs a COPY INTO statement that references the internal stage. Schedule the task to start after the file lands in the internal stage.
  • C. On the on-premise server, schedule a SQL file to run using SnowSQL that executes a PUT to push a specific file to the internal stage. Create a pipe that runs a COPY INTO statement that references the internal stage. Snowpipe auto-ingest will automatically load the file from the internal stage when the new file lands in the internal stage.
  • D. On the on-premise server, schedule a Python file that uses the Snowpark Python library. The Python script will read the CSV data into a DataFrame and generate an INSERT INTO statement that will directly load into the table. The script will bypass the need to move a file into an internal stage.


Answer : B

What are characteristics of Snowpark Python packages? (Choose three.)

  • A. Third-party packages can be registered as a dependency to the Snowpark session using the session.import() method.
  • B. Python packages can access any external endpoints.
  • C. Python packages can only be loaded in a local environment.
  • D. Third-party supported Python packages are locked down to prevent hitting.
  • E. The SQL command DESCRIBE FUNCTION will list the imported Python packages of the Python User-Defined Function (UDF).
  • F. Querying information_schema.packages will provide a list of supported Python packages and versions.


Answer : AEF

While running an external function, the following error message is received:
Error: Function received the wrong number of rows
What is causing this to occur?

  • A. External functions do not support multiple rows.
  • B. Nested arrays are not supported in the JSON response.
  • C. The JSON returned by the remote service is not constructed correctly.
  • D. The return message did not produce the same number of rows that it received.


Answer : D

A Data Engineer enables a result cache at the session level with the following command:
ALTER SESSION SET USE_CACHED_RESULT = TRUE;
The Engineer then runs the following SELECT query twice without delay:
SELECT *
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
SAMPLE(10) SEED (99);
The underlying table does not change between executions.
What are the results of both runs?

  • A. The first and second run returned the same results, because SAMPLE is deterministic.
  • B. The first and second run returned the same results, because the specific SEED value was provided.
  • C. The first and second run returned different results, because the query is evaluated each time it is run.
  • D. The first and second run returned different results, because the query uses * instead of an explicit column list.


Answer : B

A company built a sales reporting system with Python, connecting to Snowflake using the Python Connector. Based on the user's selections, the system generates the SQL queries needed to fetch the data for the report. First it gets the customers that meet the given query parameters (on average 1000 customer records for each report run), and then it loops the customer records sequentially. Inside that loop it runs the generated SQL clause for the current customer to get the detailed data for that customer number from the sales data table.
When the Data Engineer tested the individual SQL clauses, they were fast enough (1 second to get the customers, 0.5 second to get the sales data for one customer), but the total runtime of the report is too long.
How can this situation be improved?

  • A. Increase the size of the virtual warehouse.
  • B. Increase the number of maximum clusters of the virtual warehouse.
  • C. Define a clustering key for the sales data table.
  • D. Rewrite the report to eliminate the use of the loop construct.


Answer : D

A company is using Snowpipe to bring in millions of rows every day of Change Data Capture (CDC) into a Snowflake staging table on a real-time basis. The CDC needs to get processed and combined with other data in Snowflake and land in a final table as part of the full data pipeline.
How can a Data Engineer MOST efficiently process the incoming CDC on an ongoing basis?

  • A. Create a stream on the staging table and schedule a task that transforms data from the stream, only when the stream has data.
  • B. Transform the data during the data load with Snowpipe by modifying the related COPY INTO statement to include transformation steps such as CASE statements and JOINS.
  • C. Schedule a task that dynamically retrieves the last time the task was run from information_schema.task_history and use that timestamp to process the delta of the new rows since the last time the task was run.
  • D. Use a CREATE OR REPLACE TABLE AS statement that references the staging table and includes all the transformation SQL. Use a task to run the full CREATE OR REPLACE TABLE AS statement on a scheduled basis.


Answer : A

A Data Engineer is building a pipeline to transform a 1 TB table by joining it with supplemental tables. The Engineer is applying filters and several aggregations leveraging Common Table Expressions (CTEs) using a size Medium virtual warehouse in a single query in Snowflake.
After checking the Query Profile, what is the recommended approach to MAXIMIZE performance of this query if the Profile shows data spillage?

  • A. Enable clustering on the table.
  • B. Increase the warehouse size.
  • C. Rewrite the query to remove the CTEs.
  • D. Switch to a multi-cluster virtual warehouse.


Answer : B

Which system role is recommended for a custom role hierarchy to be ultimately assigned to?

  • A. ACCOUNTADMIN
  • B. SECURITYADMIN
  • C. SYSADMIN
  • D. USERADMIN


Answer : C

Which callback function is required within a JavaScript User-Defined Function (UDF) for it to execute successfully?

  • A. initialize()
  • B. processRow()
  • C. handler()
  • D. finalize()


Answer : B

Which Snowflake feature facilitates access to external API services such as geocoders, data transformation, machine learning models, and other custom code?

  • A. Security integration
  • B. External tables
  • C. External functions
  • D. Java User-Defined Functions (UDFs)


Answer : C

A Data Engineer needs to know the details regarding the micro-partition layout for a table named Invoice using a built-in function.
Which query will provide this information?

  • A. SELECT SYSTEM$CLUSTERING_INFORMATION('Invoice');
  • B. SELECT $CLUSTERING_INFORMATION('Invoice');
  • C. CALL SYSTEM$CLUSTERING_INFORMATION('Invoice');
  • D. CALL $CLUSTERING_INFORMATION('Invoice');


Answer : A

A Data Engineer would like to define a file structure for loading and unloading data.
Where can the file structure be defined? (Choose three.)

  • A. COPY command
  • B. MERGE command
  • C. FILE FORMAT object
  • D. PIPE object
  • E. STAGE object
  • F. INSERT command


Answer : ACE

Page:    1 / 5   
Total 65 questions