Certified Data Analyst Associate v1.0 (Certified Data Analyst Associate)

Page:    1 / 3   
Total 45 questions

Which of the following benefits of using Databricks SQL is provided by Data Explorer?

  • A. It can be used to run UPDATE queries to update any tables in a database.
  • B. It can be used to view metadata and data, as well as view/change permissions.
  • C. It can be used to produce dashboards that allow data exploration.
  • D. It can be used to make visualizations that can be shared with stakeholders.
  • E. It can be used to connect to third party BI cools.


Answer : B

The stakeholders.customers table has 15 columns and 3,000 rows of data. The following command is run:

After running SELECT * FROM stakeholders.eur_customers, 15 rows are returned. After the command executes completely, the user logs out of Databricks.
After logging back in two days later, what is the status of the stakeholders.eur_customers view?

  • A. The view remains available and SELECT * FROM stakeholders.eur_customers will execute correctly.
  • B. The view has been dropped.
  • C. The view is not available in the metastore, but the underlying data can be accessed with SELECT * FROM delta. `stakeholders.eur_customers`.
  • D. The view remains available but attempting to SELECT from it results in an empty result set because data in views are automatically deleted after logging out.
  • E. The view has been converted into a table.


Answer : B

A data analyst created and is the owner of the managed table my_ table. They now want to change ownership of the table to a single other user using Data Explorer.
Which of the following approaches can the analyst use to complete the task?

  • A. Edit the Owner field in the table page by removing their own account
  • B. Edit the Owner field in the table page by selecting All Users
  • C. Edit the Owner field in the table page by selecting the new owner's account
  • D. Edit the Owner field in the table page by selecting the Admins group
  • E. Edit the Owner field in the table page by removing all access


Answer : C

A data analyst has a managed table table_name in database database_name. They would now like to remove the table from the database and all of the data files associated with the table. The rest of the tables in the database must continue to exist.
Which of the following commands can the analyst use to complete the task without producing an error?

  • A. DROP DATABASE database_name;
  • B. DROP TABLE database_name.table_name;
  • C. DELETE TABLE database_name.table_name;
  • D. DELETE TABLE table_name FROM database_name;
  • E. DROP TABLE table_name FROM database_name;


Answer : B

A data analyst runs the following command:

SELECT age, country -

FROM my_table -
WHERE age >= 75 AND country = 'canada';
Which of the following tables represents the output of the above command?

  • A.
  • B.
  • C.
  • D.
  • E.


Answer : E

A data analyst runs the following command:
INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers;
What is the result of running this command?

  • A. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, and any duplicate data is deleted.
  • B. The command fails because it is written incorrectly.
  • C. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data.
  • D. The suppliers table now contains the data from the new_suppliers table, and the new_suppliers table now contains the data from the suppliers table.
  • E. The suppliers table now contains only the data from the new_suppliers table.


Answer : B

A data engineer is working with a nested array column products in table transactions. They want to expand the table so each unique item in products for each row has its own row where the transaction_id column is duplicated as necessary.
They are using the following incomplete command:

Which of the following lines of code can they use to fill in the blank in the above code block so that it successfully completes the task?

  • A. array distinct(products)
  • B. explode(products)
  • C. reduce(products)
  • D. array(products)
  • E. flatten(products)


Answer : B

A data analysis team is working with the table_bronze SQL table as a source for one of its most complex projects. A stakeholder of the project notices that some of the downstream data is duplicative. The analysis team identifies table_bronze as the source of the duplication.
Which of the following queries can be used to deduplicate the data from table_bronze and write it to a new table table_silver?

  • A. CREATE TABLE table_silver AS
    SELECT DISTINCT *
    FROM table_bronze;
  • B. CREATE TABLE table_silver AS
    INSERT *
    FROM table_bronze;
  • C. CREATE TABLE table_silver AS
    MERGE DEDUPLICATE *
    FROM table_bronze;
  • D. INSERT INTO TABLE table_silver
    SELECT * FROM table_bronze;
  • E. INSERT OVERWRITE TABLE table_silver
    SELECT * FROM table_bronze;


Answer : A

A business analyst has been asked to create a data entity/object called sales_by_employee. It should always stay up-to-date when new data are added to the sales table. The new entity should have the columns sales_person, which will be the name of the employee from the employees table, and sales, which will be all sales for that particular sales person. Both the sales table and the employees table have an employee_id column that is used to identify the sales person.
Which of the following code blocks will accomplish this task?

  • A.
  • B.
  • C.
  • D.
  • E.


Answer : D

A data analyst has been asked to use the below table sales_table to get the percentage rank of products within region by the sales:

The result of the query should look like this:

Which of the following queries will accomplish this task?

  • A.
  • B.
  • C.
  • D.
  • E.


Answer : B

In which of the following situations should a data analyst use higher-order functions?

  • A. When custom logic needs to be applied to simple, unnested data
  • B. When custom logic needs to be converted to Python-native code
  • C. When custom logic needs to be applied at scale to array data objects
  • D. When built-in functions are taking too long to perform tasks
  • E. When built-in functions need to run through the Catalyst Optimizer


Answer : C

Consider the following two statements:
Statement 1:

Statement 2:

Which of the following describes how the result sets will differ for each statement when they are run in Databricks SQL?

  • A. The first statement will return all data from the customers table and matching data from the orders table. The second statement will return all data from the orders table and matching data from the customers table. Any missing data will be filled in with NULL.
  • B. When the first statement is run, only rows from the customers table that have at least one match with the orders table on customer_id will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.
  • C. There is no difference between the result sets for both statements.
  • D. Both statements will fail because Databricks SQL does not support those join types.
  • E. When the first statement is run, all rows from the customers table will be returned and only the customer_id from the orders table will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.


Answer : B

A data analyst has created a user-defined function using the following line of code:
CREATE FUNCTION price(spend DOUBLE, units DOUBLE)

RETURNS DOUBLE -
RETURN spend / units;
Which of the following code blocks can be used to apply this function to the customer_spend and customer_units columns of the table customer_summary to create column customer_price?

  • A. SELECT PRICE customer_spend, customer_units AS customer_price
    FROM customer_summary
  • B. SELECT price
    FROM customer_summary
  • C. SELECT function(price(customer_spend, customer_units)) AS customer_price
    FROM customer_summary
  • D. SELECT double(price(customer_spend, customer_units)) AS customer_price
    FROM customer_summary
  • E. SELECT price(customer_spend, customer_units) AS customer_price
    FROM customer_summary


Answer : E

A data analyst has been asked to count the number of customers in each region and has written the following query:

If there is a mistake in the query, which of the following describes the mistake?

  • A. The query is using count(*), which will count all the customers in the customers table, no matter the region.
  • B. The query is missing a GROUP BY region clause.
  • C. The query is using ORDER BY, which is not allowed in an aggregation.
  • D. There are no mistakes in the query.
  • E. The query is selecting region, but region should only occur in the ORDER BY clause.


Answer : B

A data analyst is processing a complex aggregation on a table with zero null values and their query returns the following result:

Which of the following queries did the analyst run to obtain the above result?

  • A.
  • B.
  • C.
  • D.
  • E.


Answer : E

Page:    1 / 3   
Total 45 questions