100% found this document useful (1 vote)

3K views

Databricks Data Engineer Associate Dumps

Uploaded by

sachantaru

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

3K views

Databricks Data Engineer Associate Dumps

Uploaded by

sachantaru

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

1 #A new data engineering team has been assigned to work on a project.

The team will need

access to database customers in order to see what tables already exist. The team has its own
group team. Which of the following commands can be used to grant the necessary permission
on the entire database to the new team?

A. GRANT VIEW ON CATALOG customers TO team;

B. GRANT CREATE ON DATABASE customers TO team;

C. GRANT USAGE ON CATALOG team TO customers;

D. GRANT CREATE ON DATABASE team TO customers;

E.GRANT USAGE ON DATABASE customers TO team;

2#A new data engineering team team. has been assigned to an ELT project. The new data
engineering team will need full privileges on the database customers to fully manage the project.
Which of the following commands can be used to grant full permissions on the database to the
new data engineering team?

A. GRANT USAGE ON DATABASE customers TO team;

B. GRANT ALL PRIVILEGES ON DATABASE team TO customers;

C. GRANT SELECT PRIMILEGES ON DATABASE customers TO teams;

D.GRANT SELECT CREATE MODIFY USAGE PRIVILEGES ON DATABASE customers TO team;

E. GRANT ALL PRIVILEGES ON DATABASE custorners TO team;

3 A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly
because the clusters take a long time to start. Which of the following actions can the data
engineer perform to improve the start up time for the clusters used for the Job?

A. They can use endpoints available in Databricks SQL

B. They can use jobs clusters instead of all-purpose clusters

C. They can configure the clusters to be single-node

D. They can use clusters that are from a cluster pool

E. They can configure the dlusters to autoscale for larger data sizes
4 #A single Job runs two notebooks as two separate tasks. A data engineer has noticed that
one of the notebooks is running slowly in the Job's current run. The data engineer asks a tech
lead for help in identifying why this might be the case. Which of the following approaches can
the tech lead use to identify why the notebook is running slowly as part of the Job?

A. They can navigate to the Runs tab in the Jobs Ul to immediately review the processing
notebook.

B. They can navigate to the Tasks tab in the Jobs Ul and click on the active run to review the
processing noteboolk.

C. They can navigate to the Runs tab in the Jobs Ul and click on the active run to review the
processing notebook.

D. There is no way to determine why a Job task is running slowly.

E. They can navigate to the Tasks tab in the Jobs Ul to immediately review the processing
notebook.

5 A data engineer has been using a Databricks SOL dashboard to monitor the cleanliness of the
input data to an ELT job. The ELT job has its Databricks SQL query that returns the number of
input records containing unexpected NULL values. The data engineer wants their entire team to
be notified via a messaging webhook whenever this value reaches 100. Which of the following
approaches can the data engineer use to notify their entire team via a messaging webhook
whenever the number of NULL values reaches 100?

A. They can set up an Alert with a custom template.

B. They can set up an Alert with a new email alert destination.

C. They can set up an Alert with a new webhook alert destination.

D. They can set up an Alert with one-time notifications.

E. They can set up an Alert without notifications.

6 #A data engineer wants to schedule their Databricks SOL dashboard to refresh once per day,
but they only want the associated SQL endpoint to be running when it is necessary. Which of the
following approaches can the data engineer use to minimize the total running time of the SQL
endpoint used in the refresh schedule of their dashboard?

A. They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.
B. They can set up the dashboard's SQL endpoint to be serverless.

C. They can turn on the Auto Stop feature for the SQl endpoint.

D. They can reduce the cluster size of the SQL endpoint.

E. They can ensure the dashboard's SQL endpoint is not one of the included query's SQL
endpoint.

7 # A data analysis team has noticed that their Databricks SOL queries are running too slowly
when connected to their always-on SOL endpoint. They claim that this issue is present when
many members of the team are running small queries simultaneously. They ask the data
engineering team for help. The data engineering team notices that each of the team's queries
uses the same SQL endpoint. Which of the following approaches can the data engineering team
use to improve the latency of the team's queries?

A. They can increase the cluster size of the SQL endpoint.

B. They can increase the maximum bound of the SQl endpoint's scaling range.

C. They can turn on the Auto Stop feature for the SQl endpoint.

D. They can turn on the Serverless feature for the SQL endpoint.

E. They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance
Policy to "Reliability Optimized."

8 #An engineering manager wants to monitor the performance of a recent project using a
Databricks SQL query. For the first week following the project's release, the manager wants the
query results to be updated every minute. However, the manager is concerned that the compute
resources used for the query will be left running and cost the organization a lot of money
beyond the first week of the project's release. Which of the following approaches can the
engineering team use to ensure the query does not cost the organization any money beyond the
first week of the project's release?

A. They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.

B. They can set the query's refresh schedule to end after a certain number of refreshes.

C. They cannot ensure the query does not cost the organization money bevond the first week of
the project's release.

D. They can set a limit to the number of individuals that are able to manage the query's refresh
schedule.

E. They can set the query's refresh schedule to end on a certain date in the query scheduler.

9 #A data engineer has a single-task Job that runs each morning before they begin working.
After identifying an upstream data issue, they need to set up another task to run a new
notebook prior to the original task. Which of the following approaches can the data engineer use
to set up the new task?

A. They can clone the existing task in the existing Job and update it to run the new notebook.

B. They can create a new task in the existing Job and then add it as a dependency of the original
task.

C. They can create a new task in the existing Job and then add the original task as a
dependency of the new task.

D. They can create a new job from scratch and add both tasks to run concurently.

E. They can clone the existing task to a new Jah and then edit it to run the new notebook.

10 #A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured
the pipeline to drop invalid records at each table. They notice that some data is being dropped
due to quality concerns at some point in the DLT pipeline. they would like to determine at which
table in their pipeline the data is being dropped. Which of the following approaches can the data
engineer take to identify the table that is dropping the records?

A. They can set up separate expectations for each table when developing their DLT pipeline.

B. They cannot determine which table is dropping the records.

C. They can set up DIT to notify them via email when records are dropped.

D. They can navigate to the DLT pipeline page, click on each table, and view the data quality
statistics.

E. They can navigate to the DlT pipeline page, click on the "Error" button, and review the present
errors.

1.# Which of the following Structured Streaming quertes is performing a hop from a Silver table
to a Gold table?

A.
(spark. readStream. load (rawSalesLocation)

.writeStream