0% found this document useful (0 votes)

77 views

Databricks Exam

The document is a Databricks exam containing multiple-choice questions focused on various concepts and functionalities within Databricks, such as DataFrames, Delta tables, and optimization techniques. It includes questions about SQL commands, data engineering practices, and the use of clusters. The exam is designed to assess knowledge and skills relevant to data engineering in the Databricks environment.

Uploaded by

Satyajit Ligade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Databricks Exam

Uploaded by

Satyajit Ligade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

11/9/24, 11:50 AM Databricks Exam

Databricks Exam
[email protected] Switch account Draft saved

* Indicates required question

Email *

[email protected]

Which one of the following is a Databrick concept? * 2 points

A. Workspace

Data Management

Authentication and authorization

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 1/14
11/9/24, 11:50 AM Databricks Exam

A table customerLocations exists with the following schema: * 2 points

id STRING, date STRING, city STRING, country STRING

A senior data engineer wants to create a new table from this table using
the following command:

CREATE TABLE customersPerCountry AS SELECT country, COUNT(*) AS

customers FROM customerLocations GROUP BY country;

A junior data engineer asks why the schema is not being declared for the
new table. Which of the following responses explains why declaring the
schema is not necessary?

CREATE TABLE AS SELECT statements result in tables that do not support schemas.

CREATE TABLE AS SELECT statements adopt schema details from the source table
and query.

CREATE TABLE AS SELECT statements assign all columns the type STRING.

CREATE TABLE AS SELECT statements result in tables where schemas are optional.

CREATE TABLE AS SELECT statements infer the schema by scanning the data.

Which of the following describes a scenario in which a data engineer will * 2 points
want to use a Job cluster instead of an all-purpose cluster?

A. An ad-hoc analytics report needs to be developed while minimizing compute

costs.

B. A data team needs to collaborate on the development of a machine learning

model.

C. An automated workflow needs to be run every 30 minutes.

D. A Databricks SQL query needs to be scheduled for upward reporting.

E. A data engineer needs to manually investigate a production error.

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 2/14
11/9/24, 11:50 AM Databricks Exam

The below code shown block contains an error. The code block is * 2 points
intended to return a DataFrame containing only the rows from DataFrame
storesDF where the value in DataFrame storesDF's "sqft" column is less
than or equal to 25,000. Assume DataFrame storesDF is the only defined
language variable. Identify the error. Code block:

storesDF.filter(sqft <= 25000)

The column name sqft needs to be quoted like storesDF.filter("sqft" <= 25000).

The sign in the logical condition inside filter() needs to be changed from <= to >=.

The sign in the logical condition inside filter() needs to be changed from <= to >.

The column name sqft needs to be quoted and wrapped in the col() function like
resDF.filter(col("sqft") <= 25000).

The column name sqft needs to be wrapped in the col() function like
storesDF.filter(col(sqft) <= 25000)

A data engineering team needs to query a Delta table to extract rows that * 2 points
all meet the same condition. However, the team has noticed that the query
is running slowly. The team has already tuned the size of the data files.
Upon investigating, the team has concluded that the rows meeting the
condition are sparsely located throughout each of the data files. Based on
the scenario, which of the following optimization techniques could speed
up the query?

Write as a Parquet file

Tuning the file size

Data skipping

Bin-packing

Option 6

Z-Ordering

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 3/14
11/9/24, 11:50 AM Databricks Exam

Which of the following DataFrame operations is always classified as a * 2 points

narrow transformation?

DataFrame.sort()

DataFrame.distinct()

DataFrame.repartition()

DataFrame.select()

DataFrame.join()

Which of the following operations fails to return a DataFrame where every * 2 points

row is unique?

DataFrame.distinct()

DataFrame.drop_duplicates(subset = None)

DataFrame.drop_duplicates()

DataFrame.dropDuplicates()

DataFrame.drop_duplicates(subset = "all")

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 4/14
11/9/24, 11:50 AM Databricks Exam

A data engineer is overwriting data in a table by deleting the table and * 2 points

recreating the table. Another data engineer suggests that this is inefficient
and the table should simply be overwritten instead. Which of the following
reasons to overwrite the table instead of deleting and recreating the table
is incorrect?

Overwriting a table is efficient because no files need to be deleted

Overwriting a table results in a clean table history for logging and audit purposes.

Overwriting a table maintains the old version of the table for Time Travel.

Overwriting a table is an atomic operation and will not leave the table in an
unfinished state

Overwriting a table allows for concurrent queries to be completed while in progress.

Which of the following locations hosts the driver and worker nodes of a * 2 points

Databricks-managed cluster?

E. Databricks web application

C. Databricks Filesystem

A. Data plane

B. Control plane

D. JDBC data source

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 5/14
11/9/24, 11:50 AM Databricks Exam

Which of the following code blocks returns a DataFrame containing all * 2 points

columns from DataFrame storesDF except for column sqft and column
customerSatisfaction? A sample of DataFrame storesDF is below:

storesDF.drop("sqft", "customerSatisfaction")

storesDF.select("storeId", "open", "openDate", "division")

storesDF.select(-col(sqft), -col(customerSatisfaction))

storesDF.drop(sqft, customerSatisfaction)

storesDF.drop(col(sqft), col(customerSatisfaction))

Which of the following code blocks returns a DataFrame containing only * 2 points
column storeId and column division from DataFrame storesDF?

storesDF.select("storeId").select("division")

storesDF.select(storeId, division)

storesDF.select("storeId", "division")

storesDF.select(col("storeId", "division"))

storesDF.select(storeId).select(division)

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 6/14
11/9/24, 11:50 AM Databricks Exam

A data engineering team has created a series of tables using Parquet * 2 points

data stored in an external system. The team is noticing that after

appending new rows to the data in the external system, their queries within
Databricks are not returning the new rows. They identify the caching of the
previous data as the cause of this issue. Which of the following
approaches will ensure that the data returned by queries is always up-to-
date?

The tables should be altered to include metadata to not cache

The tables should be refreshed in the writing cluster before the next query is run

The tables should be stored in a cloud-based external system

The tables should be converted to the Delta format

The tables should be updated before the next query is run

A data engineer has created a Delta table as part of a data pipeline. * 2 points

Downstream data analysts now need SELECT permission on the Delta

table. Assuming the data engineer is the Delta table owner, which part of
the Databricks Lakehouse Platform can the data engineer use to grant the
data analysts the appropriate access?

A. Repos

B. Jobs

C. Data Explorer

D. Databricks Filesystem

E. Dashboards

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 7/14
11/9/24, 11:50 AM Databricks Exam

Which of the following operations will trigger evaluation? * 2 points

DataFrame.filter()

DataFrame.distinct()

DataFrame.intersect()

DataFrame.join()

DataFrame.count()

A data engineer has configured a Structured Streaming job to read from a * 2 points

table, manipulate the data, and then perform a streaming write into a new
table. The code block used by the data engineer is below:

(
spark.table("sales")
.withColumn("avg_price", col("sales") / col("units")) .writeStream
.option("checkpointLocation", checkpointPath) .outputMode("complete")
._____
.table("new_sales")
)

If the data engineer only wants the query to execute a single micro-batch
to process all of the available data, which of the following lines of code
should the data engineer use to fill in the blank?

trigger(once=True)

trigger(continuous="once")

processingTime("once")

trigger(processingTime="once")

processingTime(1)

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 8/14
11/9/24, 11:50 AM Databricks Exam

A data engineer has developed a code block to perform a streaming read * 2 points

on a data source. The code block is below:

(spark .read .schema(schema) .format("cloudFiles")

.option("cloudFiles.format", "json") .load(dataSource) )

The code block is returning an error.

Which of the following changes should be made to the code block to
configure the block to successfully perform a streaming read?

A new .stream line should be added after the .load(dataSource) line.

D. A new .stream line should be added after the spark line.

The .read line should be replaced with .readStream.

A new .stream line should be added after the .read line.

The .format("cloudFiles") line should be replaced with .format("stream").

Candiadate Name * 2 points

Ajay Pratap Singh

Which of the following describes a benefit of a data lakehouse that is * 2 points

unavailable in a traditional data warehouse?

E. A data lakehouse enables both batch and streaming analytics.

D. A data lakehouse utilizes proprietary storage formats for data.

A. data lakehouse provides a relational system of data management.

C. A data lakehouse couples storage and compute for complete control.

B. A data lakehouse captures snapshots of data for version control purposes.

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 9/14
11/9/24, 11:50 AM Databricks Exam

A data engineer has written the following query: * 2 points

SELECT * FROM json.`/path/to/json/file.json`;

The data engineer asks a colleague for help to convert this query for use in
a Delta Live Tables (DLT) pipeline. The query should create the first table in
the DLT pipeline. Which of the following describes the change the
colleague needs to make to the query?

A. They need to add a COMMENT line at the beginning of the query.

B. They need to add a CREATE LIVE TABLE table_name AS line at the beginning of
the query.

C. They need to add a live. prefix prior to json. in the FROM line.

D. They need to add a CREATE DELTA LIVE TABLE table_name AS line at the
beginning of the query.

E. They need to add the cloud_files(...) wrapper to the JSON file path.

A data analyst has provided a data engineering team with the following * 2 points
Spark SQL query: SELECT district, avg(sales) FROM store_sales_20220101
GROUP BY district; The data analyst would like the data engineering team
to run this query every day. The date at the end of the table name
(20220101) should automatically be replaced with the current date each
time the query is run. Which of the following approaches could be used by
the data engineering team to efficiently automate this process?

They could replace the string-formatted date in the table with a timestamp-formatted
date.

They could request that the data analyst rewrites the query to be run less frequently.

They could manually replace the date within the table name with the current day’s
date.

They could pass the table into PySpark and develop a robustly tested module on the
existing query

They could wrap the query using PySpark and use Python’s string variable system to
automatically update the table name.

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 10/14
11/9/24, 11:50 AM Databricks Exam

Which of the following is the default storage level for persist() for a non- * 2 points
streaming DataFrame/Dataset?

MEMORY_AND_DISK

MEMORY_AND_DISK_SER

DISK_ONLY

MEMORY_ONLY_SER

MEMORY_ONLY

A junior data engineer has ingested a JSON file into a table raw_table with * 2 points

the following schema:

cart_id STRING, items ARRAY

The junior data engineer would like to unnest the items column in
raw_table to result in a new table with the following schema:

cart_id STRING, item_id STRING

Which of the following commands should the junior data engineer run to
complete this task?

SELECT cart_id, filter(items) AS item_id FROM raw_table;

SELECT cart_id, flatten(items) AS item_id FROM raw_table;

SELECT cart_id, reduce(items) AS item_id FROM raw_table;

SELECT cart_id, explode(items) AS item_id FROM raw_table;

SELECT cart_id, slice(items) AS item_id FROM raw_table;

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 11/14
11/9/24, 11:50 AM Databricks Exam

A data architect has determined that a table of the following format is * 2 points
necessary:

[REfer image]

Which of the following code blocks uses SQL DDL commands to create an
empty Delta table in the above format regardless of whether a table
already exists with this name?

CREATE OR REPLACE TABLE table_name AS SELECT id STRING,birthDate

DATE,avgRating FLOAT USING DELTA

CREATE OR REPLACE TABLE table_name ( id STRING, birthDate DATE, avgRating

FLOAT )

CREATE TABLE IF NOT EXISTS table_name ( id STRING, birthDate DATE, avgRating

FLOAT )

CREATE TABLE table_name AS SELECT id STRING, birthDate DATE, avgRating FLOAT

CREATE OR REPLACE TABLE table_name WITH COLUMNS ( id STRING, birthDate

DATE, avgRating FLOAT ) USING DELTA

Which of the following code blocks returns a new DataFrame from * 2 points
DataFrame storesDF where column numberOfManagers is the constant
integer 1?

storesDF.withColumn("numberOfManagers", lit("1"))

storesDF.withColumn("numberOfManagers", lit(1))

storesDF.withColumn("numberOfManagers", IntegerType(1))

storesDF.withColumn("numberOfManagers", 1)

storesDF.withColumn("numberOfManagers", col(1))

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 12/14
11/9/24, 11:50 AM Databricks Exam

The code block shown below should extract the value for column sqft * 2 points
from the first row of DataFrame storesDF. Choose the response that
correctly fills in the numbered blanks within the code block to complete
this task.
Code block:

__1__.__2__.__3__

1. storesDF 2. first 3. ["sqft"]

1. storesDF 2. first() 3. sqft

1. storesDF 2. first 3. col("sqft")

1. storesDF 2. first() 3. col("sqft")

1. storesDF 2. first 3. sqft

Which of the following statements describes Delta Lake? * 2 points

A. Delta Lake is an open source analytics engine used for big data workloads.

B. Delta Lake is an open format storage layer that delivers reliability, security, and
performance.

C. Delta Lake is an open source platform to help manage the complete machine
learning lifecycle.

D. Delta Lake is an open source data storage format for distributed data.

E. Delta Lake is an open format storage layer that processes data

Submit Page 1 of 1 Clear form

Never submit passwords through Google Forms.

This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy

Forms

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 13/14
11/9/24, 11:50 AM Databricks Exam

https://docs.google.com/forms/d/e/1FAIpQLSf4eRe7rFglccAi-6vLFQ9gVjmLaHQsSQYtfrAuNm353UMzDQ/viewform 14/14

Databricks Data Engineer Associate Dumps
100% (2)
Databricks Data Engineer Associate Dumps
40 pages
DCP Examen
No ratings yet
DCP Examen
112 pages
Certified Data Engineer Associate
No ratings yet
Certified Data Engineer Associate
24 pages
Professional Data Engineer Exam - Free Actual Q&As, Page 1 - ExamTopics
100% (1)
Professional Data Engineer Exam - Free Actual Q&As, Page 1 - ExamTopics
124 pages
Databricks Associate Data Engg
100% (1)
Databricks Associate Data Engg
64 pages
Databricks Certified Data Engineer Professional Dumps by Ball 21-03-2024 10qa Ebraindumps
No ratings yet
Databricks Certified Data Engineer Professional Dumps by Ball 21-03-2024 10qa Ebraindumps
19 pages
Databricks Certified Data Engineer Associate 9
No ratings yet
Databricks Certified Data Engineer Associate 9
12 pages
Data Engineer Certification Questions1
100% (1)
Data Engineer Certification Questions1
22 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
Databricks Certified Data Engineer Associate PDF
0% (1)
Databricks Certified Data Engineer Associate PDF
5 pages
Oracle Error Messages
No ratings yet
Oracle Error Messages
952 pages
PracticeExam_DBKS
No ratings yet
PracticeExam_DBKS
26 pages
Databricks Practice Questions
No ratings yet
Databricks Practice Questions
83 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
Databricks Data Engineer Professional
No ratings yet
Databricks Data Engineer Professional
98 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
eBAY QA
No ratings yet
eBAY QA
9 pages
DATABRICKS DATA ENGG PRO CERTIFICATION DUMPS
100% (2)
DATABRICKS DATA ENGG PRO CERTIFICATION DUMPS
41 pages
professional 1
No ratings yet
professional 1
18 pages
Latest DBA
No ratings yet
Latest DBA
39 pages
Databricks Certified Data Analyst Associate Exam Dumps
No ratings yet
Databricks Certified Data Analyst Associate Exam Dumps
7 pages
Databricks Questions
No ratings yet
Databricks Questions
31 pages
Data Bricks 3
No ratings yet
Data Bricks 3
10 pages
databricks-certified-data-engineer-associate_6
No ratings yet
databricks-certified-data-engineer-associate_6
10 pages
Databricks Certified Data Engineer Associate Practice Exams - 1
No ratings yet
Databricks Certified Data Engineer Associate Practice Exams - 1
25 pages
Copy of 2024
No ratings yet
Copy of 2024
75 pages
simulado82
No ratings yet
simulado82
10 pages
DEA - JULY2024-NoCopy
No ratings yet
DEA - JULY2024-NoCopy
94 pages
Certified Data Engineer Professional Questions Answers Only
No ratings yet
Certified Data Engineer Professional Questions Answers Only
96 pages
1
No ratings yet
1
12 pages
Databricks Certified Associate Data Engineer
No ratings yet
Databricks Certified Associate Data Engineer
18 pages
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
Databricks Question
No ratings yet
Databricks Question
89 pages
Apache Spark - Practices
No ratings yet
Apache Spark - Practices
24 pages
CertificationOverview_DBKS
No ratings yet
CertificationOverview_DBKS
270 pages
Databricks Certified Data Engineer Associate 4
No ratings yet
Databricks Certified Data Engineer Associate 4
13 pages
Data Bricks Certified Associated at A Engineer Exam
No ratings yet
Data Bricks Certified Associated at A Engineer Exam
142 pages
DEA-1-88_NoCopy
No ratings yet
DEA-1-88_NoCopy
19 pages
eBAY QA 1
No ratings yet
eBAY QA 1
10 pages
Databricks Certified Data Engineer Associate
No ratings yet
Databricks Certified Data Engineer Associate
4 pages
Databricksdataanalystassociateexamdumps2024 240515054759 7de10a6a
No ratings yet
Databricksdataanalystassociateexamdumps2024 240515054759 7de10a6a
13 pages
Ilovepdf - Merged (3) - Merged
No ratings yet
Ilovepdf - Merged (3) - Merged
20 pages
Professional Data Engineer Sample Questions
No ratings yet
Professional Data Engineer Sample Questions
29 pages
Databricks_Data_Engineer_Professional_Practice
No ratings yet
Databricks_Data_Engineer_Professional_Practice
10 pages
2022 06 01 Exam
No ratings yet
2022 06 01 Exam
5 pages
Midterm-Exam-Multiple-Choice
No ratings yet
Midterm-Exam-Multiple-Choice
8 pages
Midterm-Exam-Multiple-Choice
No ratings yet
Midterm-Exam-Multiple-Choice
8 pages
Cert DEWD (Edits)
No ratings yet
Cert DEWD (Edits)
158 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
Implementing Analytics Solutions Using Microsoft Fabric (Beta) v1.0
No ratings yet
Implementing Analytics Solutions Using Microsoft Fabric (Beta) v1.0
16 pages
databricks-certified-data-engineer-associate-exam-dumps-by-boone-22-1-2024-12qa-ebraindumps
No ratings yet
databricks-certified-data-engineer-associate-exam-dumps-by-boone-22-1-2024-12qa-ebraindumps
15 pages
FileViewer (1)
No ratings yet
FileViewer (1)
7 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Databricks_Data_Engineer_Associate_Practice
No ratings yet
Databricks_Data_Engineer_Associate_Practice
12 pages
Databricks Certified Data Analyst Associate Exam Valid Dumps Questions
No ratings yet
Databricks Certified Data Analyst Associate Exam Valid Dumps Questions
7 pages
DWDM Viva Question
50% (2)
DWDM Viva Question
31 pages
Databricks Certified Data Engineer Associate Demo
No ratings yet
Databricks Certified Data Engineer Associate Demo
5 pages
Google VCEup - Com - Professional-Data-Engineer 2022-July-05 173q
No ratings yet
Google VCEup - Com - Professional-Data-Engineer 2022-July-05 173q
64 pages
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
_???? ?????????? ???? (1)
No ratings yet
_???? ?????????? ???? (1)
4 pages
v3 Gcp Service Wise Interview Questions
No ratings yet
v3 Gcp Service Wise Interview Questions
62 pages
Vedant Int Ques Till Now
No ratings yet
Vedant Int Ques Till Now
2 pages
Pyspark-1
No ratings yet
Pyspark-1
7 pages
spark theory
No ratings yet
spark theory
26 pages
PySpark_Interview_Questions
No ratings yet
PySpark_Interview_Questions
2 pages
Database Management System
No ratings yet
Database Management System
141 pages
Smart Circular
No ratings yet
Smart Circular
54 pages
Advanced SQL All Practicals
No ratings yet
Advanced SQL All Practicals
23 pages
Database Systems: Design, Implementation, and Management: Advanced SQL
No ratings yet
Database Systems: Design, Implementation, and Management: Advanced SQL
52 pages
SQL Server Alwayson With 3 AG Replicas DR Drill in SQL Server
No ratings yet
SQL Server Alwayson With 3 AG Replicas DR Drill in SQL Server
7 pages
03-Web Application Vulnerabilities - I (Website Attacks Tips)
No ratings yet
03-Web Application Vulnerabilities - I (Website Attacks Tips)
5 pages
Bteq
No ratings yet
Bteq
5 pages
Database: An Example of Output From An SQL Database Query
No ratings yet
Database: An Example of Output From An SQL Database Query
18 pages
7349 Mysqlconnectivity Assignment8
No ratings yet
7349 Mysqlconnectivity Assignment8
2 pages
Aspentech Course Catalog Fy18
No ratings yet
Aspentech Course Catalog Fy18
27 pages
Profile Served: Parvez Akhtar
No ratings yet
Profile Served: Parvez Akhtar
3 pages
SQL Queries
No ratings yet
SQL Queries
11 pages
DBMS PPT 1 ENG
No ratings yet
DBMS PPT 1 ENG
74 pages
Syllabus DBI202
No ratings yet
Syllabus DBI202
8 pages
7.Data Manipulation Using SQL
No ratings yet
7.Data Manipulation Using SQL
25 pages
Mysql Tutorial: Introduction To Database
No ratings yet
Mysql Tutorial: Introduction To Database
25 pages
9 Database - PPT Compatibility Mode
No ratings yet
9 Database - PPT Compatibility Mode
30 pages
SQL 2022
No ratings yet
SQL 2022
1 page
Kendriya Vidyalaya Sangathan, Ernakulam Region Model Question Paper 2012-13 Set-2 Informatics Practices
No ratings yet
Kendriya Vidyalaya Sangathan, Ernakulam Region Model Question Paper 2012-13 Set-2 Informatics Practices
8 pages
Dbms Lab Manual 2015-16
No ratings yet
Dbms Lab Manual 2015-16
50 pages
Information_Management (1)
No ratings yet
Information_Management (1)
3 pages
DBMS Unit - 2
No ratings yet
DBMS Unit - 2
47 pages
Cs - 301 System Analysis and Design
No ratings yet
Cs - 301 System Analysis and Design
8 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
T-SQL Cookbook - Microsoft SQL Server 2012 Enhancements
No ratings yet
T-SQL Cookbook - Microsoft SQL Server 2012 Enhancements
22 pages
Chapter 1 Introduction Data Base
No ratings yet
Chapter 1 Introduction Data Base
76 pages
2018 - 07 - 06 Aust - Dietmar - Oracle Apex 18.1 - The Golden Nuggets
No ratings yet
2018 - 07 - 06 Aust - Dietmar - Oracle Apex 18.1 - The Golden Nuggets
107 pages
XII IP QP
No ratings yet
XII IP QP
8 pages
Ibm Infosphere Datastage V8.0.1 Training/Workshop: Course Description
No ratings yet
Ibm Infosphere Datastage V8.0.1 Training/Workshop: Course Description
2 pages

Databricks Exam

Uploaded by

Databricks Exam

Uploaded by

11/9/24, 11:50 AM Databricks Exam

* Indicates required question

Which one of the following is a Databrick concept? * 2 points

Authentication and authorization

A table customerLocations exists with the following schema: * 2 points

id STRING, date STRING, city STRING, country STRING

CREATE TABLE customersPerCountry AS SELECT country, COUNT(*) AS

A. An ad-hoc analytics report needs to be developed while minimizing compute

B. A data team needs to collaborate on the development of a machine learning

C. An automated workflow needs to be run every 30 minutes.

D. A Databricks SQL query needs to be scheduled for upward reporting.

E. A data engineer needs to manually investigate a production error.

storesDF.filter(sqft <= 25000)

Write as a Parquet file

Tuning the file size

Which of the following DataFrame operations is always classified as a * 2 points

Overwriting a table is efficient because no files need to be deleted

Overwriting a table allows for concurrent queries to be completed while in progress.

E. Databricks web application

D. JDBC data source

storesDF.select("storeId", "open", "openDate", "division")

data stored in an external system. The team is noticing that after

The tables should be altered to include metadata to not cache

The tables should be stored in a cloud-based external system

The tables should be converted to the Delta format

The tables should be updated before the next query is run

Downstream data analysts now need SELECT permission on the Delta

Which of the following operations will trigger evaluation? * 2 points

on a data source. The code block is below:

(spark .read .schema(schema) .format("cloudFiles")

The code block is returning an error.

A new .stream line should be added after the .load(dataSource) line.

D. A new .stream line should be added after the spark line.

The .read line should be replaced with .readStream.

A new .stream line should be added after the .read line.

The .format("cloudFiles") line should be replaced with .format("stream").

Candiadate Name * 2 points

Ajay Pratap Singh

Which of the following describes a benefit of a data lakehouse that is * 2 points

unavailable in a traditional data warehouse?

E. A data lakehouse enables both batch and streaming analytics.

D. A data lakehouse utilizes proprietary storage formats for data.

A. data lakehouse provides a relational system of data management.

C. A data lakehouse couples storage and compute for complete control.

B. A data lakehouse captures snapshots of data for version control purposes.

A data engineer has written the following query: * 2 points

SELECT * FROM json.`/path/to/json/file.json`;

A. They need to add a COMMENT line at the beginning of the query.

the following schema:

cart_id STRING, items ARRAY

cart_id STRING, item_id STRING

SELECT cart_id, filter(items) AS item_id FROM raw_table;

SELECT cart_id, flatten(items) AS item_id FROM raw_table;

SELECT cart_id, reduce(items) AS item_id FROM raw_table;

SELECT cart_id, explode(items) AS item_id FROM raw_table;

SELECT cart_id, slice(items) AS item_id FROM raw_table;

CREATE OR REPLACE TABLE table_name AS SELECT id STRING,birthDate

CREATE OR REPLACE TABLE table_name ( id STRING, birthDate DATE, avgRating

CREATE TABLE IF NOT EXISTS table_name ( id STRING, birthDate DATE, avgRating

CREATE TABLE table_name AS SELECT id STRING, birthDate DATE, avgRating FLOAT

CREATE OR REPLACE TABLE table_name WITH COLUMNS ( id STRING, birthDate

1. storesDF 2. first 3. ["sqft"]

1. storesDF 2. first() 3. sqft

1. storesDF 2. first 3. col("sqft")

1. storesDF 2. first() 3. col("sqft")

1. storesDF 2. first 3. sqft

Which of the following statements describes Delta Lake? * 2 points

E. Delta Lake is an open format storage layer that processes data

Submit Page 1 of 1 Clear form

Never submit passwords through Google Forms.

You might also like