0% found this document useful (0 votes)

23 views

s01 PDE Course Workbook

Uploaded by

Fakrul Tareq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

s01 PDE Course Workbook

Uploaded by

Fakrul Tareq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Preparing for

Your Professional
Data Engineer Journey

Course Workbook
Certification Exam Guide Sections
1 Designing Data Processing Systems

2 Ingesting and Processing the Data

3 Storing the Data

4 Preparing and Using Data for Analysis

5 Maintaining and Automating Data Workloads

Section 1:
Designing Data Processing
Systems
1.1 Diagnostic Question 01

Business analysts in your team need to run A. bigquery.resourceViewer and

analysis on data that was loaded into bigquery.dataViewer
BigQuery. You need to follow recommended B. bigquery.user and
practices and grant permissions. bigquery.dataViewer
C. bigquery.dataOwner
What role should you grant
the business analysts? D. storage.objectViewer and
bigquery.user
1.1 Diagnostic Question 02
Cymbal Retail has acquired another A. Create a new
company in Europe. Data access organization for all
permissions and policies in this new region projects in Europe and
differ from those in Cymbal Retail’s assign policies in each organization that
headquarters, which is in North America. comply with regional laws.
You need to define a consistent set of B. Implement a flat hierarchy, and assign policies
policies for projects in each region that to each project according to its region.
follow recommended practices.
C. Create top level folders for each region, and
assign policies at the folder level.
What should you do?
D. Implement policies at the resource level that
comply with regional laws.
1.1 Diagnostic Question 03

You are migrating on-premises data to a A. Use the Cloud Data Loss
data warehouse on Google Cloud. This data Prevention API (DLP API) to
will be made available to business analysts. identify and redact data that matches infoTypes like
Local regulations require that customer credit card numbers, phone numbers, and email IDs.
information including credit card numbers, B. Delete all columns with a title similar to "credit
phone numbers, and email IDs be captured, card," "phone," and "email."
but not used in analysis. You need to use a
C. Create a regular expression to identify and delete
reliable, recommended solution to redact
patterns that resemble credit card numbers, phone
the sensitive data.
numbers, and email IDs.
What should you do? D. Use the Cloud Data Loss Prevention API (DLP API)
to perform date shifting of any entries with credit
card numbers, phone numbers, and email IDs.
1.1 Diagnostic Question 04
Your data and applications reside in A. Enable confidential
multiple geographies on Google Cloud. computing for all your
Some regional laws require you to hold your virtual machines.
own keys outside of the cloud provider B. Store keys in Cloud Key Management Service
environment, whereas other laws are less (Cloud KMS), and reduce the number of days for
restrictive and allow storing keys with the automatic key rotation.
same provider who stores the data. The
C. Store your keys in Cloud Hardware Security
management of these keys has increased in
Module (Cloud HSM), and retrieve keys from it when
complexity, and you need a solution that
required.
can centrally manage all your keys.
D. Store your keys on a supported external key
What should you do? management partner, and use Cloud External Key
Manager (Cloud EKM) to get keys when required.
Designing for security and
1.1 compliance
Courses Skill Badges Documentation
Modernizing Data Lakes and Data Implement Load Balancing Import data from Google Cloud into a
Warehouses with Google Cloud on Compute Engine
secured BigQuery data warehouse
● Introduction to Data Engineering Prepare Data for ML APIs on
● Building a Data Lake IAM basic and predefined roles
Google Cloud
● Building a Data Warehouse reference
Smart Analytics, Machine Learning, Creating and managing Folders
and AI on Google Cloud Resource hierarchy
● Prebuilt ML Model APIs for Sensitive Data Protection
Unstructured Data InfoType detector reference
Serverless Data Processing with Cloud External Key Manager
Dataflow: Foundations Hold your own key with Google Cloud
● IAM, Quotas, and Permissions External Key Manager
● Security Evolving Cloud External Key Manager –
BigQuery Fundamentals for Redshift What’s new with Cloud EKM | Google
Professionals Cloud Blog
● BigQuery and Google Cloud IAM
1.2 Diagnostic Question 05

Cymbal Retail has a team of business analysts A. Load the data into
who need to fix and enhance a set of large Dataprep, explore the data,
input data files. For example, duplicates need and edit the transformations as needed.
to be removed, erroneous rows should be B. Create a Dataproc job to perform the data
deleted, and missing data should be added. fixes you need.
These steps need to be performed on all the
C. Create a Dataflow pipeline with the data
present set of files and any files received in
fixes you need.
the future in a repeatable, automated
process. The business analysts are not adept D. Load the data into Google Sheets, explore
at programming. the data, and fix the data as needed.

What should they do?

1.2 Diagnostic Question 06

You have a Dataflow pipeline that runs A. Use Cloud Monitoring

data processing jobs. You need to B. Use Cloud Logging
identify the parts of the pipeline code
C. Use Cloud Profiler
that consume the most resources.
D. Use Cloud Audit Logs
What should you do?
1.2 Designing for reliability and fidelity
Courses Skill Badges Documentation

Modernizing Data Lakes and Data Warehouses on Prepare Data for ML Dataprep Basics
Google Cloud APIs on Google Cloud
● Building a Data Warehouse Dataprep Wrangle
Engineer Data for Language
Building Batch Data Pipelines on Google Cloud
Predictive Modeling
● Introduction to Building Batch Data Pipelines
with BigQuery ML Monitoring pipeline
● Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
performance using
Building Resilient Streaming Analytics Systems on
Google Cloud Cloud Profiler | Dataflow
● Serverless Messaging with Pub/Sub
Serverless Data Processing with Dataflow: Develop Pipelines
● Best Practices
Serverless Data Processing with Dataflow: Operations
● Monitoring
● Logging and Error Reporting
● Troubleshooting and Debug
● Testing and CI/CD
● Reliability
1.3 Diagnostic Question 07

You are using Dataproc to process a large A. Cloud SQL

number of CSV files. The storage option B. Zonal persistent disks
you choose needs to be flexible to serve
C. Local SSD
many worker nodes in multiple clusters.
These worker nodes will read the data and D. Cloud Storage
also write to it for intermediate storage
between processing jobs.

What is the recommended storage option

on Google Cloud?
1.3 Diagnostic Question 08
You are managing the data for Cymbal Retail, which consists A. Implement a
of multiple teams including retail, sales, marketing, and legal. data mesh with
These teams are consuming data from multiple producers Dataplex and have
including point of sales systems, industry data, orders, and producers tag data when created.
more. Currently, teams that consume data have to repeatedly B. Implement a data lake with Cloud Storage,
ask the teams that produce it to verify the most up-to-date and create buckets for each team such as
data and to clarify other questions about the data, such as retail, sales, marketing.
source and ownership. This process is unreliable and
C. Implement a data warehouse by using
time-consuming and often leads to repeated escalations.
BigQuery, and create datasets for each team
You need to implement a centralized solution that gains a
such as retail, sales, marketing.
unified view of the organization's data and improves
searchability. D. Implement Looker dashboards that
provide views of the data that meet each
What should you do? teams’ requirements.
Designing for flexibility and
1.3 portability

Courses Skill Badges Documentation

Modernizing Data Lakes and Get Started with Dataplex Dataproc best practices | Google Cloud
Data Warehouses on Google Blog
Cloud HDFS vs. Cloud Storage: Pros, cons and
● Introduction to Data migration tips | Google Cloud Blog
Engineering Dataplex overview
● Building a Data Lake
Building Batch Data Pipelines
on Google Cloud
● Introduction to Building
Batch Data Pipelines
Serverless Data Processing
with Dataflow: Foundations
● Beam Portability
1.4 Diagnostic Question 09

Laws in the region where you operate A. Store the data in a

require that files related to all orders made Cloud Storage bucket, and
each day are stored immutably for 365 enable object versioning and delete any version
days. The solution that you recommend has older than 365 days.
to be cost-effective. B. Store the data in a Cloud Storage bucket, and
specify a retention period.
What should you do?
C. Store the data in a Cloud Storage bucket, and
set a lifecycle policy to delete the file after 365
days.
D. Store the data in a Cloud Storage bucket, enable
object versioning, and delete any version greater
than 365.
1.4 Diagnostic Question 10

Cymbal Retail is migrating its private data A. Store the data in an

centers to Google Cloud. Over many years, HTTPS endpoint, and configure
hundreds of terabytes of data were Storage Transfer Service to copy the data
accumulated. You currently have a 100 to Cloud Storage.
Mbps line and you need to transfer this B. Upload the data to Cloud Storage by using
data reliably before commencing gcloud storage.
operations on Google Cloud in 45 days.
C. Zip and upload the data to Cloud Storage
buckets by using the Google Cloud console.
What should you do?
D. Order a transfer appliance, export the data to it,
and ship it to Google.
1.4 Designing data migrations

Courses Documentation

Retention policies and retention policy

Modernizing Data Lakes and Data Warehouses on Google Cloud locks | Cloud Storage
● Building a Data Lake Migration to Google Cloud:
● Building a Data Warehouse
Transferring your large datasets
BigQuery Fundamentals for Redshift Professionals
● BigQuery and Google Cloud IAM
Section 2:
Ingesting and
Processing the Data
2.1 Diagnostic Question 01

Your data engineering team A. Store the data in Cloud Storage and create an
receives data in JSON format extract, transform, and load (ETL) pipeline.
from external sources at the B. Make your BigQuery data warehouse public and
end of each day. You need to ask the external sources to insert the data.
design the data pipeline.
C. Create a public API to allow external
applications to add the data to your warehouse.

What should you do? D. Store the data in persistent disks and create an
ETL pipeline.
2.1 Diagnostic Question 02

The first stage of your data pipeline A. Cloud Storage

processes tens of terabytes of B. Cloud SQL
financial data and creates a
C. AlloyDB
sparse, time-series dataset as a
key-value pair. D. Bigtable

Which of these is a suitable sink

for the pipeline's first stage?
2.1 Diagnostic Question 03

You are processing large A. Copy the data from Cloud SQL to a new
amounts of input data in BigQuery table hourly.
BigQuery. You need to combine B. Copy the data from Cloud SQL and create a
this data with a small amount of combined, normalized table hourly.
frequently changing data that is
C. Use a federated query to get data from Cloud
available in Cloud SQL.
SQL.
D. Create a Dataflow pipeline to combine the
What should you do? BigQuery and Cloud SQL data when the Cloud
SQL data changes.
2.1 Planning the data pipelines

Courses Skill Badges Documentation

Modernizing Data Lakes and Data Warehouses Prepare Data for ML APIs on Google
on Google Cloud Cloud What Data Pipeline Architecture should I use? |
● Introduction to Data Engineering
Engineer Data for Predictive Modeling Google Cloud Blog
● Building a Data Lake
with BigQuery ML
● Building a Data Warehouse Bigtable overview
Building Batch Data Pipelines on Google Cloud Cloud SQL federated queries | BigQuery
● Executing Spark on Dataproc
● Manage Data Pipelines with Cloud Data Exploring new features in BigQuery federated
Fusion and Cloud Composer queries | Google Cloud Blog
Building Resilient Streaming Analytics Systems
on Google Cloud
● High-Throughput BigQuery and Bigtable
Streaming Features
Serverless Data Processing with Dataflow:
Develop Pipelines
● Beam Concepts Review
● Sources and Sinks
● Schemas
2.2 Diagnostic Question 04

Your company has multiple data A. Dataflow

analysts but a limited data engineering B. Cloud Data Fusion
team. You need to choose a tool where
C. Dataproc
the analysts can build data pipelines
themselves with a graphical user D. Cloud Composer
interface.

Which of these products is the

most appropriate?
2.2 Diagnostic Question 05

You manage a PySpark batch data A. Configure the job to run on Dataproc Serverless.
pipeline by using Dataproc. You B. Configure the job to run with Spot VMs.
want to take a hands-off approach
C. Rewrite the job in Spark SQL.
to running the workload, and you do
not want to provision and manage D. Rewrite the job in Dataflow with SQL.
your own cluster.

What should you do?

2.2 Diagnostic Question 06

You need to run batch jobs, A. Use Cloud Scheduler to run the jobs.
which could take many days B. Use Workflows to run the jobs.
to complete. You do not want
C. Run the jobs on Batch.
to manage the infrastructure
provisioning. D. Use Cloud Run to run the jobs.

What should you do?

2.2 Diagnostic Question 07

You are creating a data pipeline for A. Hopping windows (sliding windows in Apache Beam)
streaming data on Dataflow for B. Session windows
Cymbal Retail's point of sales data.
C. Global window
You want to calculate the total sales
per hour on a continuous basis. D. Tumbling windows (fixed windows in Apache Beam)

Which of these windowing

options should you use?
2.2 Diagnostic Question 08

You want to build a streaming data A. Pub/Sub, Dataflow, BigQuery

analytics pipeline in Google Cloud. You B. Pub/Sub, Dataprep, BigQuery
need to choose the right products that
C. Cloud Storage, Dataflow, Cloud SQL
support streaming data.
D. Cloud Storage, Dataprep, AlloyDB

Which of these would you choose?

2.2 Building the pipelines
Courses Skill Badges Documentation

Building Batch Data Pipelines on Google Cloud Prepare Data for ML APIs on Google Cloud Data Fusion overview
● Introduction to Building Batch Data Pipelines Cloud
● Executing Spark on Dataproc What is Dataproc Serverless?
● Serverless Data Processing with Dataflow Introduction to Google Batch
● Manage Data Pipelines with Cloud Data
Fusion and Cloud Compose Get started with Batch | Google Cloud
Building Resilient Streaming Analytics Systems on
Streaming pipelines | Cloud Dataflow
Google Cloud
● Serverless Messaging with Pub/Sub Basics of the Beam model
● Dataflow Streaming Features
Serverless Data Processing with Dataflow: Streaming analytics solutions | Google Cloud
Foundations
● Separating Compute and Storage with
Dataflow
Serverless Data Processing with Dataflow:
Develop Pipelines
● Windows, Watermarks, and Triggers
● States and Timers
● Dataflow SQL and DataFrames
Serverless Data Processing with Dataflow:
Operations
● Performance
● Testing and CI/CD
● Flex Templates
2.3 Diagnostic Question 09

You have a data pipeline that requires A. Cloud Tasks

you to monitor a Cloud Storage bucket B. Cloud Composer
for a file, start a Dataflow job to
C. Cloud Scheduler
process data in the file, run a shell
script to validate the processed data in D. Cloud Run
BigQuery, and then delete the original
file. You need to orchestrate this
pipeline by using recommended tools.

Which product should you choose?

2.3 Diagnostic Question 10

You are running Dataflow jobs for data A. Terraform

processing. When developers update the B. Compute Engine
code in Cloud Source Repositories, you
C. Cloud Code
need to test and deploy the updated
code with minimal effort. D. Cloud Build

Which of these would you use to build

your continuous integration and delivery
(CI/CD) pipeline for data processing?
Deploying and operationalizing
2.3 the pipelines

Courses Skill Badges Documentation

Building Batch Data Pipelines on Engineer Data for Predictive How to use Cloud Composer for data
Google Cloud Modeling with BigQuery ML orchestration
● Manage Data Pipelines with
Cloud Composer overview
Cloud Data Fusion and Cloud
Composer Use a CI/CD pipeline for data-processing
workflows | Google Cloud
Serverless Data Processing with
Dataflow: Operations
● Testing and CI/CD
Section 3:
Storing the Data
3.1 Diagnostic Question 01

You need to choose a data storage A. Use Spanner.

solution to support a transactional B. Use Cloud SQL.
system. Your customers are primarily C. Install a database of your choice on a
based in one region. You want to Compute Engine VM.
reduce your administration tasks and D. Create a Cloud Storage bucket with a
focus engineering effort on building regional bucket.
your business application.

What should you do?

3.1 Diagnostic Question 02

You need to store data long term and use A. Standard

it to create quarterly reports. B. Nearline
C. Coldline
What storage class should you choose? D. Archive
3.1 Selecting storage systems

Courses Documentation

Google Cloud Big Data and Machine Learning Fundamentals Cloud SQL for MySQL, PostgreSQL,
● Big Data and Machine Learning on Google Cloud and SQL Server

Modernizing Data Lakes and Data Warehouses on Google Cloud What is Cloud SQL?
● Introduction to data engineering Storage classes | Google Cloud
● Building a data lake
● Building a data warehouse
Building Resilient Streaming Analytics Systems on Google Cloud
● High-Throughput BigQuery and Bigtable Streaming Features
3.2 Diagnostic Question 03

You have several large tables in your A. Retain the data on BigQuery with the same
transaction databases. You need to schema as the source.
move all the data to BigQuery for the B. Combine all the transactional database tables
business analysts to explore and into a single table using outer joins.
analyze the data. C. Redesign the schema to normalize the data by
removing all redundancies.
How should you design the D. Redesign the schema to denormalize the data
schema in BigQuery? with nested and repeated data.
3.2 Diagnostic Question 04

You are ingesting data that is A. Create an ingestion-time partitioned table with
spread out over a wide range daily partitioning type.
of dates into BigQuery at a B. Create an ingestion-time partitioned table with
fast rate. You need to yearly partitioning type.
partition the table to make
C. Create an integer-range partitioned table.
queries performant.
D. Create a time-unit column-partitioned table with
yearly partitioning type.
What should you do?
3.2 Diagnostic Question 05

Your analysts repeatedly run the A. Create a dataset with the data that is frequently
same complex queries that queried.
combine and filter through a lot B. Create a view of the frequently queried data.
of data on BigQuery. The data
C. Export the frequently queried data into a new
changes frequently. You need to
table.
reduce the effort for the analysts.
D. Export the frequently queried data into Cloud
What should you do? SQL.
Planning for using a data
3.2 warehouse

Courses Skill Badges Documentation

Introduction to optimizing query

Modernizing Data Lakes and Data Build a Data Warehouse with performance | BigQuery | Google
Warehouses on Google Cloud BigQuery Cloud
● Building a data warehouse Introduction to partitioned tables |
Building Resilient Streaming Analytics BigQuery | Google Cloud
Systems on Google Cloud Creating partitioned tables |
● Advanced BigQuery functionality BigQuery | Google Cloud
and performance Introduction to views | BigQuery |
Google Cloud
3.3 Diagnostic Question 06

You have data that is ingested daily A. Create a bucket on Cloud Storage with object
and frequently analyzed in the first versioning configured.
month. Thereafter, the data is retained B. Create a bucket on Cloud Storage with Autoclass
only for audits, which happen configured.
occasionally every few years. You need
C. Configure a data retention policy on Cloud Storage.
to configure cost-effective storage.
D. Configure a lifecycle policy on Cloud Storage.
What should you do?
3.3 Diagnostic Question 07

You have data stored in a Cloud A. The user has no access if IAM denies the
Storage bucket. You are using both permission.
Identity and Access Management B. The user only has access if both IAM and ACLs
(IAM) and Access Control Lists grant a permission.
(ACLs) to configure access control.
C. The user has access if either IAM or ACLs grant
a permission.
D. The user has no access if either IAM or ACLs
Which statement describes a user's deny a permission.
access to objects in the bucket?
3.3 Diagnostic Question 08

A manager at Cymbal Retail A. Review the Admin Activity audit logs.

expresses concern about B. Enable and then review the Data Access audit logs.
unauthorized access to objects
C. Route the Admin Activity logs to a BigQuery sink
in your Cloud Storage bucket.
and analyze the logs with SQL queries.
You need to evaluate all access
on all objects in the bucket. D. Change the permissions on the bucket to only
trusted employees.
What should you do?
3.3 Using a data lake

Courses Documentation

Modernizing Data Lakes and Data Warehouses on Google Cloud Cloud Storage
● Building a data lake Object Lifecycle Management | Cloud
Storage
Overview of access control | Cloud
Storage
Cloud Audit Logs with Cloud Storage |
Google Cloud
3.4 Diagnostic Question 09

Cymbal Retail has accumulated a large A. Create tags for data entries in Cloud Catalog.
amount of data. Analysts and leadership B. Rename BigQuery columns with more
are finding it difficult to understand the descriptive names.
meaning of the data, such as BigQuery
C. Export the data to Cloud Storage with
columns. Users of the data don't know
descriptive file names.
who owns what. You need to improve
the searchability of the data. D. Add a description column corresponding to
each data column.
What should you do?
3.4 Diagnostic Question 10

You have large amounts of data A. Create a lake for Cloud Storage data and a zone
stored on Cloud Storage and for BigQuery data.
BigQuery. Some of it is processed, B. Create a lake for BigQuery data and a zone for
but some is yet unprocessed. You Cloud Storage data.
have a data mesh created in
C. Create a lake for unprocessed data and assets
Dataplex. You need to make it
for processed data.
convenient for internal users of the
data to discover and use the data. D. Create a raw zone for the unprocessed data and
a curated zone for the processed data.
What should you do?
3.4 Designing for a data mesh

Courses Skill Badges Documentation

Your company uses Google Workspace A. Create models in Looker.

and your leadership team is familiar with B. Configure Connected Sheets.
its business apps and collaboration tools.
C. Configure Tableau.
They want a cost-effective solution that
uses their existing knowledge to evaluate, D. Configure Looker Studio.
analyze, filter, and visualize data that is
stored in BigQuery.

What should you do to create a

solution for the leadership team?
4.1 Diagnostic Question 02

You have data in PostgreSQL that was A. Use nested and repeated fields.
designed to reduce redundancy. You B. Retain the data in normalized form always.
are transferring this data to BigQuery
C. Copy the primary tables and use federated
for analytics. The source data is
queries for secondary tables.
hierarchical and frequently queried
together. You need to design a D. Copy the normalized data into partitions.
BigQuery schema that is performant.

What should you do?

4.1 Diagnostic Question 03

You repeatedly run the same A. Views

queries by joining multiple tables. B. Materialized views
The original tables change about
C. Federated queries
ten times per day. You want an
optimized querying approach. D. Partitions

Which feature should you use?

4.1 Diagnostic Question 04

You have analytics data stored in A. Use an aggregate function.

BigQuery. You need an efficient B. Use a UDF (user-defined function).
way to compute values across a
C. Use BigQuery ML.
group of rows and return a single
result for each row. D. Use a window function with an OVER clause.

What should you do?

4.1 Diagnostic Question 05

You need to optimize the A. Batch your updates and inserts.

performance of queries in B. Use the LIMIT clause to reduce the data read.
BigQuery. Your tables are not
C. Filter data as late as possible.
partitioned or clustered.
D. Perform self-joins on data.
What optimization technique
can you use?
4.1 Preparing data for visualization

Courses Skill Badges Documentation

Prepare Data for ML APIs on Introduction to analysis and business

Google Cloud Big Data and Machine
Learning Fundamentals Google Cloud intelligence tools
● Data Engineering for streaming Engineer Data for Predictive Use nested and repeated fields
data Modeling with BigQuery ML
Introduction to materialized views
Modernizing Data Lakes and Data
Warehouses on Google Cloud Window function calls
● Building a data warehouse Optimize query computation
Building Resilient Streaming Analytics
Optimize query computation
Systems on Google Cloud
● Dataflow streaming features
● Advanced BigQuery functionality
and performance
Serverless Data Processing with
Dataflow: Develop Pipelines
● Windows, watermarks, and triggers
4.2 Diagnostic Question 06

Your data in BigQuery has some A. Create a new dataset with the column's data.
columns that are extremely sensitive. B. Create a new table with the column's data.
You need to enable only some users
C. Use policy tags.
to see certain columns.
D. Use Identity and Access Management (IAM)
permissions.
What should you do?
4.2 Diagnostic Question 07

Your business has collected A. Export the data to zip files and share it through
industry-relevant data over many Cloud Storage.
years. The processed data is B. Host the data on Analytics Hub.
useful for your partners and they
C. Export the data to persistent disks and share it
are willing to pay for its usage.
through an FTP endpoint.
You need to ensure proper access
control over the data. D. Host the data on Cloud SQL.

What should you do?

4.2 Diagnostic Question 08

You have a complex set of data that A. Looker Studio

comes from multiple sources. The B. Connected Sheets
analysts in your team need to analyze the
C. D3.js library
data, visualize it, and publish reports to
internal and external stakeholders. You D. Looker
need to make it easier for the analysts to
work with the data by abstracting the
multiple data sources.

What tool do you recommend?

4.2 Sharing data

Courses Skill Badges Documentation

Introduction to column-level access
Google Cloud Big Data and Machine Data Catalog Fundamentals control
Learning Fundamentals
Analytics Hub | Data Exchange and Data
● Data Engineering for Streaming
Data Sharing | Google Cloud

Modernizing Data Lakes and Data Introduction to Analytics Hub | BigQuery

Warehouses on Google Cloud Secure data exchanges and data sharing
● Introduction to Data Engineering with Analytics Hub
Building Batch Data Pipelines on Looker business intelligence platform
Google Cloud embedded analytics
● Introduction to Building Batch
Data Pipelines
4.3 Diagnostic Question 09

You built machine learning (ML) models A. Train the model with more of similar data.
based on your own data. In production, B. Perform L2 regularization.
the ML models are not giving satisfactory
C. Perform feature engineering, and use domain
results. When you examine the data, it
knowledge to enhance the column data.
appears that the existing data is not
sufficiently representing the business D. Train the model with the same data, but use
goals. You need to create a more more epochs.
accurate machine learning model.

What should you do?

4.3 Diagnostic Question 10

You used Dataplex to create lakes and A. You have an exclude pattern that matches the files.
zones for your business data. However, B. You have scheduled discovery to run every hour.
some files are not being discovered.
C. The files are in ORC format.
D. The files are in Parquet format.
What could be the issue?
4.3 Exploring and analyzing data
Courses Skill Badges Documentation

Google Cloud Big Data and Machine Engineer Data for Predictive Use the BigQuery ML TRANSFORM clause for
Learning Fundamentals Modeling with BigQuery ML feature engineering | Google Cloud
● Big Data with BigQuery Feature preprocessing overview | BigQuery |
● The machine learning workflow with
Google Cloud
Vertex A
Modernizing Data Lakes and Data Discover data | Dataplex | Google Cloud
Warehouses on Google Cloud
● Introduction to Data Engineering
Building Batch Data Pipelines on Google
Cloud
● Introduction to building batch data
pipelines
Smart Analytics, Machine Learning, and
AI on Google Cloud
● Custom model building with SQL in
BigQuery ML
Section 5:
Maintaining and
Automating Data Workloads
5.1 Diagnostic Question 01

You need to design a Dataproc A. Reuse the same cluster and run each job in
cluster to run multiple small sequence.
jobs. Many jobs (but not all) B. Reuse the same cluster to run all jobs in parallel.
are of high priority.
C. Use ephemeral clusters.
D. Use cluster autoscaling.
What should you do?
5.1 Optimizing resources

Courses Documentation

Building Batch Data Pipelines on Google Cloud Dataproc Job Optimization

● Executing Spark on Dataproc How-to Guide | Google Cloud
Blog
5.2 Diagnostic Question 02

You need to create repeatable A. Write each task to be responsible for one
data processing tasks by using operation.
Cloud Composer. You need to B. Use current time with the now() function for
follow best practices and computation.
recommended approaches.
C. Update data with INSERT statements during the
task run.
What should you do?
D. Combine multiple functionalities in a single task
execution.
Designing automation
5.2 and repeatability

Courses Skill Badges Documentation

Building Batch Data Pipelines on Engineer Data for Predictive Write Airflow DAGs | Cloud
Google Cloud Modeling with BigQuery ML Composer
● Manage Data Pipelines with DAGs — Airflow Documentation
Cloud Data Fusion and Cloud
Composer DAG writing best practices in
Serverless Data Processing with Apache Airflow | Astronomer
Dataflow: Develop Pipelines Documentation
● Best Practices
5.3 Diagnostic Question 03

Multiple analysts need to prepare A. Use on-demand pricing.

reports on Monday mornings due B. Use Flex Slots.
to which there is heavy utilization
C. Use BigQuery Enterprise edition with a
of BigQuery. You want to take a
one-year commitment.
cost-effective approach to
managing this demand. D. Use BigQuery Enterprise Plus edition with a
three-year commitment.
What should you do?
5.3 Diagnostic Question 04

You have a team of data analysts that A. Run all queries in interactive mode.
run queries interactively on BigQuery B. Create a yearly reservation of BigQuery slots.
during work hours. You also have
C. Run the report generation queries in batch
thousands of report generation
mode.
queries that run simultaneously. You
often see an error: Exceeded rate D. Create a view to run the queries.
limits: too many concurrent queries
for this project_and_region.

How would you resolve this issue?

Organizing workloads based on business
5.3 requirements

Courses Documentation

Modernizing Data Lakes and Data Warehouses on Google Cloud Scale cloud data warehouse up
and down quickly
● Introduction to Data Engineering
Introduction to reservations |
● Building a Data Warehouse
BigQuery | Google Cloud
Building Resilient Streaming Analytics Systems on Google Cloud
Introduction to BigQuery
● Advanced BigQuery Functionality and Performance editions | Google Cloud
Run a query | BigQuery | Google
Cloud
Troubleshoot quota and limit
errors | BigQuery | Google Cloud
5.4 Diagnostic Question 05

You have a Dataflow pipeline in A. Review the Dataflow logs regularly.

production. For certain data, the B. Set up alerts with Cloud Run functions code that
system seems to be stuck longer reviews the audit logs regularly.
than usual. This is causing delays
C. Review the Cloud Monitoring dashboard
in the pipeline execution. You
regularly.
want to reliably and proactively
track and resolve such issues. D. Set up alerts on Cloud Monitoring based on
system lag.
What should you do?
5.4 Diagnostic Question 06

When running Dataflow jobs, you A. Disable Dataflow shuffle.

see this error in the logs: "A hot B. Increase the data with the hot key.
key HOT_KEY_NAME was
C. Ensure that your data is evenly distributed.
detected in…". You need to
resolve this issue and make the D. Add more compute instances for processing.
workload performant.

What should you do?

5.4 Diagnostic Question 07

A colleague at Cymbal Retail asks A. When you want to scale on-cluster Hadoop
you about the configuration of Distributed File System (HDFS).
Dataproc autoscaling for a project. B. When you want to scale out single-job clusters.
C. When you want to down-scale idle clusters to
What would be the minimum size.
Google-recommended situation when
D. When there are different size workloads on the
you should enable autoscaling?
cluster.
5.4 Monitoring and troubleshooting processes

Courses Skill Badges Documentation

Modernizing Data Lakes and Data Warehouses on Google Prepare Data for ML APIs on Use Cloud Monitoring for Dataflow
Cloud Google Cloud pipelines
● Introduction to Data Engineering
Troubleshoot Dataflow errors | Google
Building Batch Data Pipelines on Google Cloud
Cloud
● Executing Spark on Dataproc
Building Resilient Streaming Analytics Systems on Google Troubleshoot stragglers in batch jobs |
Cloud Cloud Dataflow
● Serverless Messaging with Pub/Sub Autoscaling clusters | Dataproc
● Advanced BigQuery Functionality and Performance Documentation | Google Cloud
Serverless Data Processing with Dataflow: Foundations
● IAM, Quotas, and Permissions
Serverless Data Processing with Dataflow: Develop
Pipelines
● State and Timers
● Best Practices
Serverless Data Processing with Dataflow: Operations
● Monitoring
● Troubleshooting and Debug
● Reliability
5.5 Diagnostic Question 08

Cymbal Retail processes A. Take Dataflow snapshots periodically.

streaming data on Dataflow with B. Create Dataflow jobs from templates.
Pub/Sub as a source. You need
C. Enable vertical autoscaling.
to plan for disaster recovery and
protect against zonal failures. D. Enable Dataflow shuffle.

What should you do?

5.5 Diagnostic Question 09

You run a Cloud SQL instance for A. Configure replication.

a business that requires that the B. Configure high availability.
database is accessible for
C. Configure backups.
transactions. You need to ensure
minimal downtime for database D. Configure backups and increase the
transactions. number of backups.

What should you do?

5.5 Diagnostic Question 10

You are running a Dataflow pipeline in A. Re-read the input data and create separate
production. The input data for this outputs for valid and erroneous data.
pipeline is occasionally inconsistent. B. Read the data once, and split it into two
Separately from processing the valid pipelines, one to output valid data and another
data, you want to efficiently capture to output erroneous data.
the erroneous input data for analysis.
C. Check for the erroneous data in the logs.
D. Create a side output for the erroneous data.
What should you do?
Maintaining awareness of failures and
5.5 mitigating impact

Courses Documentation

Modernizing Data Lakes and Data Warehouses on Google Cloud Use Dataflow snapshots |
● Building a Data Lake Google Cloud
Serverless Data Processing with Dataflow: Develop Pipelines About high availability | Cloud
SQL for MySQL
● State and Timers
Design Your Pipeline
● Best Practices
Serverless Data Processing with Dataflow: Operations
● Troubleshooting and Debug
● Reliability
When will you take the exam?

Plan time How many weeks do you have to

prepare?

to prepare How many hours will you spend

preparing for the exam each week?

How many total hours will you

prepare?
Weekly study plan

Now, consider what you’ve learned about your knowledge and skills
through the diagnostic questions in this course. You should have a
better understanding of what areas you need to focus on and what
resources are available.

Use the template that follows to plan your study goals for each week.
Consider:
● What exam guide section(s) or topic area(s) will you focus on?
● What courses (or specific modules) will help you learn more?
● What Skill Badges or labs will you work on for hands-on practice?
● What documentation links will you review?
● What additional resources will you use - such as sample
questions?
● What will you do to prepare for the case studies?
You may do some or all of these study activities each week.

Duplicate the weekly template for the number of weeks in your

individual preparation journey.
Weekly study template (example)

Area(s) of focus: Using BigQuery as a data warehouse

Courses/modules Modernizing Data Lakes and Data Warehouses with Google Cloud
to complete: ● Building a data warehouse

Skill Badges/labs Build a Data Warehouse with BigQuery

to complete:

Documentation Overview of BigQuery storage | Google Cloud

to review: Overview of BigQuery analytics | Google Cloud
Introduction to BigQuery administration | Google Cloud
Organizing BigQuery resources | Google Cloud

Additional study: Sample Questions 1- 5

Weekly study template

Area(s) of focus:

Courses/modules
to complete:

Skill Badges/labs
to complete:

Documentation
to review:

Additional study:

Associate Data Practitioner Exam Dumps
No ratings yet
Associate Data Practitioner Exam Dumps
6 pages
Google's Professional Data Engineer _ ExamTopics
No ratings yet
Google's Professional Data Engineer _ ExamTopics
234 pages
AZ-900 Microsoft Azure Fundamentals: Exam Prep Question Bank
From Everand
AZ-900 Microsoft Azure Fundamentals: Exam Prep Question Bank
Krumu Publisher
No ratings yet
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m0-l6-file-en-7
100% (1)
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m0-l6-file-en-7
80 pages
Learning AWS
From Everand
Learning AWS
Aurobindo Sarkar
4/5 (4)
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Google Data Engineer Certification Workbook
No ratings yet
Google Data Engineer Certification Workbook
80 pages
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m1-l7-file-en-13
No ratings yet
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m1-l7-file-en-13
32 pages
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
From Everand
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
SUJAN
No ratings yet
Associate Data Practitioner Google Cloud Dumps Questions
No ratings yet
Associate Data Practitioner Google Cloud Dumps Questions
7 pages
Google Cloud Professional Cloud Security Engineer 100+ Practice Exam Questions with Detailed Answers
From Everand
Google Cloud Professional Cloud Security Engineer 100+ Practice Exam Questions with Detailed Answers
vivian njoroge
No ratings yet
Google Cloud Professional Cloud Architect 100+ Practice Exam questions with Detailed Answers
From Everand
Google Cloud Professional Cloud Architect 100+ Practice Exam questions with Detailed Answers
vivian njoroge
No ratings yet
Preparing For Your Professional Data Engineer Journey T GCPPDE A m5 l6 File en 33
No ratings yet
Preparing For Your Professional Data Engineer Journey T GCPPDE A m5 l6 File en 33
33 pages
Mastering the Art of Cloud Computing with Google Cloud Platform: Unraveling the Secrets of Experts
From Everand
Mastering the Art of Cloud Computing with Google Cloud Platform: Unraveling the Secrets of Experts
Steve Jones
No ratings yet
Deploy any website on google cloud platform
From Everand
Deploy any website on google cloud platform
AJ Books
No ratings yet
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
From Everand
AWS Solutions Architect Certification Case Based Practice Questions Latest Edition 2023
Exam OG
No ratings yet
Professional Data Engineer Sample Questions
No ratings yet
Professional Data Engineer Sample Questions
29 pages
Professional Cloud Architect Exam
0% (1)
Professional Cloud Architect Exam
138 pages
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
From Everand
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Adi Wijaya
No ratings yet
Dataengieer
No ratings yet
Dataengieer
23 pages
PDE-sample Ques
No ratings yet
PDE-sample Ques
4 pages
Mastering Google Cloud Platform: Navigating the Clouds
From Everand
Mastering Google Cloud Platform: Navigating the Clouds
Kameron Hussain
No ratings yet
cloud-digital-leader_2
No ratings yet
cloud-digital-leader_2
28 pages
pls-academy-cld-student-v2-slides-2-2
No ratings yet
pls-academy-cld-student-v2-slides-2-2
105 pages
Google Passguide Cloud-Digital-Leader Actual Test 2023-Jul-21 by Marcus 91q Vce
100% (1)
Google Passguide Cloud-Digital-Leader Actual Test 2023-Jul-21 by Marcus 91q Vce
29 pages
Professional Cloud Architect Exam - Free Actual Q&As, Page 3 - ExamTopics
No ratings yet
Professional Cloud Architect Exam - Free Actual Q&As, Page 3 - ExamTopics
4 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PCA Exam
No ratings yet
PCA Exam
19 pages
Advanced Serverless Data Management: Harnessing Google Cloud Functions for Cutting-Edge Processing
From Everand
Advanced Serverless Data Management: Harnessing Google Cloud Functions for Cutting-Edge Processing
Adam Jones
No ratings yet
Professional Cloud Architect 0
No ratings yet
Professional Cloud Architect 0
27 pages
Associate Cloud Engineer Demo
No ratings yet
Associate Cloud Engineer Demo
5 pages
GCP Data Engineer Curriculum
No ratings yet
GCP Data Engineer Curriculum
7 pages
Naan Mudhalvan - Data Analytics by Google Lab Manual-2-24!2!23
No ratings yet
Naan Mudhalvan - Data Analytics by Google Lab Manual-2-24!2!23
22 pages
Professional Data Engineer Sample Questions - Docx-22 Qa Imp
0% (1)
Professional Data Engineer Sample Questions - Docx-22 Qa Imp
20 pages
Professional Data Engineer Beta Exam Guide
No ratings yet
Professional Data Engineer Beta Exam Guide
6 pages
Streamlining Cloud Infrastructure: Mastering Google Cloud Deployment Manager
From Everand
Streamlining Cloud Infrastructure: Mastering Google Cloud Deployment Manager
Peter Jones
No ratings yet
Azure Fundamentals Success Kit
From Everand
Azure Fundamentals Success Kit
PRIYANKA
No ratings yet
Azure Fundamentals Exam Insights
From Everand
Azure Fundamentals Exam Insights
PRIYANKA
No ratings yet
Cloud Computing Essentials: A Practical Guide with Examples
From Everand
Cloud Computing Essentials: A Practical Guide with Examples
William E. Clark
No ratings yet
Gerente El Lago de Datos
No ratings yet
Gerente El Lago de Datos
24 pages
Google - Passleader.cloud Digital Leader - free.PDF.2023 May 25.by - Gary.153q.vce
No ratings yet
Google - Passleader.cloud Digital Leader - free.PDF.2023 May 25.by - Gary.153q.vce
5 pages
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Bda 3
No ratings yet
Bda 3
2 pages
Professional Data Engineer Certification Exam Guide
No ratings yet
Professional Data Engineer Certification Exam Guide
6 pages
Professional Cloud Architect Exam - Free Actual Q&As, Page 1 - ExamTopics
No ratings yet
Professional Cloud Architect Exam - Free Actual Q&As, Page 1 - ExamTopics
4 pages
Microsoft Dynamics NAV Administration
From Everand
Microsoft Dynamics NAV Administration
Amit Sachdev
No ratings yet
Cloud Digital Leader 1
No ratings yet
Cloud Digital Leader 1
29 pages
Fundamentals of Big Data and Business Analytics - Assignment June 2021 K...
No ratings yet
Fundamentals of Big Data and Business Analytics - Assignment June 2021 K...
9 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Mastering Lead Generation with DeepSeek AI/ A Comprehensive Guide to Transforming Your Sales Strategy
From Everand
Mastering Lead Generation with DeepSeek AI/ A Comprehensive Guide to Transforming Your Sales Strategy
Robert Cullen
No ratings yet
Cloud Digital Leader
No ratings yet
Cloud Digital Leader
11 pages
GCP Data Engineering Training- updated (1)
No ratings yet
GCP Data Engineering Training- updated (1)
17 pages
Google Cloud Run for DevOps: Automating Deployments and Scaling
From Everand
Google Cloud Run for DevOps: Automating Deployments and Scaling
Robert Johnson
No ratings yet
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
No ratings yet
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
18 pages
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
associate_data_practitioner_exam_guide_english
No ratings yet
associate_data_practitioner_exam_guide_english
3 pages
From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
No ratings yet
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
141 pages
Read Me
No ratings yet
Read Me
2 pages
Week 9-10 Managing Control and Redo Log File
No ratings yet
Week 9-10 Managing Control and Redo Log File
11 pages
Huawei - OceanStor V5 - Dourado V6
No ratings yet
Huawei - OceanStor V5 - Dourado V6
18 pages
VLF Notes
No ratings yet
VLF Notes
5 pages
50 Qlik Sense Interview Questions To Ask To Hire Top Analysts 1726054010278j3s
No ratings yet
50 Qlik Sense Interview Questions To Ask To Hire Top Analysts 1726054010278j3s
1 page
CMP2006 Mock Lab Test 1
No ratings yet
CMP2006 Mock Lab Test 1
2 pages
Databases A Beginner s Guide 1st Edition Andy Oppel - The ebook in PDF format with all chapters is ready for download
100% (2)
Databases A Beginner s Guide 1st Edition Andy Oppel - The ebook in PDF format with all chapters is ready for download
28 pages
FortiNAC Best Practice Host and User Aging
No ratings yet
FortiNAC Best Practice Host and User Aging
10 pages
Trees
No ratings yet
Trees
8 pages
3.1-2 Active Directory
No ratings yet
3.1-2 Active Directory
26 pages
HADR Users Guide
No ratings yet
HADR Users Guide
78 pages
Active Directory Interview Questions and Answers
No ratings yet
Active Directory Interview Questions and Answers
4 pages
MySQL Migration Guide October 2020
No ratings yet
MySQL Migration Guide October 2020
59 pages
TeradataStudioUserGuide 2041
No ratings yet
TeradataStudioUserGuide 2041
350 pages
Group by - Having Clause - Stored Procedures
No ratings yet
Group by - Having Clause - Stored Procedures
30 pages
4 Cloud Computing.pptx
No ratings yet
4 Cloud Computing.pptx
34 pages
Practical-6: Aim: Explain Confidentiality, Integrity and Availability. Confidentiality
No ratings yet
Practical-6: Aim: Explain Confidentiality, Integrity and Availability. Confidentiality
4 pages
Manual Installation of Nextcloud (Linux. Ubuntu, Raspberry Etc)
No ratings yet
Manual Installation of Nextcloud (Linux. Ubuntu, Raspberry Etc)
12 pages
Do You Really Need A Data Warehouse Senturus Webinar
No ratings yet
Do You Really Need A Data Warehouse Senturus Webinar
78 pages
BW Characteristics With Conversion Routines _ SAP Help Portal
No ratings yet
BW Characteristics With Conversion Routines _ SAP Help Portal
11 pages
Vega_BillingSolution
No ratings yet
Vega_BillingSolution
27 pages
AWS Module 3 Notes
No ratings yet
AWS Module 3 Notes
5 pages
Dbms Unit 1 - Part A Questions
No ratings yet
Dbms Unit 1 - Part A Questions
4 pages
Uninstalling DB2 UDB
No ratings yet
Uninstalling DB2 UDB
5 pages
ADMT chp3
No ratings yet
ADMT chp3
111 pages
Using Netezza Query Plan
No ratings yet
Using Netezza Query Plan
5 pages
Practice - Using SQL Loader
No ratings yet
Practice - Using SQL Loader
13 pages
Memory - IV: CS220: Introduction To Computer Organization 2011-12 Ist Semester
No ratings yet
Memory - IV: CS220: Introduction To Computer Organization 2011-12 Ist Semester
3 pages
0 KDLVLP Đã G P
No ratings yet
0 KDLVLP Đã G P
523 pages

s01 PDE Course Workbook

Uploaded by

s01 PDE Course Workbook

Uploaded by

Preparing for

2 Ingesting and Processing the Data

3 Storing the Data

4 Preparing and Using Data for Analysis

5 Maintaining and Automating Data Workloads

Business analysts in your team need to run A. bigquery.resourceViewer and

What should they do?

You have a Dataflow pipeline that runs A. Use Cloud Monitoring

You are using Dataproc to process a large A. Cloud SQL

What is the recommended storage option

Courses Skill Badges Documentation

Laws in the region where you operate A. Store the data in a

Cymbal Retail is migrating its private data A. Store the data in an

Retention policies and retention policy

The first stage of your data pipeline A. Cloud Storage

Which of these is a suitable sink

Courses Skill Badges Documentation

Your company has multiple data A. Dataflow

Which of these products is the

What should you do?

What should you do?

Which of these windowing

You want to build a streaming data A. Pub/Sub, Dataflow, BigQuery

Which of these would you choose?

You have a data pipeline that requires A. Cloud Tasks

Which product should you choose?

You are running Dataflow jobs for data A. Terraform

Which of these would you use to build

Courses Skill Badges Documentation

You need to choose a data storage A. Use Spanner.

What should you do?

You need to store data long term and use A. Standard

Courses Skill Badges Documentation

Introduction to optimizing query

A manager at Cymbal Retail A. Review the Admin Activity audit logs.

Courses Skill Badges Documentation

Tags and tag templates | Data Catalog

Your company uses Google Workspace A. Create models in Looker.

What should you do to create a

What should you do?

You repeatedly run the same A. Views

Which feature should you use?

You have analytics data stored in A. Use an aggregate function.

What should you do?

You need to optimize the A. Batch your updates and inserts.

Courses Skill Badges Documentation

Prepare Data for ML APIs on Introduction to analysis and business

What should you do?

You have a complex set of data that A. Looker Studio

What tool do you recommend?

Courses Skill Badges Documentation

Modernizing Data Lakes and Data Introduction to Analytics Hub | BigQuery

What should you do?

Building Batch Data Pipelines on Google Cloud Dataproc Job Optimization

Courses Skill Badges Documentation

Multiple analysts need to prepare A. Use on-demand pricing.

How would you resolve this issue?

You have a Dataflow pipeline in A. Review the Dataflow logs regularly.

When running Dataflow jobs, you A. Disable Dataflow shuffle.

What should you do?

Courses Skill Badges Documentation

Cymbal Retail processes A. Take Dataflow snapshots periodically.

What should you do?

You run a Cloud SQL instance for A. Configure replication.

What should you do?

Plan time How many weeks do you have to

to prepare How many hours will you spend

How many total hours will you

Duplicate the weekly template for the number of weeks in your

Area(s) of focus: Using BigQuery as a data warehouse

Skill Badges/labs Build a Data Warehouse with BigQuery

Documentation Overview of BigQuery storage | Google Cloud

Additional study: Sample Questions 1- 5

You might also like