OD M4 Summary of Introduction To Data Engineering
OD M4 Summary of Introduction To Data Engineering
Modernizing Data
Lakes and Data
Warehouses with
Google Cloud
Course Summary
Let’s review some keys concepts we covered in this course on data lakes and data
warehouses.
Proprietary + Confidential
Course summary
● The customers of a data engineer are all the people who make decisions with data.
● The three primary advantages of doing data engineering in the cloud are:
○ Ability to separate compute and storage
○ Serverless products
○ Not having to manage infrastructure
Course summary
● We introduced data lakes and data warehouses and discussed the key
differences between the two. At a high level, a data lake is a place to store
unprocessed data. While a data warehouse is a place to store transformed
data that you ultimately want to use for analytics, machine learning, and
dashboards.
● Next, we discussed Cloud Storage as the data lake solution on Google Cloud
in some technical depth. We also presented other Google Cloud solutions for
low-latency requirements, transactional workloads, and structured data.
● We introduced BigQuery as the data warehouse solution on Google Cloud.
We discussed partitioning and clustering in BigQuery as techniques for
improving query performance.
● Also, we talked about E-L, E-L-T, and E-T-L and how these relate to data lakes
and warehouses.
● Finally, we presented some reference architectures on Google Cloud for
streaming and batch data pipelines. The hope is that these reference
architectures serve as a starting point for your data pipeline.
Proprietary + Confidential
1
Modernizing Data Lakes and Data
Warehouses with Google Cloud
Data Engineering
2
Building Batch Data Pipelines on
2 Google Cloud
3
Building Resilient Streaming Analytics
3 Systems on Google Cloud
4
Smart Analytics, Machine Learning
4 and AI on Google Cloud
Building Batch Data Pipelines on Google Cloud is the second course of the Data
Engineering on Google Cloud course series. We hope to see you there!