OD M4 Summary of Introduction To Data Engineering

The primary role of a data engineer is to build data pipelines to enable stakeholders to use data to make decisions. The course covered data lakes and warehouses, differences between them, and Google Cloud solutions like Cloud Storage and BigQuery. It also discussed ETL, ELT and EL approaches and reference architectures.

Uploaded by

obumnwabude

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

OD M4 Summary of Introduction To Data Engineering

Uploaded by

obumnwabude

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Proprietary + Confidential

Modernizing Data
Lakes and Data
Warehouses with
Google Cloud

Course Summary

Let’s review some keys concepts we covered in this course on data lakes and data
warehouses.
Proprietary + Confidential

Course summary

● Data engineers build data pipelines.

● The customers of a data engineer are all the people who make decisions with data.

● The three primary advantages of doing data engineering in the cloud are:
○ Ability to separate compute and storage
○ Serverless products
○ Not having to manage infrastructure

● The primary role of a data engineer is to build data pipelines.

● The ultimate purpose of a data pipeline is to enable stakeholders in an
organization to use data to make faster and better decisions.
● While the role of a data engineer is not new, being able to build data pipelines
entirely in the cloud is relatively new. We argue that doing data engineering in
the cloud is advantageous because you can separate compute from storage,
and you don’t have to worry about managing infrastructure and even software.
This allows you to spend more time on what matters; getting insights from
data.
Proprietary + Confidential

Course summary

● Difference between a data lake and data warehouse.

● Google Cloud Storage as a data lake solution.

● BigQuery as a data warehouse solution.

● Differences between ETL, ELT and EL.

● Google Cloud reference architectures for ETL, ELT and EL.

● We introduced data lakes and data warehouses and discussed the key
differences between the two. At a high level, a data lake is a place to store
unprocessed data. While a data warehouse is a place to store transformed
data that you ultimately want to use for analytics, machine learning, and
dashboards.
● Next, we discussed Cloud Storage as the data lake solution on Google Cloud
in some technical depth. We also presented other Google Cloud solutions for
low-latency requirements, transactional workloads, and structured data.
● We introduced BigQuery as the data warehouse solution on Google Cloud.
We discussed partitioning and clustering in BigQuery as techniques for
improving query performance.
● Also, we talked about E-L, E-L-T, and E-T-L and how these relate to data lakes
and warehouses.
● Finally, we presented some reference architectures on Google Cloud for
streaming and batch data pipelines. The hope is that these reference
architectures serve as a starting point for your data pipeline.
Proprietary + Confidential

Data Engineering learning path

1
Modernizing Data Lakes and Data
Warehouses with Google Cloud
Data Engineering
2
Building Batch Data Pipelines on
2 Google Cloud

3
Building Resilient Streaming Analytics
3 Systems on Google Cloud

4
Smart Analytics, Machine Learning
4 and AI on Google Cloud

Congratulations on completing Modernizing Data Lakes and Data Warehouses

with Google Cloud.

Building Batch Data Pipelines on Google Cloud is the second course of the Data
Engineering on Google Cloud course series. We hope to see you there!

NDG Linux Unhatched Assessment Answers 100
100% (3)
NDG Linux Unhatched Assessment Answers 100
6 pages
DAD 220 Module Four Major Activity Template
No ratings yet
DAD 220 Module Four Major Activity Template
4 pages
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m0-l6-file-en-7
100% (1)
Preparing For Your Professional Data Engineer Journey - T-GCPPDE-A-m0-l6-file-en-7
80 pages
TF2023313 - BPP Big Data and AMV File 2
No ratings yet
TF2023313 - BPP Big Data and AMV File 2
14 pages
M1 - Introduction To Data Engineering Slides
No ratings yet
M1 - Introduction To Data Engineering Slides
62 pages
Data Mining A Tutorial-Based Primer, Second Edition PDF
100% (1)
Data Mining A Tutorial-Based Primer, Second Edition PDF
530 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Big Query Google
100% (1)
Big Query Google
62 pages
Chapter08-Digital Storage
No ratings yet
Chapter08-Digital Storage
39 pages
What Is Normalization
No ratings yet
What Is Normalization
9 pages
M1.4 Summary
No ratings yet
M1.4 Summary
4 pages
M1.1 Introduction To Data Engineering
No ratings yet
M1.1 Introduction To Data Engineering
75 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
s01 PDE Course Workbook
No ratings yet
s01 PDE Course Workbook
80 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
01-Migrating Enterprise Databases To The Cloud
100% (1)
01-Migrating Enterprise Databases To The Cloud
36 pages
Google Data Engineer Certification Workbook
No ratings yet
Google Data Engineer Certification Workbook
80 pages
11 - Getting Started With Google Cloud
No ratings yet
11 - Getting Started With Google Cloud
35 pages
Google Cloud Analytics Lakehouse
No ratings yet
Google Cloud Analytics Lakehouse
47 pages
GCP - DataPlex - Building A Data Lakehouse
No ratings yet
GCP - DataPlex - Building A Data Lakehouse
19 pages
Unlock The Power of Private Cloud Big Data Analytics Ref Arch
No ratings yet
Unlock The Power of Private Cloud Big Data Analytics Ref Arch
18 pages
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
No ratings yet
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
70 pages
Final Project
No ratings yet
Final Project
28 pages
coa_file[2]
No ratings yet
coa_file[2]
11 pages
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
No ratings yet
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
75 pages
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
No ratings yet
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
73 pages
OD M2 Building A Data Lake
No ratings yet
OD M2 Building A Data Lake
59 pages
Professional Data Engineer Certification Exam Guide - Learn - Google Cloud
No ratings yet
Professional Data Engineer Certification Exam Guide - Learn - Google Cloud
10 pages
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
No ratings yet
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
63 pages
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
No ratings yet
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
58 pages
Migrating Your Databases To Managed Services On Google Cloud
100% (1)
Migrating Your Databases To Managed Services On Google Cloud
29 pages
how-snowflakes-cloud-architecture-scales-modern-data-analytics
No ratings yet
how-snowflakes-cloud-architecture-scales-modern-data-analytics
12 pages
GCP Clouud Digital Study - Guide - v2-0
No ratings yet
GCP Clouud Digital Study - Guide - v2-0
27 pages
BDML - v2.1 - PDF - Module 1 - Introduction To Google Cloud Platform
No ratings yet
BDML - v2.1 - PDF - Module 1 - Introduction To Google Cloud Platform
58 pages
Role of Cloud Computing For Big Data
No ratings yet
Role of Cloud Computing For Big Data
5 pages
OD 02 PDE Designing Data Processing Systems
No ratings yet
OD 02 PDE Designing Data Processing Systems
67 pages
Week 7 GCP Notes
No ratings yet
Week 7 GCP Notes
4 pages
CDL Study Guide November - 2021
No ratings yet
CDL Study Guide November - 2021
22 pages
Week 5 GCP Lec Notes
No ratings yet
Week 5 GCP Lec Notes
13 pages
Vinod Kumarresume1111111
No ratings yet
Vinod Kumarresume1111111
4 pages
Intro To Data Science On Cloud
No ratings yet
Intro To Data Science On Cloud
1 page
Pls Academy Pde Student Slides 4 2405
No ratings yet
Pls Academy Pde Student Slides 4 2405
129 pages
GCP Fund Module 4 Storage in The Cloud
100% (1)
GCP Fund Module 4 Storage in The Cloud
37 pages
Building Batch Data Pipelines On Google Cloud
No ratings yet
Building Batch Data Pipelines On Google Cloud
18 pages
Fast Lane - GO-GCF-BDM
No ratings yet
Fast Lane - GO-GCF-BDM
3 pages
Resume Nowsath SEP 2024
No ratings yet
Resume Nowsath SEP 2024
2 pages
Naukri_MaheshReddy7y_0m
No ratings yet
Naukri_MaheshReddy7y_0m
6 pages
Analyse Data in GCP
No ratings yet
Analyse Data in GCP
14 pages
DemystifyingGoogleCloudAComprehensiveReviewofCloudComputingServices
No ratings yet
DemystifyingGoogleCloudAComprehensiveReviewofCloudComputingServices
13 pages
GCP Storage
No ratings yet
GCP Storage
12 pages
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
From Everand
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
SUJAN
No ratings yet
Aviral_Bhardwaj_Resume2025-1
No ratings yet
Aviral_Bhardwaj_Resume2025-1
3 pages
Data Cube on Cloud Computing
No ratings yet
Data Cube on Cloud Computing
10 pages
Masters Thesis Jitendra Kumar Jaiswal ME IT 2018
No ratings yet
Masters Thesis Jitendra Kumar Jaiswal ME IT 2018
53 pages
IV Cloud computing services
No ratings yet
IV Cloud computing services
12 pages
Shabukarisadiq Resume
No ratings yet
Shabukarisadiq Resume
7 pages
Google Cloud's Data Storage And
No ratings yet
Google Cloud's Data Storage And
2 pages
Naukri SambaiahMitta (7y 5m)
No ratings yet
Naukri SambaiahMitta (7y 5m)
6 pages
1 - Study Guide
No ratings yet
1 - Study Guide
23 pages
Chaitanya-DE-4yrs-6
No ratings yet
Chaitanya-DE-4yrs-6
4 pages
SudheerKumar Ponnana Resume
No ratings yet
SudheerKumar Ponnana Resume
4 pages
22021134 - Đặng Thanh Quang - Chủ đề 1
No ratings yet
22021134 - Đặng Thanh Quang - Chủ đề 1
3 pages
Google Cloud Architect Resume - Ajmal M Shaikh
No ratings yet
Google Cloud Architect Resume - Ajmal M Shaikh
6 pages
Cloud Computing
No ratings yet
Cloud Computing
39 pages
ADF
No ratings yet
ADF
12 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
Module 5 Instance Stores and Amazon Elastic Block Store (Amazon Store)
No ratings yet
Module 5 Instance Stores and Amazon Elastic Block Store (Amazon Store)
18 pages
Midterms CS-352-LEC-1913T
No ratings yet
Midterms CS-352-LEC-1913T
15 pages
Final Capstone Story Template
No ratings yet
Final Capstone Story Template
20 pages
DBMS QB
No ratings yet
DBMS QB
9 pages
Dbms MCQ Questions With Answers
No ratings yet
Dbms MCQ Questions With Answers
5 pages
Session 1 Tableau Environment
No ratings yet
Session 1 Tableau Environment
16 pages
13-Recovery - Deferred and Immediate UPDATE-29-04-2024
No ratings yet
13-Recovery - Deferred and Immediate UPDATE-29-04-2024
36 pages
Linux File Complete
No ratings yet
Linux File Complete
9 pages
Database Management System Assignment
No ratings yet
Database Management System Assignment
8 pages
Normalization 1
No ratings yet
Normalization 1
14 pages
02 Querying Data On External Object Storage - v1 - 0 - DA016655
No ratings yet
02 Querying Data On External Object Storage - v1 - 0 - DA016655
11 pages
Sitecore Questions
No ratings yet
Sitecore Questions
4 pages
Unit 4 - Queue
No ratings yet
Unit 4 - Queue
10 pages
ADBMSFinal Exam 2016
No ratings yet
ADBMSFinal Exam 2016
7 pages
p2p and DHT in IT
No ratings yet
p2p and DHT in IT
27 pages
Create Mysql User
No ratings yet
Create Mysql User
4 pages
Class 12 Practicals 20 Prgs 5
No ratings yet
Class 12 Practicals 20 Prgs 5
59 pages
File Management
No ratings yet
File Management
26 pages
Fast Ambulance Database Mod
No ratings yet
Fast Ambulance Database Mod
7 pages
CIS-245 Final Project Winter 2010
No ratings yet
CIS-245 Final Project Winter 2010
2 pages
Microsoft SAP Integration and InnovationwithAzure
No ratings yet
Microsoft SAP Integration and InnovationwithAzure
31 pages
01 Introduction To Business Analytics
No ratings yet
01 Introduction To Business Analytics
14 pages
Big Data, Big Innovations: Collaborative, Self-Service Analytics Delivers Unprecedented Value
No ratings yet
Big Data, Big Innovations: Collaborative, Self-Service Analytics Delivers Unprecedented Value
4 pages
BDC
No ratings yet
BDC
5 pages

OD M4 Summary of Introduction To Data Engineering

Uploaded by

OD M4 Summary of Introduction To Data Engineering

Uploaded by

Proprietary + Confidential

● Data engineers build data pipelines.

● The primary role of a data engineer is to build data pipelines.

● Difference between a data lake and data warehouse.

● Google Cloud Storage as a data lake solution.

● BigQuery as a data warehouse solution.

● Differences between ETL, ELT and EL.

● Google Cloud reference architectures for ETL, ELT and EL.

Data Engineering learning path

Congratulations on completing Modernizing Data Lakes and Data Warehouses

You might also like