0% found this document useful (0 votes)

19 views

Unit-4

Uploaded by

dhanashree

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Unit-4

Uploaded by

dhanashree

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Unit-4: Data Management using Cloud Computing(12 Marks)

1. Architecture of Modern Data Pipeline:-

i. A data pipeline is a method to accept raw data from various
sources, processes this data to convert it into meaningful
information, and then push it into storage like a data lake or data
warehouse.

A Data Pipeline Architecture is a blueprint or framework for moving data

from various sources to a destination. It involves a sequence of steps or
stages that process data, starting with collecting raw data from multiple
sources and then transforming and preparing it for storage and analysis.
The architecture includes components for data ingestion, transformation,
storage, and delivery. The pipeline might also have various tools and
technologies, such as data integration platforms, data warehouses, and
data lakes, for storing and processing the data.

 TYPES OF DATA PIPELINES:-

Different types of data pipelines are conceived to serve specific data
processing and analysis requirements. Here are some common types of
data pipelines:
1. Batch data pipeline:
Batch data pipelines process data in large batches or sets at specific
intervals. They are suitable for scenarios where near real-time data
Processing is unnecessary and periodic updates or data refreshes are
sufficient.
Batch pipelines typically involve extracting data from various sources,
performing transformations, and loading the processed data into a target
destination.
1. Real-time data pipeline:
Real-time data pipelines handle data in a continuous and streaming
manner, processing data as it arrives or is generated. They are ideal
for scenarios that require immediate access to the most up-to-date
data and where real-time analytics or actions are necessary.
Real-time pipelines often involve ingesting data streams, performing
transformations or enrichments on the fly ,and delivering the
processed data to downstream systems or applications in real time.
2. Event-driven data pipeline:
Event-driven pipelines are triggered by specific events or actions,
such as data updates, system events, or user interactions. They
respond to these events and initiate data processing tasks
accordingly.
Event-driven pipelines are commonly used in scenarios where data
processing needs to be triggered by specific events rather than on a
predefined schedule.
 DATA PIPELINE ARCHITECTURE:-
A Data Pipeline Architecture is a blueprint or framework for moving data
from various sources to a destination. It involves a sequence of steps or
stages that process data, starting with collecting raw data from multiple
sources and then transforming and preparing it for storage and analysis.
The architecture includes components for data ingestion, transformation,
storage, and delivery. The pipeline might also have various tools and
technologies, such as data integration platforms, data warehouses, and data
lakes, for storing and processing the data.
Data pipeline architectures are crucial for efficient data management,
processing, and analysis in modern businesses and organizations.
We breakdown data pipeline architecture into a series of parts and
processes, including:
i. DATA INGESTION:-
Sources:
Data sources refer to any place or application from which data iscollected
for analysis, processing, or storage. Examples of data sources include
databases, data warehouses, cloud storage systems, files on local drives,
APIs, social media platforms, and sensor data from IoT devices.
Datacanbestructured,semi-structured,orunstructured,dependingonthe
source. The selection of the source fully depends on the intended use & the
requirements of the data pipeline or analytics application.
Joins
The data flows in from multiple sources. Joins are the logic implemented
to define how the data is combined. When performing joins between
different data sources, the process can be more complex than traditional
database joins due to differences in data structure, format, and storage.
Extraction
Data extraction is the process of extracting or retrieving specific data
from a larger dataset or source. This can involve parsing through
unstructured data to find relevant information or querying a database to
retrieve specific records or information.
Data extraction is an important part of data analysis, as it allows analysts
to focus on specific subsets of data and extract insights and findings from
that data.
DATA TRANSFORMATION
Standardization
Data standardization ,also known as data normalization, is the process
of transforming and organizing data into a consistent format that
adheres to predefined standards.
It involves applying set of rules or procedures to ensure that data from
different sources or systems are structured and formatted uniformly,
making it easier to compare, analyze, and integrate.
Data standardization typically involves the following steps:
 Data cleansing
 Data formatting
 Data categorization
 Data validation
 Data integration
Correction
Data correction ,also known as data cleansing or data scrubbing, refers to
the process of identifying and rectifying errors, inconsistencies,
inaccuracies, or discrepancies within a dataset.
DATASTORAGE
Load
In data engineering, data loading refers to the process of ingesting or
importing data from various sources into a target destination, such as a
database, data warehouse, or data lake.
It involves moving the data from its source format to a storage or
processing environment where it can be accessed, managed, and analyzed
effectively.
Automation
Data pipeline automation refers to the practice of automating the process
of creating, managing, and executing data pipelines.
A data pipeline is a series of interconnected steps that involve extracting,
transforming, and loading (ETL) data from various sources to a target
destination for analysis, reporting, or other purposes.
Automating this process helps streamline data workflows, improve
efficiency, and reduce manual intervention.
2. Data Pipeline Characteristics:-Only robust end-to-end data pipelines
can properly equip you to source, collect, manage, analyze, and
effectively use data so you can generate new market opportunities and
deliver cost-saving business processes. Modern data pipelines make
extracting information from the data you collect fast and efficient.
Characteristics to look for when considering a data pipeline include:
1. Continuous and extensible data processing
2. The elasticity and agility of the cloud
3. Isolated and independent resources for data processing
4. Democratized data access and self-service management
5. High availability and disaster recovery

3. Collecting and Ingesting Data:-

Data Collection: Definition: Data collection is the process of gathering raw data
from various sources and compiling it into a central location for analysis. It is
typically the first step in the data analysis process.

Diagram:

+ +
|Data Sources|
+ +
|
v
+ +
|Data Storage|
+ +

Data Ingestion : Definition :Data ingestion is the process of taking data from
various sources and preparing it for analysis. This can involve transforming the
data, cleaning it, and structuring it so that it can be easily analyzed.
Diagram:

+ +
|Data Sources|
+ +
|
v
+ +
|Data Ingestion|
+ +
|
v
+ +
|Data Storage|
+ +

Key Differences:
1. Data collection involves gathering raw data from various sources,
while data ingestion involves processing and preparing data for
analysis.
2. Data collection is typically a one-time process, while data ingestion can
be an ongoing process.
3. Data collection can involve manual entry of data, while data ingestion
is typically an automated process.
4. Data collection can be a time-consuming and resource-intensive process,
while data ingestion can be faster and more efficient.
5. Data collection is often done in a decentralized manner, while data
ingestion is typically centralized.

4. Transforming Data:-
Data transformation is the process of converting and cleaning raw data
from one data source to meet the requirements of its new location. Also
called data wrangling, transforming data is essential to ingestion
workflows that feed data warehouses and modern data lakes. Analytics
projects may also use data transformation to prepare warehoused data for
analysis.
Why is data transformation important?
Data transformation remains in dispensable because enterprise data
ecosystems remain stubbornly heterogeneous despite decades of
centralization and standardization initiatives. Each application and storage
system takes slightly different approaches to formatting and structuring
data. Organizational format, structure, and quality variations occur as
business domains and regional operations develop their own data systems.
Without data transformation, data analysts would have to fix these
inconsistencies each time they tried to combine two data sources. This
project- by-project approach consumes resources, risks variations
between analyses, and makes decision-making less effective.
The process of transforming data from multiple sources to meet a single
standard improves the efficiency of a company’s data analysis operations
by delivering the following benefits:
i. Data quality improvement
ii. Data Consistency
iii. Data Integration
iv. Data Analysis

5. Designing Pipeline:-

Step 1: Determine the goal in building data pipelines

Your first step when building data pipelines is to identify the outcome or value it
will offer your company or product. At this point, you’d ask questions like:
i. What are our objectives for this data pipeline?
ii. How do we measure the success of the data pipeline?
iii. What use cases will the data pipeline serve (reporting, analytics, machine
learning)?
iv. Who are the end-users of the data that this pipeline will produce? How will
that data help them meet their goals?
Step 2: Choose the data sources
In the next step, consider the possible data sources to enter the data pipeline. Ask
questions such as:
i. What are all the potential sources of data?
ii. In what format will the data come in (flat files, JSON, XML)?
iii. How will we connect to the data sources?
Step 3: Determine the data ingestion strategy
Now that you understand your pipeline goals and have defined data sources, it’s
time to ask questions about how the pipeline will collect the data. Ask questions
including:
i. Should we build our own data ingest pipelines in-house with python, airflow,
and other script ware?
ii. Would we be utilizing third-party integration tools to ingest the data?
iii. Are we going to be using intermediate data stores to store data as it flows to
the destination?
iv. Are we collecting data from the origin in predefined batches or in real time?
Step 4: Design the data processing plan
Once data is ingested, it must be processed and transformed for it to be valuable to
downstream systems. At this stage, you’ll ask questions like:
i. What data processing strategies are we utilizing on the data (ETL, ELT,
cleaning, formatting)?
ii. Are we going to be enriching the data with specific attributes?
iii. Are we using all the data or just a subset?
iv. How do we remove redundant data?
Step 5: Set up storage for the output of the pipeline
Once the data gets processed, we must determine the final storage destination for our
data to serve various business use cases. Ask questions including:
i. Are we going to be using big data stores like data warehouses or data lakes?
ii. Would the data be stored on cloud or on-premises?’
iii. Which of the data stores will serve our top use cases?
iv. In what format will the final data be stored?
Step 6: Plan the data workflow
Now, it’s time to design the sequencing of processes in the data pipeline. At this
stage, we ask questions such as:
i. What downstream jobs are dependent on the completion of an upstream job?
ii. Are there jobs that can run in parallel?
iii. How do we handle failed jobs?
Step 7: Implement a data monitoring and governance framework
You’ve almost built an entire data pipeline! Our second to final step includes
establishing a data monitoring and governance framework, which helps us observe
the data pipeline to ensure a healthy and efficient channel that’s reliable, secure, and
performs as required. In this step, we determine:
i. What needs to be monitored? Dropped records? Failed pipeline runs? Node
outages?
ii. How do we ensure data is secure and no sensitive data is exposed?
iii. How do we secure the machines running the data pipelines?
iv. Is the data pipeline meeting the delivery SLOs?
v. Who is in charge of data monitoring?
Step 8: Plan the data consumption layer
This final step determines the various services that’ll consume the processed data
from our data pipeline. At the data consumption layer, we ask questions such as:
i. What’s the best way to harness and utilize our data?
ii. Do we have all the data we need for our intended use case?
iii. How do our consumption tools connect to our data stores?

6. Evolving from ETL to ELT:-

What’s ETL
To simplify, ETL or Extract Transform Load is a data integration process that
involves extracting data from various sources, transforming it into a suitable
format(arranging it), and loading it into a target data warehouse or data hub. As
the name suggests, it involves:
Extract:
This phase involves retrieving data from disparate sources such as databases, flat
files, or APIs.
Transform:
Data is cleaned, standardized, aggregated, and manipulated to meet business
requirements. This includes data cleansing, formatting, calculations, and data
enrichment.
Load:
The transformed data is transferred into the target system, often a data warehouse,
for analysis and reporting.
ETL processes are critical for building data warehouses and enabling business
intelligence and advanced analytics capabilities.

Defining ELT
ELT is a data integration process where raw data is extracted from various sources
and loaded into a data lake or data warehouse without immediate transformation
(that’s done later). The data is transformed only when needed for specific analysis
or reporting. As the name suggests, it involves:
Extract:
Data is pulled from disparate sources.
Load:
Raw data is stored in a data lake or data warehouse in its original format.
Transform:
Data is transformed and processed as needed for specific queries or reports. This
approach uses cloud computing and big data technologies to handle large volumes
of data efficiently and at the right time.
ELT is often associated with cloud-based data warehousing and big data analytics
platforms.

The Shift from ETL to ELT: Evolving Data Integration:-

The shift from ETL to ELT represents more than just a change in process—it’s a
fundamental shift in how businesses handle their data. Data analytics
companies understand that the future is digital, and staying a step ahead requires
not just adapting to new technologies, but leading the way. Our mission is to help
businesses like yours use the power of data, ensuring that every data point
contributes to your business sustainability.
For decades, ETL has been the front face of data integration. As explained above,
the process involves extracting data from various sources, transforming it into a
suitable format, and then loading it into a data warehouse or other system for
analysis. While ETL has served us well, it comes with significant limitations.
 Data Latency: Traditional ETL processes often result in delays, meaning that by
the time data is ready for analysis, it may already be outdated/old.
 Complexity: ETL can be complex and time-consuming, requiring substantial
resources to manage the entire data transformation process.
 Cost: The infrastructure needed to support ETL can be expensive, particularly as
data volumes grow. It affects scalability all around.
 ELT flips the traditional model on its head. Instead of transforming data before
it’s loaded, ELT first loads the raw data into a data warehouse or data lake and
then performs transformations as needed. This shift offers many advantages:
 Better Agility: By loading data first, businesses can start working with their data
much sooner, allowing for more agile and responsive decision-making.
 Scalability: ELT is better suited for the massive datasets that are becoming the
norm today. It scales more easily and efficiently than traditional ETL processes.
 Cost-Efficiency: With ELT, businesses can utilize cloud-based data storage and
processing solutions, reducing the need for expensive on-premise infrastructure.

7. Delivering and sharing data:-

Data delivery in a modern data pipeline is the process of moving processed and
analyzed data to a target system or application. This can include a database, data
warehouse, reporting tool, or dashboard.

Here are some things to consider about data delivery and sharing in a modern data
pipeline:
 DataOps
Modern data pipelines are developed using the DataOps methodology, which
combines technologies and processes to automate data pipelines and shorten
development and delivery cycles.
 Data sharing
Amazon Redshift Data Sharing allows data producers to define permissions and grant
access to consumers. Consumers can access live copies of the data, which is still
owned by the producer.
 Data sources
Data pipelines can ingest, process, prepare, transform, and enrich structured,
unstructured, and semi-structured data.
 Data storage
Data pipelines can store data in a data lake or data warehouse. Data lakes are better
for organizations that need a large repository for raw data, while data warehouses
are better for organizations that need quick access to structured data.
 Data tools
Modern data pipeline tools combine drag-and-drop canvas builders with inline
code and query editors. These tools support open formats like Python or SQL for
transformation logic, and YAML for describing the pipeline topology.

Generative AI - A Beginner's Guide
No ratings yet
Generative AI - A Beginner's Guide
62 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Data Pipeline
No ratings yet
Data Pipeline
14 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
A Mobile Application For Dyslexia, Dysgraphia and Dyscalculia in Sinhala
No ratings yet
A Mobile Application For Dyslexia, Dysgraphia and Dyscalculia in Sinhala
49 pages
370013-063 ConfigGde Smartpack2 Smartpack-S Compack-Controllers 1v0 PDF
No ratings yet
370013-063 ConfigGde Smartpack2 Smartpack-S Compack-Controllers 1v0 PDF
56 pages
CCD UNIT 4
No ratings yet
CCD UNIT 4
5 pages
UNIT 1 To 5
No ratings yet
UNIT 1 To 5
37 pages
chp4 ccd
No ratings yet
chp4 ccd
8 pages
ccd 4,5,6
No ratings yet
ccd 4,5,6
21 pages
DZ Data Pipeline Essentials 2024
No ratings yet
DZ Data Pipeline Essentials 2024
6 pages
Data Pipeline Essentials: See Ya Later
No ratings yet
Data Pipeline Essentials: See Ya Later
6 pages
Data Models (Module - II)
No ratings yet
Data Models (Module - II)
101 pages
Ds 6
No ratings yet
Ds 6
7 pages
Data Transformation With Advanced Data Stack
No ratings yet
Data Transformation With Advanced Data Stack
35 pages
Course1_summary
No ratings yet
Course1_summary
4 pages
Pipeline
No ratings yet
Pipeline
19 pages
4.data Engineering
No ratings yet
4.data Engineering
9 pages
4-Data Processing Pipelines in Science and Business
100% (1)
4-Data Processing Pipelines in Science and Business
22 pages
N3 2020 Copy Updated
No ratings yet
N3 2020 Copy Updated
22 pages
Ch 05 Data Engineering.pptx (2)
No ratings yet
Ch 05 Data Engineering.pptx (2)
28 pages
What Is a Data Pipeline_ _ IBM
No ratings yet
What Is a Data Pipeline_ _ IBM
10 pages
Data Arch Base
No ratings yet
Data Arch Base
11 pages
DS Day 6
No ratings yet
DS Day 6
5 pages
Data_Engineering_Part_1__1735286787
No ratings yet
Data_Engineering_Part_1__1735286787
22 pages
Data Engineering Life Cycle
No ratings yet
Data Engineering Life Cycle
33 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
4 pages
20230314-EB-Transform Your Data Pipelines
No ratings yet
20230314-EB-Transform Your Data Pipelines
9 pages
Data Ingestion, Processing and Architecture Layers For Big Data and Iot
No ratings yet
Data Ingestion, Processing and Architecture Layers For Big Data and Iot
32 pages
8
No ratings yet
8
43 pages
Week 5 Chapter 6
No ratings yet
Week 5 Chapter 6
29 pages
WP Data Engineers Handbook
No ratings yet
WP Data Engineers Handbook
22 pages
Data Infrastructure
No ratings yet
Data Infrastructure
7 pages
DE Skills and Tools Guide
No ratings yet
DE Skills and Tools Guide
20 pages
Data Engineering UNIT-1 (2)
No ratings yet
Data Engineering UNIT-1 (2)
5 pages
12 Best Practices For Modern Data Integration: White Paper
100% (3)
12 Best Practices For Modern Data Integration: White Paper
10 pages
DocScanner 20 Oct 2024 2-19 PM
No ratings yet
DocScanner 20 Oct 2024 2-19 PM
16 pages
Data Pipeline Architecture
No ratings yet
Data Pipeline Architecture
6 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Data Engineering
No ratings yet
Data Engineering
6 pages
Data Processing
No ratings yet
Data Processing
26 pages
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
No ratings yet
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
10 pages
Chapter 2 EmTe
No ratings yet
Chapter 2 EmTe
37 pages
download
No ratings yet
download
4 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
32 pages
De Imp Qa
No ratings yet
De Imp Qa
12 pages
BDMA Part 2
No ratings yet
BDMA Part 2
16 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
D-REPORT (2) (1)
No ratings yet
D-REPORT (2) (1)
19 pages
Data Pipelines Explained
No ratings yet
Data Pipelines Explained
4 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
A Comprehensive Meta Model For The
No ratings yet
A Comprehensive Meta Model For The
61 pages
46556972
No ratings yet
46556972
67 pages
Oreilly Technical Guide Understanding Etl
No ratings yet
Oreilly Technical Guide Understanding Etl
107 pages
module 2-3 fuba midterms
100% (1)
module 2-3 fuba midterms
5 pages
DATA ENG
No ratings yet
DATA ENG
10 pages
Unit-2
No ratings yet
Unit-2
11 pages
Data Dictionary
No ratings yet
Data Dictionary
24 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
19.1 - Data Pipelines
No ratings yet
19.1 - Data Pipelines
18 pages
DSS ch2
No ratings yet
DSS ch2
112 pages
Document (22)
No ratings yet
Document (22)
4 pages
1stfam
No ratings yet
1stfam
2 pages
CPP REPORT (2)
No ratings yet
CPP REPORT (2)
17 pages
fam pracs
No ratings yet
fam pracs
29 pages
Unit no 5-1
No ratings yet
Unit no 5-1
16 pages
Unit 6-CCD
No ratings yet
Unit 6-CCD
23 pages
CCD Assignment2
No ratings yet
CCD Assignment2
1 page
CCD Assignment1
No ratings yet
CCD Assignment1
1 page
Unit II OSY Handout Revised 01.08.2023
No ratings yet
Unit II OSY Handout Revised 01.08.2023
11 pages
Osy notes
No ratings yet
Osy notes
38 pages
Unit I OSY Handout Revised 24.07.2023
No ratings yet
Unit I OSY Handout Revised 24.07.2023
20 pages
Unit IV OSY Handout Revised 07.08.2023
No ratings yet
Unit IV OSY Handout Revised 07.08.2023
33 pages
Unit2 CloudArchitecture Notes
No ratings yet
Unit2 CloudArchitecture Notes
17 pages
Unit III OSY Handout Revised 24.07.2023
No ratings yet
Unit III OSY Handout Revised 24.07.2023
15 pages
Finpro Feature
No ratings yet
Finpro Feature
2 pages
Important Dos Commands
No ratings yet
Important Dos Commands
14 pages
C Programming
No ratings yet
C Programming
186 pages
Muhammad Nawaz Resume ATS
No ratings yet
Muhammad Nawaz Resume ATS
1 page
Sinag Fighting Game
No ratings yet
Sinag Fighting Game
5 pages
AOS-PPT
No ratings yet
AOS-PPT
7 pages
Ass 3 Com Crime PDF
No ratings yet
Ass 3 Com Crime PDF
12 pages
Engine Test
No ratings yet
Engine Test
3 pages
Instant Download Professional Content Management Systems Handling Digital Media Assets 1st Edition Dr Andreas Mauthe PDF All Chapters
100% (20)
Instant Download Professional Content Management Systems Handling Digital Media Assets 1st Edition Dr Andreas Mauthe PDF All Chapters
60 pages
Security Administration
No ratings yet
Security Administration
24 pages
About Department-Electronics & Information Technology
No ratings yet
About Department-Electronics & Information Technology
8 pages
Chart of Accounts in Oracle Fusion Financials
No ratings yet
Chart of Accounts in Oracle Fusion Financials
18 pages
1 Vision Lec 1
No ratings yet
1 Vision Lec 1
49 pages
Instrumentation and Control Engineering
100% (1)
Instrumentation and Control Engineering
28 pages
Keda Case Report
No ratings yet
Keda Case Report
4 pages
Unit 2 Lec 1 Cloud Computing
No ratings yet
Unit 2 Lec 1 Cloud Computing
40 pages
LTRSPG-2212
No ratings yet
LTRSPG-2212
62 pages
Dinah Aryani - 3.3.2.5 Packet Tracer - Threat Modeling at The IoT Device Layer
No ratings yet
Dinah Aryani - 3.3.2.5 Packet Tracer - Threat Modeling at The IoT Device Layer
14 pages
T8990 Depliant ENG PDF
No ratings yet
T8990 Depliant ENG PDF
2 pages
LabVIEW Mostly Missed Question in CLAD
No ratings yet
LabVIEW Mostly Missed Question in CLAD
39 pages
Onion Steganography
No ratings yet
Onion Steganography
61 pages
Mitasova Et Al 2011 Lanscape Evolution
No ratings yet
Mitasova Et Al 2011 Lanscape Evolution
16 pages
Dental Clinic Management System
No ratings yet
Dental Clinic Management System
10 pages
Atos Servo Valve TFS330
No ratings yet
Atos Servo Valve TFS330
10 pages
Java Architecture - JDK,JRE,JVM,JIT
No ratings yet
Java Architecture - JDK,JRE,JVM,JIT
6 pages
Netapp Admin Guide
100% (1)
Netapp Admin Guide
325 pages
3DCreative Issue 125
100% (1)
3DCreative Issue 125
103 pages

Unit-4

Uploaded by

Unit-4

Uploaded by

Unit-4: Data Management using Cloud Computing(12 Marks)

1. Architecture of Modern Data Pipeline:-

A Data Pipeline Architecture is a blueprint or framework for moving data

 TYPES OF DATA PIPELINES:-

3. Collecting and Ingesting Data:-

Step 1: Determine the goal in building data pipelines

6. Evolving from ETL to ELT:-

The Shift from ETL to ELT: Evolving Data Integration:-

7. Delivering and sharing data:-

You might also like