SlideShare a Scribd company logo
Introduction
Training
Healthcare Analysis
Project
Telecom Customer
Revenue Analysis
Conclusio
n
1 2 3 4 5
Overview
Introduction
Data engineering involves designing and maintaining systems for handling and analyzing
large volumes of data from diverse sources, crucial for enabling data-driven decision-making
across industries.
Sigmoid excels in Data Engineering and Data Science services, delivering innovative
solutions tailored to diverse industries. Leading in ML and AI, we specialize in providing top-
tier data solutions.
source : medium block by narendrababuoggu
Training
• Python
• Sq
l
• Data Extraction
• AWS
• Snowflake
• Scal
a
• Data Pipeline
• Mongodb
• Spar
k
Programming Languages Database Data Engineering Tools
Data Processing
OLTP (Online Transaction Processing):
• Tailored for handling a high volume of small, straightforward transactions in real-time.
• Commonly employed in transactional systems like e-commerce platforms, banking applications, and inventory
management systems.
• Geared towards supporting real-time, interactive operations necessitating high concurrency, minimal latency, and
data consistency.
• Utilizes normalized data schema, efficient indexing techniques, and adheres to ACID principles.
OLAP (Online Analytical Processing):
• Specialized in managing extensive and intricate datasets.
• Primarily utilized in analytical systems such as business intelligence platforms, data mining tools, and decision
support systems.
• Geared towards facilitating ad-hoc querying, data analysis, and generating reports.
• Operates with lower concurrency requirements but higher throughput, focusing on data aggregation and complex
analytics.
• Typically relies on data warehousing solutions, ETL processes, and dedicated OLAP servers for data processing
and analysis.
Monitoring, Reporting and Analytics
Data Lake:
• Serves as a comprehensive storage solution for raw, unprocessed data from diverse sources, maintaining its
original format.
• Acts as a centralized hub for scalable data ingestion, storage, and processing.
• Facilitates schema flexibility and exploratory data analysis, enabling dynamic insights and late-binding
analytics.
• Tailored for big data processing and real-time ingestion, commonly leveraging Hadoop-based technologies.
Data Warehouse:
• Functions as a centralized repository for structured and processed data, harmonized into a unified schema for
efficient querying and analysis.
• Employs ETL processes to transform and load data into a standardized schema, enabling complex SQL
queries and OLAP operations.
• Designed for historical data analysis and strategic decision-making.
• Typically utilizes traditional RDBMS solutions like Oracle, SQL Server, or PostgreSQL for data warehousing
needs.
Health Care Analysis
Entity Relationship
Diagram
Solution Architecture
Sources ETL tools Storage/Analysis Visualization
ETL Pipeline for the snowflake problem statement
ETL Pipeline for the snowflake problem statement
A report that shows for each state how many people underwent treatment for the disease “Autism”.
ETL Pipeline for the snowflake problem statement
For each age(in years), how many patients have gone for treatment?
For each age(in years), how many patients have gone for treatment?
Telecom Customer Revenue Analysis Project
• This project focuses on analyzing customer behavior and revenue generation for a
telecom company.
• It involves gathering and processing data from multiple sources, including call records,
billing data, demographics, and other relevant information, which has already been
provided.
• The primary goal is to uncover patterns and trends in consumer behavior and usage,
aiming to enhance profitability and enhance customer satisfaction.
Project Workflow
Data
Extractio
n
Data
Transformation
Data
Loading
Producer
System
Data
Warehouse
Data
Analysi
s
Data
Visualizationi
Phase 1 :
Analyzing Data
Quality
Data Cleaning Upload data to
MongoDB
• Use Pandas DataFram to
identify columns with
missing values, null
values
• Fill missing values with
appropriate strategies
like mean, median,
mode, forward/backward
filling.
• Perform any additional
data cleaning tasks such
as converting data types
or removing duplicates
as needed.
• Using pymongo,
establish a connnection
with mongoDB
• Iterate through the JSON
data and insert each row
into the appropriate
MongoDB collection.
Extraction, Transform, Load
Phase 2 : Create a Producer System
• A Kafka producer application is developed using Python to interact with the Kafka cluster
• We use the argparse module to parse command-line arguments to specify the interval
between producing messages (in seconds).
Phase 3 : Setup Data Warehouse and Load cleansed data
Define Database
and Schema
Define Tables Load
Data
• Create a database
and schema to
organize your data
• Create tables within the
schema to represent your
cleansed data.
• Define appropriate column
names, data types, and
constraints based on data
requirements
• Use Snowsql to load data
into Snowflake.
• Use COPY INTO method.
Phase 4 : Enrich
Data
• Using Apache Spark, the data stream from, Kafka is consumed.
• Joins are performed with the datasets based on the unique identification numbers.
Kafka
Snowflake
Spark Snowflake
Phase 5 : Data Analysis
• Using Snowflake to derive actionable insights and uncover meaningful patterns from the
enriched dataset.
• By aggregating and summarizing the data at different granularities, such as overall and week-
wise, comprehensive insights into customer behavior and revenue generation trends are
obtained.
• Snowflake’s ability to handle complex queries and process large volumes of data efficiently
enabled to make informed decisions regarding revenue optimization, customer retention
strategies, and service enhancements.
Phase 6 : Workflow Orchestration
• Using Airflow’s Directed Acyclic Graphs (DAGs), a series of tasks are defined to encompass the
entire data pipeline, from data ingestion to analysis and visualization.
• By defining dependencies between tasks, Airflow ensures that each step wis executed in the
correct order and that subsequent tasks are only triggered upon successful completion of
prerequisite tasks.
• Snowflake’s ability to handle complex queries and process large volumes of data efficiently
enabled to make informed decisions regarding revenue optimization, customer retention
strategies, and service enhancements.
Ad

More Related Content

Similar to ETL Pipeline for the snowflake problem statement (20)

Analysis technologies - day3 slides Lecture notesppt
Analysis technologies - day3 slides Lecture notespptAnalysis technologies - day3 slides Lecture notesppt
Analysis technologies - day3 slides Lecture notesppt
PerumalPitchandi
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
argonauts007
 
real time data processing is a tsubtopic in the topic in the domain bigdata
real time data processing is a tsubtopic in the topic in the domain bigdatareal time data processing is a tsubtopic in the topic in the domain bigdata
real time data processing is a tsubtopic in the topic in the domain bigdata
ArasuVishnu
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
martinbpeters
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
Sunderland City Council
 
158001210111bapan data warehousepptse.pptx
158001210111bapan data warehousepptse.pptx158001210111bapan data warehousepptse.pptx
158001210111bapan data warehousepptse.pptx
BapanKar2
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Juhi Mahajan
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
GouriShankar_Informatica
GouriShankar_InformaticaGouriShankar_Informatica
GouriShankar_Informatica
Gouri Shankar M
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
Selvaraj Kesavan
 
OLAP (Online Analytical Processing).pptx
OLAP (Online Analytical Processing).pptxOLAP (Online Analytical Processing).pptx
OLAP (Online Analytical Processing).pptx
lalitajites
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
eramfatima43
 
Slide Share MDW Modern Data Warehouse DWH
Slide Share MDW Modern Data Warehouse DWHSlide Share MDW Modern Data Warehouse DWH
Slide Share MDW Modern Data Warehouse DWH
MahmoudTalaat52
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
Kushal Singh
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
Samraiz Tejani
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Devyani Vaidya
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Devyani Vaidya
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Devyani Vaidya
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdf
Datacademy.ai
 
Analysis technologies - day3 slides Lecture notesppt
Analysis technologies - day3 slides Lecture notespptAnalysis technologies - day3 slides Lecture notesppt
Analysis technologies - day3 slides Lecture notesppt
PerumalPitchandi
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
argonauts007
 
real time data processing is a tsubtopic in the topic in the domain bigdata
real time data processing is a tsubtopic in the topic in the domain bigdatareal time data processing is a tsubtopic in the topic in the domain bigdata
real time data processing is a tsubtopic in the topic in the domain bigdata
ArasuVishnu
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
martinbpeters
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
Sunderland City Council
 
158001210111bapan data warehousepptse.pptx
158001210111bapan data warehousepptse.pptx158001210111bapan data warehousepptse.pptx
158001210111bapan data warehousepptse.pptx
BapanKar2
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
GouriShankar_Informatica
GouriShankar_InformaticaGouriShankar_Informatica
GouriShankar_Informatica
Gouri Shankar M
 
OLAP (Online Analytical Processing).pptx
OLAP (Online Analytical Processing).pptxOLAP (Online Analytical Processing).pptx
OLAP (Online Analytical Processing).pptx
lalitajites
 
BD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdfBD_Architecture and Charateristics.pptx.pdf
BD_Architecture and Charateristics.pptx.pdf
eramfatima43
 
Slide Share MDW Modern Data Warehouse DWH
Slide Share MDW Modern Data Warehouse DWHSlide Share MDW Modern Data Warehouse DWH
Slide Share MDW Modern Data Warehouse DWH
MahmoudTalaat52
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
Kushal Singh
 
Harness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data LakeHarness the power of Data in a Big Data Lake
Harness the power of Data in a Big Data Lake
Saurabh K. Gupta
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
Samraiz Tejani
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdf
Datacademy.ai
 

Recently uploaded (20)

Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...15th International Conference on Computer Science, Engineering and Applicatio...
15th International Conference on Computer Science, Engineering and Applicatio...
IJCSES Journal
 
Degree_of_Automation.pdf for Instrumentation and industrial specialist
Degree_of_Automation.pdf for  Instrumentation  and industrial specialistDegree_of_Automation.pdf for  Instrumentation  and industrial specialist
Degree_of_Automation.pdf for Instrumentation and industrial specialist
shreyabhosale19
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Value Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous SecurityValue Stream Mapping Worskshops for Intelligent Continuous Security
Value Stream Mapping Worskshops for Intelligent Continuous Security
Marc Hornbeek
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
railway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forgingrailway wheels, descaling after reheating and before forging
railway wheels, descaling after reheating and before forging
Javad Kadkhodapour
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfRICS Membership-(The Royal Institution of Chartered Surveyors).pdf
RICS Membership-(The Royal Institution of Chartered Surveyors).pdf
MohamedAbdelkader115
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Ad

ETL Pipeline for the snowflake problem statement

  • 2. Introduction Data engineering involves designing and maintaining systems for handling and analyzing large volumes of data from diverse sources, crucial for enabling data-driven decision-making across industries. Sigmoid excels in Data Engineering and Data Science services, delivering innovative solutions tailored to diverse industries. Leading in ML and AI, we specialize in providing top- tier data solutions.
  • 3. source : medium block by narendrababuoggu
  • 4. Training • Python • Sq l • Data Extraction • AWS • Snowflake • Scal a • Data Pipeline • Mongodb • Spar k Programming Languages Database Data Engineering Tools
  • 5. Data Processing OLTP (Online Transaction Processing): • Tailored for handling a high volume of small, straightforward transactions in real-time. • Commonly employed in transactional systems like e-commerce platforms, banking applications, and inventory management systems. • Geared towards supporting real-time, interactive operations necessitating high concurrency, minimal latency, and data consistency. • Utilizes normalized data schema, efficient indexing techniques, and adheres to ACID principles. OLAP (Online Analytical Processing): • Specialized in managing extensive and intricate datasets. • Primarily utilized in analytical systems such as business intelligence platforms, data mining tools, and decision support systems. • Geared towards facilitating ad-hoc querying, data analysis, and generating reports. • Operates with lower concurrency requirements but higher throughput, focusing on data aggregation and complex analytics. • Typically relies on data warehousing solutions, ETL processes, and dedicated OLAP servers for data processing and analysis.
  • 6. Monitoring, Reporting and Analytics Data Lake: • Serves as a comprehensive storage solution for raw, unprocessed data from diverse sources, maintaining its original format. • Acts as a centralized hub for scalable data ingestion, storage, and processing. • Facilitates schema flexibility and exploratory data analysis, enabling dynamic insights and late-binding analytics. • Tailored for big data processing and real-time ingestion, commonly leveraging Hadoop-based technologies. Data Warehouse: • Functions as a centralized repository for structured and processed data, harmonized into a unified schema for efficient querying and analysis. • Employs ETL processes to transform and load data into a standardized schema, enabling complex SQL queries and OLAP operations. • Designed for historical data analysis and strategic decision-making. • Typically utilizes traditional RDBMS solutions like Oracle, SQL Server, or PostgreSQL for data warehousing needs.
  • 7. Health Care Analysis Entity Relationship Diagram
  • 8. Solution Architecture Sources ETL tools Storage/Analysis Visualization
  • 11. A report that shows for each state how many people underwent treatment for the disease “Autism”.
  • 13. For each age(in years), how many patients have gone for treatment?
  • 14. For each age(in years), how many patients have gone for treatment?
  • 15. Telecom Customer Revenue Analysis Project • This project focuses on analyzing customer behavior and revenue generation for a telecom company. • It involves gathering and processing data from multiple sources, including call records, billing data, demographics, and other relevant information, which has already been provided. • The primary goal is to uncover patterns and trends in consumer behavior and usage, aiming to enhance profitability and enhance customer satisfaction.
  • 17. Phase 1 : Analyzing Data Quality Data Cleaning Upload data to MongoDB • Use Pandas DataFram to identify columns with missing values, null values • Fill missing values with appropriate strategies like mean, median, mode, forward/backward filling. • Perform any additional data cleaning tasks such as converting data types or removing duplicates as needed. • Using pymongo, establish a connnection with mongoDB • Iterate through the JSON data and insert each row into the appropriate MongoDB collection. Extraction, Transform, Load
  • 18. Phase 2 : Create a Producer System • A Kafka producer application is developed using Python to interact with the Kafka cluster • We use the argparse module to parse command-line arguments to specify the interval between producing messages (in seconds).
  • 19. Phase 3 : Setup Data Warehouse and Load cleansed data Define Database and Schema Define Tables Load Data • Create a database and schema to organize your data • Create tables within the schema to represent your cleansed data. • Define appropriate column names, data types, and constraints based on data requirements • Use Snowsql to load data into Snowflake. • Use COPY INTO method.
  • 20. Phase 4 : Enrich Data • Using Apache Spark, the data stream from, Kafka is consumed. • Joins are performed with the datasets based on the unique identification numbers. Kafka Snowflake Spark Snowflake
  • 21. Phase 5 : Data Analysis • Using Snowflake to derive actionable insights and uncover meaningful patterns from the enriched dataset. • By aggregating and summarizing the data at different granularities, such as overall and week- wise, comprehensive insights into customer behavior and revenue generation trends are obtained. • Snowflake’s ability to handle complex queries and process large volumes of data efficiently enabled to make informed decisions regarding revenue optimization, customer retention strategies, and service enhancements.
  • 22. Phase 6 : Workflow Orchestration • Using Airflow’s Directed Acyclic Graphs (DAGs), a series of tasks are defined to encompass the entire data pipeline, from data ingestion to analysis and visualization. • By defining dependencies between tasks, Airflow ensures that each step wis executed in the correct order and that subsequent tasks are only triggered upon successful completion of prerequisite tasks. • Snowflake’s ability to handle complex queries and process large volumes of data efficiently enabled to make informed decisions regarding revenue optimization, customer retention strategies, and service enhancements.