SlideShare a Scribd company logo
Kettle – ETL Tool
Sreenivas K
Agenda

Introduction
− ETL Process
− Pentaho's Kettle

Data Integration Challenges

Prerequisites and Recent Releases

Pentaho DI Components

Spoon
− Transformations
− Jobs
Introduction – ETL Process

Major Components
− Extracting

Gathering raw data from source systems and storing it in ETL staging
environment

Data Profiling

Identifying data that changed since last load.
− Transforming- Cleaning and Conforming

Processing data to improve its quality, format it, merge from multiple
sources, enforce conformed dimensions

Data cleansing

Recording error events

Audit dimensions

Creating and maintaining conformed dimensions and facts
Introduction – ETL Process
− Loading

Loading data into data warehouse tables

Managing hierarchies in dimensions

Managing special dimensions such as date and time, junk, mini, shrunken,
small static, and user-maintained dimensions

Fact table loading

Building and maintaining bridge dimension tables

Handling late arriving data

Management of conformed dimensions

Administration of fact tables

Building aggregations

Building OLAP cubes

Transferring DW data to other environment for specific purposes
Data Transformation and
Integration Examples

Data filtering
− Is not null, greater than, less than, includes

Field manipulation
− Trimming, padding, upper and lowercase conversion

Data calculations
− + - X / , average, absolute value, arctangent, natural logarithm

Date manipulation
− First day of month, Last day of month, add months, week of year, day of year

Data type conversion
− String to number, number to string, date to number

Merging fields & splitting fields

Looking up date
− Look up in a database, in a text file, an excel sheet, …
Introduction – Pentaho Kettle

Kettle – Kettle Extraction Transformation Transportation &
Loading tool

Its open source business intelligence suite for powerful
data integration by Pentaho. Founded in 2004.

Products of Pentaho
− Mondrain – OLAP server written in Java
− Kettle – ETL tool
Data Integration - Challenges

Data is everywhere

Data is inconsistent
− Records are different in each system

Performance issues
− Running queries to summarize data for stipulated
long period takes operating system for task

Data is never all in Data Warehouse
− Excel sheet, acquisition, new application
Prerequisites Recent Releases

Java Runtime Environment
1.5 and above

Compatible with almost any
platform

Compatible with wide range
of Databases technologies.

4/25 Data Integration 3.0.3 GA

4/18 Data Integration 3.1 Milestone

2/8 Data Integration 3.0.2 GA

12/12 Data Integration 3.0.1 GA

11/15 Data Integration 3.0 GA

10/31 Data Integration 3.0 RC2

10/24 Data Integration 2.5.2 GA

10/08 Data Integration 3.0 RC1

08/24 Data Integration 2.5.1 GA
Pentaho Components

Spoon
− GUI that allows you to design transformations and jobs that can
be run with the Kettle tools — Pan and Kitchen
− Transformations and Jobs can describe themselves using an XML
file or can be put in a Kettle database repository.
− Spoon is available as executable script and batch file to make use
of tool in heterogeneous environment.

Pan
− A program to execute transformations designed by Spoon in XML or
database repository.
− Transformations are scheduled in batch mode to be run automatically at
regular intervals

Kitchen
− Execute jobs designed by Spoon in XML or database repository

Repository Connection establishment

Auto login
− By setting manually KETTLE_REPOSITORY,
KETTLE_USER and KETTLE_PASSWORD
environmental variables.

Login
− By default PDI provides login username and
password ad admin.
Pentaho etl-tool
Pentaho etl-tool
Pentaho etl-tool

Transformation
− Value: Values are part of a row
and can contain any type of data
− Row: a row exists of 0 or more
values
− Output stream: an output
stream is a stack of rows that
leaves a step.
− Input stream: an input stream is
a stack of rows that enters a
step.
− Hop: A hop is a graphical
representation of one or more
data streams between 2 steps.
− Note: A note is a piece of
information that can be added to
a transformation
Engine capable of performing a
multitude of functions such as reading,
manipulating and writing data to and
from various data sources.

Jobs
− Job Entry: A job entry is
one part of a job and
performs a certain
− Hop: A hop is a graphical
representation of one or
more data streams
between 2 steps
− Note: a note is a piece of
information that can be added to
a job
A way of calling transformations and
controlling the sequence of their
execution. Usually jobs are
scheduled in batch mode to be run
automatically at regular intervals.
Input Steps
Output Steps
Lookup Steps
Transformation
Steps
Join Steps
DW Steps
Mapping Steps
Job Steps
Pentaho etl-tool
Pentaho etl-tool
Pentaho etl-tool
Ad

More Related Content

What's hot (20)

An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
Sivakumar Ramar
 
Power BI & SAP - Integration Options and possible Pifalls
Power BI & SAP - Integration Options and possible PifallsPower BI & SAP - Integration Options and possible Pifalls
Power BI & SAP - Integration Options and possible Pifalls
JJDE
 
Presentation 1 - SSRS (1)
Presentation 1 - SSRS (1)Presentation 1 - SSRS (1)
Presentation 1 - SSRS (1)
Anurag Rana
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
DataminingTools Inc
 
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
Edureka!
 
Pentaho data integration 4.0 and my sql
Pentaho data integration 4.0 and my sqlPentaho data integration 4.0 and my sql
Pentaho data integration 4.0 and my sql
AHMED ENNAJI
 
Pentaho
PentahoPentaho
Pentaho
teza123
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BI
Edureka!
 
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdfBest-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
ssuserf8f9b2
 
SSIS Presentation
SSIS PresentationSSIS Presentation
SSIS Presentation
BarbaraBederman
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
Databricks
 
Datastage ppt
Datastage pptDatastage ppt
Datastage ppt
Newyorksys.com
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Edureka!
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
Knoldus Inc.
 
Power bi
Power biPower bi
Power bi
Lakshmi Prasanna Kottagorla
 
1\9.SSIS 2008R2_Training - Introduction to SSIS
1\9.SSIS 2008R2_Training - Introduction to SSIS1\9.SSIS 2008R2_Training - Introduction to SSIS
1\9.SSIS 2008R2_Training - Introduction to SSIS
Pramod Singla
 
Power BI
Power BIPower BI
Power BI
Stéphane Fréchette
 
Tableau 7.0 prsentation
Tableau 7.0 prsentationTableau 7.0 prsentation
Tableau 7.0 prsentation
inam_slides
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
Sivakumar Ramar
 
Power BI & SAP - Integration Options and possible Pifalls
Power BI & SAP - Integration Options and possible PifallsPower BI & SAP - Integration Options and possible Pifalls
Power BI & SAP - Integration Options and possible Pifalls
JJDE
 
Presentation 1 - SSRS (1)
Presentation 1 - SSRS (1)Presentation 1 - SSRS (1)
Presentation 1 - SSRS (1)
Anurag Rana
 
MS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data miningMS SQL SERVER: SSIS and data mining
MS SQL SERVER: SSIS and data mining
DataminingTools Inc
 
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
SSIS Tutorial For Beginners | SQL Server Integration Services (SSIS) | MSBI T...
Edureka!
 
Pentaho data integration 4.0 and my sql
Pentaho data integration 4.0 and my sqlPentaho data integration 4.0 and my sql
Pentaho data integration 4.0 and my sql
AHMED ENNAJI
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Pentaho-BI
Pentaho-BIPentaho-BI
Pentaho-BI
Edureka!
 
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdfBest-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
ssuserf8f9b2
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
Databricks
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Edureka!
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
Knoldus Inc.
 
1\9.SSIS 2008R2_Training - Introduction to SSIS
1\9.SSIS 2008R2_Training - Introduction to SSIS1\9.SSIS 2008R2_Training - Introduction to SSIS
1\9.SSIS 2008R2_Training - Introduction to SSIS
Pramod Singla
 
Tableau 7.0 prsentation
Tableau 7.0 prsentationTableau 7.0 prsentation
Tableau 7.0 prsentation
inam_slides
 

Viewers also liked (14)

Pentaho PDI
Pentaho PDIPentaho PDI
Pentaho PDI
Joao Gutheil
 
Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data Sharing
Safe Software
 
vertica_tmp_4.5
vertica_tmp_4.5vertica_tmp_4.5
vertica_tmp_4.5
Hwang Andrew
 
Penatho
PenathoPenatho
Penatho
chenvi123
 
Hire Pentaho Developer | BI Tools
Hire Pentaho Developer | BI ToolsHire Pentaho Developer | BI Tools
Hire Pentaho Developer | BI Tools
eLuminous Technologies Pvt. Ltd.
 
Hybrid & Logical Data Warehouse
Hybrid & Logical Data WarehouseHybrid & Logical Data Warehouse
Hybrid & Logical Data Warehouse
Heungsoon Yang
 
Pentaho ETL ハンズオン
Pentaho ETL ハンズオンPentaho ETL ハンズオン
Pentaho ETL ハンズオン
Teruo Kawasaki
 
Open Source Reporting Tool Comparison
Open Source Reporting Tool ComparisonOpen Source Reporting Tool Comparison
Open Source Reporting Tool Comparison
Rogue Wave Software
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
Ashnikbiz
 
TeraStream for ETL
TeraStream for ETLTeraStream for ETL
TeraStream for ETL
치민 최
 
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Channy Yun
 
빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)
Channy Yun
 
Informatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools ComparisonInformatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools Comparison
Roberto Espinosa
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
Xpand IT
 
Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data Sharing
Safe Software
 
Hybrid & Logical Data Warehouse
Hybrid & Logical Data WarehouseHybrid & Logical Data Warehouse
Hybrid & Logical Data Warehouse
Heungsoon Yang
 
Pentaho ETL ハンズオン
Pentaho ETL ハンズオンPentaho ETL ハンズオン
Pentaho ETL ハンズオン
Teruo Kawasaki
 
Open Source Reporting Tool Comparison
Open Source Reporting Tool ComparisonOpen Source Reporting Tool Comparison
Open Source Reporting Tool Comparison
Rogue Wave Software
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
Ashnikbiz
 
TeraStream for ETL
TeraStream for ETLTeraStream for ETL
TeraStream for ETL
치민 최
 
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Daum 내부 빅데이터 및 클라우드 기술 활용 사례- 윤석찬 (2012)
Channy Yun
 
빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)빅데이터 기술 현황과 시장 전망(2014)
빅데이터 기술 현황과 시장 전망(2014)
Channy Yun
 
Informatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools ComparisonInformatica Pentaho Etl Tools Comparison
Informatica Pentaho Etl Tools Comparison
Roberto Espinosa
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
Xpand IT
 
Ad

Similar to Pentaho etl-tool (20)

Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
jade_22
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
Dr Anjan Krishnamurthy
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfolio
rolee23
 
Datawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterDatawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot faster
John Leonard
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
ganblues
 
ETL (1).ppt
ETL (1).pptETL (1).ppt
ETL (1).ppt
ssuser98bffa1
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
SingleStore
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
Jeff McQuigg
 
Dan Querimit - BI Portfolio
Dan Querimit - BI PortfolioDan Querimit - BI Portfolio
Dan Querimit - BI Portfolio
querimit
 
ETL
ETL ETL
ETL
butest
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
camyla81
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
ssuser8ccb5a
 
Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODI
Nagendra K
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
Ramesh Ch
 
Data migration
Data migrationData migration
Data migration
Vatsala Chauhan
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
Amit Kumar
 
Data automation 101
Data automation 101Data automation 101
Data automation 101
Yosua Michael Maranatha
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
Ramesh Ch
 
AIRflow at Scale
AIRflow at ScaleAIRflow at Scale
AIRflow at Scale
Digital Vidya
 
Informatica overview
Informatica overviewInformatica overview
Informatica overview
Swetha Naveen
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
jade_22
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfolio
rolee23
 
Datawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot fasterDatawa.re: Data warehouse design, development and support just got alot faster
Datawa.re: Data warehouse design, development and support just got alot faster
John Leonard
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
ganblues
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data AnalyticsStrata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
SingleStore
 
ELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_JeffELT Publishing Tool Overview V3_Jeff
ELT Publishing Tool Overview V3_Jeff
Jeff McQuigg
 
Dan Querimit - BI Portfolio
Dan Querimit - BI PortfolioDan Querimit - BI Portfolio
Dan Querimit - BI Portfolio
querimit
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
camyla81
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
ssuser8ccb5a
 
Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODI
Nagendra K
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
Ramesh Ch
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
Amit Kumar
 
Ramesh BODS_IS
Ramesh BODS_ISRamesh BODS_IS
Ramesh BODS_IS
Ramesh Ch
 
Informatica overview
Informatica overviewInformatica overview
Informatica overview
Swetha Naveen
 
Ad

Recently uploaded (20)

Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 

Pentaho etl-tool

  • 1. Kettle – ETL Tool Sreenivas K
  • 2. Agenda  Introduction − ETL Process − Pentaho's Kettle  Data Integration Challenges  Prerequisites and Recent Releases  Pentaho DI Components  Spoon − Transformations − Jobs
  • 3. Introduction – ETL Process  Major Components − Extracting  Gathering raw data from source systems and storing it in ETL staging environment  Data Profiling  Identifying data that changed since last load. − Transforming- Cleaning and Conforming  Processing data to improve its quality, format it, merge from multiple sources, enforce conformed dimensions  Data cleansing  Recording error events  Audit dimensions  Creating and maintaining conformed dimensions and facts
  • 4. Introduction – ETL Process − Loading  Loading data into data warehouse tables  Managing hierarchies in dimensions  Managing special dimensions such as date and time, junk, mini, shrunken, small static, and user-maintained dimensions  Fact table loading  Building and maintaining bridge dimension tables  Handling late arriving data  Management of conformed dimensions  Administration of fact tables  Building aggregations  Building OLAP cubes  Transferring DW data to other environment for specific purposes
  • 5. Data Transformation and Integration Examples  Data filtering − Is not null, greater than, less than, includes  Field manipulation − Trimming, padding, upper and lowercase conversion  Data calculations − + - X / , average, absolute value, arctangent, natural logarithm  Date manipulation − First day of month, Last day of month, add months, week of year, day of year  Data type conversion − String to number, number to string, date to number  Merging fields & splitting fields  Looking up date − Look up in a database, in a text file, an excel sheet, …
  • 6. Introduction – Pentaho Kettle  Kettle – Kettle Extraction Transformation Transportation & Loading tool  Its open source business intelligence suite for powerful data integration by Pentaho. Founded in 2004.  Products of Pentaho − Mondrain – OLAP server written in Java − Kettle – ETL tool
  • 7. Data Integration - Challenges  Data is everywhere  Data is inconsistent − Records are different in each system  Performance issues − Running queries to summarize data for stipulated long period takes operating system for task  Data is never all in Data Warehouse − Excel sheet, acquisition, new application
  • 8. Prerequisites Recent Releases  Java Runtime Environment 1.5 and above  Compatible with almost any platform  Compatible with wide range of Databases technologies.  4/25 Data Integration 3.0.3 GA  4/18 Data Integration 3.1 Milestone  2/8 Data Integration 3.0.2 GA  12/12 Data Integration 3.0.1 GA  11/15 Data Integration 3.0 GA  10/31 Data Integration 3.0 RC2  10/24 Data Integration 2.5.2 GA  10/08 Data Integration 3.0 RC1  08/24 Data Integration 2.5.1 GA
  • 9. Pentaho Components  Spoon − GUI that allows you to design transformations and jobs that can be run with the Kettle tools — Pan and Kitchen − Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository. − Spoon is available as executable script and batch file to make use of tool in heterogeneous environment.  Pan − A program to execute transformations designed by Spoon in XML or database repository. − Transformations are scheduled in batch mode to be run automatically at regular intervals  Kitchen − Execute jobs designed by Spoon in XML or database repository
  • 10.  Repository Connection establishment  Auto login − By setting manually KETTLE_REPOSITORY, KETTLE_USER and KETTLE_PASSWORD environmental variables.  Login − By default PDI provides login username and password ad admin.
  • 14.  Transformation − Value: Values are part of a row and can contain any type of data − Row: a row exists of 0 or more values − Output stream: an output stream is a stack of rows that leaves a step. − Input stream: an input stream is a stack of rows that enters a step. − Hop: A hop is a graphical representation of one or more data streams between 2 steps. − Note: A note is a piece of information that can be added to a transformation Engine capable of performing a multitude of functions such as reading, manipulating and writing data to and from various data sources.
  • 15.  Jobs − Job Entry: A job entry is one part of a job and performs a certain − Hop: A hop is a graphical representation of one or more data streams between 2 steps − Note: a note is a piece of information that can be added to a job A way of calling transformations and controlling the sequence of their execution. Usually jobs are scheduled in batch mode to be run automatically at regular intervals.
  • 16. Input Steps Output Steps Lookup Steps Transformation Steps Join Steps DW Steps Mapping Steps Job Steps