SlideShare a Scribd company logo
Open Source ETL using Talend Open Studio

                                    Lu´ Santos
                                      ıs
                                luis@luissantos.pt



                                 February 14, 2013




Lu´ Santos luis@luissantos.pt
  ıs                                Open Source ETL   February 14, 2013   1
Overview

1    Who am i?

2    What is ETL?

3    ETL Software Suites

4    Talend Open Studio for Data Integration

5    Hands on

6    Conclusion



    Lu´ Santos luis@luissantos.pt
      ıs                            Open Source ETL   February 14, 2013   2
Warning!!!




This presentation was created using Latex
                  Why?
             Because i can!




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   3
Who am i?




Lu´ Santos luis@luissantos.pt
  ıs                              Open Source ETL   February 14, 2013   4
Who am i?




          Software Engineer and
          Mathematics Student
          Open Source addicted
          PHP and Java Developer




 Lu´ Santos luis@luissantos.pt
   ıs                             Open Source ETL   February 14, 2013   5
What is ETL?




Lu´ Santos luis@luissantos.pt
  ıs                               Open Source ETL   February 14, 2013   6
What is ETL?


     In computing, Extract, Transform and Load (ETL) refers to a
     process in database usage and especially in data warehousing
     that involves:
             Extracting data from outside sources
             Transforming it to fit operational needs (which can include
             quality levels)
             Loading it into the end target (database, more specifically,
             operational data store, data mart or data warehouse)



        (2013, http://en.wikipedia.org/wiki/Extract, transform, load)




 Lu´ Santos luis@luissantos.pt
   ıs                              Open Source ETL               February 14, 2013   7
ETL Software Suites




      Pentaho Data Integration (Kettle)
      SQL Server Integration Services
      Talend Open Studio for Data Integration
      etc...




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   8
Talend Open Studio for Data Integration


Talend Open Studio is a set of tools for developing, testing, deploying and
application integration projects.
      Talend Open Studio for Big Data
      Bonita Open Solution (BPM)
      Talend Open Studio for Data Integration
      Talend Open Studio for Data Quality
      Talend ESB
      Talend Open Studio for MDM




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL             February 14, 2013   9
Datasource(rer)s




Lu´ Santos luis@luissantos.pt
  ıs                                 Open Source ETL   February 14, 2013   10
Datasources (Extract and Load)




  Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP,
                  REST, HTTP, FTP, SSH, Imap




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL     February 14, 2013   11
Transformers




Lu´ Santos luis@luissantos.pt
  ıs                               Open Source ETL   February 14, 2013   12
Transformers (Transform)




      Sort data
      Convert data
      Cross data between datasources
      Filter data
      Fuzzy search
      Normalize and Denormalize data




  Lu´ Santos luis@luissantos.pt
    ıs                            Open Source ETL   February 14, 2013   13
Where and how ?



     Where ?
             Multi-platform ( Linux, MacOs, BSD-* even on windows )
             You just need a JVM (Java Virtual Machine)




 Lu´ Santos luis@luissantos.pt
   ıs                              Open Source ETL              February 14, 2013   14
Where and how ?



     Where ?
             Multi-platform ( Linux, MacOs, BSD-* even on windows )
             You just need a JVM (Java Virtual Machine)
     How ?
             Execute it from your favorite programming language using syscalls
             Command line
             From your JVM based application (Java, Groovy, JRuby)
             Webservices runing on the top Java App Server (Tomcat, Glassfish)




 Lu´ Santos luis@luissantos.pt
   ıs                               Open Source ETL               February 14, 2013   14
Hands on




Lu´ Santos luis@luissantos.pt
  ıs                             Open Source ETL   February 14, 2013   15
Hands on




     Querying data
     Joining data from multiple datasources
     Filtering and sorting data
     Exporting data
     Deploying your job
     Calling it from PHP




 Lu´ Santos luis@luissantos.pt
   ıs                             Open Source ETL   February 14, 2013   16
Database Schema




 Lu´ Santos luis@luissantos.pt
   ıs                            Open Source ETL   February 14, 2013   17
Example




 Lu´ Santos luis@luissantos.pt
   ıs                            Open Source ETL   February 14, 2013   18
”With great power comes great responsability.”
                                         (Voltair)




Lu´ Santos luis@luissantos.pt
  ıs                            Open Source ETL      February 14, 2013   19
The End
    email: luis@luissantos.pt
    twitter: @santosluis87
    linkedin: https://www.linkedin.com/in/luissantos87




Lu´ Santos luis@luissantos.pt
  ıs                             Open Source ETL         February 14, 2013   20
Ad

More Related Content

What's hot (20)

Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Edureka!
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with links
Chris Testa-O'Neill
 
Azure datafactory
Azure datafactoryAzure datafactory
Azure datafactory
Dimko Zhluktenko
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
Mark Kromer
 
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Edureka!
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
mattcasters
 
Data stage
Data stageData stage
Data stage
Sai Kiran
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
HARIHARAN R
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
Slava Kokaev
 
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1
Eric Bragas
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
Alex Rayón Jerez
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
inovex GmbH
 
Tableau Architecture
Tableau ArchitectureTableau Architecture
Tableau Architecture
Kishore Chaganti
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
Bobby Curtis
 
ETL
ETLETL
ETL
Mallikarjuna G D
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter Architecture
BigClasses Com
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Cognos 11 installation step by step and notes
Cognos 11 installation step by step and notesCognos 11 installation step by step and notes
Cognos 11 installation step by step and notes
Carlos Castro Rodríguez
 
Pentaho data integration 4.0 and my sql
Pentaho data integration 4.0 and my sqlPentaho data integration 4.0 and my sql
Pentaho data integration 4.0 and my sql
AHMED ENNAJI
 
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Edureka!
 
Azure Data Factory presentation with links
Azure Data Factory presentation with linksAzure Data Factory presentation with links
Azure Data Factory presentation with links
Chris Testa-O'Neill
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
Mark Kromer
 
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Edureka!
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
mattcasters
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
HARIHARAN R
 
Introduction to Azure Data Factory
Introduction to Azure Data FactoryIntroduction to Azure Data Factory
Introduction to Azure Data Factory
Slava Kokaev
 
Intro to Azure Data Factory v1
Intro to Azure Data Factory v1Intro to Azure Data Factory v1
Intro to Azure Data Factory v1
Eric Bragas
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
Brett VanderPlaats
 
Kettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration toolKettle: Pentaho Data Integration tool
Kettle: Pentaho Data Integration tool
Alex Rayón Jerez
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
inovex GmbH
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
Bobby Curtis
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter Architecture
BigClasses Com
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Cognos 11 installation step by step and notes
Cognos 11 installation step by step and notesCognos 11 installation step by step and notes
Cognos 11 installation step by step and notes
Carlos Castro Rodríguez
 
Pentaho data integration 4.0 and my sql
Pentaho data integration 4.0 and my sqlPentaho data integration 4.0 and my sql
Pentaho data integration 4.0 and my sql
AHMED ENNAJI
 

Similar to Open Source ETL using Talend Open Studio (20)

20130206 open refine
20130206  open refine20130206  open refine
20130206 open refine
Martin Magdinier
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
Treasure Data, Inc.
 
Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3
nakshatraL
 
20100714accel
20100714accel20100714accel
20100714accel
Jeff Hammerbacher
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Jamey Hanson
 
Navigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial EcosystemNavigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial Ecosystem
Just van den Broecke
 
Tyler Rutschman- Kansas City
Tyler Rutschman- Kansas CityTyler Rutschman- Kansas City
Tyler Rutschman- Kansas City
Splunk
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETL
kabrilake
 
Oracle GoldenGate for Oracle DBAs
Oracle GoldenGate for Oracle DBAsOracle GoldenGate for Oracle DBAs
Oracle GoldenGate for Oracle DBAs
Guatemala User Group
 
Lakshmi_DB_Engineer1
Lakshmi_DB_Engineer1Lakshmi_DB_Engineer1
Lakshmi_DB_Engineer1
Lakshmi Narayana Hanumanthu
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE
 
Migration to Lotus Groupware @ UZH
Migration to Lotus Groupware  @ UZHMigration to Lotus Groupware  @ UZH
Migration to Lotus Groupware @ UZH
Roberto Mazzoni
 
Linked (Open) Data: A quick introduction
Linked (Open) Data: A quick introductionLinked (Open) Data: A quick introduction
Linked (Open) Data: A quick introduction
nvitucci
 
Linked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsLinked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and Applications
Rui Vieira
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
vijayk23x
 
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Varsha Nayak
 
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Varsha Nayak
 
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Varsha Nayak
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Andy Petrella
 
Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3
nakshatraL
 
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Jamey Hanson
 
Navigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial EcosystemNavigating the Open Source Geospatial Ecosystem
Navigating the Open Source Geospatial Ecosystem
Just van den Broecke
 
Tyler Rutschman- Kansas City
Tyler Rutschman- Kansas CityTyler Rutschman- Kansas City
Tyler Rutschman- Kansas City
Splunk
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETL
kabrilake
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE
 
Migration to Lotus Groupware @ UZH
Migration to Lotus Groupware  @ UZHMigration to Lotus Groupware  @ UZH
Migration to Lotus Groupware @ UZH
Roberto Mazzoni
 
Linked (Open) Data: A quick introduction
Linked (Open) Data: A quick introductionLinked (Open) Data: A quick introduction
Linked (Open) Data: A quick introduction
nvitucci
 
Linked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and ApplicationsLinked Open Data - State of the Art, Challenges and Applications
Linked Open Data - State of the Art, Challenges and Applications
Rui Vieira
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
vijayk23x
 
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Varsha Nayak
 
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Varsha Nayak
 
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Revolutionizing Data Engineering with Open Source GenAI Powered Chat Based To...
Varsha Nayak
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Andy Petrella
 
Ad

Recently uploaded (20)

Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Ad

Open Source ETL using Talend Open Studio

  • 1. Open Source ETL using Talend Open Studio Lu´ Santos ıs [email protected] February 14, 2013 Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 1
  • 2. Overview 1 Who am i? 2 What is ETL? 3 ETL Software Suites 4 Talend Open Studio for Data Integration 5 Hands on 6 Conclusion Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 2
  • 3. Warning!!! This presentation was created using Latex Why? Because i can! Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 3
  • 4. Who am i? Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 4
  • 5. Who am i? Software Engineer and Mathematics Student Open Source addicted PHP and Java Developer Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 5
  • 6. What is ETL? Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 6
  • 7. What is ETL? In computing, Extract, Transform and Load (ETL) refers to a process in database usage and especially in data warehousing that involves: Extracting data from outside sources Transforming it to fit operational needs (which can include quality levels) Loading it into the end target (database, more specifically, operational data store, data mart or data warehouse) (2013, http://en.wikipedia.org/wiki/Extract, transform, load) Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 7
  • 8. ETL Software Suites Pentaho Data Integration (Kettle) SQL Server Integration Services Talend Open Studio for Data Integration etc... Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 8
  • 9. Talend Open Studio for Data Integration Talend Open Studio is a set of tools for developing, testing, deploying and application integration projects. Talend Open Studio for Big Data Bonita Open Solution (BPM) Talend Open Studio for Data Integration Talend Open Studio for Data Quality Talend ESB Talend Open Studio for MDM Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 9
  • 10. Datasource(rer)s Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 10
  • 11. Datasources (Extract and Load) Mysql, MSSQL, Oracle, Sqlite, FirebirdSQL, XLS, CSV, XML, SOAP, REST, HTTP, FTP, SSH, Imap Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 11
  • 12. Transformers Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 12
  • 13. Transformers (Transform) Sort data Convert data Cross data between datasources Filter data Fuzzy search Normalize and Denormalize data Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 13
  • 14. Where and how ? Where ? Multi-platform ( Linux, MacOs, BSD-* even on windows ) You just need a JVM (Java Virtual Machine) Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 14
  • 15. Where and how ? Where ? Multi-platform ( Linux, MacOs, BSD-* even on windows ) You just need a JVM (Java Virtual Machine) How ? Execute it from your favorite programming language using syscalls Command line From your JVM based application (Java, Groovy, JRuby) Webservices runing on the top Java App Server (Tomcat, Glassfish) Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 14
  • 16. Hands on Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 15
  • 17. Hands on Querying data Joining data from multiple datasources Filtering and sorting data Exporting data Deploying your job Calling it from PHP Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 16
  • 18. Database Schema Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 17
  • 19. Example Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 18
  • 20. ”With great power comes great responsability.” (Voltair) Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 19
  • 21. The End email: [email protected] twitter: @santosluis87 linkedin: https://www.linkedin.com/in/luissantos87 Lu´ Santos [email protected] ıs Open Source ETL February 14, 2013 20