SlideShare a Scribd company logo
CDH
What is CDH?
•Popular distribution of Apache
Hadoop and related projects.
•Delivers scalable storage and
distributed computing.
•Apache-licensed open source
Big Data
•Collection of large data sets that cannot be processed using traditional
computing techniques
•Big Data challenges
•Storage
•Capturing data
•Analyze data
Hadoop
•What is Hadoop?
• The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers using
simple programming models
•Hadoop components
•HDFS – Storage layer
•MapReduce – Processing layer
•YARN – Resource management layer
Hadoop
ecosystem
HDFS
•Stores different types of large data sets (structured, semi-structured
and unstructured)
•HDFS creates a level of abstraction over the resources, from where we
can see the whole HDFS as a single unit.
•Stores data across various resources and maintains the log file about
the stored data.(metadata)
HDFS Cont...
MapReduce
•Core component in Hadoop ecosystem for processing
•Two functions
•Map
•Reduce
Kafka
• Distributed publish-subscribe messaging system and a robust queue
that can handle a high volume of data and enables to pass messages
from one end-point to another
• Built on top of the ZooKeeper synchronization service
• Integrates well with Apache Storm and Spark for real-time streaming
data analysis
Sqoop
• Sqoop − SQL to Hadoop and Hadoop to SQL
• Tool designed to transfer data between Hadoop and relational
database servers
YARN
• Performs all processing activities by allocating resources
• YARN is an attempt to take Apache Hadoop beyond MapReduce for
data-processing
• Consists
• Resource manager
• Node manager
YARN
Hive
•Data warehouse software
•Provide data summarization, query and analysis
•Query language – Hive Query language (HQL)
•Uses metastore to store meta-data about the data
•Familiar built in user defined functions
Pig
• Write complex MapReduce transformations using a simple scripting
language called Pig Latin
• Pig translates the Pig Latin script into MapReduce
• Makes Hadoop data accessible for a variety of batch processing
workloads
• Data preparation
• ETL
• Data mining
Impala
• It is an interactive SQL like query engine that runs on top of Hadoop
Distributed File System (HDFS)
• Parallel processing SQL query engine for processing huge volume of
data
• Provide unified platform for real time queries.
• Impala is faster than Apache Hive.
• Impala is memory intensive and does not run effectively for heavy
data operations like joins.
Impala
HBase
• Wide column store database (NoSQL).
• Database built in top of the HDFS
• HBase does not support a structured query language like SQL
• Provides random, real time access to data in Hadoop
Spark
• Open-source distributed general-purpose cluster computing
framework with in-memory data processing engine
• It can run in Hadoop clusters through YARN and it can process data in
HDFS .
• Fast and general engine for large-scale data processing.
• High level API's for programming languages: Java, Python, Scarla, R
• Supports SQL queries, Streaming data, Machine learning (ML), and
Graph algorithms.
Mahout
• Use for creating scalable machine learning algorithms
• Implemented on top of Apache Hadoop and using the MapReduce
paradigm.
• Lets applications to analyze large sets of data effectively and in
quicktime
• Mahout provides the data science tools to automatically find
meaningful patterns in big data sets in HDFS.
SolR
• Open source platform for searches of data stored in HDFS
• Advanced full-text search
• Near real-time indexing
• Standards based upon interfaces like JSON, XML, HTTP
• Comprehensive HTML administration interfaces
Kudu
• Apache Kudu completes Hadoop's storage layer to enable fast
analytics on fast data.
• It runs on commodity hardware, is horizontally scalable, and supports
highly available operation.
• Integration with MapReduce, Spark and other Hadoop ecosystem
components.
• Strong performance for running sequential and random workloads
simultaneously
Sentry
• Granular, role-based authorization module for Hadoop
• Provides the ability to control and enforce precise levels of privileges
on data for authenticated users and applications on a Hadoop cluster.
• Designed to be a pluggable authorization engine for Hadoop
components.
• Allows to define authorization rules to validate a user
Thank You
Ad

More Related Content

What's hot (20)

Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
Sascha Dittmann
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Apache hive
Apache hiveApache hive
Apache hive
pradipbajpai68
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
John Yeung
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Owen O'Malley
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
DataWorks Summit/Hadoop Summit
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
Simplilearn
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
Databricks
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
Sascha Dittmann
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
John Yeung
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
joelcrabb
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
Owen O'Malley
 
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
Simplilearn
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
Databricks
 

Similar to Cloudera Hadoop Distribution (20)

Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop and their in big data analysis EcoSystem.pptx
Hadoop and their in big data analysis EcoSystem.pptxHadoop and their in big data analysis EcoSystem.pptx
Hadoop and their in big data analysis EcoSystem.pptx
Rahul Borate
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
raghavanand36
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
Kibrom Gebrehiwot
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
Krisshhna Daasaarii
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
InSemble
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
NetajiGandi1
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
iaeronlineexm
 
Hadoop
HadoopHadoop
Hadoop
chandinisanz
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
yaevents
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
Shravan (Sean) Pabba
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
VIJAYAPRABAP
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Ayyappan Paramesh
 
Big Data UNIT 2 AKTU syllabus all topics covered
Big Data UNIT 2 AKTU syllabus all topics coveredBig Data UNIT 2 AKTU syllabus all topics covered
Big Data UNIT 2 AKTU syllabus all topics covered
chinky1118
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
TIB Academy
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
 
Hadoop and their in big data analysis EcoSystem.pptx
Hadoop and their in big data analysis EcoSystem.pptxHadoop and their in big data analysis EcoSystem.pptx
Hadoop and their in big data analysis EcoSystem.pptx
Rahul Borate
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
raghavanand36
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
InSemble
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
Jakub Stransky
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
NetajiGandi1
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
iaeronlineexm
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
yaevents
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
VIJAYAPRABAP
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Big Data UNIT 2 AKTU syllabus all topics covered
Big Data UNIT 2 AKTU syllabus all topics coveredBig Data UNIT 2 AKTU syllabus all topics covered
Big Data UNIT 2 AKTU syllabus all topics covered
chinky1118
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
Ad

Recently uploaded (20)

How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptxSCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
SCI BIZ TECH QUIZ (OPEN) PRELIMS XTASY 2025.pptx
Ronisha Das
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Ad

Cloudera Hadoop Distribution

  • 1. CDH
  • 2. What is CDH? •Popular distribution of Apache Hadoop and related projects. •Delivers scalable storage and distributed computing. •Apache-licensed open source
  • 3. Big Data •Collection of large data sets that cannot be processed using traditional computing techniques •Big Data challenges •Storage •Capturing data •Analyze data
  • 4. Hadoop •What is Hadoop? • The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models •Hadoop components •HDFS – Storage layer •MapReduce – Processing layer •YARN – Resource management layer
  • 6. HDFS •Stores different types of large data sets (structured, semi-structured and unstructured) •HDFS creates a level of abstraction over the resources, from where we can see the whole HDFS as a single unit. •Stores data across various resources and maintains the log file about the stored data.(metadata)
  • 8. MapReduce •Core component in Hadoop ecosystem for processing •Two functions •Map •Reduce
  • 9. Kafka • Distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables to pass messages from one end-point to another • Built on top of the ZooKeeper synchronization service • Integrates well with Apache Storm and Spark for real-time streaming data analysis
  • 10. Sqoop • Sqoop − SQL to Hadoop and Hadoop to SQL • Tool designed to transfer data between Hadoop and relational database servers
  • 11. YARN • Performs all processing activities by allocating resources • YARN is an attempt to take Apache Hadoop beyond MapReduce for data-processing • Consists • Resource manager • Node manager
  • 12. YARN
  • 13. Hive •Data warehouse software •Provide data summarization, query and analysis •Query language – Hive Query language (HQL) •Uses metastore to store meta-data about the data •Familiar built in user defined functions
  • 14. Pig • Write complex MapReduce transformations using a simple scripting language called Pig Latin • Pig translates the Pig Latin script into MapReduce • Makes Hadoop data accessible for a variety of batch processing workloads • Data preparation • ETL • Data mining
  • 15. Impala • It is an interactive SQL like query engine that runs on top of Hadoop Distributed File System (HDFS) • Parallel processing SQL query engine for processing huge volume of data • Provide unified platform for real time queries. • Impala is faster than Apache Hive. • Impala is memory intensive and does not run effectively for heavy data operations like joins.
  • 17. HBase • Wide column store database (NoSQL). • Database built in top of the HDFS • HBase does not support a structured query language like SQL • Provides random, real time access to data in Hadoop
  • 18. Spark • Open-source distributed general-purpose cluster computing framework with in-memory data processing engine • It can run in Hadoop clusters through YARN and it can process data in HDFS . • Fast and general engine for large-scale data processing. • High level API's for programming languages: Java, Python, Scarla, R • Supports SQL queries, Streaming data, Machine learning (ML), and Graph algorithms.
  • 19. Mahout • Use for creating scalable machine learning algorithms • Implemented on top of Apache Hadoop and using the MapReduce paradigm. • Lets applications to analyze large sets of data effectively and in quicktime • Mahout provides the data science tools to automatically find meaningful patterns in big data sets in HDFS.
  • 20. SolR • Open source platform for searches of data stored in HDFS • Advanced full-text search • Near real-time indexing • Standards based upon interfaces like JSON, XML, HTTP • Comprehensive HTML administration interfaces
  • 21. Kudu • Apache Kudu completes Hadoop's storage layer to enable fast analytics on fast data. • It runs on commodity hardware, is horizontally scalable, and supports highly available operation. • Integration with MapReduce, Spark and other Hadoop ecosystem components. • Strong performance for running sequential and random workloads simultaneously
  • 22. Sentry • Granular, role-based authorization module for Hadoop • Provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications on a Hadoop cluster. • Designed to be a pluggable authorization engine for Hadoop components. • Allows to define authorization rules to validate a user

Editor's Notes

  • #3: only Hadoop solution to offer unified batch processing, interactive SQL and interactive search, and role-based access controls.
  • #7: fault-tolerant and self-healing distributed file-system turn a cluster of industry-standard servers into a massively scalable pool of storage.  Features Scalability Flexibility Reliability
  • #9: Accessibility Reliability Flexibility Hadoop scalable
  • #10: we have two main challenges.The first challenge is how to collect large volume of data and the second challenge is to analyze the collected data
  • #14: front end for parsing SQL statements, generating logical plans, optimizing logical plans, translating them into physical plans which are executed by MapReduce jobs.
  • #15: 1 line of pig latin = approximately 100 lines of map reduce
  • #18: Nosql database Characteristics – fault tolerance(replication) , fast(near real time lookups), usable(Data model accommodates wide range of use cases)
  • #19: Apache Spark’s Streaming and SQL programming models with MLlib and GraphX make it easier for developers and data scientists to build applications that exploit machine learning and graph analytics.
  • #20: SUPPORTS Collaborative filtering Clustering Classification Frequent itemset mining
  • #21: Solr is highly reliable, scalable and fault tolerant. Hadoop operators put documents in Apache Solr by “indexing” via XML, JSON, CSV or binary over HTTP. Then users can query those petabytes of data via HTTP GET. They can receive XML, JSON, CSV or binary results. Apache Solr is optimized for high volume web traffic.
  • #23: Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS 
  • #24: Sentry currently works out of the box with Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala, and HDFS