SlideShare a Scribd company logo
Shankar RadhakrishnanHCL TechnologiesHadoop – An Introduction
State of the DataWhat is HadoopHadoop EcosystemReferencesAgenda
Data driven businessesBusinesses have been collecting information all the timeMine more == Collect more (and vice-versa)ChallengesApplication ComplexitiesData growthInfrastructureEconomicsNeed of the dayState of the data
Data driven businessBusinesses have been collecting informationall the timeMine more == Collect more (and vice-versa)ChallengesApplication ComplexitiesData growthInfrastructureEconomicsState of the data
ApplicationsSearches, Message posts, Comments, Emails,Blogs, Photos, Video Clips, Product ListingsERP, CRM, Databases, Internal Applications, Customer/Consumer facing productsMobileContextWeb, Customers, Products, Business Systems,Processes, ServicesSupport SystemsCRM, SOA, Recommendation Systems/processes,Data warehouses, Business Intelligence, BPMData driven business
Data driven businessesBusinesses have been collecting informationall the timeMine more == Collect more (and vice-versa)ChallengesApplication ComplexitiesData growthInfrastructureEconomicsState of the data
DriversROICustomer RetentionProduct AffinityMarket TrendsResearch AnalysisCustomer/Consumer AnalyticsProcessClusteringClassificationBuild RelationshipsRegressionTypesStructuredSemi-structuredUnstructuredMine more
Data driven businessesBusinesses have been collecting informationall the timeMine more == Collect more (and vice-versa)ChallengesApplication ComplexitiesData growthInfrastructureEconomicsState of the data
Complex ApplicationsData integration is a good but complex problem to solveData GrowthGrowth is exponentialInfrastructureAvailabilityUnscalablehardwareEconomicsManaging high data volume comes at a priceFailures are very costlyChallenges
System that can handle high volume dataSystem that can perform complex operationsScalableRobustHighly AvailableFault TolerantCheapNeed of the day
Top level Apache projectOpen sourceInspired by Google’s white papers onMap/Reduce (MR), Google File System (GFS)Originally developed to support Apache Nutch Search EngineSoftware Framework - JavaDesignedFor sophisticated analysisTo deal with structured and unstructured complex data
Runs on commodity hardwareShared-nothing architectureScale hardware when ever you wantSystem compensates for hardware scalingand issues (if any)Run large-scale, high volume data processesScales well with complex analysis jobsHandles failuresIdeal to consolidate data from both new and legacy data sourcesValue to the businessWhy Hadoop?
Hadoop in an enterprise - Example
HDFS 		Hadoop Distributed File SystemMap/Reduce 	Software framework for Clustered, 			Distributed data processingZooKeeper 	SchedulerAvro 		Data SerializationChukwa 		Data Collection System to monitor 			Distributed SystemsHBase 		Data storage for distributed large 			tablesHive 		Data warehousing infrastructurePig 			High-Level Query LanguageHadoop Ecosystem
Master/Slave ArchitectureRuns on commodity hardwareFault TolerantHandle large volumes of dataProvides High ThroughputStreaming data-accessSimple file coherency modelPortable to heterogeneous hardware and softwareRobustHandles disk failures, replication (& re-replication)Performs cluster rebalancing, data integrity checksHDFS – Hadoop Distributed File System
HDFS – ExampleName nodeFile system operations
Maps data-nodesData nodeProcess read/write
Handles Data-blocks
ReplicationTagged by a jobSplits input data-set into separate chunk’sProcessed by map tasks, in parallelSorts the output of the mapsProcessed by reduce tasks, in parallelTypically stored and processed in a file systemFramework takes care ofScheduling tasksMonitoringRe-executing failed tasksHadoop Map/Reduce
Example : Mapper Function
Ad

More Related Content

What's hot (20)

Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
Lewis Crawford
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Caserta
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
nandhiniarumugam619
 
Big Data- Automotive Industry Use Case
Big Data- Automotive Industry Use CaseBig Data- Automotive Industry Use Case
Big Data- Automotive Industry Use Case
Sophie (C.F.) Tsai
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
Abdullah Çetin ÇAVDAR
 
Big data 101
Big data 101Big data 101
Big data 101
Paresh Motiwala, PMP®
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
Jan Wiegelmann
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
Multisoft Virtual Academy
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
Amrit Chhetri
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
17aroumougamh
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
VIKAS KATARE
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
RojaT4
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
magda3695
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
Lewis Crawford
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Caserta
 
Big Data- Automotive Industry Use Case
Big Data- Automotive Industry Use CaseBig Data- Automotive Industry Use Case
Big Data- Automotive Industry Use Case
Sophie (C.F.) Tsai
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
Amrit Chhetri
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
17aroumougamh
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
VIKAS KATARE
 
Introduction of big data unit 1
Introduction of big data unit 1Introduction of big data unit 1
Introduction of big data unit 1
RojaT4
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
magda3695
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 

Similar to Hadoop - An Introduction (20)

Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
Xoriant Corporation
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Rio Info
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
darugar
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
Edureka!
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
Bill Hayduk
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
Blackvard
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
Big Data Week
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
Mohit Srivastava
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
Supratim Ray
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big Data
Stratebi
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
BlibBlobb
 
data_engineering_on_GCP_PDE_cheat_sheets
data_engineering_on_GCP_PDE_cheat_sheetsdata_engineering_on_GCP_PDE_cheat_sheets
data_engineering_on_GCP_PDE_cheat_sheets
oteghelepeter
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
avenkatram
 
data analytics lecture4.pptx
data analytics lecture4.pptxdata analytics lecture4.pptx
data analytics lecture4.pptx
NamrataBhatt8
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Yahoo Developer Network
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
James Serra
 
Addressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop WayAddressing Big Data Challenges - The Hadoop Way
Addressing Big Data Challenges - The Hadoop Way
Xoriant Corporation
 
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Big data: Descoberta de conhecimento em ambientes de big data e computação na...
Rio Info
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
darugar
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
Edureka!
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
Bill Hayduk
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
Blackvard
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
Big Data Week
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
Supratim Ray
 
Stratebi Big Data
Stratebi Big DataStratebi Big Data
Stratebi Big Data
Stratebi
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
BlibBlobb
 
data_engineering_on_GCP_PDE_cheat_sheets
data_engineering_on_GCP_PDE_cheat_sheetsdata_engineering_on_GCP_PDE_cheat_sheets
data_engineering_on_GCP_PDE_cheat_sheets
oteghelepeter
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
avenkatram
 
data analytics lecture4.pptx
data analytics lecture4.pptxdata analytics lecture4.pptx
data analytics lecture4.pptx
NamrataBhatt8
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Yahoo Developer Network
 
Ad

Recently uploaded (20)

Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Ad

Hadoop - An Introduction

  • 2. State of the DataWhat is HadoopHadoop EcosystemReferencesAgenda
  • 3. Data driven businessesBusinesses have been collecting information all the timeMine more == Collect more (and vice-versa)ChallengesApplication ComplexitiesData growthInfrastructureEconomicsNeed of the dayState of the data
  • 4. Data driven businessBusinesses have been collecting informationall the timeMine more == Collect more (and vice-versa)ChallengesApplication ComplexitiesData growthInfrastructureEconomicsState of the data
  • 5. ApplicationsSearches, Message posts, Comments, Emails,Blogs, Photos, Video Clips, Product ListingsERP, CRM, Databases, Internal Applications, Customer/Consumer facing productsMobileContextWeb, Customers, Products, Business Systems,Processes, ServicesSupport SystemsCRM, SOA, Recommendation Systems/processes,Data warehouses, Business Intelligence, BPMData driven business
  • 6. Data driven businessesBusinesses have been collecting informationall the timeMine more == Collect more (and vice-versa)ChallengesApplication ComplexitiesData growthInfrastructureEconomicsState of the data
  • 7. DriversROICustomer RetentionProduct AffinityMarket TrendsResearch AnalysisCustomer/Consumer AnalyticsProcessClusteringClassificationBuild RelationshipsRegressionTypesStructuredSemi-structuredUnstructuredMine more
  • 8. Data driven businessesBusinesses have been collecting informationall the timeMine more == Collect more (and vice-versa)ChallengesApplication ComplexitiesData growthInfrastructureEconomicsState of the data
  • 9. Complex ApplicationsData integration is a good but complex problem to solveData GrowthGrowth is exponentialInfrastructureAvailabilityUnscalablehardwareEconomicsManaging high data volume comes at a priceFailures are very costlyChallenges
  • 10. System that can handle high volume dataSystem that can perform complex operationsScalableRobustHighly AvailableFault TolerantCheapNeed of the day
  • 11. Top level Apache projectOpen sourceInspired by Google’s white papers onMap/Reduce (MR), Google File System (GFS)Originally developed to support Apache Nutch Search EngineSoftware Framework - JavaDesignedFor sophisticated analysisTo deal with structured and unstructured complex data
  • 12. Runs on commodity hardwareShared-nothing architectureScale hardware when ever you wantSystem compensates for hardware scalingand issues (if any)Run large-scale, high volume data processesScales well with complex analysis jobsHandles failuresIdeal to consolidate data from both new and legacy data sourcesValue to the businessWhy Hadoop?
  • 13. Hadoop in an enterprise - Example
  • 14. HDFS Hadoop Distributed File SystemMap/Reduce Software framework for Clustered, Distributed data processingZooKeeper SchedulerAvro Data SerializationChukwa Data Collection System to monitor Distributed SystemsHBase Data storage for distributed large tablesHive Data warehousing infrastructurePig High-Level Query LanguageHadoop Ecosystem
  • 15. Master/Slave ArchitectureRuns on commodity hardwareFault TolerantHandle large volumes of dataProvides High ThroughputStreaming data-accessSimple file coherency modelPortable to heterogeneous hardware and softwareRobustHandles disk failures, replication (& re-replication)Performs cluster rebalancing, data integrity checksHDFS – Hadoop Distributed File System
  • 16. HDFS – ExampleName nodeFile system operations
  • 19. ReplicationTagged by a jobSplits input data-set into separate chunk’sProcessed by map tasks, in parallelSorts the output of the mapsProcessed by reduce tasks, in parallelTypically stored and processed in a file systemFramework takes care ofScheduling tasksMonitoringRe-executing failed tasksHadoop Map/Reduce
  • 20. Example : Mapper Function
  • 21. Example : Reduce Function