SlideShare a Scribd company logo
Hadoop Cluster Setup
A Simple Way by Cloudera Manager
Peng-Yi Lai
Co-graph confidential
Outline
▪ Cloudera Manager – Set Up Your Hadoop

▪ Flume – Data Collection Tool

Co-graph confidential
Before Starting
▪ Ask yourself what do you want!

An expert to make
Hadoop itself better

Provide Service by
Using Hadoop

Co-graph confidential
As a Hadoop Expert

Better to know Hadoop as detail as possible
Companies like Cloudera and MapR
Co-graph confidential
Other Usages on Hadoop
1. Learn how to use
Hadoop to solve
problems more
effectively and
efficiently
2. Find an easiest
way to make sure
your Hadoop can
work properly

Co-graph confidential
Desired Skills
▪ Network knowledge is imperative
▪ Every node in a cluster communicates with each
other through network
▪ Even with cloudera manager, you still need to
handle it on your own

▪ Linux administration
▪ Everyone knows that!!

Co-graph confidential
Requirement for Cloudera
Manager (1)
▪ Prepare Your Machines
▪ Supported OS version
▪ Only 64bit Linux-based

▪ Supported Browsers
▪ For admin console

▪ Supported Database
▪ If you need to use custom database other than embedded PostgreSQL database

▪ Supported JDK version
▪ Cloudera Manager would install it for you if there is no JDK installed

▪ Repositories
▪ All hosts must have to access standard packages repositories and Cloudera
Hadoop repositories

Co-graph confidential
Requirement for Cloudera
Manager (2)
▪ Networking and Security
▪ Properly configuring DNS or /etc/hosts
▪ Everyone should know who’s who

▪ Using root account ro password-less sudo permision ssh
access to all cluster machines
▪ No blocking by iptables or firewalls
▪ 7180 port is used to access Cloudera Manager

▪ No blocking by Security-Enhanced Linux (SELinux)
▪ disabled

▪ There are more details on cloudera.com
▪ If there is a problem, don’t feel ashamed to google!
Co-graph confidential
Set Up a Hadoop Cluster
▪ After everything is done, install clouderamanager-installer.bin from the Cloudera
Downlaods page
▪ Change the permission and install
▪ Login to admin console on http://<Server
host>:7180
▪ Follow the steps by Cloudera Manager
▪ Done!

Co-graph confidential
Cloudera Manager Login

Co-graph confidential
Specify Hosts

Co-graph confidential
Hosts Found

Co-graph confidential
Waiting for Installation

Co-graph confidential
Home

Co-graph confidential
Actions of Services

Co-graph confidential
HDFS Service

Co-graph confidential
Configuration of HDFS

Co-graph confidential
Selected Services

Co-graph confidential
Services to Add

Co-graph confidential
All Hosts

Co-graph confidential
Information of a Host

Co-graph confidential
More about Cloudera Manager
▪ Easy to upgrade your CHD version

▪ Easy to add/delete a host and a cluster
▪ Easy to configure High Availability (HA)
▪ Support Hadoop security by using
Kerberos
▪ Support backup and disaster recovery

Co-graph confidential
For Developer
▪ Use Hue (another topic)

Co-graph confidential
Observation

Co-graph confidential
Flume
A Data Collection Tool
Co-graph confidential
Two Ways to Use Flume
Independent of Hadoop
cluster
• Flume can totally run by
itself
• Configure flume.conf in
/etc/flume-ng/conf

On cluster of Hadoop
Or a node managed by
Cloudera Manager
• Easy to keep the agent
nodes under control
• Start, Stop, Restart
service on admin console
• Configure flume on admin
console
• Convenient to check log
file

Co-graph confidential
3 Important Settings
Source
• Define what kind of events sent by external source
to accept
Channel
• Define which way to keep the event until it’s
consumed by a Flume sink
Sink
• Define which repository like HDFS or Flume agent
to put/forward the event kept in Channel

Co-graph confidential
Type Example
▪ Source
▪
▪
▪
▪
▪
▪
▪
▪
▪

Avro Source
Exec Source
JMS Source
NetCat Source
Syslog TCP
Source
Syslog UDP
Source
HTTP Source
Thrift Legacy
Source
…etc

▪ Channel
▪ Memory
Channel
▪ JDBC
Channel

▪ File Channel
▪ Pseudo
Transaction
Channel
▪ Custom
Channel

Co-graph confidential

▪ Sink
▪ HDFS Sink
▪ Logger Sink

▪ Avro Sink
▪ Thrift Sink
▪ IRC Sink
▪ File Roll Sink

▪ HBaseSink
▪ …etc
Example of Setting

Co-graph confidential
Use Cloudera Manager

Co-graph confidential
Co-graph confidential
Ad

More Related Content

What's hot (20)

Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Cloudera, Inc.
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on Docker
MariaDB plc
 
Using Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WANUsing Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WAN
philip_stoev
 
OpenStack London Meetup, 18 Nov 2015
OpenStack London Meetup, 18 Nov 2015OpenStack London Meetup, 18 Nov 2015
OpenStack London Meetup, 18 Nov 2015
Jesse Pretorius
 
MHA: Getting started & moving past quirks percona live santa clara 2013
MHA: Getting started & moving past quirks percona live santa clara 2013MHA: Getting started & moving past quirks percona live santa clara 2013
MHA: Getting started & moving past quirks percona live santa clara 2013
Colin Charles
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
Jayush Luniya
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
Alex Moundalexis
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
DataStax Academy
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
DataStax
 
Do more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloudDo more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloud
philip_stoev
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase Replication
HBaseCon
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
Spark Summit
 
MHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirksMHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirks
Colin Charles
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Scott Miao
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Evan Chan
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
Why Your Apache Spark Job is Failing
Why Your Apache Spark Job is FailingWhy Your Apache Spark Job is Failing
Why Your Apache Spark Job is Failing
Cloudera, Inc.
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
Structor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop ClustersStructor - Automated Building of Virtual Hadoop Clusters
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on Docker
MariaDB plc
 
Using Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WANUsing Galera Cluster to Power Geo-distributed Applications on the WAN
Using Galera Cluster to Power Geo-distributed Applications on the WAN
philip_stoev
 
OpenStack London Meetup, 18 Nov 2015
OpenStack London Meetup, 18 Nov 2015OpenStack London Meetup, 18 Nov 2015
OpenStack London Meetup, 18 Nov 2015
Jesse Pretorius
 
MHA: Getting started & moving past quirks percona live santa clara 2013
MHA: Getting started & moving past quirks percona live santa clara 2013MHA: Getting started & moving past quirks percona live santa clara 2013
MHA: Getting started & moving past quirks percona live santa clara 2013
Colin Charles
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
Jayush Luniya
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
DataStax Academy
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
DataStax
 
Do more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloudDo more with Galera Cluster in your OpenStack cloud
Do more with Galera Cluster in your OpenStack cloud
philip_stoev
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase Replication
HBaseCon
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
Spark Summit
 
MHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirksMHA (MySQL High Availability): Getting started & moving past quirks
MHA (MySQL High Availability): Getting started & moving past quirks
Colin Charles
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Scott Miao
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Evan Chan
 

Viewers also liked (17)

บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazine
IMC Institute
 
Installation and setup hadoop published
Installation and setup hadoop publishedInstallation and setup hadoop published
Installation and setup hadoop published
Dipendra Kusi
 
Hdfs
HdfsHdfs
Hdfs
Jaganadh Gopinadhan
 
קורס אנדרואיד
קורס אנדרואידקורס אנדרואיד
קורס אנדרואיד
Nathan Krasney
 
Introduction to big data
Introduction to big data Introduction to big data
Introduction to big data
Nathan Krasney
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
Andrei Savu
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
Cloudera, Inc.
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
IMC Institute
 
Guide "LinkedIn - Tremplin pour l'emploi"
Guide "LinkedIn - Tremplin pour l'emploi"Guide "LinkedIn - Tremplin pour l'emploi"
Guide "LinkedIn - Tremplin pour l'emploi"
LinkedIn France Presse
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
Emilio Coppa
 
Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera Quickstart
IMC Institute
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
Apache Apex
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
Pietro Michiardi
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Edureka!
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
บทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazineบทความ Big Data School ใน IMC e-Magazine
บทความ Big Data School ใน IMC e-Magazine
IMC Institute
 
Installation and setup hadoop published
Installation and setup hadoop publishedInstallation and setup hadoop published
Installation and setup hadoop published
Dipendra Kusi
 
קורס אנדרואיד
קורס אנדרואידקורס אנדרואיד
קורס אנדרואיד
Nathan Krasney
 
Introduction to big data
Introduction to big data Introduction to big data
Introduction to big data
Nathan Krasney
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
Andrei Savu
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
IMC Institute
 
Guide "LinkedIn - Tremplin pour l'emploi"
Guide "LinkedIn - Tremplin pour l'emploi"Guide "LinkedIn - Tremplin pour l'emploi"
Guide "LinkedIn - Tremplin pour l'emploi"
LinkedIn France Presse
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
Emilio Coppa
 
Big data processing using Cloudera Quickstart
Big data processing using Cloudera QuickstartBig data processing using Cloudera Quickstart
Big data processing using Cloudera Quickstart
IMC Institute
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
Apache Apex
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
Bernard Marr
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
Pietro Michiardi
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Edureka!
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Ad

Similar to Hadoop cluster setup by using cloudera manager (20)

Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
Cloudera, Inc.
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
Amal G Jose
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
Marc Cluet
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
Sumitra Pundlik
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
Jack (Yaakov) Bezalel
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Cloudera, Inc.
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
Luis Rodríguez Castromil
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
Mário Almeida
 
Hadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdfHadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdf
SpiritsoftsTraining
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
 
Setting up a local WordPress Environment
Setting up a local WordPress EnvironmentSetting up a local WordPress Environment
Setting up a local WordPress Environment
Chris La Nauze
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
Dinesh Chitlangia
 
Hadoop Futures
Hadoop FuturesHadoop Futures
Hadoop Futures
Steve Loughran
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
HBaseCon
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Mike Pittaro
 
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsClouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Ahmed Mekawy
 
Modernize Your Drupal Development
Modernize Your Drupal DevelopmentModernize Your Drupal Development
Modernize Your Drupal Development
Chris Tankersley
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
Cloudera, Inc.
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
Amal G Jose
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
Marc Cluet
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
Sumitra Pundlik
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Cloudera, Inc.
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
Luis Rodríguez Castromil
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
Mário Almeida
 
Hadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdfHadoop Administration Online Training.pdf
Hadoop Administration Online Training.pdf
SpiritsoftsTraining
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
 
Setting up a local WordPress Environment
Setting up a local WordPress EnvironmentSetting up a local WordPress Environment
Setting up a local WordPress Environment
Chris La Nauze
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
markgrover
 
Ozone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalabilityOzone - Evolution of hdfs scalability
Ozone - Evolution of hdfs scalability
Dinesh Chitlangia
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
markgrover
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
HBaseCon
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
Mike Pittaro
 
Clouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production DeploymentsClouldera Implementation Guide for Production Deployments
Clouldera Implementation Guide for Production Deployments
Ahmed Mekawy
 
Modernize Your Drupal Development
Modernize Your Drupal DevelopmentModernize Your Drupal Development
Modernize Your Drupal Development
Chris Tankersley
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
Ad

More from Co-graph Inc. (6)

ITベンチャー社長が語る!《新時代の採用戦略!》
ITベンチャー社長が語る!《新時代の採用戦略!》ITベンチャー社長が語る!《新時代の採用戦略!》
ITベンチャー社長が語る!《新時代の採用戦略!》
Co-graph Inc.
 
HAL東京インターン生業務成果プレゼン
HAL東京インターン生業務成果プレゼンHAL東京インターン生業務成果プレゼン
HAL東京インターン生業務成果プレゼン
Co-graph Inc.
 
[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門
Co-graph Inc.
 
MongoDB + XSD/XML
MongoDB + XSD/XMLMongoDB + XSD/XML
MongoDB + XSD/XML
Co-graph Inc.
 
業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法
Co-graph Inc.
 
Watch Your Log!
Watch Your Log!Watch Your Log!
Watch Your Log!
Co-graph Inc.
 
ITベンチャー社長が語る!《新時代の採用戦略!》
ITベンチャー社長が語る!《新時代の採用戦略!》ITベンチャー社長が語る!《新時代の採用戦略!》
ITベンチャー社長が語る!《新時代の採用戦略!》
Co-graph Inc.
 
HAL東京インターン生業務成果プレゼン
HAL東京インターン生業務成果プレゼンHAL東京インターン生業務成果プレゼン
HAL東京インターン生業務成果プレゼン
Co-graph Inc.
 
[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門[コグラフ]spss modelerによるデータ加工入門
[コグラフ]spss modelerによるデータ加工入門
Co-graph Inc.
 
業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法業務システムにおけるMongoDB活用法
業務システムにおけるMongoDB活用法
Co-graph Inc.
 

Recently uploaded (20)

ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 

Hadoop cluster setup by using cloudera manager

  • 1. Hadoop Cluster Setup A Simple Way by Cloudera Manager Peng-Yi Lai Co-graph confidential
  • 2. Outline ▪ Cloudera Manager – Set Up Your Hadoop ▪ Flume – Data Collection Tool Co-graph confidential
  • 3. Before Starting ▪ Ask yourself what do you want! An expert to make Hadoop itself better Provide Service by Using Hadoop Co-graph confidential
  • 4. As a Hadoop Expert Better to know Hadoop as detail as possible Companies like Cloudera and MapR Co-graph confidential
  • 5. Other Usages on Hadoop 1. Learn how to use Hadoop to solve problems more effectively and efficiently 2. Find an easiest way to make sure your Hadoop can work properly Co-graph confidential
  • 6. Desired Skills ▪ Network knowledge is imperative ▪ Every node in a cluster communicates with each other through network ▪ Even with cloudera manager, you still need to handle it on your own ▪ Linux administration ▪ Everyone knows that!! Co-graph confidential
  • 7. Requirement for Cloudera Manager (1) ▪ Prepare Your Machines ▪ Supported OS version ▪ Only 64bit Linux-based ▪ Supported Browsers ▪ For admin console ▪ Supported Database ▪ If you need to use custom database other than embedded PostgreSQL database ▪ Supported JDK version ▪ Cloudera Manager would install it for you if there is no JDK installed ▪ Repositories ▪ All hosts must have to access standard packages repositories and Cloudera Hadoop repositories Co-graph confidential
  • 8. Requirement for Cloudera Manager (2) ▪ Networking and Security ▪ Properly configuring DNS or /etc/hosts ▪ Everyone should know who’s who ▪ Using root account ro password-less sudo permision ssh access to all cluster machines ▪ No blocking by iptables or firewalls ▪ 7180 port is used to access Cloudera Manager ▪ No blocking by Security-Enhanced Linux (SELinux) ▪ disabled ▪ There are more details on cloudera.com ▪ If there is a problem, don’t feel ashamed to google! Co-graph confidential
  • 9. Set Up a Hadoop Cluster ▪ After everything is done, install clouderamanager-installer.bin from the Cloudera Downlaods page ▪ Change the permission and install ▪ Login to admin console on http://<Server host>:7180 ▪ Follow the steps by Cloudera Manager ▪ Done! Co-graph confidential
  • 19. Services to Add Co-graph confidential
  • 21. Information of a Host Co-graph confidential
  • 22. More about Cloudera Manager ▪ Easy to upgrade your CHD version ▪ Easy to add/delete a host and a cluster ▪ Easy to configure High Availability (HA) ▪ Support Hadoop security by using Kerberos ▪ Support backup and disaster recovery Co-graph confidential
  • 23. For Developer ▪ Use Hue (another topic) Co-graph confidential
  • 25. Flume A Data Collection Tool Co-graph confidential
  • 26. Two Ways to Use Flume Independent of Hadoop cluster • Flume can totally run by itself • Configure flume.conf in /etc/flume-ng/conf On cluster of Hadoop Or a node managed by Cloudera Manager • Easy to keep the agent nodes under control • Start, Stop, Restart service on admin console • Configure flume on admin console • Convenient to check log file Co-graph confidential
  • 27. 3 Important Settings Source • Define what kind of events sent by external source to accept Channel • Define which way to keep the event until it’s consumed by a Flume sink Sink • Define which repository like HDFS or Flume agent to put/forward the event kept in Channel Co-graph confidential
  • 28. Type Example ▪ Source ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ Avro Source Exec Source JMS Source NetCat Source Syslog TCP Source Syslog UDP Source HTTP Source Thrift Legacy Source …etc ▪ Channel ▪ Memory Channel ▪ JDBC Channel ▪ File Channel ▪ Pseudo Transaction Channel ▪ Custom Channel Co-graph confidential ▪ Sink ▪ HDFS Sink ▪ Logger Sink ▪ Avro Sink ▪ Thrift Sink ▪ IRC Sink ▪ File Roll Sink ▪ HBaseSink ▪ …etc