How-To - Install CDH On Mac OSX 10
How-To - Install CDH On Mac OSX 10
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Support
Developers
Contact Us
Downloads
Search
COMMUNITY
FAQs
Blog
Accumulo (1)
Avro (16)
Bigtop (6)
Books (11)
Careers (14)
CDH (150)
DOWNLOADS
TRAINING
BLOGS
Our Customers
DOCUMENTATION
11 comments
This overview will cover the basic tarball setup for your Mac.
If youre an engineer building applications on CDH and becoming familiar with all the rich features for designing the
next big solution, it becomes essential to have a native Mac OSX install. Sure, you may argue that your MBP with its
four-core, hyper-threaded i7, SSD, 16GB of DDR3 memory are sufficient for spinning up a VM, and in most instances
such as using a VM for a quick demo youre right. However, when experimenting with a slightly heavier workload
that is a bit more resource intensive, youll want to explore a native install.
In this post, I will cover setup of a few basic dependencies and the necessities to run HDFS, MapReduce with YARN,
Apache ZooKeeper, and Apache HBase. It should be used as a guideline to get your local CDH box setup with the
objective to enable you with building and running applications on the Apache Hadoop stack.
Note: This process is not supported and thus you should be comfortable as a self-supporting sysadmin. With that in
mind, the configurations throughout this guideline are suggested for your default bash shell environment that can be
set in your ~/.profile.
Dependencies
Cloud (18)
Install the Java version that is supported for the CDH version you are installing. In my case for CDH 5.1, Ive installed
JDK 1.7 u67.Historically the JDK for Mac OSX was only available from Apple, but since JDK 1.7, its available directly
through Oracles Java downloads. Download the .dmg (in the example below, jdk-7u67-macosx-x64.dmg) and
install it.
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
1/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Community (206)
Data Ingestion (20)
Data Science (33)
Events (45)
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/Home"
Note:Youll notice that after installing the Oracle JDK, the original path used to manage versioning
/System/Library/Frameworks/JavaVM.framework/Versions, will not be updated and you now have the
control to manage your versions independently.
Enable ssh on your mac by turning on remote login. You can find this option under your toolbars Appleicon > System
Preferences > Sharing.
Flume (21)
General (334)
2/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Mahout (5)
MapReduce (73)
Homebrew
Parquet (12)
Another toolkit I admire is Homebrew, a package manager for OSX. While Xcode developer command-line tools are
great, the savvy naming conventions and ease of use of Homebrew get the job done in a fun way.
Pig (36)
I havent needed Homebrew for much else than for installing dependencies required for building native Snappy
libraries for Mac OSX and ease of install of MySQL for Hive. Snappy is commonly used within HBase, HDFS, and
MapReduce for compression and decompression.
CDH
QuickStart VM (5)
Finally, the easy part: The CDH tarballs are very nicely packaged and easily downloadable from Clouderas repository.
Ive downloaded tarballs for CDH 5.1.0.
Search (23)
Download and explode the tarballs in a libdirectory where you can manage latest versions with a simple symlink as
Security (29)
the following.Although Mac OSXs Make Alias feature is bi-directional, do not use it, but instead use your commandline ln -s command, such as ln -s source_file target_file.
Performance (12)
Sentry (1)
/Users/jordanh/cloudera/
cdh5.1/
Spark (32)
Sqoop (24)
Support (5)
Testing (8)
This Month In The
Ecosystem (15)
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
3/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Tools (9)
Training (45)
Use Case (67)
nn/
pids
tmp/
zk/
Youll notice above that youve created a handful of directories under a folder named ops. Youll use them later to
Whirr (6)
customize the configuration of the essential components for running Hadoop. Set your environment properties
according to the paths where youve exploded your tarballs.
YARN (15)
ZooKeeper (24)
Archives by Month
Shell
~/.profile
CDH="cdh5.1"
export HADOOP_HOME="/Users/jordanh/cloudera/${CDH}/hadoop"
export HBASE_HOME="/Users/jordanh/cloudera/${CDH}/hbase"
export HIVE_HOME="/Users/jordanh/cloudera/${CDH}/hive"
export HCAT_HOME="/Users/jordanh/cloudera/${CDH}/hive/hcatalog"
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZK_HOME}/bin:${HBASE_HOME
Update your main Hadoop configuration files, as shown in the sample files below. You can also download all files
referenced in this post directly from here.
$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
<description>The name of the default file system.A URI whose
scheme and authority determine the FileSystem implementation.The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/jordanh/cloudera/ops/tmp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.compression.codecs</name>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
XHTML
4/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.ha
<description>A comma-separated list of the compression codec classes that can
be used for compression/decompression. In addition to any classes specified
with this property (which take precedence), codec classes on the classpath
are discovered using a Java ServiceLoader.</description>
</property>
</configuration>
$
XHTML
H
A
<D
nO
aO
mP
e_
>H
dO
fM
sE
./
ne
at
mc
e/
nh
oa
dd
eo
.o
np
a/
mh
ed
.f
ds
irs
<i
/t
ne
a.
mx
em
>l
<value>/Users/jordanh/cloudera/ops/nn</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage).If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/jordanh/cloudera/ops/dn/</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks.If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>localhost:50075</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
5/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
I attribute the YARN and MRv2 configuration and setup from the CDH 5 installation docs. I will not digress into the
specifications of each property or the orchestration and details of how YARN and MRv2 operate, but theres some
great information that my colleague Sandy has already shared for developers and admins.
Be sure to make the necessary adjustments per your systems memory and CPU constraints. Per the image below, it
is easy to see how these parameters will affect your machines performance when you execute jobs.
XHTML
6/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<description>The maximum allocation for every container request at the RM,
in MBs. Memory requests higher than this won't take effect,
and will get capped to this value.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests lower than this won't take effect,
and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests higher than this won't take effect,
and will get capped to this value.</description>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/mapred-site.xml
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
<description>
The number of virtual cores required for each reduce task.
</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for reduces.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of maps.</description>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
XHTML
7/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Shell
# The directory where pid files are stored when processes run as daemons. /tmp by default.
export HADOOP_PID_DIR="/Users/jordanh/cloudera/ops/pids"
export YARN_PID_DIR=${HADOOP_PID_DIR}
You can configure HBase to run without separately downloading Apache ZooKeeper. Rather, it has a bundled package
that you can easily run as a separate instance or as standalone mode in a single JVM. I recommend using either
distributed or standalone mode instead of a separately downloaded ZooKeeper tarball on your machine for ease of
use, configuration, and management.
The primary difference with configuration between running HBase in distributed or standalone mode is with the
hbase.cluster.distributedproperty in hbase-site.xml. Set the property to false for launching HBase in
standalone mode or true to spin up separate instances for services such as HBases ZooKeeper and RegionServer.
Update the following configurations for HBase as specified to run it per this type of configuration.
Note regarding hbase-site.xml: Property hbase.cluster.distributedis set to false by default and will launch
in standalone mode. Also, hbase.zookeeper.quorumis set to localhost by default and does not need to be
overridden in our scenario.
$HBASE_HOME/conf/hbase-site.xml
false, startup will run all HBase and ZooKeeper daemons together
in the one JVM.
</description>
</property>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
XHTML
8/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/Users/jordanh/cloudera/ops/tmp/hbase-${user.name}</value>
<description>Temporary directory on the local filesystem.
Change this setting to point to a location more permanent
than '/tmp' (The '/tmp' directory is often cleared on
machine restart).
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/Users/jordanh/cloudera/ops/zk</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
<description>The directory shared by region servers and into
which HBase persists.The URL should be 'fully-qualified'
to include the filesystem scheme.For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase.By default HBase writes
into /tmp.Change this configuration else all data will be lost
on machine restart.
</description>
</property>
</configuration>
Shell
# Tell HBase whether it should manage its own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
9/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
10/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
11/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Service HBase
HBase Master/RegionServer/ZooKeeper
start: start-hbase.sh
stop: stop-hbase.sh
logs: /Users/jordanh/cloudera/ops/logs/hbase/
url: http://localhost:60010/master-status
Test
Shell
hbase shell
create 'URL_HITS', {NAME=>'HOURLY'},{NAME=>'DAILY'},{NAME=>'YEARLY'}
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090110', '10'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090111', '5'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090112', '30'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090113', '80'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090114', '7'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'DAILY:20140901', '10012'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'YEARLY:2014', '93310101'
scan 'URL_HITS'
12/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
in java.library.path
Resolution: Snappy libraries are not compiled for Mac OSX out of the box. A Snappy Java port was introduced in
CDH 5 and likely will require to be recompiled on your machine.
Shell
git clone https://github.com/xerial/snappy-java.git
cd snappy-java
make
cp target/snappy-java-1.1.1.3.jar $HADOOP_HOME/share/hadoop/common/lib/asnappy-java-1.1.1.3.jar
Landing Page
Creating a landing page will help consolidate all the HTTP addresses of the services that youre running. Please note
that localhost can be replaced with your local hostname (such as jakuza-mbp.local).
Service Apache HTTPD
start: sudo -s launchctl load -w /System/Library/LaunchDaemons/org.apache.httpd.plist
stop: sudo -s launchctl unload -w /System/Library/LaunchDaemons/org.apache.httpd.plist
logs: /var/log/apache2/
url: http://localhost/index.html
Create index.html (edit /Library/WebServer/Documents/index.html, which you can download here).
It will look something like this:
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
13/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Conclusion
With this guide, you should have a locally running Hadoop cluster with HDFS, MapReduce, and HBase. These are the
core components for Hadoop, and are good initial foundation for building and prototyping your applications locally.
I hope this will be a good starting point on your dev box to try out more ways to build your products, whether they are
data pipelines, analytics, machine learning, search and exploration, or more, on the Hadoop stack.
Jordan Hambleton is a Solutions Architect at Cloudera.
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
14/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Filed under:
CDH
General
11 Responses
KRIS / SEPTEMBER 16, 2014 / 11:41 PM
Hi Justin,
Might be valuable for reader to point the to 2 blogposts describing a local install and configuring it to run in local and
pseudo distributed mode. The second blogpost describes the way its automated with the help of ansible even.
Thanks in advance
http://blog.godatadriven.com/local-and-pseudo-distributed-cdh5-hadoop-on-your-laptop.html
http://blog.godatadriven.com/automated-cdh5-hadoop-on-your-laptop-with-ansible.html
Kris
CHEN, JIANZHONG / SEPTEMBER 17, 2014 / 6:31 AM
Hi, please add instructions for hive with mysql as the metastore. Hive is an essential ingredient of an Hadoop Ecosytem
more so than HBase. In any case thanks for putting together what is there so far (Including HBase). thanks.
JORDAN HAMBLETON / SEPTEMBER 25, 2014 / 11:37 AM
Hi Stephen,
Appreciate the comment. If youve followed the steps above, hive will work out of the box using its embedded metastore
using derby. Be sure to use the same local directory for all of the instances you use hive shell in order to use the same
metastore created.
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
15/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
In addition to the local metastore, you can install mysql via brew. Follow the config & setup from the link below. Ive
listed a few tips below.
https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/5.0/CDH5-InstallationGuide/cdh5ig_hive_metastore_configure.html
1. Install Mysql & Setup (mac conversions)
1.1. brew install mysql
1.2. follow instructions for mysql config from above CDH5 install link.
1.3. get mysql connector (ie. mysql-connector-java-5.1.16-bin.jar) and copy it to $HIVE_HOME/lib/
quick tips
* start mysql: mysql.server start
* stop mysql: mysql.server stop
Lastly, update hive-site.xml as specified on CDH5 install link above or find a copy on my github:
https://github.com/joropolis/misc-data/blob/master/blog-2014-09-01/hive-site.xml
Launch hive shell & if you have any issues, try running in debug mode.
hive -hiveconf hive.root.logger=DEBUG,console
If you see an error like the following, ensure the mysql connector jar is in $HIVE_HOME/lib/.
* The specified datastore driver (com.mysql.jdbc.Driver) was not found in the CLASSPATH
DEBASISH / OCTOBER 04, 2014 / 11:21 AM
16/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Thanks for the note Debasish. Yes, this is working without additional configuration. Did you check your yarn logs for the
DistributedShell command you executed (see snip below)? In my example, it will print out the top cpu hogs on your mac!
Also, note that only mapreduce job logs are viewable through the mapred historyserver web. Use the cmd line to view
your yarn logs based on the application id per example below.
$ yarn logs -applicationId application_1412661426311_0001
Container: container_1412661426311_0001_01_000002 on jakuza-mbp_54669
=======================================================================
LogType: stderr
LogLength: 0
Log Contents:
LogType: stdout
LogLength: 6724
Log Contents:
PID STAT %CPU TIME COMMAND
2703 S+ 25.0 0:01.76 /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/
159 Ss 14.4 1:42.13 /Library/StartupItems/SymAutoProtect/
122 Ss 5.1 6:30.38 /System/Library/Frameworks/ApplicationServices
2516 S+ 4.1 0:07.47 /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/
17/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
server).
MIKE B / NOVEMBER 29, 2014 / 11:01 AM
@Somnath:
I had the same problem, and I took a look in Makefile.vars, and it is looking for the python libs in /usr/include/python2.7
I ran the following, and it seems to work:
sudo mkdir /usr/include
sudo ln -s /usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/include/python2.7
/usr/include/python2.7
(You might have to adjust these slightly if youre using a different version of Python.)
Hope this helps.
IS IT POSSIBLE TO INSTALL CLOUDERA MANAGER AGENT ON MAC / DECEMBER 11, 2014 / 12:06 PM
Awesome post. I successfully configured it on my mac. One question is whether its possible to install cloudera manager
agent on mac. I have some old linux machines and Ive configured them through cloudera manager. I want to add my
mac to the cluster managed by cm. Thanks a lot.
SOON / DECEMBER 12, 2014 / 4:30 PM
Thanks, this was really helpful. I was able to install it successfully on my mac with no problems. Is it possible to run
HiveServer2?
JORDAN HAMBLETON / DECEMBER 29, 2014 / 1:13 PM
Soon, HiveServer2 requires configuring your hive client config property hive.metastore.uris in $HIVE_HOME/conf/hivesite.xml as below (a copy can be found from mentioned links).
hive.metastore.uris
thrift://localhost:9083
IP address (or fully-qualified domain name) and port of the metastore host
To start the metastore & hiveserver2, use the following commands:
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
18/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Leave a comment
Name
REQUIRED
REQUIRED
Website
Comment
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
19/20
2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Leave Comment
Products
Solutions
Partners
About
Cloudera Enterprise
Enterprise Solutions
Resource Library
Cloudera Express
Partner Solutions
Support
Management Team
Cloudera Manager
Industry Solutions
English
Follow us:
Board
CDH
Events
All Downloads
Press Center
Professional Services
Careers
Training
Contact Us
Share:
Subscription Center
Cloudera, Inc.
www.cloudera.com
2014 Cloudera, Inc. All rights reserved Terms & Conditions Privacy Policy
US: 1-888-789-1488
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
Intl: 1-650-362-0488
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
20/20