Hive Installation on Windows 10
Hive Installation on Windows 10
Hive Introduction
In reference to Hadoop and HBase outline as well installation over Windows environment,
already we have been talked and gone through the same in my previous post. We came to know
that Hadoop can perform only batch processing, and data will be accessed only in a sequential
manner. It does mean one has to search the entire data-set even for the simplest of jobs.
In such scenario, a huge data-set when processed results in another huge data set, which should
also be processed sequentially. At this point, a new solution is needed to access any point of
data in a single unit of time (random access). Here the HBase can store massive amounts of
data from terabytes to petabytes and allows fast random reads and writes that cannot be handled
by the Hadoop.
HBase is an open source non-relational (NoSQL) distributed column-oriented database that runs
on top of HDFS and real-time read/write access to those large data-sets. Initially, it was Google
Big Table, afterwards it was re-named as HBase and is primarily written in Java, designed to
provide quick random access to huge amounts of the data-set.
Next in this series, we will walk through Apache Hive, the Hive is a data warehouse infrastructure
work on Hadoop Distributed File System and MapReduce to encapsulate Big Data, and makes
querying and analyzing stress-free. In fact, it is an ETL tool for Hadoop ecosystem, enables
developers to write Hive Query Language (HQL) statements very similar to SQL statements.
Hive Installation
In brief, Hive is a data warehouse software project built on top of Hadoop, that facilitate reading,
writing, and managing large datasets residing in distributed storage using SQL. Honestly, before
moving ahead, it is essential to install Hadoop first, I am considering Hadoop is already installed,
if not, then go to my previous post how to install Hadoop on Windows environment.
I went through Hive (2.1.0) installation on top of Derby Metastore (10.12.1.1), though you can
use any stable version.
Download hive-site.xml
https://mindtreeonline-my.sharepoint.com/:u:/g/personal/m1045767_mindtree_com1/
EbsE-U5qCIhIpo8AmwuOzwUBstJ9odc6_QA733OId5qWOg?e=2X9cfX
https://drive.google.com/file/d/1qqAo7RQfr5Q6O-GTom6Rji3TdufP81zd/view?
usp=sharing
Extract file apache-hive-2.1.0-bin.tar.gz and place under "D:\Hive", you can use any preferred
location –
Similar to Hive, extract file db-derby-10.12.1.1-bin.tar.gz and place under "D:\Derby", you can
use any preferred location –
STEP - 3: Moving hive-site.xml file
Set the path for the following Environment variables (User Variables) on windows 10 –
HIVE_HOME - D:\Hive\apache-hive-2.1.0-bin
HIVE_BIN - D:\Hive\apache-hive-2.1.0-bin\bin
HIVE_LIB - D:\Hive\apache-hive-2.1.0-bin\lib
DERBY_HOME - D:\Derby\db-derby-10.12.1.1-bin
HADOOP_USER_CLASSPATH_FIRST - true
This PC - > Right Click - > Properties - > Advanced System Settings - > Advanced - >
Environment Variables
STEP - 6: Configure System variables
Next onward need to set System variables, including Hive bin directory path –
HADOOP_USER_CLASSPATH_FIRST - true
Variable: Path
Value:
1. D:\Hive\apache-hive-2.1.0-bin\bin
2. D:\Derby\db-derby-10.12.1.1-bin\bin
Now need to do a cross check with Hive configuration file for Derby details –
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.ClientDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>hive.server2.enable.impersonation</name>
<description>Enable user impersonation for HiveServer2</description>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
<description> Client authentication types. NONE: no authentication check LDAP: LDAP/AD
based authentication KERBEROS: Kerberos/GSSAPI authentication CUSTOM: Custom
authentication provider (Use with property hive.server2.custom.authentication.class)
</description>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
</configuration>
Open command prompt and change directory to “D:\Hadoop\hadoop-2.8.0\sbin" and type "start-
all.cmd" to start apache.
Since the ‘start-all.cmd’ command has been deprecated so you can use below command in order
wise -
“start-dfs.cmd” and
“start-yarn.cmd”
Derby server has been started and ready to accept connection so open a new command prompt
under administrator privileges and move to hive directory as “D:\Hive\apache-hive-2.1.0-bin\bin”
–
[5] Usage of LOAD Command for Inserting Data Into Hive Tables
Create a sample text file using ‘|’ delimiter –
[6] Hive Select Data from Table -
SELECT * FROM STUDENTS;