0% found this document useful (0 votes)

87 views

Unit 4 Hadoop Eco System PDF

The document discusses Apache Pig, a platform for analyzing large datasets that provides a high-level language called Pig Latin for expressing data analysis programs as sequences of transformations. It explains how Pig abstracts away the complexity of MapReduce and allows for easier development of programs for processing structured and unstructured data stored in Hadoop. Key features of Pig include its rich set of operators, ease of programming through a SQL-like language, ability to optimize execution, and support for user-defined functions.

Uploaded by

january

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views

Unit 4 Hadoop Eco System PDF

Uploaded by

january

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

CSE Department

Hadoop Eco System

Unit 4

Dr Rakesh Ranjan Kumar

Assistant Professor
3/28/2023 1
PIG: A Big Data Processor

3/28/2023 2
Apache Pig : Introduction
 Pig is an open-source technology (introduced by Yahoo) that is part of
the Hadoop ecosystem for processing a high volume of data
(structured, unstructured, semi-structured).

 Provides abstraction over MapReduce.

 It is used to analyze large sets of data, as well as to represent them as

data flows.

 Pig is not an acronym; it was named after a domestic animal. As an

animal pig eats anything, pig can work upon any kind of data.
Apache Pig : Contd…
 It has a high-level scripting language known as pig Latin scripts that
help programmers to develop their own functions for reading, writing,
and processing data.

 A component known as Pig Engine is present inside Apache Pig in

which Pig Latin scripts are taken as input and these scripts gets
converted into Map-Reduce jobs.

Fun Fact:
10 lines of pig Latin= approx. 200 lines of Map-Reduce Java Program
Conti...

3/28/2023 5
Why go for Pig when MR is there?

3/28/2023 6
Why go for Pig when MR is there?

3/28/2023 7
Why go for Pig when MR is there?

3/28/2023 8
Why go for Pig when MR is there?

3/28/2023 9
Why Apache Pig?

3/28/2023 10
Apache Pig: Features
Features of Pig
• Rich set of operators: It provides many operators to perform operations like join,
sort, filer, etc.
• Ease of programming: Pig Latin is similar to SQL and it is easy to write a Pig script
if you are good at SQL.
• Optimization opportunities: The tasks in Apache Pig optimize their execution
automatically, so the programmers need to focus only on semantics of the
language.
• Extensibility: Using the existing operators, users can develop their own functions
to read, process, and write data.
• UDF’s: Pig provides the facility to create User-defined Functions in other
programming languages such as Java and invoke or embed them in Pig Scripts.
• Handles all kinds of data: Apache Pig analyzes all kinds of data, both structured
as well as unstructured. It stores the results in HDFS.
3/28/2023 12
Apache Pig – Component

3/28/2023 13
Apache Pig – Components

3/28/2023 14
Pig Architecture

3/28/2023 15
Apache Pig – Architectural Components
• Parser: Initially the Pig Scripts are handled by the Parser. It checks the syntax of
the script, does type checking, and other miscellaneous checks. The output of the
parser will be a DAG (directed acyclic graph), which represents the Pig Latin
statements and logical operators.
• Optimizer: The logical plan (DAG) is passed to the logical optimizer, which carries
out the logical optimizations such as projection and pushdown.
• Compiler: The compiler compiles the optimized logical plan into a series of
MapReduce jobs.
• Execution engine: Finally the MapReduce jobs are submitted to Hadoop in a
sorted order. Finally, these MapReduce jobs are executed on Hadoop producing
the desired results.

3/28/2023 16
Apache Pig – Execution Modes

3/28/2023 17
Apache Pig – Interaction Modes

3/28/2023 18
Apache Pig : Job Execution Flow

 The programmer creates a Pig Latin script which is in the local file
system as a function.

 Once the pig script is submitted it connect with a compiler which

generates a series of MapReduce jobs.

 Pig compiler gets raw data from HDFS perform operations.

 The result files are again placed in the Hadoop File System (HDFS)
after the MapReduce job is completed.
How Apache Pig Work

3/28/2023 20
Apache Pig – Data Models

3/28/2023 21
Apache Pig Data Model - Tuple and Bag

3/28/2023 22
Apache Pig Data Model - Map and Atom

3/28/2023 23
Apache Pig - Commands

3/28/2023 24
Pig Case Study – Twitter

3/28/2023 25
Pig Case Study – Twitter

3/28/2023 26
Pig Case Study – Twitter

3/28/2023 27
Pig Case Study – Twitter

3/28/2023 28
Pig Case Study – Twitter

3/28/2023 29
Pig Case Study – Twitter

3/28/2023 30
Pig vs SQL

3/28/2023 31
Apache Pig – Applications
• Processes large volume of data
• Supports quick prototyping and ad-hoc queries across large datasets
• Performs data processing in search platforms
• Processes time-sensitive data loads
• Used by telecom companies to de-identify the user call data information.
• How Yahoo! Uses Pig:
• Yahoo uses Pig for the following purpose:
• In Pipelines – To bring logs from its web servers, where these logs undergo a cleaning step to remove
bots, company interval views and clicks.
• In Research – To quickly write a script to test a theory. Pig Integration makes it easy for the researchers to
take a Perl or Python script and run it against a huge data set.

3/28/2023 32
Apache Hive: Data Warehousing
& Analytics on Hadoop

3/28/2023 33
Hive History

3/28/2023 34
Hive Introduction

3/28/2023 35
Hive Introduction
• Hive is a data warehouse infrastructure tool used to process structured data
stored in HDFS cluster.
• It resides on top of Hadoop to summarize Big Data, and makes querying and
analyzing easy.
• Initially Hive was developed by Facebook, Later the apache Software
Foundation took it up and developed it further as an open source under the
name of Apache Hive.
• It is used by different companies.
• For example Amazon, Facebook, Netflix
3/28/2023 36
Hive Introduction

3/28/2023 37
Need of Hive

3/28/2023 38
Apache Hive: Features
 Open-source: Apache Hive is an open-source tool.

 Query large datasets: Hive can query and manage huge datasets stored in Hadoop
Distributed File System.

 Multiple-users: Multiple users can query the data using Hive Query Language (HQL)
simultaneously.

 File-formats: Hive provides support for various file formats such as textFile, ORC, Avro Files,
SequenceFile, Parquet, RCFile, LZO Compression etc.

 Built-In function: Hive provides various Built-In functions. For example, abs(), round(),
isnull().

 User-Defined Functions: It also provides support for User-Defined Functions for the tasks
like data cleansing and filtering.
Apache Hive: Contd…
 Fast: Hive is a fast, scalable, extensible tool and uses familiar concepts.

 Table Structure: Table structure in Hive is similar to table structure in RDBMS.

 ETL support: Hive supports ETL operations.

 Storage: Hive allows us to access files stored in HDFS and other similar data storage systems
such as HBase.

 Ad-hoc queries: Hive allows us to run Ad-hoc queries which are the loosely typed command
or query whose value depends on some variable for the data analysis.
 Adhoc Query: Example:- var adSQL = "SELECT * FROM table WHERE id = " + myId
A different query for each time that line of code is executed, depending on the value of
myId.
 Data Visualization: Hive can be used for Data Visualization. Integrating Hive with Apache
Tez will provide the real time processing capabilities.
Limitations of Hive
• Doesn’t support subqueries
• Subqueries are not supported.
• Latency
• The latency in the apache hive query is very high.
• Only non-real or cold data is supported
• Hive is not used for real-time data querying since it takes a while to produce a result.
• Transaction processing is not supported
• HQL does not support the Transaction processing feature.

3/28/2023 41
Architecture of Hive

3/28/2023 42
Apache Hive: Contd…
Hive chiefly consists of three core parts:

 Hive Clients: Hive offers a variety of drivers designed for communication with
different applications. For example, Hive provides Thrift clients for Thrift-based
applications. These clients and drivers then communicate with the Hive server,
which falls under Hive services.

 Hive Services: Hive services perform client interactions with Hive. For example, if
a client wants to perform a query, it must talk with Hive services.

 Hive Storage and Computing: Hive services such as file system, job client, and
meta store communicates with Hive storage and stores things like metadata
table information and query results.
Apache Hive: Hive Client
 Hive allows writing applications in various languages, including Java,
Python, and C++. It supports different types of clients such as:-

 Thrift Server - It is a cross-language service provider platform that

serves the request from all those programming languages that supports
Thrift.

 JDBC Driver - It is used to establish a connection between hive and Java

applications.

 ODBC Driver - It allows the applications that support the ODBC protocol
to connect to Hive.
Apache Hive: Hive Services
 Hive CLI - The Hive CLI (Command Line Interface) is a shell used to execute Hive queries and
commands.
 Hive Web User Interface - The Hive Web UI is just an alternative of Hive CLI. It provides a web-
based GUI for executing Hive queries and commands.
 Hive Server - It is referred to as Apache Thrift Server. It accepts the request from different clients
and provides it to Hive Driver.
 Hive Driver - It receives queries from different sources like web UI, CLI, Thrift, and JDBC/ODBC
driver and transfers the queries to the compiler.
 Hive MetaStore - It is a central repository that stores all the structure information of various
tables and partitions in the warehouse, metadata of column and its type information, the
serializers and deserializers which is used to read and write data and the corresponding HDFS
files where the data is stored.
 Hive Compiler - The purpose of the compiler is to parse the query and perform semantic analysis
on the different query blocks and expressions. It converts HiveQL statements into MapReduce
jobs.
 Hive Execution Engine - Optimizer generates the logical plan in the form of DAG of map-reduce
tasks and HDFS tasks.
Apache Hive: Hive Driver
Data Flow in Hive

3/28/2023 47
Apache Hive: Contd…

 Execute a query, which goes into the driver.

 The driver asks for the query execution plan.
 The compiler gets the metadata from the metastore.
 The metastore responds with the metadata.
 The compiler gathers this information and sends the plan back to the driver.
 The driver sends the execution plan to the execution engine.
 The execution engine acts as a bridge between the Hive and Hadoop to process
the query.
 The execution engine also communicates bidirectionally with the metastore to
perform various operations, such as create and drop tables
 Finally, a bidirectional communication is done to fetch and send results back to
the client.
Hive Data Modelling

3/28/2023 49
Apache Hive: Contd…
Apache Hive: Data Types
Different modes of Hive

3/28/2023 52
Hive vs RDBMS

3/28/2023 53
Hive Commands
 Command in Hive:
 Hive DDL (Create, View, Drop, Alert, Use)
 Hive DML (Load, Insert, Update, Delete)
 Data Retrieval Queries (Select, Where, Group BY, Limit)
 Joins in Hive (Inner, outer, Full Join)

 Built in Function in Hive.

Hive vs Pig

3/28/2023 55
HiveQL vs Pig Latin

3/28/2023 56
Hive Vs Pig Vs SQL
Hbase: Large Scale Data
Management

3/28/2023 58
Hbase History

3/28/2023 59
HBase: Why??
Hbase Introduction

3/28/2023 61
HBase: Introduction
 Hbase is an open-source non-relational distributed database written in Java. It is runs on
top of HDFS.

 Hbase is a database management system designed in 2007 by Powerset, a Microsoft

company.

 Hbase is a column –oriented database and enables real-time analysis of data.

 It can store huge amount of data in tabular format (rows and columns) for extremely fast
reads and writes.

 Hbase is mostly used in a scenario that requires regular and consistent inserting and
overwriting of data.
NoSQL Types

3/28/2023 63
Hbase Use Case

3/28/2023 64
Feature of Hbase
• Linear and modular scalability: It is highly scalable, which means, we can add more
machines to its cluster.
• Easy to use Java API for client access: HBase has been developed with the robust Java API
support (client/server) which is simple to create and easy to execute.
• Thrift gateway and RESTful Web services: To support the front end apart from Java
programing language, it supports Thrift and REST API.
• Atomic read and write: On a row level, HBase provides atomic read and write. It can be
explained as, during one read or write process, all other processes are prevented from
performing any read or write operations.
• Consistent reads and writes: HBase provides consistent reads and writes due to above
feature.
• Automatic and configurable sharding of tables: HBase tables are distributed across clusters
and these clusters are distributed across regions. These regions and clusters split, and are
redistributed as the data grows
3/28/2023 65
Application of Hbase
• Medical: The medical industry uses HBase to store the data related to patients such
as patient diseases, information such as age, gender, etc., to run MapReduce on it.
• Sports: Sports industry uses HBase to store the information related to the matches.
This information would help perform analytics and in predicting the outcomes in
future matches.
• Web: The web is using the HBase services to store the history searches of the
customers. This search information helps the companies to target the customer
directly with the product or service that they had searched for.
• Oil and petroleum: HBase is used to store the exploration data which helps in
analysing and predicting the areas where oil can be found.
• E-commerce: E-commerce is using HBase to record the customer logs and the
products they are searching for. It enables the organizations in targeting the
customer with the ads to induce him to buy their products or services.
• Other fields: Hbase is being employed in different fields where data is the most
important factor and needs to store petabytes of data to conduct the analysis
3/28/2023 66
Companies Using HBase
• Mozilla
• “Mozilla” uses HBase to store all crash data in HBase
• Facebook
• To store real-time messages, “Facebook” uses HBase storage.
• Twitter
• A company like Twitter also runs HBase across its entire Hadoop cluster. For
them, HBase offers a distributed, read/write the backup of all MySQL tables in their
production backend.
• Yahoo!
• One of the most famous companies Yahoo! also uses HBase. There HBase helps to store
document fingerprint in order to detect near-duplicates.

3/28/2023 67
HBase: Column Oriented Storage
HBase: Architecture
 HBase has two types of nodes: Master and RegionServer
Hbase Architectural component

3/28/2023 70
Data Storage in Hbase

3/28/2023 71
www.youtube.com/c/powerupwithpowerpoint
Hbase Architectural component

3/28/2023 72
Hbase Architectural component

3/28/2023 73
Hbase Architectural component

3/28/2023 74
Hbase vs Hive
• Hive and HBase are two different Hadoop based technologies

• Hive is an SQL-like engine that runs MapReduce jobs, and

• HBase is a NoSQL key/value database on Hadoop.

• Just like Google can be used for search and Facebook for social networking,
Hive can be used for analytical queries while HBase for real-time querying.

3/28/2023 75
Hbase vs RDBMS

3/28/2023 76
References
1. https://data-flair.training/blogs/hadoop-pig-tutorial/
2. https://www.educba.com/pig-architecture/
3. https://www.youtube.com/watch?v=Hve24pRW_Ps
4. https://www.simplilearn.com/tutorials/hadoop-
tutorial/hive#data_flow_in_hive
5. https://www.youtube.com/watch?v=rr17cbPGWGA&t=1089s
6. https://www.simplilearn.com/tutorials/hadoop-
tutorial/hbase#introduction_to_hbase
Thank You

Rakesh Ranjan Kumar

CSE Department

[email protected]

7 0 7 0 2 5 4 4 8 6
w w3/28/2023
w.cgu-odisha.edu.in 78

94 KW T600 Warning Lamp Module
No ratings yet
94 KW T600 Warning Lamp Module
1 page
Distribution Board Schedule
No ratings yet
Distribution Board Schedule
1 page
Types of Electric Welding
No ratings yet
Types of Electric Welding
7 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
18 pages
Bda 06
No ratings yet
Bda 06
15 pages
Unit-5 (1) BD
No ratings yet
Unit-5 (1) BD
18 pages
DA Unit-5
No ratings yet
DA Unit-5
78 pages
BDA Module-4
No ratings yet
BDA Module-4
4 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Unit v Notes
No ratings yet
Unit v Notes
17 pages
6 H Data With Hive Big Data Analytics B.tech. Final Year
No ratings yet
6 H Data With Hive Big Data Analytics B.tech. Final Year
24 pages
Unit 4
No ratings yet
Unit 4
29 pages
BDA IA-3 QB-1[1]
No ratings yet
BDA IA-3 QB-1[1]
17 pages
S Pig Hive HBase Zookeeper 07
No ratings yet
S Pig Hive HBase Zookeeper 07
21 pages
Cse 17CS82 M2 S1 PPT
No ratings yet
Cse 17CS82 M2 S1 PPT
35 pages
Unit-V CC&BD CS62
No ratings yet
Unit-V CC&BD CS62
73 pages
BDA-NOTES-JNTUK-R20-UNIT-4
No ratings yet
BDA-NOTES-JNTUK-R20-UNIT-4
14 pages
BD 5
No ratings yet
BD 5
28 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
5 PIG and HIVE
No ratings yet
5 PIG and HIVE
81 pages
What Is Apache Pig
No ratings yet
What Is Apache Pig
8 pages
Hadoop Intro - Part1
No ratings yet
Hadoop Intro - Part1
45 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
data_analytics_chapter_5
No ratings yet
data_analytics_chapter_5
14 pages
Unit 5
No ratings yet
Unit 5
39 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
UNIT 5-1
No ratings yet
UNIT 5-1
8 pages
Big Data Analytics QP
No ratings yet
Big Data Analytics QP
36 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
4 pages
Module 5_data analytics
No ratings yet
Module 5_data analytics
4 pages
Hive
No ratings yet
Hive
5 pages
BigData Analytics Unit-V
No ratings yet
BigData Analytics Unit-V
21 pages
KCS 061 - Big Data - Unit V
No ratings yet
KCS 061 - Big Data - Unit V
17 pages
Hive Pig
No ratings yet
Hive Pig
20 pages
bda report
No ratings yet
bda report
16 pages
bda-unit-4-060115-big-data-analytics-unit-4
No ratings yet
bda-unit-4-060115-big-data-analytics-unit-4
19 pages
4 Hadoop Ecosystem
No ratings yet
4 Hadoop Ecosystem
16 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
Big_Data_Unit-5
No ratings yet
Big_Data_Unit-5
81 pages
Notes - 5 Unit Big Data
No ratings yet
Notes - 5 Unit Big Data
22 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
52 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
bdcc-2.4
No ratings yet
bdcc-2.4
5 pages
Hive Tutorial
No ratings yet
Hive Tutorial
19 pages
Big Data Testing
100% (1)
Big Data Testing
34 pages
Hadoop Tools - A Brief Overview
No ratings yet
Hadoop Tools - A Brief Overview
18 pages
Apache Pig in noSql Databases
No ratings yet
Apache Pig in noSql Databases
5 pages
Unit 5
No ratings yet
Unit 5
5 pages
Lecture 4 - Hadoop Ecosystem - 1691899782480
No ratings yet
Lecture 4 - Hadoop Ecosystem - 1691899782480
36 pages
S_Pig_Hive_HBase_Zookeeper
No ratings yet
S_Pig_Hive_HBase_Zookeeper
19 pages
13 Lecture
No ratings yet
13 Lecture
23 pages
DA Unit 5
No ratings yet
DA Unit 5
191 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
4th Unit DM
No ratings yet
4th Unit DM
17 pages
Notes - 4 Unit-Big Data
No ratings yet
Notes - 4 Unit-Big Data
38 pages
Big Data Hadoop - Course Curriculum - V1
No ratings yet
Big Data Hadoop - Course Curriculum - V1
7 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Web Based Data Management of Apache Hive
No ratings yet
Web Based Data Management of Apache Hive
22 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Exp-08-Sport Data Analysis Using Hive
No ratings yet
Exp-08-Sport Data Analysis Using Hive
2 pages
UNIT1-Overview of Computing Paradigm2
No ratings yet
UNIT1-Overview of Computing Paradigm2
63 pages
ML Module 5 2022 PDF
100% (2)
ML Module 5 2022 PDF
31 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
TCS & P&P PDF
No ratings yet
TCS & P&P PDF
27 pages
Final Project Report
100% (1)
Final Project Report
48 pages
Instant ebooks textbook Fundamentals of Spatial Analysis and Modelling 1st Edition Jay Gao download all chapters
100% (4)
Instant ebooks textbook Fundamentals of Spatial Analysis and Modelling 1st Edition Jay Gao download all chapters
37 pages
The Habsburg-Ottoman Wars
No ratings yet
The Habsburg-Ottoman Wars
2 pages
AUKEY_Catalogue
No ratings yet
AUKEY_Catalogue
46 pages
Product Specification-Pumpkin Seed Kernels A
No ratings yet
Product Specification-Pumpkin Seed Kernels A
4 pages
Architectural Signs and Symbols
No ratings yet
Architectural Signs and Symbols
62 pages
Frantz Schmidt 16th-17th C Executioner
No ratings yet
Frantz Schmidt 16th-17th C Executioner
2 pages
Instant Access to Principles of Microeconomics N. Gregory Mankiw ebook Full Chapters
No ratings yet
Instant Access to Principles of Microeconomics N. Gregory Mankiw ebook Full Chapters
54 pages
Java 7 - GC Cheatsheet
No ratings yet
Java 7 - GC Cheatsheet
1 page
Share MODULE 3 Physical Fitness Test
No ratings yet
Share MODULE 3 Physical Fitness Test
13 pages
Horizontal Ducted Furred-In Unit With Plenum
No ratings yet
Horizontal Ducted Furred-In Unit With Plenum
3 pages
Organizational Development: Books To Be Read
No ratings yet
Organizational Development: Books To Be Read
49 pages
Grade 9 - Unit 8 - Global Success - practice test 1
100% (1)
Grade 9 - Unit 8 - Global Success - practice test 1
10 pages
Balance Scorecard
No ratings yet
Balance Scorecard
3 pages
Unit 1
No ratings yet
Unit 1
3 pages
1968 Ford Mustang
No ratings yet
1968 Ford Mustang
7 pages
Agreement and Disagreement
No ratings yet
Agreement and Disagreement
12 pages
Jaggery
No ratings yet
Jaggery
2 pages
(Over 150 Million User) 1
100% (2)
(Over 150 Million User) 1
4 pages
Identifying Emotions Lesson
No ratings yet
Identifying Emotions Lesson
2 pages
BUBA Presentation Set 110813 - 201311150709479471 PDF
No ratings yet
BUBA Presentation Set 110813 - 201311150709479471 PDF
7 pages
Matrix Algebra For Engineers: Jeffrey R. Chasnov
No ratings yet
Matrix Algebra For Engineers: Jeffrey R. Chasnov
143 pages
Official List
No ratings yet
Official List
1 page
Fuel Rail
No ratings yet
Fuel Rail
27 pages
Individual Behavior: by Syeda Zeerak
No ratings yet
Individual Behavior: by Syeda Zeerak
13 pages

Unit 4 Hadoop Eco System PDF

Uploaded by

Unit 4 Hadoop Eco System PDF

Uploaded by

CSE Department

Hadoop Eco System

Dr Rakesh Ranjan Kumar

 Provides abstraction over MapReduce.

 It is used to analyze large sets of data, as well as to represent them as

 Pig is not an acronym; it was named after a domestic animal. As an

 A component known as Pig Engine is present inside Apache Pig in

 Once the pig script is submitted it connect with a compiler which

 Pig compiler gets raw data from HDFS perform operations.

 Table Structure: Table structure in Hive is similar to table structure in RDBMS.

 ETL support: Hive supports ETL operations.

 Thrift Server - It is a cross-language service provider platform that

 JDBC Driver - It is used to establish a connection between hive and Java

 Execute a query, which goes into the driver.

 Built in Function in Hive.

 Hbase is a database management system designed in 2007 by Powerset, a Microsoft

 Hbase is a column –oriented database and enables real-time analysis of data.

• Hive is an SQL-like engine that runs MapReduce jobs, and

• HBase is a NoSQL key/value database on Hadoop.

Rakesh Ranjan Kumar

You might also like