0% found this document useful (0 votes)
54 views

Introduction To HBase

HBase is a NoSQL database that is built on top of Hadoop and HDFS. It allows for the storage and fast retrieval of large amounts of unstructured and semi-structured data in the form of key-value pairs. HBase tables are split into rows and columns and can handle millions of rows. It provides high performance for both reads and writes through horizontal scaling.

Uploaded by

vishnu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Introduction To HBase

HBase is a NoSQL database that is built on top of Hadoop and HDFS. It allows for the storage and fast retrieval of large amounts of unstructured and semi-structured data in the form of key-value pairs. HBase tables are split into rows and columns and can handle millions of rows. It provides high performance for both reads and writes through horizontal scaling.

Uploaded by

vishnu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Introduction to HBase

For online Hadoop training, send mail to [email protected]


Agenda
Disadvantage of SQL
What is NoSQL ?
Advantages of NoSQL
What is HBase
HBase Vs Hadoop.
Conceptual view
Column family
Column family Vs Qualifier
Installation modes
Pros of HBase
Cons of HBase
Disadvantage of SQL Database

Can’t process large amount of data.

Not easily scalable.

No partial failure.

No fault tolerant.

Need a well defined schema .

Not an open source.

October 5, 2012 3
What is NoSQL ?
Stands for Not only SQL (NoSQL).

NoSQL can handle structured/unstructured


data.

NoSQL can use Hadoop for its storage.

NoSQL does not require a fixed table


schema.

HBase is an example of NoSQL.


Advantages of NoSQL
NoSQL can handle any size of data.

NoSQL is schema less.

Partial failure of system.

NoSQL uses HDFS for storage.

Fault tolerant is taken care by Hadoop.

Easy to implement (open source).


What is HBase
HBase is a subproject of Hadoop & developed by
Apache.

It is NoSQL database which can handle millions of


rows in a table.

Data is logically organized into tables, rows and


columns

HBase table contains data in key : value format.

HBase can handle unstructured data.


HBase Vs Hadoop
HBase uses HDFS for storing its tables data.

Big tables are splitted into smaller parts and stored


on HDFS.

HBase can also run in standalone mode, in which it


doesn't use Hadoop for storage.

HMaster is the master process of HBase which


controls all HBase activities.

HRegionServer is the slave process of HBase.


Conceptual View
A row has a sortable row key and an arbitrary number
of columns.

HBase can maintain versioned data.

You can configure how many versions you want.

HBase stores the current timestamp while inserting


the data.

HBase has concept of column family.

<column family>:<qualifier> makes a column


Column Family
Instead of columns, HBase has column families.

Column families are part of table schema.

Qualifiers are not part of table schema.

We can create as many qualifiers as required at


runtime.

Combination of column family & qualifier makes a


column.

Different rows can have different no. of columns.


Column Family Vs Qualifier
Installation modes

Standalone
Does not use Hadoop for storage.

Pseudo distributed
Uses single node Hadoop cluster for storing tables.

Fully distributed
Uses fully distributed Hadoop cluster for storing tables.
Pros of HBase

Distributed
Built on Hadoop HDFS
Handles Big Data
High performance for write and read
Scalable (auto-sharding)
Fault tolerant, no data loss
Cons of HBase

Does not support table join.


Does not support group by.
Does not support Indexes.
Columns are not the part of table schema.
Can lose it's data if Hadoop crashes.

Not suitable for transactions


…Thanks…

For online Hadoop training, send mail to [email protected]

You might also like