Interview Questions - Hive and Querying
Interview Questions - Hive and Querying
Both DISTRIBUTE BY and CLUSTER BY are used for categorising query results
on the basis of one or more columns. CLUSTER BY is a shortcut for both
DISTRIBUTE BYand SORT BY. Hive uses the columns in DISTRIBUTE BY to
distribute the rows among the reducers. All rows with the same DISTRIBUTE BY
columns will go to the same reducer. However, DISTRIBUTE BY does not
guarantee clustering or sorting of properties.
Internal Table: An Internal Table in Hive is similar to the tables in RDBMSes.
The data and the table schema are tightly coupled in internal tables. If you drop
an internal table in Hive, the data stored in it will get deleted.
Hive has a relational DBMS data HBase has a columnar data model.
model.
Apache Hive has high latency as HBase provides a random and fast
compared with HBase. Hence, it is not lookup on top of HDFS, which allows
preferred for looking up individual a user to query for individual records.
records.
Answer. Hive does not support insert and update functions at a row-level, which makes it
unsuitable for OLTP systems. Note: OLTP is an online transaction-processing
system that involves INSERT, UPDATE and DELETE operations.