0% found this document useful (0 votes)

23 views

Interview Questions - Hive and Querying

Uploaded by

Junaid Sheikh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Interview Questions - Hive and Querying

Uploaded by

Junaid Sheikh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Interview Questions - Hive and Querying

1. Explain ORDER BY, CLUSTER BY, SORT BY and DISTRIBUTE BY in brief.

Answer. Both ORDER BY and SORT BY are used for sorting query results in ascending
or descending order. However, one of the differences between them is the way they
sort results. ORDER BY sorts the entire data using a reducer, whereas SORT BY
does not guarantee overall sorting of data. There may be overlapping data and it
might need more than one reducer.

Both DISTRIBUTE BY and CLUSTER BY are used for categorising query results
on the basis of one or more columns. CLUSTER BY is a shortcut for both
DISTRIBUTE BYand SORT BY. Hive uses the columns in DISTRIBUTE BY to
distribute the rows among the reducers. All rows with the same DISTRIBUTE BY
columns will go to the same reducer. However, DISTRIBUTE BY does not
guarantee clustering or sorting of properties.

2. Explain the difference between External and Internal tables.

Answer. External Table: Unlike RDBMSes where data and tables are tightly coupled,
data in External Tables is loosely coupled. External Tables reside in HDFS. Even
if you drop an external table in Hive, the data mapped to it remains intact inside
HDFS.

Internal Table: An Internal Table in Hive is similar to the tables in RDBMSes.
The data and the table schema are tightly coupled in internal tables. If you drop
an internal table in Hive, the data stored in it will get deleted.

3. In which scenarios do you use external tables?

Answer. We use external tables in the following scenarios:
1. When we need to store data in a custom location
2. Unlike Internal tables, if we delete external tables, they still continue to
reside in HDFS
3. Data from external tables should not be owned by Hive

4. Is Hive a database or a data warehouse? What are the key differences

between Hive and RDBMSes?
Answer. Hive is a data warehouse. The key difference between Hive and RDBMSes is
that an RDBMS is a traditional database where you can store only a limited
amount of data, whereas Hive is a data warehouse where you can store data in
bulk and also perform data analysis.

© Copyright 2020. upGrad Education Pvt. Ltd. All rights reserved

5. What type of Read and Write operations take place in Hive?
Answer. READ Many, WRITE Once

6. What are the instances where you can use Indexing?

Answer. The key instances where you can use indexing are as follows:
1. When the data set is large
2. When faster query execution is needed
3. For columns that are used more frequently than others
4. For read-heavy applications, where you need to read the data more
frequently

7. Differentiate between Hive and HBase.

Answer.
Hive HBase

Hive is a ‘data warehouse software’ HBase is a distributed data store built

that enables you to query and on top of HDFS, and it can leverage
manipulate data using an SQL-like all the benefits provided by Hadoop or
language known as HiveQL. HDFS.

Hive abstracts the programming HBase does not have a native

complexity of MapReduce and data-processing engine and relies on
provides a simple SQL-like language Map-Reduce and Spark APIs for data
known as HiveQL for querying data processing.
sets.

Hive has a relational DBMS data HBase has a columnar data model.
model.

Apache Hive has high latency as HBase provides a random and fast
compared with HBase. Hence, it is not lookup on top of HDFS, which allows
preferred for looking up individual a user to query for individual records.
records.

8. Can Hive be used as an OLTP system like MySQL?

Answer. Hive does not support insert and update functions at a row-level, which makes it
unsuitable for OLTP systems. Note: OLTP is an online transaction-processing
system that involves INSERT, UPDATE and DELETE operations.

9. What are the limitations of Hive?

Answer. Some limitations of Hive are as follows:

● Hive does not support insert and update functions at a row-level, which
makes it unsuitable for OLTP systems.
● Hive does not support real-time processing.
● Hive queries have high latency due to the start-up overhead of the
MapReduce job.
10. How does Hive improve performance with tables in ORC format?
Answer. Using the ORC format leads to a reduction in the size of the data stored, as this
file format has high compression ratios. As the data size is reduced, the time to
read and write the data is also reduced. The ORC format improves query
performance also by the way it stores data in a file. Data is stored in a columnar
format and columns that are not needed in a query can be skipped, thus leading
to better performance.

VHS To DVD 7.0: Honestech
No ratings yet
VHS To DVD 7.0: Honestech
74 pages
Magic Cauldron Manual
No ratings yet
Magic Cauldron Manual
24 pages
Apache HIVE
100% (1)
Apache HIVE
105 pages
Hive
No ratings yet
Hive
12 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
No ratings yet
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
10 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
big data BASICS
No ratings yet
big data BASICS
3 pages
hive
No ratings yet
hive
49 pages
Unit 4 - Data Science - Www.rgpvnotes.in
No ratings yet
Unit 4 - Data Science - Www.rgpvnotes.in
18 pages
7.Hive
No ratings yet
7.Hive
30 pages
HADOOP
No ratings yet
HADOOP
40 pages
Assignment BDHhhh
No ratings yet
Assignment BDHhhh
15 pages
Ibm Hadoop
No ratings yet
Ibm Hadoop
4 pages
Unit 3-1
No ratings yet
Unit 3-1
41 pages
BD Unit 6
No ratings yet
BD Unit 6
6 pages
Unit 3 & 4 big data
No ratings yet
Unit 3 & 4 big data
18 pages
Wa0001.
No ratings yet
Wa0001.
56 pages
Unit 3
No ratings yet
Unit 3
61 pages
S_Pig_Hive_HBase_Zookeeper
No ratings yet
S_Pig_Hive_HBase_Zookeeper
19 pages
Bda 06
No ratings yet
Bda 06
15 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
UNIT II HADOOP WITH HDFS
No ratings yet
UNIT II HADOOP WITH HDFS
22 pages
Hadoop Pig
No ratings yet
Hadoop Pig
27 pages
Big Data Hadoop Interview Questions and Answers
100% (1)
Big Data Hadoop Interview Questions and Answers
25 pages
DB Unit-4
No ratings yet
DB Unit-4
15 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
6
No ratings yet
6
2 pages
Big data UNIT 5 own
No ratings yet
Big data UNIT 5 own
18 pages
Final Doc Presentation Hive
No ratings yet
Final Doc Presentation Hive
20 pages
The Free Hive Book
No ratings yet
The Free Hive Book
1 page
Big Data Analysis IAT-1
No ratings yet
Big Data Analysis IAT-1
43 pages
BDA viva
No ratings yet
BDA viva
26 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Hive
No ratings yet
Hive
2 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
7 pages
Da ANSWERS
No ratings yet
Da ANSWERS
13 pages
Hive Main 2
No ratings yet
Hive Main 2
26 pages
Case Study Pig Hive Hbase (1).Pptx
No ratings yet
Case Study Pig Hive Hbase (1).Pptx
15 pages
BDA
No ratings yet
BDA
20 pages
Lecture Notes - Hive and Querying
No ratings yet
Lecture Notes - Hive and Querying
20 pages
Module 5_data analytics
No ratings yet
Module 5_data analytics
4 pages
Unit 6
No ratings yet
Unit 6
26 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
BDH Unit 3
No ratings yet
BDH Unit 3
16 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
15 pages
Big Data Tools and Its Framework
No ratings yet
Big Data Tools and Its Framework
5 pages
Big Data Huawei Course
No ratings yet
Big Data Huawei Course
23 pages
Hadoop
No ratings yet
Hadoop
14 pages
BigDataProcessingTools HaddopHDFSHiveSpark
No ratings yet
BigDataProcessingTools HaddopHDFSHiveSpark
2 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
Unit II Big Data
No ratings yet
Unit II Big Data
27 pages
Bda Unit 4 Material
No ratings yet
Bda Unit 4 Material
37 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
19 pages
Big Data and Hadoop Guide
No ratings yet
Big Data and Hadoop Guide
8 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Big Book of MLOps 2nd Edition
No ratings yet
Big Book of MLOps 2nd Edition
78 pages
MATLAB Short Notes
No ratings yet
MATLAB Short Notes
312 pages
Electrical Machines Lab Manual Student
No ratings yet
Electrical Machines Lab Manual Student
59 pages
Ws Guide Facebook Ads Benchmarks 2024
No ratings yet
Ws Guide Facebook Ads Benchmarks 2024
24 pages
Langpack Breton
No ratings yet
Langpack Breton
4 pages
Design of A Controller For The Plant: The Ball and Beam
No ratings yet
Design of A Controller For The Plant: The Ball and Beam
23 pages
Isometric
No ratings yet
Isometric
10 pages
65900011V1 VMFT Ships Manual Vol 1
No ratings yet
65900011V1 VMFT Ships Manual Vol 1
682 pages
Epas G9 M Q3 W7-W8
No ratings yet
Epas G9 M Q3 W7-W8
4 pages
Ansh Raj
No ratings yet
Ansh Raj
1 page
Research 1
No ratings yet
Research 1
9 pages
Online Marriage Registration System Inroduction
No ratings yet
Online Marriage Registration System Inroduction
3 pages
Lesson 11. Pasteurization of Milk
No ratings yet
Lesson 11. Pasteurization of Milk
11 pages
BCA MDU UNIT 4 Ecomm Notes
No ratings yet
BCA MDU UNIT 4 Ecomm Notes
19 pages
DBMS Imp Questions
100% (1)
DBMS Imp Questions
3 pages
Simple Explanation of Thesis Statement
100% (3)
Simple Explanation of Thesis Statement
7 pages
Arcgis Enterprise: An: Pamela Kersh, Solution Engineer
100% (1)
Arcgis Enterprise: An: Pamela Kersh, Solution Engineer
44 pages
QS331 Cone Crusher: Features & Benefits
No ratings yet
QS331 Cone Crusher: Features & Benefits
34 pages
An Intrinsic Approach To Scalar-Curvature Estimation For Point Clouds
No ratings yet
An Intrinsic Approach To Scalar-Curvature Estimation For Point Clouds
37 pages
Table of Specifications: Midi 3-Pole Magnetic Contactors Midi-Contactors
No ratings yet
Table of Specifications: Midi 3-Pole Magnetic Contactors Midi-Contactors
14 pages
Zero Carbon Platform Information
No ratings yet
Zero Carbon Platform Information
14 pages
Bentone BG450-2 MANUAL
No ratings yet
Bentone BG450-2 MANUAL
28 pages
Olivia Wilson: Itprojectmanager
No ratings yet
Olivia Wilson: Itprojectmanager
1 page
Discussion 1
No ratings yet
Discussion 1
1 page
Distributed Database System
No ratings yet
Distributed Database System
15 pages
Teacher Evaluation Sheet: Title of The Icro Project Course Outcomes Achieved
No ratings yet
Teacher Evaluation Sheet: Title of The Icro Project Course Outcomes Achieved
23 pages
SKF Cmxa75 User Manual-En
No ratings yet
SKF Cmxa75 User Manual-En
311 pages
Resume Templates in Word Format
100% (1)
Resume Templates in Word Format
4 pages
Quarterly Test - Q2 English 6
No ratings yet
Quarterly Test - Q2 English 6
6 pages
MindOrks Android Online Beginners Course-Syllabus
No ratings yet
MindOrks Android Online Beginners Course-Syllabus
3 pages
MSP Final PDF
No ratings yet
MSP Final PDF
151 pages
Tau Class Lab Report Template
No ratings yet
Tau Class Lab Report Template
3 pages

Interview Questions - Hive and Querying

Uploaded by

Interview Questions - Hive and Querying

Uploaded by

Interview Questions - Hive and Querying

1. Explain ORDER BY, CLUSTER BY, SORT BY and DISTRIBUTE BY in brief.

2. Explain the difference between External and Internal tables.

3. In which scenarios do you use external tables?

4. Is Hive a database or a data warehouse? What are the key differences

© Copyright 2020. upGrad Education Pvt. Ltd. All rights reserved

6. What are the instances where you can use Indexing?

7. Differentiate between Hive and HBase.

Hive is a ‘data warehouse software’ HBase is a distributed data store built

Hive abstracts the programming HBase does not have a native

8. Can Hive be used as an OLTP system like MySQL?

9. What are the limitations of Hive?

© Copyright 2020. upGrad Education Pvt. Ltd. All rights reserved

© Copyright 2020. upGrad Education Pvt. Ltd. All rights reserved

You might also like