Big Data Course Agenda
Big Data Course Agenda
hadoop
hive..
Apache Hive is an open source data warehouse software for reading, writing
and managing large data set files that are stored directly in either the Apache
Hadoop Distributed File System (HDFS) or other data storage systems such
as Apache HBase. Hive enables SQL developers to write Hive Query
Language (HQL) statements that are similar to standard SQL statements for
data query and analysis. It is designed to make MapReduce programming
easier because you don’t have to know and write lengthy Java code. Instead,
you can write queries more simply in HQL, and Hive can then create the map
and reduce the functions.
Apache Spark..
Spark SQL brings native support for SQL to Spark and streamlines the process of
querying data stored both in RDDs (Spark’s distributed datasets) and in external
sources. Spark SQL conveniently blurs the lines between RDDs and relational
tables. Unifying these powerful abstractions makes it easy for developers to
intermix SQL commands querying external data with complex analytics, all within
in a single application. Concretely, Spark SQL will allow developers to: