0% found this document useful (0 votes)

13 views

Big Data Course Agenda

Hadoop is an open source framework for distributed processing and storage of big data across clusters of servers. It manages structured and unstructured data and supports analytics like predictive modeling. Hadoop uses HDFS for rapid data access and fault tolerance. Sqoop transfers data between Hadoop and relational databases. Hive provides SQL queries on Hadoop data. Spark is a distributed processing system for fast queries on large data using in-memory caching. Spark SQL unifies SQL queries and complex analytics. Scala is a functional language that runs on the JVM and is used for Spark development.

Uploaded by

Mageshwaran Priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Big Data Course Agenda

Uploaded by

Mageshwaran Priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

BIG DATA

hadoop

Hadoop is an open source distributed processing framework that manages data

processing and storage for big data applications in scalable clusters of computer
servers. It's at the center of an ecosystem of big data technologies that are primarily
used to support advanced analytics initiatives, including predictive analytics, data
mining and machine learning. Hadoop systems can handle various forms of
structured and unstructured data, giving users more flexibility for collecting,
processing, analyzing and managing data than relational databases and data
warehouses provide.

Hadoop & Bigdata

Hadoop runs on commodity servers and can scale up to support thousands of

hardware nodes. The Hadoop Distributed File System (HDFS) is designed to provide
rapid data access across the nodes in a cluster, plus fault-tolerant capabilities so
applications can continue to run if individual nodes fail. Those features helped
Hadoop become a foundational data management platform for big data
analytics uses after it emerged in the mid-2000s
Sqoop..

Enterprises that use Hadoop are finding it necessary to transfer some of

their data from traditional relational database management systems
(RDBMSs) to the Hadoop ecosystem.

Sqoop, an integral part of Hadoop, can perform this transfer in an automated

fashion. Moreover, the data imported into Hadoop can be transformed with
MapReduce before exporting them back to the RDBMS. Sqoop can also
generate Java classes for programmatically interacting with imported data.

Sqoop uses a connector-based architecture that allows it to use plugins for

connecting with external databases.

hive..

Apache Hive is an open source data warehouse software for reading, writing
and managing large data set files that are stored directly in either the Apache
Hadoop Distributed File System (HDFS) or other data storage systems such
as Apache HBase. Hive enables SQL developers to write Hive Query
Language (HQL) statements that are similar to standard SQL statements for
data query and analysis. It is designed to make MapReduce programming
easier because you don’t have to know and write lengthy Java code. Instead,
you can write queries more simply in HQL, and Hive can then create the map
and reduce the functions.
Apache Spark..

Apache Spark is an open-source, distributed processing system used for big

data workloads. It utilizes in-memory caching, and optimized query
execution for fast analytic queries against data of any size. It provides
development APIs in Java, Scala, Python and R, and supports code reuse
across multiple workloads—batch processing, interactive queries, real-time
analytics, machine learning, and graph processing. You’ll find it used by
organizations from any industry, including at FINRA, Yelp, Zillow, DataXu,
Urban Institute, and CrowdStrike. Apache Spark has become one of the most
popular big data distributed processing framework with 365,000 meetup
members in 2017

Spark sql / scala..

Spark SQL brings native support for SQL to Spark and streamlines the process of
querying data stored both in RDDs (Spark’s distributed datasets) and in external
sources. Spark SQL conveniently blurs the lines between RDDs and relational
tables. Unifying these powerful abstractions makes it easy for developers to
intermix SQL commands querying external data with complex analytics, all within
in a single application. Concretely, Spark SQL will allow developers to:

 Import relational data from Parquet files and Hive tables

 Run SQL queries over imported data and existing RDDs
 Easily write RDDs out to Hive tables or Parquet files

Scala is a hybrid functional and object-oriented programming language which

runs on JVM (Java Virtual Machine). The name is an acronym for Scalable
Language. It is designed for concurrency, expressiveness, and scalability

BDA - Unit 4
No ratings yet
BDA - Unit 4
18 pages
Open Source Technologies
No ratings yet
Open Source Technologies
19 pages
Module 2.pptx
No ratings yet
Module 2.pptx
20 pages
BDTools
No ratings yet
BDTools
15 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
Hadoop Vs Apache Spark
No ratings yet
Hadoop Vs Apache Spark
6 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
21 pages
big data BASICS
No ratings yet
big data BASICS
3 pages
Big Data Analytics Presentation
No ratings yet
Big Data Analytics Presentation
30 pages
15 Big Data Tools and Technologies To Know About in 2021
No ratings yet
15 Big Data Tools and Technologies To Know About in 2021
7 pages
Big Data Emerging Technologie
No ratings yet
Big Data Emerging Technologie
10 pages
226 Unit-7
No ratings yet
226 Unit-7
26 pages
Module 1 Glossary What Is Big Data
No ratings yet
Module 1 Glossary What Is Big Data
2 pages
BIG DATA ANALYTICS USING HADOOP TOOLS – APACHE HIVE VS APACHE PIG_1604726800
No ratings yet
BIG DATA ANALYTICS USING HADOOP TOOLS – APACHE HIVE VS APACHE PIG_1604726800
5 pages
unit5
No ratings yet
unit5
4 pages
Module 2.2
No ratings yet
Module 2.2
32 pages
BigData Nov2019
No ratings yet
BigData Nov2019
50 pages
Unit 4
No ratings yet
Unit 4
60 pages
Introduction To Big Data Technologies
No ratings yet
Introduction To Big Data Technologies
10 pages
Fast and Interactive Analytics Over Hadoop Data With Spark
No ratings yet
Fast and Interactive Analytics Over Hadoop Data With Spark
7 pages
Big Data Processing With Apache Spark - Infoqdotcom
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
16 pages
Big Data and Hadoop Guide
No ratings yet
Big Data and Hadoop Guide
8 pages
Lecture 3 PPT 22
No ratings yet
Lecture 3 PPT 22
25 pages
06 Big Data
No ratings yet
06 Big Data
52 pages
Introduction To Spark
No ratings yet
Introduction To Spark
84 pages
DA U2
No ratings yet
DA U2
17 pages
Ibm Hadoop
No ratings yet
Ibm Hadoop
4 pages
Hadoop
No ratings yet
Hadoop
14 pages
Big Data Links
No ratings yet
Big Data Links
7 pages
Big Data Technology
No ratings yet
Big Data Technology
9 pages
BigDataProcessingTools HaddopHDFSHiveSpark
No ratings yet
BigDataProcessingTools HaddopHDFSHiveSpark
2 pages
Big Data technologies UNIT 1
No ratings yet
Big Data technologies UNIT 1
5 pages
Chap3_OverviewOfBigDataEcosystem
No ratings yet
Chap3_OverviewOfBigDataEcosystem
91 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Compare Hadoop vs. Spark vs. Kafka For Your Big Data Strategy
No ratings yet
Compare Hadoop vs. Spark vs. Kafka For Your Big Data Strategy
10 pages
Big Data Analytics 0th Lecture
No ratings yet
Big Data Analytics 0th Lecture
19 pages
Big Data Technology Stack
100% (1)
Big Data Technology Stack
12 pages
Lec no 10
No ratings yet
Lec no 10
17 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
16 pages
A Study of Big Data Analytics Using Apache Spark With Python and Scala
No ratings yet
A Study of Big Data Analytics Using Apache Spark With Python and Scala
8 pages
Big Data Platforms
No ratings yet
Big Data Platforms
4 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
Top 4 Open Source Tools You Can Use To Handle Big Data
No ratings yet
Top 4 Open Source Tools You Can Use To Handle Big Data
64 pages
Unit 6-1
No ratings yet
Unit 6-1
128 pages
hadoop ecosystem-converted
No ratings yet
hadoop ecosystem-converted
5 pages
A Critical Analysis of Apache Hadoop and Spark For Big Data Processing
No ratings yet
A Critical Analysis of Apache Hadoop and Spark For Big Data Processing
6 pages
Activity: NAME: Chogle Saif Ali ROLLNO.: 12CO27 Class: Be-Co Summary: Components of Hadoop Ecosystem
No ratings yet
Activity: NAME: Chogle Saif Ali ROLLNO.: 12CO27 Class: Be-Co Summary: Components of Hadoop Ecosystem
5 pages
MA_VaishuAchini_VIT_24 - ICT703 - A3
No ratings yet
MA_VaishuAchini_VIT_24 - ICT703 - A3
21 pages
Poetic Seminar
No ratings yet
Poetic Seminar
17 pages
Introduction To Big Dat1
No ratings yet
Introduction To Big Dat1
6 pages
unit 2
No ratings yet
unit 2
9 pages
Big data Handling Techniques
No ratings yet
Big data Handling Techniques
21 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Big Data and its applications Presentation
No ratings yet
Big Data and its applications Presentation
7 pages
PPT 2.1.1.
No ratings yet
PPT 2.1.1.
24 pages
Hive
No ratings yet
Hive
8 pages
Hive Documet
No ratings yet
Hive Documet
33 pages
Big Data Introduction
No ratings yet
Big Data Introduction
7 pages
Big Data Introduction
No ratings yet
Big Data Introduction
8 pages

Big Data Course Agenda

Uploaded by

Big Data Course Agenda

Uploaded by

BIG DATA

Hadoop is an open source distributed processing framework that manages data

Hadoop & Bigdata

Hadoop runs on commodity servers and can scale up to support thousands of

Enterprises that use Hadoop are finding it necessary to transfer some of

Sqoop, an integral part of Hadoop, can perform this transfer in an automated

Sqoop uses a connector-based architecture that allows it to use plugins for

Apache Spark is an open-source, distributed processing system used for big

Spark sql / scala..

 Import relational data from Parquet files and Hive tables

Scala is a hybrid functional and object-oriented programming language which

You might also like