0% found this document useful (0 votes)
18 views

B07_Apache spark in big data analytics tools.Apache spark is very usefull tool in big data analytics

Bigdata analytics tools

Uploaded by

mudduanjali02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

B07_Apache spark in big data analytics tools.Apache spark is very usefull tool in big data analytics

Bigdata analytics tools

Uploaded by

mudduanjali02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

║JAI SRI GURUDEV║

Sri AdichunchanagiriShikshana Trust (R)

SJB INSTITUTE OF TECHNOLOGY


BGS Health & Education City, Kengeri, Bangalore – 60.

DEPARTMENT OF ELECTRONICS & COMMUNICATION


ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ASSIGNMENT – 02
APACHE SPARK
Presented by

Presented By: Under the guidance of:

Anjali N [1JB22CS400] Mrs. Vijayalakshmi B


Deepika C[1JB22CS403]
Deeya Darshini SD [1JB22CS404]
INTRODUCTION

Apache Spark is an open-source, distributed data processing


framework designed for big data analytics and machine learning.
It provides a fast, general-purpose engine for large-scale data
processing with capabilities for batch, real-time streaming,
machine learning, and graph processing.
KEY FEATURES

 Speed

 Ease of Use
 Fault Tolerance

 Scalability

 Rich Ecosystem
ARCHITECTURE

Apache Spark uses a master-slave architecture


consisting of the following components:
 Driver Program → Cluster Manager → Executors (Worker
Nodes)
• Driver Program: Defines tasks and sends them to executors.
• Cluster Manager: Allocates resources for tasks.
• Executors: Perform parallel computations on the worker nodes.
QUERIES IN APACHE SPARK
Using Spark SQL - Spark SQL allows you to run SQL-like queries on structured

data. You can load data into a temporary table or view and execute SQL queries.

Example
from pyspark.sql import SparkSession

# Initialize Spark Session


spark = SparkSession.builder.appName("SparkSQLExample").getOrCreate()

# Load data into a DataFrame


df = spark.read.csv("data.csv", header=True, inferSchema=True)
# Create a temporary view
df.createOrReplaceTempView("data_table")

# Run an SQL query


result = spark.sql("SELECT name, age FROM data_table WHERE age >
30")

# Show the results


result.show()
Use Cases

 Real-Time Data Processing

 Machine Learning

 Graph Analytics

 Data Science and Exploratory Data Analysis (EDA)

 Real-Time Recommendations

 Media and Entertainment


Conclusion

Apache Spark is a powerful and versatile big data processing


framework that has revolutionized the way data is processed and
analyzed. Its key features, such as in-memory computing,
scalability, fault tolerance, and real-time data processing, make it an
indispensable tool in the world of big data.

You might also like