Loading and Saving Data

The document discusses loading and saving data from JSON and CSV files using PySpark. It shows how to read CSV and JSON files into DataFrames, manipulate the DataFrames by selecting columns, filtering rows, adding new columns, and sorting data. It also demonstrates how to write the modified DataFrames back to CSV and JSON files.

Uploaded by

durgapriyachikkala05

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Loading and Saving Data

Uploaded by

durgapriyachikkala05

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Loading and saving data

from json and csv files

Reading from csv file into a dataframe
from pyspark.sql import SparkSession
import findspark
findspark.init()

# Create a SparkSession
spark = SparkSession.builder.appName("CSVToDataFrame").getOrCreate()

# Specify the path to the CSV file

csv_file_path = "bollywood.csv"

# Read the CSV file into a DataFrame

df = spark.read.csv(csv_file_path, header=True, inferSchema=True)

# Show the first few rows of the DataFrame

df.show()

# Stop the SparkSession

spark.stop()
Reading, manipulating and writing
modified dataframe to another csv
from pyspark.sql import SparkSession # Sorting data
import findspark sorted_data = df.orderBy("Budget", ascending=False)
print("Sorted Data:")
findspark.init() sorted_data.show()
spark = SparkSession.builder.appName("CSVDataManipulation").getOrCreate()
csv_file_path = "bollywood.csv" # Adding a new column
# Read the CSV file into a DataFrame df_with_new_column = df.withColumn("BudgetPlusTen",
df = spark.read.csv(csv_file_path, header=True, inferSchema=True) df.Budget + 10)
print("DataFrame with a New Column :")
# Show the first few rows of the DataFrame df_with_new_column.show()
print("Initial DataFrame:")
df.show() # Save the final DataFrame into another CSV file
# Selecting specific columns output_csv_file_path = "new1.csv"
selected_columns = df.select("Release Date", "MovieName")
df_with_new_column.toPandas().to_csv(output_csv_file_path,
print("Selected Columns:") header=True, index=False)
selected_columns.show()
# Stop the SparkSession
# Filtering data based on a condition spark.stop()
filtered_data = df.filter(df.Budget >10)
print("Filtered Data:")
filtered_data.show()
JSON read()
from pyspark.sql import SparkSession
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("JSONToDataFrame").getOrCreate()
# Specify the path to the JSON file
json_file_path = "inp.json"
# Read the JSON file into a DataFrame
df = spark.read.json(json_file_path)
# Show the DataFrame
df.show()
# Stop the SparkSession
spark.stop()
Json read, modify and write
from pyspark.sql import SparkSession
import findspark
findspark.init()
import json
spark = SparkSession.builder.appName("JSONToDataFrame").getOrCreate()
json_file_path = "inp.json"
df = spark.read.json(json_file_path)
df.show()
df_with_new_column = df.withColumn("age modified", df.age + 10)
print("DataFrame with a New Column :")
df_with_new_column.show()
# Convert the DataFrame to JSON format as a string
json_data = df_with_new_column.toJSON().collect()
# Write the JSON data as a single JSON object to a file
with open("new22.json", "w") as json_file:
json.dump(json_data, json_file, indent=4)
spark.stop()

ETL Processes Using PySpark
67% (3)
ETL Processes Using PySpark
7 pages
Enhancing English Speaking Skills of ALS Learners
No ratings yet
Enhancing English Speaking Skills of ALS Learners
8 pages
Learning Pandas PDF
No ratings yet
Learning Pandas PDF
171 pages
bLScCdW1geivYxBAmcEE3u (1)(1)
No ratings yet
bLScCdW1geivYxBAmcEE3u (1)(1)
166 pages
2. Reading and Writing Files
No ratings yet
2. Reading and Writing Files
4 pages
50_PySpark_interview_questions__1732556477
No ratings yet
50_PySpark_interview_questions__1732556477
7 pages
14oct Pandas 2024
No ratings yet
14oct Pandas 2024
13 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
Filg 8
No ratings yet
Filg 8
631 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
Fundamental Pyspark Operations 1708364268
No ratings yet
Fundamental Pyspark Operations 1708364268
10 pages
1737249906013
No ratings yet
1737249906013
106 pages
Pyspark Dataframe Questions
No ratings yet
Pyspark Dataframe Questions
1 page
prac1
No ratings yet
prac1
5 pages
master_pyspark_zero_to_hero_1738689679
No ratings yet
master_pyspark_zero_to_hero_1738689679
102 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
All
No ratings yet
All
4 pages
Working with csv file in Databricks
No ratings yet
Working with csv file in Databricks
4 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
pyspark
No ratings yet
pyspark
6 pages
How To Perform Common Excel Commands in Python: Reading The Data
No ratings yet
How To Perform Common Excel Commands in Python: Reading The Data
3 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
Introducing Letters
No ratings yet
Introducing Letters
33 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
12 pages
Master PySpark 1-18
No ratings yet
Master PySpark 1-18
59 pages
EDA Python for Data Analsis
No ratings yet
EDA Python for Data Analsis
10 pages
7. MOVIE TICKET BOOKING
No ratings yet
7. MOVIE TICKET BOOKING
30 pages
PySpark
No ratings yet
PySpark
177 pages
Chapter 4 - Import-Export Data
No ratings yet
Chapter 4 - Import-Export Data
30 pages
ainotes
No ratings yet
ainotes
5 pages
Data Gathering
No ratings yet
Data Gathering
7 pages
Pyspark Theory Questions
No ratings yet
Pyspark Theory Questions
5 pages
DMV Lab 7
No ratings yet
DMV Lab 7
9 pages
Kunj Project 2
No ratings yet
Kunj Project 2
31 pages
Spark Lab
No ratings yet
Spark Lab
6 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
21 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
Oxy Metre
No ratings yet
Oxy Metre
17 pages
Data Visulization Chapter 2
No ratings yet
Data Visulization Chapter 2
24 pages
Notes of Azure Data Bricks
No ratings yet
Notes of Azure Data Bricks
16 pages
Learn Python Pandas For Data Science Quick TutorialExamples For All Primary Operations of DataFrames
No ratings yet
Learn Python Pandas For Data Science Quick TutorialExamples For All Primary Operations of DataFrames
37 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
prac1
No ratings yet
prac1
5 pages
DS1
No ratings yet
DS1
10 pages
Extract Transform Load
No ratings yet
Extract Transform Load
80 pages
2324 BigData Lab3
No ratings yet
2324 BigData Lab3
6 pages
dataframing_in_csv
No ratings yet
dataframing_in_csv
14 pages
PySpark Interview Cheatsheet 1741068112
No ratings yet
PySpark Interview Cheatsheet 1741068112
19 pages
Py_1731703428
No ratings yet
Py_1731703428
8 pages
2. DataBricks - Reading and Writing files (1)
No ratings yet
2. DataBricks - Reading and Writing files (1)
5 pages
Project
No ratings yet
Project
86 pages
notes on CSV Filespdf
No ratings yet
notes on CSV Filespdf
11 pages
Fds Unit - III
No ratings yet
Fds Unit - III
58 pages
CSV File Handling
No ratings yet
CSV File Handling
2 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Python Unit 5
No ratings yet
Python Unit 5
21 pages
Imp Questions 1-1
No ratings yet
Imp Questions 1-1
2 pages
hotel_management
No ratings yet
hotel_management
4 pages
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
React Portfolio App Development: Increase your online presence and create your personal brand
From Everand
React Portfolio App Development: Increase your online presence and create your personal brand
Abdelfattah Ragab
No ratings yet
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
From Everand
Angular Generative AI: Building an intelligent CV enhancer with Google Gemini
Abdelfattah Ragab
No ratings yet
Statistics For MGMT
No ratings yet
Statistics For MGMT
140 pages
YEAR 11 DPR NOTE
No ratings yet
YEAR 11 DPR NOTE
17 pages
File Del
No ratings yet
File Del
288 pages
QP Class Xii CS PB 2022
No ratings yet
QP Class Xii CS PB 2022
8 pages
Business Intelligence Foundation (V092022) en
No ratings yet
Business Intelligence Foundation (V092022) en
6 pages
Secondary Memory
No ratings yet
Secondary Memory
8 pages
Research Design and Methodology
No ratings yet
Research Design and Methodology
4 pages
2 Data Preperation
No ratings yet
2 Data Preperation
21 pages
DBSM 2
No ratings yet
DBSM 2
138 pages
How To Open Hub Logical File Name
No ratings yet
How To Open Hub Logical File Name
32 pages
Appointments for the case study
No ratings yet
Appointments for the case study
7 pages
S Pig Hive HBase Zookeeper 07
No ratings yet
S Pig Hive HBase Zookeeper 07
21 pages
Research: Method Results Discussion Conclusion
No ratings yet
Research: Method Results Discussion Conclusion
10 pages
Query Processing System: BREB-2016)
No ratings yet
Query Processing System: BREB-2016)
11 pages
BCom (Comp) Sem-I To VI Syllabus - 241107 - 210947
No ratings yet
BCom (Comp) Sem-I To VI Syllabus - 241107 - 210947
51 pages
JDBC - Odbc Connectivity
No ratings yet
JDBC - Odbc Connectivity
19 pages
MA (Economics) II Sem Statistical Inferences and Research Methods
No ratings yet
MA (Economics) II Sem Statistical Inferences and Research Methods
21 pages
MB App Privacy Policy - Marrybrown App
No ratings yet
MB App Privacy Policy - Marrybrown App
8 pages
Comp Sci Extended Essay Topics and Advice
No ratings yet
Comp Sci Extended Essay Topics and Advice
29 pages
Practical Research 2
No ratings yet
Practical Research 2
35 pages
Business Statistics
No ratings yet
Business Statistics
9 pages
Arc Hydro
No ratings yet
Arc Hydro
16 pages
The Academic Performance of Grade
No ratings yet
The Academic Performance of Grade
11 pages
90A-Computer Generated Documents (Updated)
No ratings yet
90A-Computer Generated Documents (Updated)
28 pages
DJ Music Library Blueprint Practice PDF
No ratings yet
DJ Music Library Blueprint Practice PDF
1 page
MS9001D Research Methodology Syllabus
No ratings yet
MS9001D Research Methodology Syllabus
2 pages
SHS PR2 Module 1 Applied Revised Sabio Roldan Edited
No ratings yet
SHS PR2 Module 1 Applied Revised Sabio Roldan Edited
8 pages
Chapter 8: Exceptions and I/O Streams
No ratings yet
Chapter 8: Exceptions and I/O Streams
40 pages
Mayuri Mehta (Editor), Kalpdrum Passi (Editor), Indranath Chatterjee (Editor), Rajan Patel (Editor) - Knowledge Modelling and Big Data Analytics in Healthcare - Advances and Applications-CRC Press
No ratings yet
Mayuri Mehta (Editor), Kalpdrum Passi (Editor), Indranath Chatterjee (Editor), Rajan Patel (Editor) - Knowledge Modelling and Big Data Analytics in Healthcare - Advances and Applications-CRC Press
363 pages

Loading and Saving Data

Uploaded by

Loading and Saving Data

Uploaded by

Loading and saving data

from json and csv files

# Specify the path to the CSV file

# Read the CSV file into a DataFrame

# Show the first few rows of the DataFrame

# Stop the SparkSession

You might also like