0% found this document useful (0 votes)
14 views9 pages

Pyspark practice Day 12 for Spark

Pyspark practice day 12 for Spark

Uploaded by

sameergoswami86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views9 pages

Pyspark practice Day 12 for Spark

Pyspark practice day 12 for Spark

Uploaded by

sameergoswami86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

PySpark Coding

Practice
Day 12

𝐒𝐞𝐞𝐤𝐡𝐨 𝐁𝐢𝐠𝐝𝐚𝐭𝐚 𝐈𝐧𝐬𝐭𝐢𝐭𝐮𝐭𝐞


Kal ki Soch, Aaj ki Shiksha
Data is the New Oil

JOIN NOW

https://www.seekhobigdata.com/
9988454737
PySpark

Question :
how you can create a new column derived from
existing columns in a PySpark DataFrame in
different ways.

Sample Data :

https://www.seekhobigdata.com/
PySpark

Coding Question 1: Creating a New Column Using


Arithmetic Operations
Create a new column Bonus which is 10% of the
Salary.

Output:

https://www.seekhobigdata.com/
PySpark

Coding Question 2: Creating a New Column Using


Conditional Statements
Create a new column Category that categorizes people
based on their age:
If age is less than 30, the category is Young.
If age is between 30 and 40, the category is Mid-age.
If age is greater than 40, the category is Senior.

Output:

https://www.seekhobigdata.com/
PySpark

Coding Question 3: Creating a New Column by


Combining Two Columns
Create a new column Full Info that combines Name
and Age into a single string.

Output:

https://www.seekhobigdata.com/
PySpark

Coding Question 4: Creating a New Column Using


a UDF (User-Defined Function)
Define a UDF to classify the salary range:
Low if salary is less than 4000.
Medium if salary is between 4000 and 6000.
High if salary is greater than 6000.

https://www.seekhobigdata.com/
PySpark

Output :

https://www.seekhobigdata.com/
PySpark

Coding Question 5: Creating a New Column Using


SQL Expressions
Use Spark SQL to create a new column Net Salary where Net
Salary = Salary + Bonus.

Output:

https://www.seekhobigdata.com/
https://www.seekhobigdata.com/

You might also like