0% found this document useful (0 votes)

3 views3 pages

Window_Functions_Spark

Window functions in Apache Spark allow for calculations across a set of rows while retaining the original data. Key components include partitioning, ordering, and window specifications, enabling operations like ranking, cumulative sums, and lag/lead functions. These functions have real-world applications across various industries, including finance, retail, healthcare, and human resources.

Uploaded by

keshav9d9reddi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views3 pages

Window_Functions_Spark

Uploaded by

keshav9d9reddi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Window Functions in Apache Spark

Window functions operate on a set of rows and return a single value for each row. They are different from standard
aggregation functions because they can retain the original row and still produce aggregated values. They are useful
for running calculations across a specified range or window of data.

Key Components of Window Functions:

- Partition By: Defines how to split the data into partitions.
- Order By: Defines the order of rows within each partition.
- Window Specification: Defines the frame for the window function.

Creating a DataFrame:
Let's start by creating a DataFrame with a sample dataset spanning two years, 2019 and 2020.

```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, desc
from pyspark.sql.window import Window

# Initialize Spark session

spark = SparkSession.builder.master("local").appName("WindowFunctions").getOrCreate()

data = [
("2019", "Hamilton", 413),
("2019", "Bottas", 326),
("2019", "Verstappen", 278),
("2019", "Vettel", 240),
("2020", "Hamilton", 347),
("2020", "Bottas", 223),
("2020", "Verstappen", 214),
("2020", "Vettel", 33),
]

# Creating DataFrame
columns = ["RaceYear", "DriverName", "TotalPoints"]
df = spark.createDataFrame(data, columns)
df.show()
```

Applying Window Functions:

1. Ranking Drivers by Total Points

```python
from pyspark.sql.functions import rank

# Define window specification

windowSpec = Window.partitionBy("RaceYear").orderBy(desc("TotalPoints"))

# Apply rank function

df.withColumn("Rank", rank().over(windowSpec)).show()
```

2. Calculating Cumulative Sum

```python
from pyspark.sql.functions import sum

# Apply cumulative sum function

df.withColumn("CumulativePoints", sum("TotalPoints").over(windowSpec)).show()
```

3. Using Lag and Lead Functions

```python
from pyspark.sql.functions import lag, lead

# Apply lag function

df.withColumn("PreviousPoints", lag("TotalPoints", 1).over(windowSpec)).show()

# Apply lead function

df.withColumn("NextPoints", lead("TotalPoints", 1).over(windowSpec)).show()
```

4. Percent Rank Function

```python
from pyspark.sql.functions import percent_rank

# Apply percent rank function

df.withColumn("PercentRank", percent_rank().over(windowSpec)).show()
```

### Real-world Applications of Window Functions

1. Financial Industry - Calculating Moving Average

```python
from pyspark.sql.functions import avg

windowSpec = Window.partitionBy("StockSymbol").orderBy("Date").rowsBetween(-4, 0)
df.withColumn("MovingAvg", avg("StockPrice").over(windowSpec)).show()
```

2. Retail Industry - Ranking Products by Sales

```python
from pyspark.sql.functions import dense_rank

windowSpec = Window.partitionBy("Category").orderBy(desc("TotalSales"))
df.withColumn("Rank", dense_rank().over(windowSpec)).show()
```

3. Healthcare Industry - Running Total of Patients

```python
from pyspark.sql.functions import sum

windowSpec = Window.orderBy("AdmissionDate").rowsBetween(Window.unboundedPreceding, Window.currentRow)

df.withColumn("RunningTotalPatients", sum("Patients").over(windowSpec)).show()
```

4. Telecommunications Industry - Churn Prediction

```python
windowSpec = Window.partitionBy("CustomerID").orderBy(desc("CallDate")).rowsBetween(-4, 0)
df.withColumn("AvgCallDuration", avg("CallDuration").over(windowSpec)).show()
```

5. Human Resources - Employee Performance Analysis

```python
from pyspark.sql.functions import row_number

windowSpec = Window.partitionBy("Department").orderBy(desc("PerformanceScore"))
df.withColumn("Rank", row_number().over(windowSpec)).show()
```

6. Sales and Marketing - Calculating Sales Growth Rate

```python
from pyspark.sql.functions import lag, col

windowSpec = Window.partitionBy("ProductID").orderBy("SalesDate")

df = df.withColumn("PreviousSales", lag("SalesAmount").over(windowSpec))
df = df.withColumn("GrowthRate", (col("SalesAmount") - col("PreviousSales")) / col("PreviousSales"))

df.show()
```

### Conclusion
Window functions in Apache Spark provide powerful capabilities for complex data analysis. By partitioning and ordering
data, you can perform various calculations like ranking, cumulative sums, and more, which are crucial for many
data processing tasks.

Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
Pyspark Practice
No ratings yet
Pyspark Practice
42 pages
Clinico Basic Pharmacology
No ratings yet
Clinico Basic Pharmacology
248 pages
Window Functions in SQL and PySpark
No ratings yet
Window Functions in SQL and PySpark
5 pages
Pyspark SQL and DataFrames
No ratings yet
Pyspark SQL and DataFrames
6 pages
Pyspark - DataFrame Window Functions
No ratings yet
Pyspark - DataFrame Window Functions
3 pages
quewtion sql_pyspark
No ratings yet
quewtion sql_pyspark
4 pages
Window Function in Pyspark
100% (1)
Window Function in Pyspark
8 pages
Fundamental Pyspark Operations 1708364268
No ratings yet
Fundamental Pyspark Operations 1708364268
10 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
PySpark Transformations
No ratings yet
PySpark Transformations
18 pages
PySpark Interview Cheatsheet 1741068112
No ratings yet
PySpark Interview Cheatsheet 1741068112
19 pages
SQL_Window_Functions__1715134116
No ratings yet
SQL_Window_Functions__1715134116
9 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Windows Function
No ratings yet
Windows Function
2 pages
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
Window Functions
No ratings yet
Window Functions
30 pages
Spark Test Que
No ratings yet
Spark Test Que
3 pages
lec21
No ratings yet
lec21
16 pages
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
From Everand
MCTS 70-515 Exam: Web Applications Development with Microsoft .NET Framework 4 (Exam Prep)
Eddie Vi
4/5 (1)
SQL (Window Function)
No ratings yet
SQL (Window Function)
6 pages
IBM_PySpark_CheatSheet
No ratings yet
IBM_PySpark_CheatSheet
2 pages
The Book of JavaScript, 2nd Edition: A Practical Guide to Interactive Web Pages
From Everand
The Book of JavaScript, 2nd Edition: A Practical Guide to Interactive Web Pages
Thau
4.5/5 (3)
Spark SQLpdf 20 jan
No ratings yet
Spark SQLpdf 20 jan
4 pages
Pyspark Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
No ratings yet
Pyspark Scenario-Based Interview Questions & Answers: Nitya Cloudtech PVT LTD
12 pages
Windows Function
No ratings yet
Windows Function
27 pages
MySQL_DAY_6
No ratings yet
MySQL_DAY_6
9 pages
Window Functions
No ratings yet
Window Functions
10 pages
SQL Window Functions
No ratings yet
SQL Window Functions
18 pages
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
SQL Window Functions
No ratings yet
SQL Window Functions
19 pages
S03-Window Functions Within SQLite
No ratings yet
S03-Window Functions Within SQLite
15 pages
SQL Window Functions
No ratings yet
SQL Window Functions
19 pages
Pyqt6 101: A Beginner’s Guide to PyQt6
From Everand
Pyqt6 101: A Beginner’s Guide to PyQt6
Edward Chang
No ratings yet
Angular Portfolio App Development: Create your personal brand
From Everand
Angular Portfolio App Development: Create your personal brand
Abdelfattah Ragab
No ratings yet
SQL Window Functions Cheat Sheet
No ratings yet
SQL Window Functions Cheat Sheet
10 pages
Windowing Functions in Databricks 1736450539
No ratings yet
Windowing Functions in Databricks 1736450539
23 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
PysparkSqlFinalDocument
No ratings yet
PysparkSqlFinalDocument
31 pages
Spark Walmart Data Analysis Project
No ratings yet
Spark Walmart Data Analysis Project
17 pages
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
Pyspark Intro
No ratings yet
Pyspark Intro
3 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
Interview Prep
No ratings yet
Interview Prep
24 pages
Windows Function SQL
No ratings yet
Windows Function SQL
5 pages
Pyspark coding questions from StrataScratch platform
No ratings yet
Pyspark coding questions from StrataScratch platform
23 pages
The Art of WebAssembly: Build Secure, Portable, High-Performance Applications
From Everand
The Art of WebAssembly: Build Secure, Portable, High-Performance Applications
Rick Battagline
No ratings yet
SQL Final Document
No ratings yet
SQL Final Document
37 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
Pyspark Cheatsheet
No ratings yet
Pyspark Cheatsheet
21 pages
SQL Cheat Sheet Python
No ratings yet
SQL Cheat Sheet Python
1 page
Window Functions in SQL
No ratings yet
Window Functions in SQL
10 pages
spark QA
No ratings yet
spark QA
34 pages
Python Data Exploratory Commands
No ratings yet
Python Data Exploratory Commands
9 pages
Spark Essentials
No ratings yet
Spark Essentials
15 pages
Fresher PyQt5: A Beginner’s Guide to PyQt5
From Everand
Fresher PyQt5: A Beginner’s Guide to PyQt5
Edward Chang
No ratings yet
EDA Python for Data Analsis
No ratings yet
EDA Python for Data Analsis
10 pages
V2SqlFinalDocument (2)
No ratings yet
V2SqlFinalDocument (2)
35 pages
1 1 1 1 1 1 1 1 1 Design Risk Report Mamaila Final 24 Jan 22
No ratings yet
1 1 1 1 1 1 1 1 1 Design Risk Report Mamaila Final 24 Jan 22
13 pages
Effects of Temperature On Solubility
No ratings yet
Effects of Temperature On Solubility
5 pages
OpenClinica Training Manual
No ratings yet
OpenClinica Training Manual
15 pages
Additives For A Challenging Cement Market PDF
No ratings yet
Additives For A Challenging Cement Market PDF
8 pages
ACS800 Sine Filters Revg PDF
No ratings yet
ACS800 Sine Filters Revg PDF
72 pages
Final Report
100% (1)
Final Report
20 pages
Castlegar/Slocan Valley Pennywise May 18, 2018
No ratings yet
Castlegar/Slocan Valley Pennywise May 18, 2018
40 pages
Air Pollution Examples 2
No ratings yet
Air Pollution Examples 2
6 pages
Precise Deodorization Design For Four Different Kinds of Vegetable Oils in Industrial Trial Production
No ratings yet
Precise Deodorization Design For Four Different Kinds of Vegetable Oils in Industrial Trial Production
9 pages
Inglés B1: Comprensión de Textos Escritos
No ratings yet
Inglés B1: Comprensión de Textos Escritos
6 pages
TC806E1012M01 II
No ratings yet
TC806E1012M01 II
4 pages
Gravity Flow, Submergence
0% (1)
Gravity Flow, Submergence
7 pages
Hot Work Permit
No ratings yet
Hot Work Permit
17 pages
Regulatory Policies CB and NBFC MOB
No ratings yet
Regulatory Policies CB and NBFC MOB
11 pages
Download Complete What Happened to You Conversations on Trauma Resilience and Healing 1st Edition Bruce D Perry Oprah Winfrey PDF for All Chapters
100% (1)
Download Complete What Happened to You Conversations on Trauma Resilience and Healing 1st Edition Bruce D Perry Oprah Winfrey PDF for All Chapters
40 pages
Cartilage
No ratings yet
Cartilage
31 pages
English Conversational Group
No ratings yet
English Conversational Group
223 pages
The Recreator 3D Operations Manual
No ratings yet
The Recreator 3D Operations Manual
15 pages
Nansana Daily Market Action Research Project 2024 Report
No ratings yet
Nansana Daily Market Action Research Project 2024 Report
101 pages
Isaps NL Int 9 1a PDF
No ratings yet
Isaps NL Int 9 1a PDF
27 pages
Positive Emotional States & Processes (1) - 1
No ratings yet
Positive Emotional States & Processes (1) - 1
45 pages
Outsourcing Agents Muthoot Personal Loan
No ratings yet
Outsourcing Agents Muthoot Personal Loan
2 pages
PROJECT
No ratings yet
PROJECT
51 pages
Thermistor Temperature Sensor Alarm Circuit: Food Living Outside Play Technology Workshop
No ratings yet
Thermistor Temperature Sensor Alarm Circuit: Food Living Outside Play Technology Workshop
2 pages
0620_w18_qp_43
No ratings yet
0620_w18_qp_43
12 pages
APP Suggested Apprentice Body Piercer Guidelines and Curriculum 2020 Edition Version 3
No ratings yet
APP Suggested Apprentice Body Piercer Guidelines and Curriculum 2020 Edition Version 3
18 pages
Nso 9
No ratings yet
Nso 9
4 pages
Fertilizer Market
No ratings yet
Fertilizer Market
16 pages
9th Science EM Half Yearly Exam 2023 Question Paper Virudhunagar District English Medium PDF Download
No ratings yet
9th Science EM Half Yearly Exam 2023 Question Paper Virudhunagar District English Medium PDF Download
2 pages

Window_Functions_Spark

Uploaded by

Window_Functions_Spark

Uploaded by

Window Functions in Apache Spark

Key Components of Window Functions:

# Initialize Spark session

Applying Window Functions:

1. Ranking Drivers by Total Points

# Define window specification

# Apply rank function

2. Calculating Cumulative Sum

# Apply cumulative sum function

3. Using Lag and Lead Functions

# Apply lag function

# Apply lead function

4. Percent Rank Function

# Apply percent rank function

### Real-world Applications of Window Functions

1. **Financial Industry** - Calculating Moving Average

2. **Retail Industry** - Ranking Products by Sales

3. **Healthcare Industry** - Running Total of Patients

windowSpec = Window.orderBy("AdmissionDate").rowsBetween(Window.unboundedPreceding, Window.currentRow)

4. **Telecommunications Industry** - Churn Prediction

5. **Human Resources** - Employee Performance Analysis

6. **Sales and Marketing** - Calculating Sales Growth Rate

You might also like

1. Financial Industry - Calculating Moving Average

2. Retail Industry - Ranking Products by Sales

3. Healthcare Industry - Running Total of Patients

4. Telecommunications Industry - Churn Prediction

5. Human Resources - Employee Performance Analysis

6. Sales and Marketing - Calculating Sales Growth Rate