0% found this document useful (0 votes)
64 views19 pages

EGU2020 13133 Presentation

Pytroll/Satpy is a collection of open-source Python modules for reading, processing, and writing Earth observation satellite data. Satpy provides high-level processing capabilities for both geostationary and low Earth orbit satellites. It supports many input and output formats and resampling data to different projections. To handle large datasets from new satellites, Satpy uses optimized tools like Dask for parallel and out-of-memory computations, improving performance. The Pytroll community develops these tools in an agile and collaborative manner.

Uploaded by

marwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views19 pages

EGU2020 13133 Presentation

Pytroll/Satpy is a collection of open-source Python modules for reading, processing, and writing Earth observation satellite data. Satpy provides high-level processing capabilities for both geostationary and low Earth orbit satellites. It supports many input and output formats and resampling data to different projections. To handle large datasets from new satellites, Satpy uses optimized tools like Dask for parallel and out-of-memory computations, improving performance. The Pytroll community develops these tools in an agile and collaborative manner.

Uploaded by

marwan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Distributed

EO satellite data processing


with Pytroll/Satpy
Salomon Eliasson, Martin Raspaud, Adam Dybbroe
SMHI

[email protected]
What is Pytroll/Satpy?
- Pytroll is a collection of free and open source python modules
- For reading, processing and writing EO satellite data
- Satpy is an easy to use front-end module, eg to generate imagery

[email protected] Sentinel 2A, MSI


Pytroll/Satpy in action
from glob import glob
from satpy.scene import Scene

# Load data by filenames


files = glob(“/data/himawari-8/*”)
scn = Scene(reader="ahi-hrit", filenames=files)

[email protected]
Pytroll/Satpy in action
# Automatically load composites and their dependencies
scn.load(["true_color"])

# Resample multi-band data to a uniform grid


rs_scn = scn.resample("japan")

# Save RGB geotiff


rs_scn.save_dataset(“true_color”)

Himawari 8, AHI
[email protected] Credits: Simon R. Proud
Pytroll/Satpy in action
# Load single channels
scn.load(["B10", 0.6])

# Show a channel
scn.show("B10")

# Channel arithmetics
array = scn["B10"] + scn[0.6]

Himawari-8, AHI

[email protected]
SatPy
- High level processing for satellite
data
- Both GEO and LEO
- Indexing of channels by name or
wavelength
- Many built-in composites
- Read many input formats
- Write many output formats
- Resample data to any PROJ.4
projection

Himawari-8, AHI, fire temperature product

[email protected]
The challenge

[email protected]
New Missions, Much more data
- GOES 16 and 17 ABI (3x, 4x, 5x)
- Himawari 8 and 9 AHI
- Sentinel 1, 2, 3
in the order of 10 000 x 10 000
pixels per segment
- EPS-SG
- MTG FCI

GOES-16, ABI
[email protected]
Data-size problem
- Too much data to fit in memory of regular
computers
- Too long processing times due to
single-threading

High Res!

Himawari 8, AHI
Credits: Simon R. Proud
[email protected]
Mitigation:
Optimized data processing

[email protected]
Efficient tools increase performance
- Python Scientific Stack:
Numpy, Scipy
- Resampling:
Pyresample & Pykdtree
- Tiepoint interpolation:
Python-geotiepoints
- Spectral-domain computations:
Pyspectral
- Orbital and space computations:
Pyorbital
Pyresample resampling performance
vs Scipy and libANN
[email protected]
Mitigation:
Out of memory computations

[email protected]
Using Dask for parallel computations
- Lazy processing
- Out-of-memory/Chunked processing
- Implements the numpy array interface

Example of chunked processing over time


[email protected]
Some performance results
● SatPy 0.8.4 - single core numpy
○ First execution crashed at 30m just before
saving to geotiff
○ Total Time: ~23m (disk cache)
○ Peak Memory Usage: ~103GB
○ Time spent on I/O: 6m14s

● SatPy 0.9.0a1 - 8 Worker Threads


(Dask):
○ Total Time: 5m38s
○ Peak Memory Usage: ~12GB
○ Time spent on I/O: 3m1s GOES-16, ABI
Credits: David J. Hoese
[email protected]
Mitigation:
Distributed processing

[email protected]
Dask distributed

- Client/Server architecture
- Works automatically on
regular dask code
- Works on clusters

[email protected]
Sentinel 2A, MSI
The Pytroll Philosophy

[email protected]
Pytroll
- FOSS
((L)GPL, Github)
- Agile development
(CI, Code reviews)
- Active community
(> 100 contributors,
Hackathons)

[email protected]
Source: OpenHub Sentinel 1B, SAR-C
www.pytroll.org
Pytroll@Slack

Pytroll@Gitub

[email protected]

PytrollOrg@Twitter

Thanks !
Sentinel 2B, MSI
[email protected]

You might also like