01_disaster_… (2) - JupyterLab
01_disaster_… (2) - JupyterLab
Header
Disaster
Imagery Risk Monitoring Using Satellite
01 - Disaster
Pre-ProcessingRisk Monitoring Systems and Data
In this notebook, you will learn the motivation behind disaster risk monitoring and
how to use hardware accelerated tools to process large image data.
Table of Contents
This notebook covers the below sections:
1. Disaster Risk Monitoring
Flood Detection
Satellite Imagery
Computer Vision
Deep Learning-Based Disaster Risk Monitoring Systems
2. Deep Learning Model Training Workflow
Deep Learning Challenges
3. Introducing the Dataset
Sentinel-1 Data Public Access
Data Annotation
Exploratory Data Analysis
Exercise #1 - Count Input Data
Exercise #2 - Explore Tiles
4. Data Pre-processing With DALI
DALI Pipeline
Data Augmentation
Exercise #3 - Data Augmentation on Batch
Random Rotation
Flood Detection
Flooding occurs when there is an overflow of water that submerges land that is
usually dry. They can occur under several conditions:
Overflow of water from water bodies, in which the water overtops or breaks
levees (natural or man-made), resulting in some of that water escaping its usual
boundaries
Accumulation of rainwater on saturated ground in an areal flood
When flow rate exceeds the capacity of the river channel
Unfortunately, flooding events are on the rise due to climate change and sea level
rise. Due to the increase in frequency and intensity, the topic of flood has garnered
international attention in the past few years. In fact, organizations such as the United
Nations has maintained effective response and proactive risk assessment for flood in
their Sustainable Development Goals. The research of flood events and their
evolution is an interdisciplinary study that requires data from a variety of sources
such as:
Live Earth observation data via satellites and surface reflectance
Precipitation, runoff, soil moisture, snow cover, and snow water equivalent
Topography and meteorology
The ability to detect flood and measure the extent of the disaster, can help decision
makers develop tactical responses and scientists study flood behavior over time.
Ultimately, we want to enable long-term mitigation strategies that are informed by
science to help us achieve sustainability.
Satellite Imagery
In this lab, we demonstrate the ability to create a flood detection segmentation model
using satellite imagery. Using satellites to study flood is advantageous since physical
access to flooded areas is limited and deploying instruments in potential flood zones
can be dangerous. Furthermore, satellite remote sensing is much more efficient than
manual or human-in-the-loop solutions.
There are thousands of man-made satellites currently active in space. Once
launched, a satellite is often placed in one of several orbits around Earth, depending
on what the satellite is designed to achieve. Some satellites, such as those discussed
in this lab, are used for Earth observation to help scientists learn about our planet
while others could be used for communication or navigation purposes.
No description has been provided for this image
Earth observation satellites have different capabilities that are suited for their unique
purposes. To obtain detailed and valuable information for flood monitoring, satellite
missions such as Copernicus Sentinel-1, provides C-band Synthetic Aperture Radar
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 2/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
(SAR) data. Satellites that use SAR, as opposed to optical satellites that use visible or
near-infrared bands, can operate day and night as well as under cloud cover. This
form of radar is used to create two-dimensional images or three-dimensional
reconstructions of objects, such as landscape. The two polar-orbiting Sentinel-1
satellites (Sentinel-1A and Sentinel-1B) maintain a repeat cycle of just 6 days in the
Lower Earth Orbit (LEO). Satellites that orbit close to Earth in the LEO enjoy the
benefits of faster orbital speed and data transfer. These features make the Sentinel-1
mission very useful for monitoring flood risk over time. Thus, a real-time AI-based
remote flood level estimation via Sentinel-1 data can prove game-changing.
No description has been provided for this image
More information about the Sentinel-1 mission can be found here.
Computer Vision
At the heart of this type of disaster risk monitoring system is one or more machine
learning models to generate insights from input data. These are generally deep
learning neural network models that have been trained for a specific task. There are
numerous approaches for drawing insight from images using machine learning such
as:
Classification is used for identifying the object contained in an image. It is the
task of labeling the given frame with one of the classes that the model has been
trained with.
Object detection, which includes image localization, can specify the location of
multiple objects in a frame.
Localization uses regression to return the coordinates of the potential
object within the frame.
Segmentation provides pixel level accuracy by creating a fine-grained
segmentation mask around the detected object. Applications for segmentation
include: an AI-powered green screen to blur or change the background of the
frame, autonomous driving where you want to segment the road and
background, or for manufacturing to identify microscopic level defects.
Semantic segmentation associates every pixel of an image with a class
label such as flood and not-flood . It treats multiple objects of the
same class as a single entity.
In contrast, instance segmentation treats multiple objects of the same
class as distinct individual instances.
No description has been provided for this image
For the purposes of detecting flood events, we will develop a semantic segmentation
model trained with labelled images that are generated from Sentinel-1 data.
Data Annotation
Acquiring the data necessary for deep learning is a costly process. To build a
segmentation model, we need labelled masks at the pixel level for each of the
images. Our data[1] has been generously provided by Cloud to Street. More
information about the dataset can be found here.
[1] Bonafilia, D., Tellman, B., Anderson, T., Issenberg,
E. 2020. Sen1Floods11: a georeferenced data set to train
and test deep learning flood algorithms for Sentinel-1.
The IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR) Workshops, 2020, pp. 210-211.
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 5/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 6/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
image_dir=os.path.join(os.getenv('LOCAL_DATA_DIR'), 'images')
mask_dir=os.path.join(os.getenv('LOCAL_DATA_DIR'), 'masks')
# display counts
print(f'-----number of images: {sum(images_count.values())}-----')
display(sorted(images_count.items(), key=lambda x: x[1]))
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 7/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
catalog_list=os.listdir(catalog_dir)
all_coordinates=[]
for catalog in catalog_list:
# check if it's a directory based on if file_name has an extensio
if len(catalog.split('.'))==1:
catalog_path=f'{catalog_dir}/{catalog}/{catalog}.json'
# read catalog
with open(catalog_path) as f:
catalog_json=json.load(f)
# parse out coordinates
coordinates_list=catalog_json['geometry']['coordinates'][0]
lon=[coordinates[0] for coordinates in coordinates_list]
all_coordinates.append(lon)
lat=[coordinates[1] for coordinates in coordinates_list]
all_coordinates.append(lat)
return all_coordinates
# create figure
plt.figure(figsize=(15, 10))
# create a Basemap
m=Basemap(projection='merc', llcrnrlat=-80, urcrnrlat=80, llcrnrlon=-180,
plt.title('Data Distribution')
plt.show()
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 8/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
def get_extent(file_path):
"""
This function returns the extent as [left, right, bottom, top] for a
"""
# read catalog for image
with open(file_path) as f:
catalog_json=json.load(f)
coordinates=catalog_json['geometry']['coordinates'][0]
coordinates=np.array(coordinates)
# get boundaries
left=np.min(coordinates[:, 0])
right=np.max(coordinates[:, 0])
bottom=np.min(coordinates[:, 1])
top=np.max(coordinates[:, 1])
return left, right, bottom, top
tiles_by_region(region_name='Spain', plot_type='images')
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 9/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
if plot_type=='images':
dir=os.path.join(image_dir, 'all_images')
cmap='viridis'
elif plot_type=='masks':
dir=os.path.join(mask_dir, 'all_masks')
cmap='gray'
else:
raise Exception('Bad Plot Type')
fig=plt.figure(figsize=(15, 15))
ax=plt.subplot(111)
South of the Equator. Feel free to refer to the above chart for Longitude
and Latitude values.
In [ ]: tiles_by_region(region_name='<<<<FIXME>>>>', plot_type='<<<<FIXME>>>>')
implemented using DALI are portable because they can easily be retargeted to
TensorFlow, PyTorch, MXNet and PaddlePaddle.
Often the pre-processing routines that are used for inference are like the ones
used for training, therefore implementing both using the same tools can save you
some boilerplate and code repetition.
DALI Pipeline
At the core of data processing with DALI lies the concept of a data processing
pipeline . It is composed of multiple operations connected in a directed graph and
contained in an object of class nvidia.dali.Pipeline . This class provides the
ability to define, build, and run data processing pipelines. Each operator in the
pipeline typically gets one or more inputs, applies some kind of data processing, and
produces one or more outputs. There are special kinds of operators that don’t take
any inputs and produce outputs. Those special operators that act like a data source,
readers, random number generators and external source fall into this category.
DALI offers CPU and GPU implementations for a wide range of processing operators.
The availability of a CPU or GPU implementation depends on the nature of the
operator. Make sure to check the documentation for an up-to-date list of supported
operations, as it is expanded with every release. The easiest way to define a DALI
pipeline is using the pipeline_def Python decorator. To create a pipeline, we
define a function where we instantiate and connect the desired operators and return
the relevant outputs. Then just decorate it with pipeline_def .
Let's start with defining a very simple pipeline, which will have two operators. The
first operator is a file reader that discovers and loads files contained in a directory.
The reader outputs both the contents of the files (in this case, PNGs) and the labels,
which are inferred from the directory structure. The second operator is an image
decoder. Lastly, we return the image and label pairs. The easiest way to create a
pipieline is by using the pipeline_def decorator. In the simple_pipeline
function we define the operations to be performed and the flow of the computation
between them. For more information about pipeline_def look to the
documentation.
In [ ]: # DO NOT CHANGE THIS CELL
# import dependencies
from nvidia.dali.pipeline import Pipeline
from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types
import warnings
warnings.filterwarnings("ignore")
@pipeline_def
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 12/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
def simple_pipeline():
# use fn.readers.file to read encoded images and labels from the hard
pngs, labels=fn.readers.file(file_root=image_dir)
# use the fn.decoders.image operation to decode images from png to RG
images=fn.decoders.image(pngs, device='cpu')
# specify which of the intermediate variables should be returned as t
return images, labels
In order to use the pipeline defined with simple_pipeline , we need to create and
build it. This is achieved by calling simple_pipeline() , which creates an
instance of the pipeline. Then we call build() on this newly created instance:
In [ ]: # DO NOT CHANGE THIS CELL
# create and build pipeline
pipe=simple_pipeline(batch_size=batch_size, num_threads=4, device_id=0)
pipe.build()
images, labels=simple_pipe_output
print("Images is_dense_tensor: " + str(images.is_dense_tensor()))
print("Labels is_dense_tensor: " + str(labels.is_dense_tensor()))
In order to see the images, we will need to loop over all tensors contained in
TensorList , accessed with its at method.
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 13/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
show_images(images)
Data Augmentation
Deep learning models require training with vast amounts of data to achieve accurate
results. DALI can not only read images from disk and batch them into tensors, but it
can also perform various augmentations on those images to improve deep learning
training results. Data augmentation artificially increases the size of a dataset by
introducing random disturbances to the data, such as geometric deformations, color
transforms, noise addition, and so on. These disturbances help produce models that
are more robust in their predictions, avoid overfitting, and deliver better accuracy. We
will use DALI to demonstrate data augmentation that we will introduce for model
training, such as cropping, resizing, and flipping.
In [ ]: # DO NOT CHANGE THIS CELL
import random
@pipeline_def
def augmentation_pipeline():
# use fn.readers.file to read encoded images and labels from the hard
image_pngs, _=fn.readers.file(file_root=image_dir)
# use the fn.decoders.image operation to decode images from png to RG
images=fn.decoders.image(image_pngs, device='cpu')
image_size=512
roi_size=image_size*.5
roi_start_x=image_size*random.uniform(0, 0.5)
roi_start_y=image_size*random.uniform(0, 0.5)
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 14/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
# create plot
fig=plt.figure(figsize=(15, (15 // columns) * rows))
gs=gridspec.GridSpec(rows, columns)
grid_data=[image_batch, resized_image_batch, flipped_image_batch, mas
grid=0
for row_idx in range(rows):
for col_idx in range(columns):
plt.subplot(gs[grid])
plt.axis('off')
plt.title(augmentation[col_idx%3])
plt.imshow(grid_data[col_idx].at(row_idx))
grid+=1
plt.tight_layout()
Random Rotation
Now let us perform additional data augmentation by rotating each image (by a
random angle). We can generate a random angle with random.uniform and use
rotate for the rotation. We create another pipeline that uses the GPU to perform
augmentations. DALI makes this transition very easy. The only thing that changes is
the definition of the rotate operator. We only need to set the device argument to
gpu and make sure that its input is transferred to the GPU by calling .gpu() .
angle=fn.random.uniform(range=(-30.0, 30.0))
rotated_images = fn.rotate(images.gpu(), angle=angle, fill_value=0, k
rotated_masks = fn.rotate(masks.gpu(), angle=angle, fill_value=0, kee
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 15/16
21/03/2025, 22:59 01_disaster_risk_monitoring_systems_and_data_pre-processing
The rotate_pipeline now performs the rotations on the GPU. Keep in mind that
the resulting images are also allocated in the GPU memory, which is typically what we
want, since the model likely requires the data to be in GPU memory for training. In
any case, copying back the data to CPU memory after running the pipeline can be
easily achieved by calling as_cpu() on the objects returned by
Pipeline.run() .
dli-e5d62e622240-86cddf.westus2.cloudapp.azure.com/lab/lab/tree/01_disaster_risk_monitoring_systems_and_data_pre-processing.ipynb 16/16