Characteristics of Environmental Data
Characteristics of Environmental Data
1
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
2
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
• Post-Wildfire Surveys: Sampling soil and water to assess the impact of wildfires
on the environment.
Crowdsourced Data
Beyond using specialized equipment and trained professionals, in-situ environmental data can
also be collected by ordinary citizens, often referred to as citizen scientists. These individuals
use pure observation and personal devices such as smartphones to gather data. This method
needs to be guided in a well-organized, objective-oriented way and is often facilitated by web
or mobile applications.
Such apps play a crucial role in structuring the data collection process, ensuring data quality,
and providing platforms for data submission. They often include guidelines, tutorials, and tools
to help citizen scientists accurately record their observations. For example, the "CrowdWater"
app enables users to contribute data on water levels and stream conditions, while other apps
might focus on tracking wildlife sightings, air quality, or weather conditions. These applications
harness the power of community participation to enhance environmental monitoring and
research.
Such crowdsourced data can be very valuable because it allows for the collection of large
quantities of data over extensive geographic areas and diverse environmental conditions. This
breadth and depth of data collection can be challenging and costly to achieve through
traditional methods. Additionally, crowdsourced data can provide real-time updates and
localized insights, which are crucial for monitoring rapidly changing environmental conditions
and for early detection of environmental issues.
Despite its value, crowdsourced data collection faces several challenges. Ensuring the accuracy
and reliability of data collected by non-experts can be difficult, as variations in observation
methods and potential errors in data recording can affect data quality. Another challenge is the
need for consistent and sustained participation from volunteers, which can fluctuate over time.
Data privacy and security concerns also arise, as participants may be reluctant to share their
location or other personal information. Lastly, the integration of crowdsourced data with data
from traditional sources requires careful validation and harmonization to ensure compatibility
and coherence in the overall dataset.
3
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
4
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
Scaling up in situ observations to cover larger areas can be challenging compared to,
e.g., remote sensing data.
2) Remote Sensing Data
Remote sensing (RS) data refers to the acquisition of information about the Earth's
surface and atmosphere from a distance, typically using platform such as satellites,
aircrafts, drones (Unmanned Aerial Vehicles, UAVs), or Balloons.
Satellite-based RS data is generally low-frequency and periodic because satellites
follow fixed orbits and collect data when passing over specific areas at regular intervals.
This makes them less flexible for real-time or need-based monitoring.
In contrast, data from aircrafts and UAVs (drones) is typically event-based or need-
based, as they can be deployed specifically for targeted monitoring tasks or events.
Balloons can fall in between, being used either periodically or for specific events
depending on the deployment.
Overall, RS data is rarely high-frequency, as most systems are constrained by their
operational design and data collection methods.
Remote sensing data primarily comes in the form of 2D images, though not necessarily
the typical optical images captured by conventional cameras. In certain cases, it can
also come in other formats, such as point clouds, especially when using technologies
like LiDAR or radar that capture 3D spatial information. It is common for remote sensing
(RS) data to be calibrated using in-situ observation data to improve accuracy and
ensure that the remotely sensed measurements align with real-world conditions.
5
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
3) Model-generated Data
Model-generated data in Environmental Data Analytics refers to data produced by
computational models that simulate environmental processes, such as climate models,
hydrological models, or air quality models. This data is an important source of information
because it allows us to fill gaps where observational data is unavailable, or simulate scenarios
that cannot be directly measured. Model-generated data can be tailored to specific needs,
providing either high-frequency data or low-frequency data for long-term trends. Additionally,
it can be event- or need-based, meaning the model can be run in response to specific events
or requirements, making it highly adaptable for environmental analysis and decision-making.
It is common for computational models that simulate environmental processes to be calibrated
using actual observations, particularly in-situ observations, to ensure their accuracy before
being used to generate data.
6
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
In-situ Observation Time series data, scalar values, and sometimes in the form
of text or images (such as images from trap cameras).
7
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
techniques in environmental data analytics, enabling the study of how environmental variables
change and interact over various time scales.
3) Seasonality and Cyclic Patterns:
Environmental data often exhibits seasonal trends (e.g., rainfall, temperature) that repeat on a
yearly or other cyclic basis.
4) Spatial and Temporal Correlation:
Environmental data usually has spatial and temporal dependencies, meaning nearby locations
or time points tend to be correlated.
5) Non-linearity:
Many environmental processes, such as climate dynamics or pollution dispersion, exhibit non-
linear relationships between variables. This means that small changes in one variable can lead
to disproportionate effects in others, complicating the task of predicting or modeling these
systems accurately.
6) Scale Dependency:
Environmental data is influenced by the spatial and temporal scales at which it is collected or
analyzed. Patterns or relationships that are evident at one scale (e.g., local, regional, or global)
may not be visible or may behave differently at another scale. For example, a trend in
temperature change observed over a city might not reflect broader regional climate patterns.
Choosing the appropriate scale is critical, as environmental phenomena can behave differently
across scales, impacting the accuracy and relevance of any analysis or model. When
environmental data is reported, it should be accompanied by context such as the scale and
resolution at which it was collected to ensure proper interpretation and application in analyses.
7) Multi-modality:
Environmental data is often multi-modal, meaning it is collected in different forms or modalities,
such as images (satellite or aerial), time series (sensor data), scalar values (e.g., temperature or
humidity readings), and text (reports or observations). Integrating and analyzing these different
modalities together can provide richer insights but also presents challenges in harmonizing
diverse data types.
8) Uncertainty:
Environmental data is often prone to uncertainty.
Data uncertainty refers to the discrepancies between the quantities that describe a system
(such as measurements or model predictions) and the actual state of the system, which cannot
be fully quantified or precisely known. If these discrepancies can be quantified in a relatively
straightforward way, they are considered errors rather than uncertainty.
In environmental data, uncertainty arises from various sources, including measurement
uncertainty, which results from limitations or issues with instruments, representativity
8
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
uncertainty, occurring when data collected at a specific location or time does not fully capture
the broader area or time period, and interpolation uncertainty, which arises when
observational data is used to estimate values at unmeasured locations, leading to potential
inaccuracies. Quantifying these uncertainties and understanding their impact on analytics and
machine learning tasks is a critical aspect of environmental data analysis (EDA).
9) Heterogeneity:
Environmental data is often highly heterogeneous, as it may originate from various collection
methods, instruments, or even the same method and instrument used at different times by
different people. As a result, the scale, resolution, and uncertainty of different subsets of the
data can vary significantly. This diversity creates challenges for data integration and analysis,
requiring careful handling to ensure consistency and comparability across datasets.
10) Large Volume:
Environmental datasets, especially those derived from remote sensing technologies or large-
scale simulations, tend to be massive. High spatial and temporal resolution means that vast
amounts of data are generated, which can strain storage and processing capabilities, requiring
sophisticated data management and analysis tools.
11) Non-Normality:
Environmental data often does not follow a normal distribution, which is a key assumption in
many statistical methods. Applying techniques that rely on normality without verifying the
data’s distribution can lead to misleading results. Therefore, it is essential to assess the data
distribution before using parametric statistics and, if necessary, consider alternative non-
parametric methods that do not assume normality.
12) Missing Data:
Environmental data often has missing values due to sensor failures, data collection issues, or
other limitations. Handling missing data through techniques like interpolation, imputation, or
specialized models is essential for accurate analysis.
9
Course Title: Environmental Data Analytics.
Degree Program: Master’s in data science.
Instructor: Mohammad Mahdi Rajabi
10