CalCOFI Machine Learning Model

This research develops a machine learning model to predict chlorophyll concentrations in marine ecosystems using CalCOFI bottle data, focusing on environmental parameters like temperature and salinity. The model integrates data preprocessing, feature engineering, and hyperparameter tuning, achieving an R² value of 0.7889, which enhances real-time monitoring and supports sustainable fisheries management. The study highlights the potential of machine learning in improving traditional chlorophyll detection methods and advancing marine ecosystem conservation efforts.

Uploaded by

Sajid Ahmed Khan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

CalCOFI Machine Learning Model

Uploaded by

Sajid Ahmed Khan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Developing a Chlorophyll Level Detection Model

Using CalCOFI Bottle Data: Implications for

Marine Ecosystem Monitoring
Nazim-E-Alam Nazmul Islam Rahat
Dept. of Computer Science Dept. of Computer Science
American International University-Bangladesh American International University-Bangladesh
Dhaka, Bangladesh Dhaka, Bangladesh
[email protected] [email protected]

Rakib Hassan Mohammad Saef Ullah Miah*

Dept. of Computer Science Dept. of Computer Science
American International University-Bangladesh American International University-Bangladesh
Dhaka, Bangladesh Dhaka, Bangladesh
[email protected] [email protected]

Abstract—Monitoring chlorophyll concentrations is essential Conventional techniques for assessing chlorophyll concen-
for comprehending the health and alterations occurring in marine trations, including satellite remote sensing and water sampling,
ecosystems. This research aim was to develop a machine learn- have demonstrated efficacy but are frequently constrained by
ing model for predicting chlorophyll concentrations in oceanic
waters, utilizing data obtained from the CalCOFI bottle surveys. geographical and temporal resolution or practical limitations
The algorithm accurately estimates chlorophyll concentrations [3]. Recent breakthroughs in machine learning have shown the
by assessing environmental parameters such as temperature, ability to improve traditional methods by utilizing large data
salinity, and nutrient levels. Data cleansing procedures were inte- sets and recognizing intricate patterns within environmental
grated with sophisticated regression methods to train the model variables [4]. Machine learning algorithms have shown notable
alongside feature engineering and hyperparameter tuning, guar-
anteeing optimal performance and minimizing errors with R2 efficacy in managing non-linear interactions among physical
value of 0.7889. This method possesses considerable potential for and chemical parameters, making them suitable for forecasting
monitoring marine ecosystems, providing an effective instrument chlorophyll concentrations [5].
for real-time chlorophyll level detection. Enhanced monitoring This research targets to create a machine learning model for
of phytoplankton health enables the model to identify ecological predicting chlorophyll concentrations with data from the Cali-
imbalances promptly, facilitate sustainable fisheries management,
and bolster conservation initiatives. This approach is applicable fornia Cooperative Oceanic Fisheries Investigations (CalCOFI)
to many locales and datasets, rendering it a significant asset for bottle surveys. The CalCOFI dataset, which includes character-
worldwide maritime monitoring initiatives. The results illustrate istics such as temperature, salinity, and nutrient concentrations,
the capability of data-driven models in enhancing marine science provides a valuable resource for analyzing the determinants
and environmental conservation. of chlorophyll dynamics. We aim to get high prediction
Index Terms—chlorophyll detection, marine ecosystem moni-
toring, machine Learning, calCOFI bottle data, oceanographic accuracy and reliability through the integration of data pre-
measures processing approaches, cross validation, feature engineering,
and hyperparameter tweaking. This method enhances real-time
monitoring of marine ecosystems and supports sustainable
I. I NTRODUCTION
fisheries management and conservation initiatives.
Marine environments are essential for sustaining ecologi-
cal equilibrium and fostering global biodiversity. Chlorophyll II. L ITERATURE R EVIEW
level is a crucial indication of phytoplankton biomass and Chlorophyll-a (Chl-a) is a key pigment found in phytoplank-
primary productivity, vital for assessing marine health. Pre- ton and plays a crucial role in photosynthesis. Monitoring Chl-
cise monitoring of chlorophyll concentrations is crucial for a concentration is essential for understanding water quality,
identifying biological alterations, such as detrimental algal assessing marine ecosystem health, and detecting harmful
blooms, which can damage marine ecosystems and fisheries algal blooms. Traditional methods for Chl-a estimation, such
[1]. Additionally, chlorophyll data helps in evaluating the as direct sampling and satellite-based remote sensing, often
effects of climate change on ocean production, facilitating face limitations due to environmental factors like cloud cover,
informed decision-making for conservation initiatives [2]. atmospheric interference, and the inability to capture data
during polar nights. In recent years, machine learning (ML) bining multiple data sources to enhance spatial and tempo-
techniques have gained significant attention as they offer the ral resolution. Researchers studying the Barents Sea applied
potential to overcome these challenges by leveraging large Sentinel-2 MSI data with in situ measurements, using a neural
datasets and improving the accuracy and efficiency of Chl-a network model called Ocean Color Network (OCN). This
predictions across various water bodies. approach achieved a 51.7% reduction in errors compared to
Machine learning models have shown remarkable poten- traditional empirical methods, demonstrating the potential of
tial in predicting Chl-a concentrations by integrating diverse data fusion techniques for more accurate and robust monitor-
datasets and identifying complex patterns that traditional em- ing [12].Another notable study employed MODIS/Aqua data
pirical methods struggle to capture. Madani et al. (2024) intro- and several ML models, including Support Vector Machine
duced a machine learning-based approach to generate a con- (SVM), Random Forest (RF), and Extreme Gradient Boost-
tinuous solar-induced chlorophyll fluorescence (SIF) dataset ing (XGBoost). The Differential Evolution-based SVM (DE-
for the Arctic Ocean [6]. They employed Random Forest SVM) model outperformed conventional methods, achieving
models trained on environmental parameters such as Chl- an impressive R² value of 0.926 [13]. These advancements
a concentration, sea surface temperature (SST), and salinity underscore the importance of combining satellite observations
(SSS), effectively extending the SIF dataset back to 2004. This with ML algorithms to enhance the reliability of Chl-a mon-
study provided valuable insights into phytoplankton activity itoring across diverse aquatic environments.While machine
and their responses to changing climatic conditions.Similarly, learning has significantly advanced Chl-a estimation, several
Chusnah et al. (2023) utilized multi-satellite imagery to de- challenges remain. One of the primary issues is the scarcity
velop high-resolution models for Chl-a concentration estima- of high-quality in situ data, which is crucial for training
tion in inland water bodies [7]. Their approach combined and validating ML models. Additionally, the generalization
Sentinel-3 OLCI data with Sentinel-2 MSI imagery using of models across different geographic regions and varying
Random Forest algorithms to enhance spatial resolution and environmental conditions presents a significant hurdle. The
achieve high prediction accuracy, with R² values of 0.873 and lack of explainability and interpretability in complex ML
0.822. These findings highlight the potential of integrating models also raises concerns about their widespread adoption
multiple data sources for more precise and reliable Chl-a in environmental management.
estimation.Deep learning has emerged as a powerful tool for Future research should focus on developing more inter-
Chl-a estimation, capable of capturing intricate spatial and pretable ML models, integrating additional environmental vari-
temporal patterns. Yao et al. (2023) explored deep learning ables such as nutrient levels and ocean currents, and expanding
models such as ConvLSTM, CNN-LSTM, and Self-Attention datasets to enhance the robustness of predictions. The incor-
ConvLSTM (SA-ConvLSTM) to forecast Chl-a levels in the poration of advanced techniques like transfer learning and
Yellow Sea and Bohai Sea [8]. The SA-ConvLSTM model domain adaptation could further improve the adaptability of
achieved the highest accuracy with a Pearson correlation models to new regions and changing environmental conditions.
coefficient of 0.887, demonstrating its ability to account for As per our knowledge, there is no proper machine learning
dynamic oceanographic changes. Zeng et al. (2023) proposed model which used CalCOFI bottle data to predict chlorophyll
a hybrid model that combines a 1D CNN for feature extrac- level. As a result, we tried to build a regression model that
tion with Support Vector Regression (SVR) for prediction, can predict accurate chlorophyll levels, which can help in
yielding an R² value of 0.892 [9]. These studies underscore real-time monitoring of marine ecosystem. The application
the significance of deep learning models in improving the of machine learning in Chl-a estimation has revolutionized
precision of Chl-a predictions. The combination of different the field by providing accurate, scalable, and cost-effective
machine learning approaches has proven to enhance prediction solutions for monitoring marine ecosystems. The integration of
accuracy further. A study focusing on the Venice Lagoon satellite data, deep learning techniques, and hybrid modeling
integrated Random Forest and Multi-Layer Perceptron (MLP) approaches has significantly improved prediction accuracy
models with the SHYFEM-BFM biogeochemical framework, and resolution. As technology continues to evolve, machine
offering valuable insights into how Chl-a concentrations might learning will play a crucial role in advancing environmental
evolve under different climate change scenarios [10]. The hy- monitoring and informing policy decisions aimed at preserving
brid approach allowed for improved long-term forecasting by aquatic ecosystems.
combining data-driven insights with process-based models.In
addition, optimized ensemble models such as Random Forest, III. M ETHODOLOGY
Gradient Boosting, and Extra Trees have been used to predict This part of the paper reflects a detailed overview of the
phytoplankton absorption coefficients across different wave- steps taken to build the models including data collection, data
lengths. The Extra Trees model, in particular, demonstrated preprocessing, exploratory data analysis, model training, fea-
exceptional accuracy with an R² value of 0.9033 at 510 nm, ture engineering, cross-validation, ensemble model, and hyper-
showcasing the effectiveness of ensemble learning techniques parameter tuning. Several tools were used during the model-
in capturing complex ecological interactions [11]. building process. Figure 1 reflects the proposed methodology
The integration of satellite data with machine learning of the study. Data manipulation and analysis were done using
models has significantly improved Chl-a estimation by com- Pandas [14] while Seaborn [15] and Matplotlib [16] helped
to visualize the data. Scikit-learn [17] was used to develop 247,000. Subsequently, physical attributes such as temper-
machine learning models and Optuna [18] was applied for ature, salnty, and depth, along with chemical components
hyperparameter optimization. Computations were managed on like O2ml L, STheta, O2Sat, PO4uM, SiO3uM, NO3uM, and
Google Collab [19] and Kaggle [20]. Sta ID (to track location), were selected for analysis.
During preprocessing, it was observed that several attributes
contained missing values. Attributes with 5–10% or 10–20%
missing data were handled using techniques such as K-Nearest
Neighbors (KNN), mean imputation, and interpolation. How-
ever, PO4uM, SiO3uM, and NO3uM had more than 50%
missing values and were subsequently removed, as imputing
such a large proportion of missing data would not yield reliable
results. The final set of selected attributes, after these prepro-
cessing steps are ChlorA, Depthm, STheta, O2Sat, O2ml L,
T degC, Salnty and Sta ID.
Outliers were addressed using the Interquartile Range (IQR)
method, which was appropriate given the large, complex, and
diverse nature of the dataset. Figure 2 illustrates that filtering
the data resulted in a narrower and smoother distribution for
each variable, suggesting that the removal of noise and outliers
significantly improved data quality.

Fig. 1. Proposed Methodology of the Study

Fig. 2. Denseplot of before and after removing outliers

A. Data Collection
For this study, the data was collected from the CalCOFI
Bottle Data provided by the official CalCOFI website [21]. C. Exploratory Data Analysis (EDA)
This dataset includes oceanographic measurements, such as The histograms in Figure 3 illustrate the distribution of
water temperature, salinity, nutrient levels (nitrate, phosphate, various oceanographic variables. Chlorophyll-a (ChlorA) ex-
and silicate), and chlorophyll concentration, recorded at differ- hibits a right-skewed distribution, with most values being low
ent depths in the Pacific Ocean. The data used in this research and a few very high values, potentially indicating regions of
covers the period of [1949-present] and includes samples taken elevated phytoplankton biomass. Depth (Depthm) shows a left-
from various stations along the California coastline. The data skewed pattern, suggesting that the majority of measurements
set is publicly available on the website, making it transparent were taken in shallower waters. Temperature (T degC) and
and accessible for further analysis and ensuring transparency Salinity display nearly normal distributions, implying rela-
in marine ecosystem research. The Bottle dataset contains tively consistent values across the dataset. Dissolved oxygen
more than 9lakh instance of oceanographic attributes with (O2ml L) is left-skewed, with higher oxygen levels being
more than 20 attributes. prevalent but some instances of lower oxygen concentrations.
Potential temperature (STheta) and oxygen saturation (O2Sat)
B. Data Preprocessing exhibit right-skewed distributions, where lower values are
The core focus of this study was to develop a chlorophyll more frequent, with a few high-value outliers. These patterns
detection model using physical and chemical components. provide insights into the variability of phytoplankton biomass,
Initially, all instances with missing values in ChlorA were oxygen levels, and the potential presence of hypoxic zones
discarded, reducing the dataset from 900,000 instances to within the study area.
Figure 7 displays the boxplot of Chlorophyll-a concentra-
tions, revealing a right-skewed distribution. The median value
is approximately 0.2, with a notable number of high-value
outliers. This suggests significant variation in phytoplankton
biomass across the dataset.

Fig. 5. Correlation matrix showing relationships among key variables.

Tree and K-NN, to more complex and powerful models

like SVR, MLP, and Random Forest. This variety helps to
thoroughly test how well the models can capture complex
Fig. 3. Distribution of various oceanographic variables.
patterns and relationships in the data. The data was divided
into two parts for model training, 80% for training and 20%
for testing.
E. Feature Engineering
In this section we tried to combine different attributes for
model training to get better performance. We combined 6
attributes and made 63 combination out of it to train the model
with random forest algorithm.
F. Ensemble Model Training
We used Ensemble stacking model combining random for-
est, SVR and Gradient Boosting Regressor to enhance the
performance.
G. Cross Validation
We applied 5-fold cross validation to validate the training
and test data.
H. Hyperparameter Tuning
Fig. 4. Boxplot of Chlorophyll-a concentrations (ChlorA). Optuna was used to fine-tune the hyperparameters of a Ran-
dom Forest Regressor. The dataset was divided into training
The correlation matrix in Figure 5 highlights key rela- and test sets, and an objective function was defined to train
tionships among the variables. Chlorophyll-a levels exhibit a the model and assess its performance using the R² metric.
positive correlation with temperature, dissolved oxygen levels Optuna’s MedianPruner was utilized to eliminate trials that
(O2ml L), and oxygen saturation (O2Sat), while showing a show low potential, and the optima l hyperparameters are used
negative correlation with depth, salinity, and potential density to retrain the final model, which is then evaluated on the test
(STheta). These findings provide a deeper understanding of data.
the environmental factors influencing phytoplankton biomass.
IV. R ESULT
D. Model Training The performance of the machine learning models was eval-
hese models were selected for their different approaches, uated based on three metrics: Mean Absolute Error (MAE),
from simple and easy-to-understand ones such as Decision Root Mean Squared Error (RMSE), and R². These metrics
provide insights into the accuracy, error magnitude, and ex- TABLE I
planatory power of each model, allowing for a thorough P ERFORMANCE M ETRICS FOR VARIOUS M ODELS
comparison. MAE MSE Rˆ2
A. Random Forest Decision Tree 0.0704 0.0175 0.5843
SVR 0.0918 0.0158 0.6261
The Random Forest model demonstrated the best overall
KNN 0.0684 0.0126 0.6999
performance, with the lowest MAE (0.0561) and RMSE
MLP 0.0832 0.0149 0.6471
(0.0090) values and the highest R² (0.7852). This indicates
Random Forest 0.0561 0.0090 0.7852
its strong ability to accurately predict chlorophyll levels while
effectively capturing complex non-linear relationships in the
dataset. Its ensemble structure, which combines predictions
from multiple decision trees, reduces overfitting and improves
generalization, making it highly suited for the diverse and
noisy CalCOFI dataset.
B. K-Nearest Neighbors (K-NN)
K-NN achieved an R² of 0.6999, the second-highest among
the models, showing that it can reasonably explain the variance
in chlorophyll levels. However, its MAE (0.0684) and RMSE
(0.0126) were higher than those of Random Forest, indicating
slightly less accuracy. K-NN’s reliance on local patterns in the
data may have limited its performance, particularly in areas
with sparse or highly variable observations.
Fig. 6. Comparison of Scroes across Different Models
C. Multilayer Perceptron (MLP)
The MLP model showed moderate performance, with an
to reduce overfitting and improve accuracy. It effectively
R² of 0.6471, MAE of 0.0832, and RMSE of 0.0149. While
captures complex relationships between physical and chemical
MLP is capable of modeling complex relationships, it is
variables related to chlorophyll levels. The model also handles
sensitive to hyperparameter tuning and data preprocessing. The
large datasets well and focuses on the most important features.
model’s performance might have been affected by the limited
This made it more reliable and accurate when predicting
optimization of its architecture and potential overfitting due to
chlorophyll levels.
the dataset’s variability.
This performance gap is attributed to the similarity among
D. Support Vector Regressor (SVR) base models, which reduced the diversity of predictions, and
SVR produced an R² of 0.6261, along with an MAE of Random Forest’s ability to effectively capture complex inter-
0.0918 and RMSE of 0.0158. Its performance was similar to actions in the CalCOFI dataset on its own. The results indicate
MLP but slightly less accurate. While SVR is effective for that the ensemble’s added complexity provided limited value
datasets with clear patterns, its ability to generalize may have due to overlapping insights among models and the dataset’s
been constrained by the high-dimensional and noisy nature of compatibility with Random Forest’s strengths.
the CalCOFI dataset. Feature engineering have been done using random forest
providing key insight of the relationship of chlorophyll with
E. Decision Tree other attributes in Figure 7. Depth is strongly influencing
5. Decision Tree The Decision Tree model had the lowest R² chlorophyll level with feature importance of more than 0.4
(0.5843) and relatively higher error metrics (MAE of 0.0704 while potential density is in second with almost 1.15 feature
and RMSE of 0.0175). This suggests that it struggled to importance. All the combination of the attributes were applied
capture the complex interactions in the data, likely due to its to train the model for best performance but the model per-
tendency to overfit on training data while failing to generalize formed best with all the attributes combined . The 5-fold cross-
effectively. validation results reveal a moderate model performance with
From table I and figure 6 it can be observed that among a mean R² score of 0.6198, indicating that the model explains
all five models random forest performed significantly well around 62% of the variance on average. The Mean Absolute
while others struggling to exceed the performance of Random Error (MAE) is 0.0803, suggesting relatively small prediction
Forest model. Random Forest had the R2 value of 0.7852 errors. The Mean Squared Error (MSE) and Root Mean
which was highest comparing other model. On the other hand, Squared Error (RMSE) are 0.0156 and 0.1243, respectively,
Random Forest had MAE, MSE value of 0.0561, 0.0090 reflecting low prediction errors and solid model performance.
respectively which were lowest than other models. Random However, there is some variation across the folds, suggesting
Forest performed better than the other models in the CalCOFI potential for further improvement.
dataset because it combines multiple decision trees, helping Optuna was used for hyperparameter tuning. In Trial 96,
Moreover, deep learning methods such as recurrent neural
networks (RNNs) or convolutional neural networks (CNNs)
can assist the model in identifying temporal and spatial pat-
terns and associations in the data. Given their versatility, a
bigger and more complicated dataset would probably enable
these models to perform even better [23].
These methods could help the model to be more dependable
overall and increase its capacity to forecast, thereby creating
fresh opportunities for its use in different fields.

VI. C ONCLUSION
This paper aimed to build a machine learning model for
chlorophyll level detection using CalCOFI data to monitor
marine ecology. The model was successful to provide proper
chlorophyll level using various environmental parameters, such
Fig. 7. feature Importance from Random Forest
as temperature, salinity, and nutrient levels such as oxygen
level, oxygen saturation, potential density, to provide accurate
predictions of chlorophyll concentrations. Result shows that
the model achieved an R2 score of 0.7876 with the fol- model performance was improved using proper data pro-
lowing hyperparameters: 490 estimators, a max depth of cessing, feature engineering and hyperparameter optimization.
44, min samples split of 2, min samples leaf of 1, and Particularly, Random Forest Regressor was demonstrated that
max features set to None. Despite this, the best performing it can be highly effective in capturing relationships within the
trial remains Trial 45, which achieved a higher R2 score of data and predict chlorophyll concentration. Despite the com-
0.7889. This indicates that the hyperparameters from Trial 45 plexity of marine ecosystems and variability in environmental
still provided the best model performance. The optimization factors, our model achieved a high level of accuracy, highlight-
process continues to explore different hyperparameter config- ing the potential of data-driven approaches in environmental
urations to improve predictive accuracy. This tuning improved monitoring which is a great success for the paper. However,
the overall model R2 value from 0.7852 to 0.7889. deep learning can be used to improve the model performance
Among the five models tested, Random Forest outperformed as the dataset is diverse, complex, and large. In the future,
the others with the highest R² value of 0.7852, alongside implementing more advance deep learning models and atten-
the lowest MAE (0.0561) and MSE (0.0090), demonstrating tion mechanisms can lead the model to more advanced and
its ability to handle complex relationships between physical accurate. In conclusion, the findings of this study contribute to
and chemical variables and reduce overfitting by combining the growing body of research on machine learning applications
multiple decision trees. While an ensemble stacking model in marine science and environmental monitoring, providing a
combining Random Forest, SVR, and Gradient Boosting Re- robust framework for future investigations into the dynamics
gressor achieved an R² of 0.71, it did not surpass Random In conclusion, the findings of this study contribute to the
Forest’s performance. Feature engineering revealed that Depth growing body of research on machine learning applications
had the strongest influence on chlorophyll levels, followed by in marine science and environmental monitoring, providing a
Potential Density, with feature importance values of 0.4 and robust framework for future investigations into the dynamics
1.15, respectively. Hyperparameter tuning using Optuna led of marine ecosystems.
to a slight improvement in model performance, achieving an
R² of 0.7889 in Trial 45, compared to 0.7876 in Trial 96, R EFERENCES
reflecting the optimal configuration for the Random Forest [1] A. W. Griffith and C. J. Gobler, “Harmful algal blooms: A cli-
model. mate change co-stressor in marine and freshwater ecosystems,”
Harmful Algae, vol. 91, pp. 1–15, Mar. 2019. [Online]. Available:
https://doi.org/10.1016/j.hal.2019.03.008
V. F UTURE S COPE [2] V. G. Dvoretsky, V. V. Vodopianova, and A. S. Bulavina, “Effects
Adding more sophisticated elements like an attention mech- of Climate Change on Chlorophyll a in the Barents Sea: A Long-
Term Assessment,” Biology, vol. 12, no. 1, p. 119, Jan. 2023. [Online].
anism in the future will help to improve the performance of Available: https://doi.org/10.3390/biology12010119
the model even further. Commonly utilized in disciplines such [3] E. T. Harvey, S. Kratzer, and P. Philipson, “Satellite-based water quality
computer vision and natural language processing, the attention monitoring for improved spatial and temporal retrieval of chlorophyll-a
in coastal waters,” Remote Sensing of Environment, vol. 158, Mar. 2015.
system could enable the model to concentrate on the most [Online]. Available: https://doi.org/10.1016/j.rse.2014.11.017
critical data points or grasp data evolution over time [22]. [4] D. B. Olawade, O. Z. Wada, A. O. Ige, B. I. Egbewole, A. Olojo,
Furthermore improving the way the data is presented could and B. I. Oladapo, “Artificial intelligence in environmental monitoring:
Advancements, challenges, and future directions,” Hygiene and Envi-
help the model learn better, hence producing more accuracy ronmental Health Advances, vol. 12, p. 100114, Dec. 2024. [Online].
and simpler interpretation. Available: https://doi.org/10.1016/j.heha.2024.100114
[5] J. W. Han, T. Kim, S. Lee, T. Kang, and J. K. Im, “Machine learning and
explainable AI for chlorophyll-a prediction in Namhan River Watershed,
South Korea,” Ecological Indicators, vol. 2024, p. 112361. [Online].
Available: https://doi.org/10.1016/j.ecolind.2024.112361
[6] N. Madani, N. C. Parazoo, M. Manizza, and A. Chatterjee, “A ma-
chine learning approach to produce a continuous solar-induced chloro-
phyll fluorescence dataset for understanding ocean productivity,” March
2024. [Online]. Available: https://doi.org/10.22541/essoar.171164956.
61516407/v1
[7] W. N. Chusnah, H. J. Chu, Tatas et al., “Machine-learning-estimation of
high-spatiotemporal-resolution chlorophyll-a concentration using multi-
satellite imagery,” Sustain. Environ. Res., vol. 33, no. 11, 2023. [Online].
Available: https://doi.org/10.1186/s42834-023-00170-1.
[8] L. Yao, X. Wang, J. Zhang, and X. Yu, “Prediction of sea surface
chlorophyll-a concentrations based on deep learning and time-series
remote sensing data,” Remote Sensing, vol. 15, no. 18, p. 4486, Sep.
2023. [Online]. Available: https://doi.org/10.3390/rs15184486.
[9] D. Kim, K. J. Lee, S. M. Jeong, M. S. Song, B. J. Kim, J. Park, and
T. Y. Heo, “Real-time chlorophyll-a forecasting using machine learning
framework with dimension reduction and hyperspectral data,” Environ.
Res., 2024, Art. no. 119823. [Online]. Available: https://doi.org/10.1016/
j.envres.2024.119823
[10] F. Zennaro, E. Furlan, D. Canu, L. Aveytua Alcazar, G. Rosati, C.
Solidoro, S. Aslan, and A. Critto, “Venice lagoon chlorophyll-a evalu-
ation under climate change conditions: A hybrid water quality machine
learning and biogeochemical-based framework,” Environmental Science
and Pollution Research, vol. 2024, pp. 1-12, 2024. [Online]. Available:
https://doi.org/10.1016/j.envres.2024.119823
[11] M. S. Alam, S. P. Tiwari, and S. M. Rahman, “Optimized ensemble
machine learning models for predicting phytoplankton absorption co-
efficients,” IEEE Access, vol. 12, pp. 5760-5769, 2024. doi: 10.1109/
ACCESS.2024.3350328.
[12] M. Asim, C. Brekke, A. Mahmood, T. Eltoft, and M. Reigstad, “Improv-
ing Chlorophyll-A Estimation From Sentinel-2 (MSI) in the Barents Sea
Using Machine Learning,” IEEE Journal of Selected Topics in Applied
Earth Observations and Remote Sensing, vol. 14, pp. 5529-5549, 2021,
doi: 10.1109/JSTARS.2021.3074975.
[13] K. Chen, J. Zhang, Y. Zheng, and X. Xie, “A Study on Global Oceanic
Chlorophyll-a Concentration Inversion Model for MODIS Using Ma-
chine Learning Algorithms,” IEEE Access, vol. 12, pp. 128843-128859,
2024, doi: 10.1109/ACCESS.2024.3456481.
[14] W. McKinney, “Pandas: A fast, powerful, flexible, and easy-to-use
open-source data analysis and manipulation library,” 2010. [Online].
Available: https://pandas.pydata.org/
[15] M. Waskom, “Seaborn: statistical data visualization,” Journal of Open
Source Software, vol. 6, no. 60, p. 3021, 2021. [Online]. Available:
https://seaborn.pydata.org/
[16] J. D. Hunter, “Matplotlib: A 2D graphics environment,” Computing
in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007. [Online].
Available: https://matplotlib.org/
[17] F. J. Pedregosa et al., “Scikit-learn: Machine learning in Python,”
Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[Online]. Available: https://scikit-learn.org/
[18] T. Akiba et al., “Optuna: A Next-generation Hyperparameter Optimiza-
tion Framework,” Proceedings of the 25th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, 2019. [Online].
Available: https://optuna.org/
[19] Google, “Google Colaboratory: A free Jupyter notebook environment
that requires no setup and runs entirely in the cloud,” [Online]. Available:
https://colab.research.google.com/
[20] Kaggle, “Kaggle: Your Home for Data Science,” [Online]. Available:
https://www.kaggle.com/
[21] California Cooperative Oceanic Fisheries Investigations
(CalCOFI), ”CalCOFI Bottle Database,” [Online]. Available:
https://calcofi.org/data/oceanographic-data/bottle-database/. [Accessed:
25-Jan-2025].
[22] D. Hu, “An introductory survey on attention mechanisms in NLP
problems,” in Intelligent Systems and Applications, Advances in Intel-
ligent Systems and Computing, vol. 295, pp. 432-448, Jan. 2020, doi:
10.1007/978-3-030-29513-4 31.
[23] A. Lamba, P. Cassey, R. R. Segaran, and L. P. Koh, “Deep learning for
environmental conservation,” Curr. Biol., vol. 29, no. 20, pp. R1156-
R1164, Oct. 2019, doi: 10.1016/j.cub.2019.08.016.

57 Pages - Thesis About Prediction of Cricket Match Outcome
No ratings yet
57 Pages - Thesis About Prediction of Cricket Match Outcome
57 pages
remotesensing-16-01870
No ratings yet
remotesensing-16-01870
24 pages
Deep Learning Methods for Multi Horizon Long Term Foreca 2024 Knowledge Base
No ratings yet
Deep Learning Methods for Multi Horizon Long Term Foreca 2024 Knowledge Base
19 pages
water-12-01822-v2
No ratings yet
water-12-01822-v2
18 pages
Arias-Rodriguez 2024 - Harmonized
No ratings yet
Arias-Rodriguez 2024 - Harmonized
27 pages
sustainability-12-06121-with-cover
No ratings yet
sustainability-12-06121-with-cover
20 pages
Hampton A
No ratings yet
Hampton A
9 pages
Chatziantoniou 2022
No ratings yet
Chatziantoniou 2022
10 pages
Water 11 01338 v3
No ratings yet
Water 11 01338 v3
19 pages
C3 Water Quality Prediction Based On Hybrid Deep (Drinking - Water)
No ratings yet
C3 Water Quality Prediction Based On Hybrid Deep (Drinking - Water)
10 pages
Short-Term Water Quality Variable Prediction Using A Hybrid CNN-LSTM Deep Learning Model
No ratings yet
Short-Term Water Quality Variable Prediction Using A Hybrid CNN-LSTM Deep Learning Model
19 pages
Integrated Technologies For Low Cost Environmental Monitoring in The Water Bodies of The Philippines: A Review
No ratings yet
Integrated Technologies For Low Cost Environmental Monitoring in The Water Bodies of The Philippines: A Review
13 pages
EoS Transactions - 2011 - Wilcock
No ratings yet
EoS Transactions - 2011 - Wilcock
3 pages
Water-14-00490-V2 PSO
No ratings yet
Water-14-00490-V2 PSO
21 pages
Operation and Integration of A Commercially Available 2024 Environmental Te
No ratings yet
Operation and Integration of A Commercially Available 2024 Environmental Te
15 pages
Response of Mrigal Fish To Pond Environment
No ratings yet
Response of Mrigal Fish To Pond Environment
10 pages
003 Confidence Levels, Sensitivity, and The Role of Bathymetry in Coral Reef Remote Sensing
No ratings yet
003 Confidence Levels, Sensitivity, and The Role of Bathymetry in Coral Reef Remote Sensing
29 pages
016 Remote Sensing of Coral Reefs For Monitoring and Management A Review
No ratings yet
016 Remote Sensing of Coral Reefs For Monitoring and Management A Review
41 pages
v1_covered_649f17f0-0d51-4c3d-a60d-457abf7d3d9f
No ratings yet
v1_covered_649f17f0-0d51-4c3d-a60d-457abf7d3d9f
20 pages
report_18
No ratings yet
report_18
20 pages
water-14-03003-v2
No ratings yet
water-14-03003-v2
14 pages
Jmse 12 00055 v2
No ratings yet
Jmse 12 00055 v2
18 pages
Science of The Total Environment
No ratings yet
Science of The Total Environment
10 pages
Thesis Marta Ramirez
No ratings yet
Thesis Marta Ramirez
140 pages
Sensors: A Water Quality Prediction Method Based On The Deep LSTM Network Considering Correlation in Smart Mariculture
No ratings yet
Sensors: A Water Quality Prediction Method Based On The Deep LSTM Network Considering Correlation in Smart Mariculture
20 pages
Citi 2020
No ratings yet
Citi 2020
10 pages
Detection of Wastewater Pollution Through Natural Language Generation With Low Cost Sensing Platform 2
No ratings yet
Detection of Wastewater Pollution Through Natural Language Generation With Low Cost Sensing Platform 2
12 pages
Design and Implementation of Rapid Assessment Approaches For Water Resource Monitoring Using Benthic Macroinvertebrates
No ratings yet
Design and Implementation of Rapid Assessment Approaches For Water Resource Monitoring Using Benthic Macroinvertebrates
15 pages
14
No ratings yet
14
26 pages
Research Paper (Yafra Khan)
No ratings yet
Research Paper (Yafra Khan)
6 pages
BOM ALGAS Contemporary Water Research - 2023 - Wu - Monitoring Algal Blooms in Small Lakes Using Drones A Case Study in Southern
No ratings yet
BOM ALGAS Contemporary Water Research - 2023 - Wu - Monitoring Algal Blooms in Small Lakes Using Drones A Case Study in Southern
11 pages
1-s2.0-S0048969724032583-main
No ratings yet
1-s2.0-S0048969724032583-main
18 pages
Research Rookies 2023 Faculty Mentor Project Information
No ratings yet
Research Rookies 2023 Faculty Mentor Project Information
8 pages
TFG Pau Baquin Definitiu
No ratings yet
TFG Pau Baquin Definitiu
31 pages
s41060-024-00704-9
No ratings yet
s41060-024-00704-9
15 pages
Dost Five Point Program No. 1: Developing Solutions To Pressing National Problems
No ratings yet
Dost Five Point Program No. 1: Developing Solutions To Pressing National Problems
13 pages
singh_prediction_epe_1-2024
No ratings yet
singh_prediction_epe_1-2024
18 pages
Ads Disads 2014
No ratings yet
Ads Disads 2014
12 pages
Multi-Scale Modeling of Intensive Macroalgae Cultivation and Marine Nitrogen Sequestration
No ratings yet
Multi-Scale Modeling of Intensive Macroalgae Cultivation and Marine Nitrogen Sequestration
11 pages
Kumarpaul 2021
No ratings yet
Kumarpaul 2021
16 pages
Water Quality Management Using GIS and RS Tools: Conference Paper
No ratings yet
Water Quality Management Using GIS and RS Tools: Conference Paper
8 pages
Application of High Frequency Intelligent Sensing Network in Monitoring and Early Warning of Water Quality Dynamic Change
No ratings yet
Application of High Frequency Intelligent Sensing Network in Monitoring and Early Warning of Water Quality Dynamic Change
14 pages
Characterization of Water Quality Conditions in The Klang - 2015 - Procedia Envi
No ratings yet
Characterization of Water Quality Conditions in The Klang - 2015 - Procedia Envi
6 pages
Sensors 21 04649
No ratings yet
Sensors 21 04649
19 pages
A16 Dissolved Oxygen Levels in River Water
No ratings yet
A16 Dissolved Oxygen Levels in River Water
50 pages
An autonomous pollution dtection system for water bodies
No ratings yet
An autonomous pollution dtection system for water bodies
17 pages
40 - Обзор применения машинного обучения для оценки качества воды
No ratings yet
40 - Обзор применения машинного обучения для оценки качества воды
10 pages
A Novel Hybrid Model To Predict Dissolved Oxygen For Efficient Water Quality in Intensive Aquaculture
No ratings yet
A Novel Hybrid Model To Predict Dissolved Oxygen For Efficient Water Quality in Intensive Aquaculture
13 pages
Marine Water Quality Monitoring
No ratings yet
Marine Water Quality Monitoring
3 pages
4_5th_Sem_Report_For_Water_Pollution_Detection[1]
No ratings yet
4_5th_Sem_Report_For_Water_Pollution_Detection[1]
32 pages
Development of A Floating Platform For Measuring Air and Water Quality
No ratings yet
Development of A Floating Platform For Measuring Air and Water Quality
6 pages
Combining Model Results and Monitoring Data For Water Quality Assessment
No ratings yet
Combining Model Results and Monitoring Data For Water Quality Assessment
7 pages
Sustainability 16 03355
No ratings yet
Sustainability 16 03355
19 pages
Use of LANDSAT 8 Images For Depth and Water Quality Assessment of El Guajaro Reservoir, Colombia
No ratings yet
Use of LANDSAT 8 Images For Depth and Water Quality Assessment of El Guajaro Reservoir, Colombia
8 pages
6th Paper
No ratings yet
6th Paper
18 pages
Water 15 02572 v2
No ratings yet
Water 15 02572 v2
24 pages
Frsen 01 623678
No ratings yet
Frsen 01 623678
17 pages
WESC short paper
No ratings yet
WESC short paper
7 pages
WSN Based Intelligent Water Quality Monitoring System For Aquatic Life Preservation in Volcanic Taal Lake
No ratings yet
WSN Based Intelligent Water Quality Monitoring System For Aquatic Life Preservation in Volcanic Taal Lake
10 pages
Underwater Communication Technologies: A Simple Guide to Big Ideas
From Everand
Underwater Communication Technologies: A Simple Guide to Big Ideas
NOVA MARTIAN
No ratings yet
Impounded Water Bodies Modelling and Simulation
From Everand
Impounded Water Bodies Modelling and Simulation
Salisu Dan’azumi
No ratings yet
Praveen Ai
No ratings yet
Praveen Ai
6 pages
Slide - 8 - 04 - Minimum Mean Square Estimation
No ratings yet
Slide - 8 - 04 - Minimum Mean Square Estimation
33 pages
Why Convexity Is The Key To Optimization: Convex Sets
No ratings yet
Why Convexity Is The Key To Optimization: Convex Sets
4 pages
MC Anova
No ratings yet
MC Anova
8 pages
Email Classification: Roll No-41463 (LP-3)
No ratings yet
Email Classification: Roll No-41463 (LP-3)
5 pages
TD DFT Benchmark For UV Vis Spectra of Coumarin Derivatives
No ratings yet
TD DFT Benchmark For UV Vis Spectra of Coumarin Derivatives
8 pages
Lecture 21
No ratings yet
Lecture 21
4 pages
lecture2-supervised-learning slides
No ratings yet
lecture2-supervised-learning slides
56 pages
5 2021 Ekong
No ratings yet
5 2021 Ekong
22 pages
Fish Farming
No ratings yet
Fish Farming
11 pages
DSC140A Super Home Work
No ratings yet
DSC140A Super Home Work
5 pages
Predicting Stock Market Time-Series Data Using CNN-LSTM Neural Network Model
No ratings yet
Predicting Stock Market Time-Series Data Using CNN-LSTM Neural Network Model
8 pages
Mountain Flood Level Forecasting in Small Watersheds Based on Recurrent Neural Networks and Multi-Dimensional Data
No ratings yet
Mountain Flood Level Forecasting in Small Watersheds Based on Recurrent Neural Networks and Multi-Dimensional Data
14 pages
2012 - Piyush Et Al - Maximum and Minimum Temperature Prediction Over Western
No ratings yet
2012 - Piyush Et Al - Maximum and Minimum Temperature Prediction Over Western
8 pages
3570 6517 2 PB
No ratings yet
3570 6517 2 PB
5 pages
Bucket Computer
No ratings yet
Bucket Computer
10 pages
Chapter-4 Mobile Radio Propagation Large-Scale Path Loss
100% (2)
Chapter-4 Mobile Radio Propagation Large-Scale Path Loss
106 pages
project proposal chi
No ratings yet
project proposal chi
6 pages
1-S2.0-S0378377421006557-Main Vishwakarma
No ratings yet
1-S2.0-S0378377421006557-Main Vishwakarma
22 pages
Comparison of Statistical and Machine Learning Methods for Daily SKU Demand Forecasting
No ratings yet
Comparison of Statistical and Machine Learning Methods for Daily SKU Demand Forecasting
25 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
AIML - Unit 4 Notes
No ratings yet
AIML - Unit 4 Notes
23 pages
FDPI Study Guide March 2020 Exam PDF
No ratings yet
FDPI Study Guide March 2020 Exam PDF
71 pages
Hydrological Response To Climate Change in Baro Basin, Ethiopia, Using Representative Concentration Pathway Scenarios
No ratings yet
Hydrological Response To Climate Change in Baro Basin, Ethiopia, Using Representative Concentration Pathway Scenarios
16 pages
SPE Polymers - 2022 - Tafreshi - Machine Learning Based Model For Predicting The Material Properties of Nanostructured PDF
No ratings yet
SPE Polymers - 2022 - Tafreshi - Machine Learning Based Model For Predicting The Material Properties of Nanostructured PDF
14 pages
Numpy For Data Science
No ratings yet
Numpy For Data Science
94 pages
Group Assignment Final PDF
100% (1)
Group Assignment Final PDF
13 pages
CHAPTER 3 FORECASTING
No ratings yet
CHAPTER 3 FORECASTING
31 pages

CalCOFI Machine Learning Model

Uploaded by

CalCOFI Machine Learning Model

Uploaded by

Developing a Chlorophyll Level Detection Model

Using CalCOFI Bottle Data: Implications for

Rakib Hassan Mohammad Saef Ullah Miah*

Fig. 1. Proposed Methodology of the Study

Fig. 2. Denseplot of before and after removing outliers

Fig. 5. Correlation matrix showing relationships among key variables.

Tree and K-NN, to more complex and powerful models

You might also like