预测月份温度机器学习模型_使用机器学习模型预测天气温度变化

该博客探讨了如何运用机器学习模型来预测天气温度的变化,详细介绍了预测过程,并结合python实现,对理解气候模式及应用人工智能进行天气预报具有指导意义。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

预测月份温度机器学习模型

A Practical Machine Learning Workflow Example

实用的机器学习工作流程示例

问题介绍 (Problem Introduction)

The problem we will tackle is predicting the average global land and ocean temperature using over 100 years of past weather data. We are going to act as if we don’t have access to any weather forecasts. What we do have access to is a century’s worth of historical global temperatures averages including; global maximum temperatures, global minimum temperatures, and global land and ocean temperatures. Having all of this, we know that this is a supervised, regression machine learning problem

我们将要解决的问题是使用100多年的过去天气数据来预测全球平均陆地和海洋温度。 我们将采取行动,好像我们无法获得任何天气预报一样。 我们所能获得的是一个世纪以来全球历史平均温度值,包括: 全球最高温度,全球最低温度以及全球陆地和海洋温度。 有了所有这些,我们知道这是一个有监督的回归机器学习问题

It’s supervised because we have both the features and the target that we want to predict, also our target makes this a regression task because it is continuous. During training, we will give multiple regression models both the features and targets and it must learn how to map the data to a prediction. Moreover, this is a regression task because the target value is continuous (as opposed to discrete classes in classification).

之所以受到监督,是因为我们既具有要预测的特征和目标,又因为它是连续的,所以我们的目标使它成为回归任务。 在训练期间,我们将提供特征和目标的多个回归模型,并且它必须学习如何将数据映射到预测。 此外,这是一项回归任务,因为目标值是连续的(与分类中的离散类相对)。

That’s pretty much all the background we need, so let’s start!

这几乎是我们需要的所有背景,所以让我们开始吧!

ML工作流程 (ML Workflow)

Before we jump right into programming, we should outline exactly what we want to do. The following steps are the basis of my machine learning workflow now that we have our problem and model in mind:

在开始进行编程之前,我们应该准确概述我们想做的事情。 考虑到我们的问题和模型,以下步骤是我的机器学习工作流程的基础:

  1. State the question and determine the required data (completed)

    陈述问题并确定所需数据(已完成)
  2. Acquire the data

    采集数据
  3. Identify and correct missing data points/anomalies

    识别并纠正丢失的数据点/异常
  4. Prepare the data for the machine learning model by cleaning/wrangling

    通过清理/整理为机器学习模型准备数据
  5. Establish a baseline model

    建立基准模型
  6. Train the model on the training data

    根据训练数据训练模型
  7. Make predictions on the test data

    对测试数据做出预测
  8. Compare predictions to the known test set targets and calculate performance metrics

    将预测与已知测试集目标进行比较,并计算性能指标
  9. If performance is not satisfactory, adjust the model, acquire more data, or try a different modeling technique

    如果性能不令人满意,请调整模型,获取更多数据或尝试其他建模技术
  10. Interpret model and report results visually and numerically

    可视化和数字化解释模型并报告结果

数据采集 (Data Acquisition)

First, we need some data. To use a realistic example, I retrieved temperature data from the Berkeley Earth Climate Change: Earth Surface Temperature Dataset found on Kaggle.com. Being that this dataset was created from one of the most prestigious research universities in the world, we will assume data in the dataset is truthful.

首先,我们需要一些数据。 举一个实际的例子,我从Kaggle.com上的“伯克利地球气候变化:地球表面温度数据集”中检索了温度数据。 由于该数据集是由世界上最负盛名的研究型大学之一创建的,因此我们将假定数据集中的数据是真实的。

Dataset link:https://www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data

数据集链接: https : //www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data

After importing some important libraries and modules, the code below loads in the CSV data which I store into a variable we can use later:

导入一些重要的库和模块后,下面的代码将CSV数据加载到我存储的变量中,以备后用:

Image for post

Following are explanations of each column:

以下是各列的说明:

dt: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures

dt:平均陆地温度从1750年开始,最高和最低陆地温度以及全球海洋和陆地温度从1850年开始

LandAverageTemperature: global average land temperature in celsius

LandAverageTemperature:摄氏全球平均气温

LandAverageTemperatureUncertainty: the 95% confidence interval around the average

LandAverageTemperatureUncertainty:围绕平均值的95%置信区间

LandMaxTemperature: global average maximum land temperature in celsius

LandMaxTemperature:全球平均最高气温,以摄氏度为单位

LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature

LandMaxTemperatureUncertainty:最高陆地温度附近的95%置信区间

LandMinTemperature: global average minimum land temperature in celsius

LandMinTemperature:摄氏全球平均最低气温

LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature

LandMinTemperatureUncertainty:最低地面温度附近的95%置信区间

LandAndOceanAverageTemperature: global average land and ocean temperature in celsius

LandAndOceanAverageTemperature:全球平均陆地和海洋温度以摄氏

LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

陆地和海洋平均温度不确定性:全球平均陆地和海洋温度的95%置信区间

识别异常/丢失数据 (Identify Anomalies/ Missing Data)

Looking through the data (shown above) from Berkeley Earth, I noticed several missing data points, which is a great reminder that data collected in the real-world will never be perfect. Missing data can impact analysis immensely, as can incorrect data or outliers.

通过查看来自伯克利地球的数据(如上所示),我注意到了一些缺失的数据点,这很提醒我们,在现实世界中收集的数据永远不会是完美的。 数据丢失或不正确的数据或异常值都会极大地影响分析。

To identify anomalies, we can quickly find missing using the info() method on our DataFrame.

为了识别异常,我们可以使用DataFrame上的info()方法快速找到缺失的内容。

Image for post

Also, we can use the “.isnull()” and “.sum()” methods directly on our dataframe to find the total amount of missing values in each column.

另外,我们可以直接在数据帧上使用“ .isnull()”和“ .sum()”方法来查找每一列中缺失值的总数。

Image for post

资料准备 (Data Preparation

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值