【零基础学AI】第23讲：神经网络原理 - 前向传播与反向传播-CSDN博客

本文链接：https://blog.csdn.net/wiyi9891/article/details/149110652

在这里插入图片描述

本节课你将学到

理解神经网络的基本工作原理
掌握前向传播的计算过程
理解反向传播的数学原理
实现一个简单的手写数字识别网络

开始之前

环境要求

Python 3.8+
需要安装的包：
- numpy==1.21.0
- matplotlib==3.4.0
- tensorflow==2.8.0

前置知识

基本Python编程能力（第1-8讲）
矩阵运算基础（第4讲NumPy）
机器学习基础概念（第9讲）

核心概念

什么是神经网络？

想象你是一个刚出生的婴儿，学习认识苹果的过程：

第一次看到苹果：妈妈告诉你这是"苹果"（输入数据+标签）
多次观察：你注意到苹果是圆的、红色的（提取特征）
犯错与纠正：把西红柿当成苹果时被纠正（误差反馈）
最终掌握：能准确识别各种苹果（模型收敛）

神经网络就是这样通过大量"观察"和"纠错"来学习的智能系统。

神经网络三要素

神经元（Neuron）：计算的基本单元
- 像生物神经元一样接收输入，产生输出
- 数学表示：输出 = 激活函数(权重·输入 + 偏置)
层（Layer）：神经元的集合
- 输入层：接收原始数据
- 隐藏层：进行特征变换（通常有多层）
- 输出层：产生最终预测
连接权重（Weights）：决定信息传递强度
- 训练过程就是不断调整这些权重

前向传播（Forward Propagation）

就像工厂的生产流水线：

原材料（输入数据）进入第一道工序（输入层）
经过多道加工（隐藏层变换）
最终得到成品（输出预测）

数学过程：

层1输出 = 激活函数(权重1·输入 + 偏置1)
层2输出 = 激活函数(权重2·层1输出 + 偏置2)
...
最终输出 = 激活函数(权重N·层N-1输出 + 偏置N)

反向传播（Backpropagation）

当产品不合格时：

检查最终成品与标准的差距（计算损失函数）
逆向追溯每道工序的责任（计算各层梯度）
调整每道工序的参数（更新权重和偏置）

关键点：

使用链式法则计算梯度
从输出层向输入层逐层传播误差
梯度下降法更新参数

激活函数的作用

为什么需要激活函数？没有它会发生什么？

不加激活函数：多层网络等价于单层线性变换
常用激活函数：
- Sigmoid：将输出压缩到(0,1)
- ReLU：简单高效，解决梯度消失
- Softmax：多分类输出概率分布

代码实战

1. 准备数据 - MNIST手写数字数据集

import tensorflow as tf
import matplotlib.pyplot as plt

# 加载数据
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 查看数据形状
print("训练集形状:", x_train.shape)  # (60000, 28, 28)
print("标签形状:", y_train.shape)    # (60000,)

# 数据预处理
x_train = x_train / 255.0  # 归一化到[0,1]
x_test = x_test / 255.0

# 可视化一个样本
plt.figure(figsize=(5,5))
plt.imshow(x_train[0], cmap='gray')
plt.title(f"标签: {y_train[0]}")
plt.axis('off')
plt.show()

2. 构建神经网络模型

model = tf.keras.Sequential([
    # 将28x28图像展平为784维向量
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    
    # 全连接层（128个神经元，ReLU激活）
    tf.keras.layers.Dense(128, activation='relu'),
    
    # Dropout层（防止过拟合）
    tf.keras.layers.Dropout(0.2),
    
    # 输出层（10个神经元对应0-9，Softmax激活）
    tf.keras.layers.Dense(10, activation='softmax')
])

# 编译模型
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# 查看模型结构
model.summary()

3. 训练模型（前向+反向传播）

# 训练模型（实际发生前向传播和反向传播）
history = model.fit(x_train, y_train, 
                    epochs=5,  # 训练轮数
                    validation_data=(x_test, y_test))

# 绘制训练曲线
plt.plot(history.history['accuracy'], label='训练准确率')
plt.plot(history.history['val_accuracy'], label='验证准确率')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

4. 手动实现前向传播（理解原理）

import numpy as np

# 模拟一个极简神经网络
def simple_network(inputs):
    # 第一层权重（3个神经元）
    weights1 = np.array([[0.2, 0.8, -0.5],
                        [0.1, -0.3, 0.9]])
    bias1 = np.array([0.1, 0.2, 0.3])
    
    # 第二层权重（1个神经元）
    weights2 = np.array([[0.5], [-1.2], [0.7]])
    bias2 = np.array([0.4])
    
    # 前向传播
    layer1 = np.maximum(0, np.dot(inputs, weights1) + bias1)  # ReLU激活
    output = 1 / (1 + np.exp(-(np.dot(layer1, weights2) + bias2)))  # Sigmoid激活
    
    return output

# 测试输入
test_input = np.array([0.5, 0.8])
print("网络输出:", simple_network(test_input))

5. 手动实现反向传播（理解原理）

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

# 模拟训练过程
def train_one_step(inputs, target, weights, learning_rate=0.1):
    # 前向传播
    net = np.dot(inputs, weights)
    output = sigmoid(net)
    
    # 计算误差
    error = target - output
    
    # 反向传播（链式法则）
    delta = error * sigmoid_derivative(net)
    
    # 更新权重
    weights += learning_rate * np.dot(inputs.reshape(-1,1), delta.reshape(1,-1))
    
    return weights, output

# 初始化权重
weights = np.array([0.5, -0.3])

# 训练数据
inputs = np.array([0.8, 0.2])
target = 1.0

# 训练过程可视化
print("初始权重:", weights)
for i in range(5):
    weights, output = train_one_step(inputs, target, weights)
    print(f"第{i+1}次训练 - 输出:{output:.4f}, 权重:{weights}")

完整项目

项目结构

lesson_23_neural_networks/
├── README.md
├── requirements.txt
├── neural_network.py       # 主程序文件
├── manual_implementation.py # 手动实现代码
├── data/                   # 存放数据
└── output/                 # 输出结果
    ├── training_curve.png
    └── sample_prediction.png

requirements.txt

numpy==1.21.0
matplotlib==3.4.0
tensorflow==2.8.0

neural_network.py

import tensorflow as tf
import matplotlib.pyplot as plt

# 加载数据
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# 构建模型
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])

# 编译模型
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# 训练模型
history = model.fit(x_train, y_train, 
                    epochs=5, 
                    validation_data=(x_test, y_test))

# 保存模型
model.save('output/mnist_model.h5')

# 评估模型
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f"\n测试准确率: {test_acc*100:.2f}%")

# 可视化训练过程
plt.plot(history.history['accuracy'], label='训练准确率')
plt.plot(history.history['val_accuracy'], label='验证准确率')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.savefig('output/training_curve.png')
plt.show()

# 预测示例
predictions = model.predict(x_test)
plt.figure(figsize=(10,5))
for i in range(5):
    plt.subplot(1,5,i+1)
    plt.imshow(x_test[i], cmap='gray')
    pred_label = tf.argmax(predictions[i]).numpy()
    plt.title(f"预测:{pred_label}")
    plt.axis('off')
plt.savefig('output/sample_prediction.png')
plt.show()

运行效果

控制台输出

Epoch 1/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2960 - accuracy: 0.9150 - val_loss: 0.1421 - val_accuracy: 0.9573
Epoch 2/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1437 - accuracy: 0.9573 - val_loss: 0.0989 - val_accuracy: 0.9699
Epoch 3/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1076 - accuracy: 0.9674 - val_loss: 0.0850 - val_accuracy: 0.9735
Epoch 4/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0876 - accuracy: 0.9733 - val_loss: 0.0773 - val_accuracy: 0.9758
Epoch 5/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0750 - accuracy: 0.9768 - val_loss: 0.0721 - val_accuracy: 0.9776

测试准确率: 97.76%