《Python实战进阶》No44: 并发框架：concurrent.futures 的使用（下）

最新推荐文章于 2025-06-28 11:08:29 发布

带娃的IT创业者

最新推荐文章于 2025-06-28 11:08:29 发布

阅读量1.5k

点赞数 52

CC 4.0 BY-SA版权

分类专栏： Python实战进阶文章标签： python 开发语言

本文链接：https://blog.csdn.net/yweng18/article/details/147104425

Python实战进阶专栏收录该内容

55 篇文章

订阅专栏

No44: 并发框架：concurrent.futures 的使用（下）

摘要

concurrent.futures 是 Python 标准库中简化并发编程的核心工具，通过 线程池（ThreadPoolExecutor） 和 进程池（ProcessPoolExecutor） 实现高效的并发任务调度。本篇结合实战案例，在上集中，我们结合 Future 对象和回调机制，通过两个实战案例”API 请求并发处理”，“矩阵运算并行化“展示如何通过上下文管理器优雅地管理资源。在本文，我们将巩固所学知识，演示如何通过 submit()、map() 等方法并行化处理 I/O 密集型（如文件下载）和 CPU 密集型任务（如图像增强），并探讨其在 AI 大模型场景中的应用。

在这里插入图片描述

核心概念与知识点

1. 线程池与进程池

线程池（ThreadPoolExecutor）：适合 I/O 密集型任务（如文件读写、网络请求），通过多线程规避 I/O 等待时间。
进程池（ProcessPoolExecutor）：适合 CPU 密集型任务（如图像处理、数值计算），利用多核 CPU 并行加速，绕过 GIL 限制。

2. 任务提交与结果获取

submit()：提交单个任务，返回 Future 对象，支持异步获取结果。
map()：批量提交任务，按输入顺序返回结果，适合数据并行场景。
as_completed()：按任务完成顺序处理结果，适用于需要实时响应的场景。

3. 错误处理与超时控制

通过 try-except 捕获任务异常。
使用 timeout 参数防止任务阻塞。

AI 大模型相关性

并行化超参数搜索：在模型训练中，用进程池并行运行不同超参数组合的网格搜索，加速调参过程。
特征工程加速：在分布式训练前，用线程池并行处理大规模数据集的特征提取与增强。

实战案例

案例1：线程池并行下载模型权重文件

场景：从多个 URL 下载模型权重文件，利用多线程加速 I/O 操作。

import concurrent.futures
import requests

def download_file(url, filename):
    """下载文件并保存到本地"""
    response = requests.get(url)
    with open(filename, "wb") as f:
        f.write(response.content)
    return f"Downloaded {filename}"

urls = [
    ("https://example.com/model1.pth", "model1.pth"),
    ("https://example.com/model2.pth", "model2.pth"),
]

# 使用线程池并行下载
with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(download_file, url, filename) for url, filename in urls]
    for future in concurrent.futures.as_completed(futures):
        try:
            print(future.result())
        except Exception as e:
            print(f"Error: {e}")

输出：

Downloaded model1.pth
Downloaded model2.pth

解释：线程池通过 as_completed() 实时返回下载结果，适合处理独立且耗时的 I/O 任务。

案例2：进程池并行处理图像增强

场景：对一批图像进行旋转、翻转等增强操作，利用多进程加速 CPU 计算。

from PIL import Image
import concurrent.futures
import os

def augment_image(image_path):
    """对图像进行旋转和翻转"""
    img = Image.open(image_path)
    # 模拟增强操作
    img = img.rotate(45)
    img = img.transpose(Image.FLIP_LEFT_RIGHT)
    output_path = f"augmented_{os.path.basename(image_path)}"
    img.save(output_path)
    return output_path

image_paths = ["img1.jpg", "img2.jpg", "img3.jpg"]

# 使用进程池并行处理
with concurrent.futures.ProcessPoolExecutor() as executor:
    results = executor.map(augment_image, image_paths)
    for result in results:
        print(f"Processed: {result}")

输出：

Processed: augmented_img1.jpg
Processed: augmented_img2.jpg
Processed: augmented_img3.jpg

解释：进程池通过 map() 按顺序返回结果，适合批量处理 CPU 密集型任务。

总结与扩展思考

1. 线程池 vs 进程池的选择标准

线程池：I/O 密集型任务（如网络请求、文件读写）。
进程池：CPU 密集型任务（如图像处理、科学计算）。

2. 结合 `joblib` 扩展分布式计算

对于更大规模的任务，可使用 joblib 的 Parallel 和 delayed 接口，结合 concurrent.futures 实现分布式计算：

from joblib import Parallel, delayed

# 分布式处理示例
results = Parallel(n_jobs=4, backend="threading")(
    delayed(process_data)(data) for data in dataset
)