【算法工程】基于FastAPI实现并发异步任务处理系统

源泉的小广场

已于 2025-02-12 14:44:56 修改

阅读量1.5k

点赞数 11

CC 4.0 BY-SA版权

分类专栏：算法业务及数据结构文章标签： fastapi 并发异步进程池 asyncio ProcessPool

于 2025-01-16 13:04:49 首次发布

本文链接：https://blog.csdn.net/weixin_65514978/article/details/145179874

算法业务及数据结构专栏收录该内容

9 篇文章

订阅专栏

1. 背景介绍

近期在FastAPI应用中遇到一个并发问题，采用asyncio的create_task方式，再结合to_thread的方式实现异步化，并发能力很弱，而且当遇到实现方法中存在线程不安全的问题，并发就基本不能使用。因此需要采取其他方案进行解决。

2. 解决思路

我们将基于 FastAPI 和 ProcessPoolExecutor 来实现异步任务处理系统，用来执行计算密集型的任务。

进程池并行处理：使用 concurrent.futures.ProcessPoolExecutor 来启动一个进程池，处理计算密集型任务。之所以在这里采用进程池，是为了确保每个任务运行在独立的 Python 进程中，一方面可以解决遇到的共享参数线程不安全问题，另一方面可以绕过 Python 的 GIL，充分利用多核 CPU。在 Python 中，多个线程共享参数时，可能导致线程不安全的情况，尤其是在并发访问或修改共享数据时。例如，如果多个线程同时修改共享参数，可能会引发数据竞争，导致结果不可预测。

异步接口与后台任务： FastAPI 的异步化通过 async def 定义非阻塞的 HTTP 处理逻辑。接口的核心是 BackgroundTasks，目的是将耗时任务交由后台异步执行，避免阻塞 HTTP 请求响应，当平台发起计算任务后，可以先快速回复信息，避免客户端长时间等待，然后实际的任务在后台进行处理。

任务状态管理：使用一个全局字典 jobs 存储任务的执行状态和结果，任务通过唯一 ID (UUID) 进行标识，前端可以通过任务 ID 检索状态，并且可以通过该uuid信息，关闭pid进程，杀死不需要的计算任务。

3. 核心功能与实现

3.1 进程池初始化与关闭

进程池的生命周期与 FastAPI 应用绑定：

在应用启动时，通过 @app.on_event("startup") 创建进程池。
在应用关闭时，通过 @app.on_event("shutdown") 关闭进程池，释放资源。

@app.on_event("startup") 
async def startup_event(): 
    app.state.executor = ProcessPoolExecutor() 
    logger.info("ProcessPoolExecutor initialized.") 

@app.on_event("shutdown") 
async def on_shutdown(): 
    app.state.executor.shutdown() 
    logger.info("ProcessPoolExecutor shutdown.")

3.2 在进程池中异步运行任务

通过 asyncio.get_event_loop().run_in_executor 实现异步运行：

核心是 run_in_process 方法，它将指定的计算函数compute_something以及参数封装后交由进程池执行。
使用 functools.partial 将函数和参数组合成可序列化的对象，便于传递给进程池。

async def run_in_process(fn, *args, **kwargs): 
    loop = asyncio.get_event_loop() 
    return await loop.run_in_executor(app.state.executor, 
                                      partial(fn, *args, **kwargs))

实际运行一段时间，发现存在一个问题，超时后不能直接kill任务，因此需要调整为子进程的模式，如下所示：

async def run_in_process(fn, *args, **kwargs):
    """在进程池中异步运行函数，并添加超时控制，确保超时后终止任务"""
    timeout = int(settings.SERVICE["time_out"])
    loop = asyncio.get_event_loop()

    queue = multiprocessing.Queue()
    process = multiprocessing.Process(target=process_wrapper, args=(fn, queue, *args), kwargs=kwargs)
    process.start()

    try:
        return await asyncio.wait_for(loop.run_in_executor(None, queue.get), timeout=timeout)
    except asyncio.TimeoutError:
        logger.error(f"Task execution exceeded timeout of {timeout} seconds. Terminating process...")
        process.terminate()  # 强制终止进程
        process.join()  # 确保进程资源被释放
        raise TimeoutError(f"Task execution exceeded timeout of {timeout} seconds.")
    finally:
        if process.is_alive():
            process.terminate()
            process.join()

3.3 后台任务处理

后台任务使用 BackgroundTasks，由 FastAPI 提供支持：

deep_parse_async 接口通过 BackgroundTasks.add_task 启动任务，并将 start_compute_something_task 作为后台任务的执行函数。
每个任务在后台独立运行，不阻塞 HTTP 响应。

@app.post("/compute_something/async", tags=["compute"], summary="异步接口") 
async def compute_something_async(
        background_tasks: BackgroundTasks,
        example_parameter: str = Form(description="示例参数", 
                                      default="xxx")): 
        uid = uuid4() 
        jobs[uid] = {"status": "in_progress", "result": None}         

        background_tasks.add_task( 
                start_compute_somthing_task, 
                uid, 
                example_parameter = example_parameter)
        return Response(code=200, status="success", message="job created")

3.4 任务执行与状态更新

后台任务的执行逻辑由 start_compute_something_task 实现：

任务开始时，状态标记为 "in_progress"。
调用 run_in_process 将任务提交到进程池执行。
任务完成后，更新状态为 "complete"，并保存结果；如果失败，更新状态为 "failed" 并记录错误信息。

async def start_compute_something_task(uid: UUID, **kwargs): 
    try: 
        result = await run_in_process(example_api.real_compute_something, **kwargs)
        jobs[uid]["result"] = result 
        jobs[uid]["status"] = "complete" 
    except Exception as ex: 
        jobs[uid]["status"] = "failed" 
        jobs[uid]["result"] = {"error": str(ex)}

3.5 最大任务数及超时时间功能

同时也支持配置最大任务数：

1@app.on_event("startup") 
async def startup_event(): 
    app.state.executor = ProcessPoolExecutor(max_workers=MAX_JOB_NUMBER)

支持超时时间设置：

async def run_in_process(fn, *args, timeout: int = 600, **kwargs): 
    loop = asyncio.get_event_loop() 
    future = loop.run_in_executor(app.state.executor, partial(fn, *args, **kwargs)) 
    try: 
        return await asyncio.wait_for(future, timeout=timeout) 
    except TimeoutError: 
        logger.error(f"Task timed out after {timeout} seconds.") 
        raise Exception(f"Task execution exceeded timeout of {timeout} seconds.")