Flink / Scala 实战 - 20.keyBy 后 window 数据倾斜实战

本文深入探讨了Flink中keyBy操作后数据倾斜的实战问题,通过模拟数据倾斜任务,分析倾斜原因,并提出负载均衡的解决方案,包括调整数据源的分布、增加key的范围、添加随机值等策略,以实现窗口数据的均匀分配。文章还提供了本地任务与WebUI监控的实现方法,帮助开发者更好地理解和处理数据倾斜问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

一.引言

二.Flink 本地任务 + 本地 WebUI 监控实现

1.本地 WebUI 监控实现

2.本地模拟数据倾斜任务实现

2.1 数据源 Source

2.2 keyedStream + Window

2.3 WebUI 启动

三.数据倾斜实战

1.构建倾斜数据

2.keyBy 倾斜分析

3.负载均衡思路

4.负载均衡实践

5.更常见的倾斜处理情况

6.思路拓展

四.总结


一.引言

上一篇文章介绍了 DataStream keyBy 生成 KeyedStream 的原理,这篇文章趁热打铁解决一下实战场景下经常遇到的数据倾斜示例,对于原始数据源就不均匀的情况,通过 reblacne 即可解决,本文主要解决 keyBy 后使用 window 的数据倾斜场景并给出更加通用的方法判断自己数据的倾斜程度。

二.Flink 本地任务 + 本地 WebUI 监控实现

为了方便任务观察 window 下 subtask 的处理数据量,我们采用 Flink 本地提交 + URL 的模式,如果在实战中想要查看官方的统计监控,也可以参考 Flink/Scala Metrics 使用与详解,这样具体的数据在 Flink 监控 UI 中就可以看到。

networks:  net:    external: trueservices:  jobmanager1:    restart: always    image: apache/flink:1.16.3    container_name: jobmanager1    hostname: jobmanager1    ports:     - '8081:8081'    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-jobmanager1.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net  jobmanager2:    restart: always    image: apache/flink:1.16.3    container_name: jobmanager2    hostname: jobmanager2    ports:     - '8082:8081'    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-jobmanager2.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net    depenes_on:      - jobmanager1  taskmanager1:    restart: always    image: apache/flink:1.16.3    container_name: taskmanager1    hostname: taskmanager1    command: taskmanager    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-taskmanager1.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net    depenes_on:      - jobmanager1      - jobmanager2  taskmanager2:    restart: always    image: apache/flink:1.16.3    container_name: taskmanager2    hostname: taskmanager2    command: taskmanager    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-taskmanager2.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net    depenes_on:      - jobmanager1      - jobmanager2  taskmanager3:    restart: always    image: apache/flink:1.16.3    container_name: taskmanager3    hostname: taskmanager3    command: taskmanager    volumes:      - /etc/localtime:/etc/localtime      - /home/sumengnan/apache/flink/timezone:/etc/timezone      - /home/sumengnan/apache/flink/conf/flink-conf-taskmanager3.yaml:/opt/flink/conf/flink-conf.yaml      - /home/sumengnan/apache/flink/conf/log4j.properties:/opt/flink/conf/log4j.properties      - /home/sumengnan/apache/flink/conf/logback.xml:/opt/flink/conf/logback.xml      - /home/sumengnan/apache/flink/conf/log4j-console.properties:/opt/flink/conf/log4j-console.properties      - /home/sumengnan/apache/flink/conf/logback-console.xml:/opt/flink/conf/logback-console.xml      - /home/sumengnan/apache/flink/data:/opt/flink/data      - /home/sumengnan/apache/flink/log:/opt/flink/log      - /home/sumengnan/apache/flink/tmp:/opt/flink/tmp    networks:      - net    depenes_on:      - jobmanager1      - jobmanager2
最新发布
04-04
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

BIT_666

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值