1. Join 实现有几种呢,源码有研究过吗?底层是怎么实现的
Join类型有七种:①内连接、②笛卡尔积连接、③左连接、④右连接、⑤全连接、⑥左半连接(in)、⑦左反连接(not in)
在 ANSI SQL标准中,共有5种Join方式:内连接 (Inner)、全外连接(FullOuter)、左外连接(LeftOuter)、 右外连接( RightOuter)和交叉连接( Cross)。
Join实现类型有五种:①广播 Hash、②Shuffle Hash、③Shuffle Sort Merge、④笛卡尔积、⑤广播嵌套循环
在 Spark 的物理计划(physical plan)阶段,Spark 的 JoinSelection 类会根据 Join hints 策略、Join 表的大小、 Join 是等值 Join(equi-join) 还是不等值(non-equi-joins)以及参与 Join 的 key 是否可以排序等条件来选择最终的 Join 策略(join strategies),最后 Spark 会利用选择好的 Join 策略执行最终的计算。当前 Spark(Apache Spark 3.0)一共支持五种 Join 策略:
Join分类
- Broadcast hash join (BHJ)
- Shuffle hash join(SHJ)
- Shuffle sort merge join (SMJ)
- Shuffle-and-replicate nested loop join,又称笛卡尔积(Cartesian product join)
- Broadcast nested loop join (BNLJ)
1.Broadcast Hash Join (BHJ)
BHJ 又称 map-side-only join,即map端join,适合小表join大表,将小表广播到所有Executor中进行本地join,避免Shuffle。
使用这个 Join 策略必须满足以下条件:
- 小表的数据必须很小,可以通过 spark.sql.autoBroadcastJoinThreshold 参数来配置,默认是 10MB,如果你的内存比较大,可以将这个阈值适当加大;如果将 spark.sql.autoBroadcastJoinThreshold 参数设置为 -1,可以关闭 BHJ;
- 只能用于等值 Join,不要求参与 Join 的 keys 可排序;
- 除了 full outer joins ,支持所有的 Join 类型。
2.Shuffle hash join(SHJ)
当表的数据比spark.sql.autoBroadcastJoinThreshold大,不适合使用广播,这个时候就可以考虑使用 Shuffle hash join。
Shuffle hash join 同样是在大表和小表进行 Join 的时候选择的一种策略,它的计算思想是:把大表和小表按照相同的分区算法和分区数进行分区(根据参与 Join 的 keys 进行分区),这样就保证了 hash 值一样的数据都分发到同一个分区中,然后在同一个 Executor 中两张表 hash 值一样的分区就可以在本地进行 hash Join 了。在进行 Join 之前,还会对小表 hash 完的分区构建 hash map。Shuffle hash join 利用了分治思想,把大问题拆解成小问题去解决。
要启用 Shuffle Hash Join 必须满足以下几个条件:
- 仅支持等值 Join,不要求参与 Join 的 Keys 可排序;
- 支持所有类型的join,除了full outer join
- spark.sql.join.preferSortMergeJoin 参数必须设置为 false,参数是从 Spark 2.0.0 版本引入的,默认值为 true,也就是默认情况下选择 Sort Merge Join;
- 小表的大小(plan.stats.sizeInBytes)必须小于 spark.sql.autoBroadcastJoinThreshold * spark.sql.shuffle.partitions;而且小表大小(stats.sizeInBytes)的三倍必须小于等于大表的大小(stats.sizeInBytes),也就是 a.stats.sizeInBytes * 3 < = b.stats.sizeInBytes
3.Shuffle sort merge join (SMJ)
前面两种 Join 策略对表的大小都有条件的,如果参与 Join 的表都很大,这时候就得考虑用 Shuffle Sort Merge Join 了。
Shuffle Sort Merge Join 的实现思想:也是对两张表参与 Join 的 Keys 使用相同的分区算法和分区数进行分区,目的就是保证相同的 Keys 都落到相同的分区里面。分区完之后再对每个分区按照参与 Join 的 Keys 进行排序,最后 Reduce 端获取两张表相同分区的数据进行 Merge Join,也就是 Keys 相同说明 Join 上了。
Shuffle Sort Merge Join 并不是一定就使用的,也需要满足以下条件:
- 仅支持等值 Join,并且要求参与 Join 的 Keys 可排序;
- 支持所有的Join类型
4.Cartesian product join
两张参与 Join 的表没指定 where 条件(ON 条件)那么会产生 Cartesian product join,这个 Join 得到的结果其实就是两张行数的乘积。
Broadcast Nested Join将一个输入数据集广播到每个executor上,然后在各个executor上,另一个数据集的分区会和第一个数据集使用嵌套循环的方式进行Join输出结果。
Broadcast Nested Join需要广播数据集和嵌套循环,计算效率极低,对内存的需求也极大,因为不论数据集大小,都会有一个数据集被广播到所有executor上。
适用条件:
- 必须是 inner Join;
- 支持等值和不等值 Join。
5.Broadcast nested loop join (BNLJ)
就是不加任何特效,最最普通的join实现方式,先遍历一边,然后每一行判断时遍历另一边,非常非常慢。
适用条件:
- 支持等值和非等值join;
- 支持所有的Join类型,一些优化:① right outer join时广播左表;② left outer join、left semi join、left anti join、existence join时广播右表;③ inner join时广播其中的任一一张表。其他情况下会多次扫描数据,会很慢。
选择逻辑
如果是等值Join,会先看join hints,顺序如下:
- brocast hint:会判断是否是支持的join类型,两边都hint会选择较小的一张广播
- sort merge hint:需要join key可排序
- shuffle hash hint:会判断是否是支持的join类型,两边都hint会选择较小的一张build hash
- shuffle replicate NL hint:即Cartesian product join,inner join时有效
如果没有hint或者hint不适合,会按如下规则顺序执行:
- 判断是否适合Broadcast hash join,判断是否是支持的join类型,两边都hint会选择较小的一张广播;
- 判断是否适合Shuffle hash join,小表必须比大表小很多,同时spark.sql.join.preferSortMergeJoin设置的是false,然后根据小表构建本地hash map;
- 判断是否适合Shuffle sort merge join,要求Join key必须可排序;
- 判断是否适合Cartesian product join,要求Join 类型必须是inner join;
- Broadcast nested loop join作为最终方案,有可能OOM但别无选择。
3.0.0源码
/**
* Select the proper physical plan for join based on join strategy hints, the availability of
* equi-join keys and the sizes of joining relations. Below are the existing join strategies,
* their characteristics and their limitations.
*
* - Broadcast hash join (BHJ):
* Only supported for equi-joins, while the join keys do not need to be sortable.
* Supported for all join types except full outer joins.
* BHJ usually performs faster than the other join algorithms when the broadcast side is
* small. However, broadcasting tables is a network-intensive operation and it could cause
* OOM or perform badly in some cases, especially when the build/broadcast side is big.
*
* - Shuffle hash join:
* Only supported for equi-joins, while the join keys do not need to be sortable.
* Supported for all join types except full outer joins.
*
* - Shuffle sort merge join (SMJ):
* Only supported for equi-joins and the join keys have to be sortable.
* Supported for all join types.
*
* - Broadcast nested loop join (BNLJ):
* Supports both equi-joins and non-equi-joins.
* Supports all the join types, but the implementation is optimized for:
* 1) broadcasting the left side in a right outer join;
* 2) broadcasting the right side in a left outer, left semi, left anti or existence join;
* 3) broadcasting either side in an inner-like join.
* For other cases, we need to scan the data multiple times, which can be rather slow.
*
* - Shuffle-and-replicate nested loop join (a.k.a. cartesian product join):
* Supports both equi-joins and non-equi-joins.
* Supports only inner like joins.
*/
object JoinSelection extends Strategy with PredicateHelper {
def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
// If it is an equi-join, we first look at the join hints w.r.t. the following order:
// 1. broadcast hint: pick broadcast hash join if the join type is supported. If both sides
// have the broadcast hints, choose the smaller side (based on stats) to broadcast.
// 2. sort merge hint: pick sort merge join if join keys are sortable.
// 3. shuffle hash hint: We pick shuffle hash join if the join type is supported. If both
// sides have the shuffle hash hints, choose the smaller side (based on stats) as the
// build side.
// 4. shuffle replicate NL hint: pick cartesian product if join type is inner like.
//
// If there is no hint or the hints are not applicable, we follow these rules one by one:
// 1. Pick broadcast hash join if one side is small enough to broadcast, and the join type
// is supported. If both sides are small, choose the smaller side (based on stats)
// to broadcast.
// 2. Pick shuffle hash join if one side is small enough to build local hash map, and is
// much smaller than the other side, and `spark.sql.join.preferSortMergeJoin` is false.
// 3. Pick sort merge join if the join keys are sortable.
// 4. Pick cartesian product if join type is inner like.
// 5. Pick broadcast nested loop join as the final solution. It may OOM but we don't have
// other choice.
case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right, hint) =>
def createBroadcastHashJoin(buildLeft: Boolean, buildRight: Boolean) = {
val wantToBuildLeft = canBuildLeft(joinType) && buildLeft
val wantToBuildRight = canBuildRight(joinType) && buildRight
getBuildSide(wantToBuildLeft, wantToBuildRight, left, right).map { buildSide =>
Seq(joins.BroadcastHashJoinExec(
leftKeys,
rightKeys,
joinType,
buildSide,
condition,
planLater(left),
planLater(right)))
}
}
def createShuffleHashJoin(buildLeft: Boolean, buildRight: Boolean) = {
val wantToBuildLeft = canBuildLeft(joinType) && buildLeft
val wantToBuildRight = canBuildRight(joinType) && buildRight
getBuildSide(wantToBuildLeft, wantToBuildRight, left, right).map { buildSide =>
Seq(joins.ShuffledHashJoinExec(
leftKeys,
rightKeys,
joinType,
buildSide,
condition,
planLater(left),
planLater(right)))
}
}
def createSortMergeJoin() = {
if (RowOrdering.isOrderable(leftKeys)) {
Some(Seq(joins.SortMergeJoinExec(
leftKeys, rightKeys, joinType, condition, planLater(left), planLater(right))))
} else {
None
}
}
def createCartesianProduct() = {
if (joinType.isInstanceOf[InnerLike]) {
Some(Seq(joins.CartesianProductExec(planLater(left), planLater(right), condition)))
} else {
None
}
}
def createJoinWithoutHint() = {
createBroadcastHashJoin(
canBroadcast(left) && !hint.leftHint.exists(_.strategy.contains(NO_BROADCAST_HASH)),
canBroadcast(right) && !hint.rightHint.exists(_.strategy.contains(NO_BROADCAST_HASH)))
.orElse {
if (!conf.preferSortMergeJoin) {
createShuffleHashJoin(
canBuildLocalHashMap(left) && muchSmaller(left, right),
canBuildLocalHashMap(right) && muchSmaller(right, left))
} else {
None
}
}
.orElse(createSortMergeJoin())
.orElse(createCartesianProduct())
.getOrElse {
// This join could be very slow or OOM
val buildSide = getSmallerSide(left, right)
Seq(joins.BroadcastNestedLoopJoinExec(
planLater(left), planLater(right), buildSide, joinType, condition))
}
}
createBroadcastHashJoin(hintToBroadcastLeft(hint), hintToBroadcastRight(hint))
.orElse { if (hintToSortMergeJoin(hint)) createSortMergeJoin() else None }
.orElse(createShuffleHashJoin(hintToShuffleHashLeft(hint), hintToShuffleHashRight(hint)))
.orElse { if (hintToShuffleReplicateNL(hint)) createCartesianProduct() else None }
.getOrElse(createJoinWithoutHint())
// If it is not an equi-join, we first look at the join hints w.r.t. the following order:
// 1. broadcast hint: pick broadcast nested loop join. If both sides have the broadcast
// hints, choose the smaller side (based on stats) to broadcast for inner and full joins,
// choose the left side for right join, and choose right side for left join.
// 2. shuffle replicate NL hint: pick cartesian product if join type is inner like.
//
// If there is no hint or the hints are not applicable, we follow these rules one by one:
// 1. Pick broadcast nested loop join if one side is small enough to broadcast. If only left
// side is broadcast-able and it's left join, or only right side is broadcast-able and
// it's right join, we skip this rule. If both sides are small, broadcasts the smaller
// side for inner and full joins, broadcasts the left side for right join, and broadcasts
// right side for left join.
// 2. Pick cartesian product if join type is inner like.
// 3. Pick broadcast nested loop join as the final solution. It may OOM but we don't have
// other choice. It broadcasts the smaller side for inner and full joins, broadcasts the
// left side for right join, and broadcasts right side for left join.
case logical.Join(left, right, joinType, condition, hint) =>
val desiredBuildSide = if (joinType.isInstanceOf[InnerLike] || joinType == FullOuter) {
getSmallerSide(left, right)
} else {
// For perf reasons, `BroadcastNestedLoopJoinExec` prefers to broadcast left side if
// it's a right join, and broadcast right side if it's a left join.
// TODO: revisit it. If left side is much smaller than the right side, it may be better
// to broadcast the left side even if it's a left join.
if (canBuildLeft(joinType)) BuildLeft else BuildRight
}
def createBroadcastNLJoin(buildLeft: Boolean, buildRight: Boolean) =