详细解释 Calcite RBO 规则优化引擎
最编程
2024-04-03 09:21:50
...
【开源中国 APP 全新上线】“动弹” 回归、集成大模型对话、畅读技术报告”
SQL优化通常包括两个部分RBO(Rule based Optimizer)和CBO(Cost based Optimizer),Calcite 引擎支持以上两种模式的优化,本节讲述Calcite是如何基于RBO进行优化的。
RelOptRule 规则基类
RelOptRule是Calcite规则引擎中的规则基类,它可以将原有的SQL表达式转换成优化后的SQL表达式,优化器能够识别这些规则并调用。
package org.apache.calcite.plan;
public abstract class RelOptRule {
/**
* 规则描述,默认为类的全路径
*/
protected final String description;
/**
* 树表达式的根.
*/
private final RelOptRuleOperand operand;
/**
* 通过工厂构建响应的规则表达式
*/
public final RelBuilderFactory relBuilderFactory;
/**
* 规则表达式列表
*/
public final List<RelOptRuleOperand> operands;
/**
* 构建每个规则的处理顺序,从当前节点开始一直到根节点
*/
private void assignSolveOrder() {
for (RelOptRuleOperand operand : operands) {
operand.solveOrder = new int[operands.size()];
int m = 0;
for (RelOptRuleOperand o = operand; o != null; o = o.getParent()) {
operand.solveOrder[m++] = o.ordinalInRule;
}
for (int k = 0; k < operands.size(); k++) {
boolean exists = false;
for (int n = 0; n < m; n++) {
if (operand.solveOrder[n] == k) {
exists = true;
}
}
if (!exists) {
operand.solveOrder[m++] = k;
}
}
// Assert: operand appears once in the sort-order.
assert m == operands.size();
}
}
/**
* 判断规则是否匹配操作
*/
public boolean matches(RelOptRuleCall call) {
return true;
}
/**
* 接收规则匹配后的通知,通过RelOptRuleCall返回
*/
public abstract void onMatch(RelOptRuleCall call);
...
}
定义一个规则类
org/apache/calcite/rel/rules包下定义了Calcite使用的规则类,只需继承上面的RelOptRule类即可,下面的代码是JOIN关系的优化规则。
public class JoinAssociateRule extends RelOptRule {
//~ Static fields/initializers ---------------------------------------------
/** The singleton. */
public static final JoinAssociateRule INSTANCE =
new JoinAssociateRule(RelFactories.LOGICAL_BUILDER);
//~ Constructors -----------------------------------------------------------
/**
* Creates a JoinAssociateRule.
*/
public JoinAssociateRule(RelBuilderFactory relBuilderFactory) {
super(
operand(Join.class,
operand(Join.class, any()),
operand(RelSubset.class, any())),
relBuilderFactory, null);
}
//~ Methods ----------------------------------------------------------------
public void onMatch(final RelOptRuleCall call) {
final Join topJoin = call.rel(0);
final Join bottomJoin = call.rel(1);
final RelNode relA = bottomJoin.getLeft();
final RelNode relB = bottomJoin.getRight();
final RelSubset relC = call.rel(2);
final RelOptCluster cluster = topJoin.getCluster();
final RexBuilder rexBuilder = cluster.getRexBuilder();
if (relC.getConvention() != relA.getConvention()) {
// relC could have any trait-set. But if we're matching say
// EnumerableConvention, we're only interested in enumerable subsets.
return;
}
// topJoin
// / \
// bottomJoin C
// / \
// A B
final int aCount = relA.getRowType().getFieldCount();
final int bCount = relB.getRowType().getFieldCount();
final int cCount = relC.getRowType().getFieldCount();
final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount);
final ImmutableBitSet bBitSet =
ImmutableBitSet.range(aCount, aCount + bCount);
if (!topJoin.getSystemFieldList().isEmpty()) {
// FIXME Enable this rule for joins with system fields
return;
}
// If either join is not inner, we cannot proceed.
// (Is this too strict?)
if (topJoin.getJoinType() != JoinRelType.INNER
|| bottomJoin.getJoinType() != JoinRelType.INNER) {
return;
}
// Goal is to transform to
//
// newTopJoin
// / \
// A newBottomJoin
// / \
// B C
// Split the condition of topJoin and bottomJoin into a conjunctions. A
// condition can be pushed down if it does not use columns from A.
final List<RexNode> top = new ArrayList<>();
final List<RexNode> bottom = new ArrayList<>();
JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom);
JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top,
bottom);
// Mapping for moving conditions from topJoin or bottomJoin to
// newBottomJoin.
// target: | B | C |
// source: | A | B | C |
final Mappings.TargetMapping bottomMapping =
Mappings.createShiftMapping(
aCount + bCount + cCount,
0, aCount, bCount,
bCount, aCount + bCount, cCount);
final List<RexNode> newBottomList = new ArrayList<>();
new RexPermuteInputsShuttle(bottomMapping, relB, relC)
.visitList(bottom, newBottomList);
RexNode newBottomCondition =
RexUtil.composeConjunction(rexBuilder, newBottomList);
final Join newBottomJoin =
bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB,
relC, JoinRelType.INNER, false);
// Condition for newTopJoin consists of pieces from bottomJoin and topJoin.
// Field ordinals do not need to be changed.
RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top);
@SuppressWarnings("SuspiciousNameCombination")
final Join newTopJoin =
topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA,
newBottomJoin, JoinRelType.INNER, false);
call.transformTo(newTopJoin);
}
}
RBO 执行过程
HepPlanner 是RBO规则的执行器,其核心代码是findBestExp() 方法,具体内容见代码。
public class HepPlanner {
// 查找最优表达式
public RelNode findBestExp() {
assert this.root != null;
this.executeProgram(this.mainProgram);
this.collectGarbage();
return this.buildFinalPlan(this.root);
}
// 遍历所有的规则
private void executeProgram(HepProgram program) {
HepProgram savedProgram = this.currentProgram;
this.currentProgram = program;
this.currentProgram.initialize(program == this.mainProgram);
UnmodifiableIterator var3 = this.currentProgram.instructions.iterator();
while(var3.hasNext()) {
HepInstruction instruction = (HepInstruction)var3.next();
instruction.execute(this);
int delta = this.nTransformations - this.nTransformationsLastGC;
if (delta > this.graphSizeLastGC) {
this.collectGarbage();
}
}
this.currentProgram = savedProgram;
}
// 构建最优SQL表达式
private RelNode buildFinalPlan(HepRelVertex vertex) {
RelNode rel = vertex.getCurrentRel();
this.notifyChosen(rel);
List<RelNode> inputs = rel.getInputs();
for(int i = 0; i < inputs.size(); ++i) {
RelNode child = (RelNode)inputs.get(i);
if (child instanceof HepRelVertex) {
child = this.buildFinalPlan((HepRelVertex)child);
rel.replaceInput(i, child);
rel.recomputeDigest();
}
}
return rel;
}
}
Calcite 默认的优化规则
Calcite 包括并不限于以下规则,具体详见rules包下的规则类。
- AggregateXXXRule
- FilterXXXRule
- CalcXXXRule
- DateRangeRule
- JoinXXXRule
- ProjectXXXRule
总结
Calcite 基于规则引擎的优化,可以将一些粗鄙的SQL转换成优化后的SQL,但是其并不跟根据原始数据集的定义来进一步优化SQL,比如我们在返回结果级时优先根据主键索引或时间索引排序,或者每次仅返回500条数据就已经能够满足用户的需求。所以,在这些场景下,我们还需要根据业务去定义自己的SQL规则,这样更好的发挥Calcite的优势。