详细解释 Calcite RBO 规则优化引擎

最编程 2024-04-03 09:21:50

...

【开源中国 APP 全新上线】“动弹” 回归、集成大模型对话、畅读技术报告”

SQL优化通常包括两个部分RBO（Rule based Optimizer）和CBO（Cost based Optimizer），Calcite 引擎支持以上两种模式的优化，本节讲述Calcite是如何基于RBO进行优化的。

RelOptRule 规则基类

RelOptRule是Calcite规则引擎中的规则基类，它可以将原有的SQL表达式转换成优化后的SQL表达式，优化器能够识别这些规则并调用。

package org.apache.calcite.plan;

public abstract class RelOptRule {

    /**
     * 规则描述，默认为类的全路径
     */
    protected final String description;

    /**
     * 树表达式的根.
     */
    private final RelOptRuleOperand operand;

    /**
     *  通过工厂构建响应的规则表达式
     */
    public final RelBuilderFactory relBuilderFactory;

    /**
     * 规则表达式列表
     */
    public final List<RelOptRuleOperand> operands;


    /**
     * 构建每个规则的处理顺序，从当前节点开始一直到根节点
     */
    private void assignSolveOrder() {
        for (RelOptRuleOperand operand : operands) {
            operand.solveOrder = new int[operands.size()];
            int m = 0;
            for (RelOptRuleOperand o = operand; o != null; o = o.getParent()) {
                operand.solveOrder[m++] = o.ordinalInRule;
            }
            for (int k = 0; k < operands.size(); k++) {
                boolean exists = false;
                for (int n = 0; n < m; n++) {
                    if (operand.solveOrder[n] == k) {
                        exists = true;
                    }
                }
                if (!exists) {
                    operand.solveOrder[m++] = k;
                }
            }

            // Assert: operand appears once in the sort-order.
            assert m == operands.size();
        }
    }

    /**
     * 判断规则是否匹配操作
     */
    public boolean matches(RelOptRuleCall call) {
        return true;
    }

    /**
     * 接收规则匹配后的通知，通过RelOptRuleCall返回
     */
    public abstract void onMatch(RelOptRuleCall call);

   ...
}

定义一个规则类

org/apache/calcite/rel/rules包下定义了Calcite使用的规则类，只需继承上面的RelOptRule类即可，下面的代码是JOIN关系的优化规则。

public class JoinAssociateRule extends RelOptRule {
  //~ Static fields/initializers ---------------------------------------------

  /** The singleton. */
  public static final JoinAssociateRule INSTANCE =
      new JoinAssociateRule(RelFactories.LOGICAL_BUILDER);

  //~ Constructors -----------------------------------------------------------

  /**
   * Creates a JoinAssociateRule.
   */
  public JoinAssociateRule(RelBuilderFactory relBuilderFactory) {
    super(
        operand(Join.class,
            operand(Join.class, any()),
            operand(RelSubset.class, any())),
        relBuilderFactory, null);
  }

  //~ Methods ----------------------------------------------------------------

  public void onMatch(final RelOptRuleCall call) {
    final Join topJoin = call.rel(0);
    final Join bottomJoin = call.rel(1);
    final RelNode relA = bottomJoin.getLeft();
    final RelNode relB = bottomJoin.getRight();
    final RelSubset relC = call.rel(2);
    final RelOptCluster cluster = topJoin.getCluster();
    final RexBuilder rexBuilder = cluster.getRexBuilder();

    if (relC.getConvention() != relA.getConvention()) {
      // relC could have any trait-set. But if we're matching say
      // EnumerableConvention, we're only interested in enumerable subsets.
      return;
    }

    //        topJoin
    //        /     \
    //   bottomJoin  C
    //    /    \
    //   A      B

    final int aCount = relA.getRowType().getFieldCount();
    final int bCount = relB.getRowType().getFieldCount();
    final int cCount = relC.getRowType().getFieldCount();
    final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount);
    final ImmutableBitSet bBitSet =
        ImmutableBitSet.range(aCount, aCount + bCount);

    if (!topJoin.getSystemFieldList().isEmpty()) {
      // FIXME Enable this rule for joins with system fields
      return;
    }

    // If either join is not inner, we cannot proceed.
    // (Is this too strict?)
    if (topJoin.getJoinType() != JoinRelType.INNER
        || bottomJoin.getJoinType() != JoinRelType.INNER) {
      return;
    }

    // Goal is to transform to
    //
    //       newTopJoin
    //        /     \
    //       A   newBottomJoin
    //               /    \
    //              B      C

    // Split the condition of topJoin and bottomJoin into a conjunctions. A
    // condition can be pushed down if it does not use columns from A.
    final List<RexNode> top = new ArrayList<>();
    final List<RexNode> bottom = new ArrayList<>();
    JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom);
    JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top,
        bottom);

    // Mapping for moving conditions from topJoin or bottomJoin to
    // newBottomJoin.
    // target: | B | C      |
    // source: | A       | B | C      |
    final Mappings.TargetMapping bottomMapping =
        Mappings.createShiftMapping(
            aCount + bCount + cCount,
            0, aCount, bCount,
            bCount, aCount + bCount, cCount);
    final List<RexNode> newBottomList = new ArrayList<>();
    new RexPermuteInputsShuttle(bottomMapping, relB, relC)
        .visitList(bottom, newBottomList);
    RexNode newBottomCondition =
        RexUtil.composeConjunction(rexBuilder, newBottomList);

    final Join newBottomJoin =
        bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB,
            relC, JoinRelType.INNER, false);

    // Condition for newTopJoin consists of pieces from bottomJoin and topJoin.
    // Field ordinals do not need to be changed.
    RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top);
    @SuppressWarnings("SuspiciousNameCombination")
    final Join newTopJoin =
        topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA,
            newBottomJoin, JoinRelType.INNER, false);

    call.transformTo(newTopJoin);
  }
}

RBO 执行过程

HepPlanner 是RBO规则的执行器，其核心代码是findBestExp() 方法，具体内容见代码。

public class HepPlanner {
    // 查找最优表达式
    public RelNode findBestExp() {
        assert this.root != null;

        this.executeProgram(this.mainProgram);
        this.collectGarbage();
        return this.buildFinalPlan(this.root);
    }

    // 遍历所有的规则
    private void executeProgram(HepProgram program) {
        HepProgram savedProgram = this.currentProgram;
        this.currentProgram = program;
        this.currentProgram.initialize(program == this.mainProgram);
        UnmodifiableIterator var3 = this.currentProgram.instructions.iterator();

        while(var3.hasNext()) {
            HepInstruction instruction = (HepInstruction)var3.next();
            instruction.execute(this);
            int delta = this.nTransformations - this.nTransformationsLastGC;
            if (delta > this.graphSizeLastGC) {
                this.collectGarbage();
            }
        }

        this.currentProgram = savedProgram;
    }

    // 构建最优SQL表达式
    private RelNode buildFinalPlan(HepRelVertex vertex) {
        RelNode rel = vertex.getCurrentRel();
        this.notifyChosen(rel);
        List<RelNode> inputs = rel.getInputs();

        for(int i = 0; i < inputs.size(); ++i) {
            RelNode child = (RelNode)inputs.get(i);
            if (child instanceof HepRelVertex) {
                child = this.buildFinalPlan((HepRelVertex)child);
                rel.replaceInput(i, child);
                rel.recomputeDigest();
            }
        }

        return rel;
    }
}

Calcite 默认的优化规则

Calcite 包括并不限于以下规则，具体详见rules包下的规则类。

AggregateXXXRule
FilterXXXRule
CalcXXXRule
DateRangeRule
JoinXXXRule
ProjectXXXRule

总结

Calcite 基于规则引擎的优化，可以将一些粗鄙的SQL转换成优化后的SQL，但是其并不跟根据原始数据集的定义来进一步优化SQL，比如我们在返回结果级时优先根据主键索引或时间索引排序，或者每次仅返回500条数据就已经能够满足用户的需求。所以，在这些场景下，我们还需要根据业务去定义自己的SQL规则，这样更好的发挥Calcite的优势。

上一篇： Netty 核心原理分析和 RPC 实践 6-10

下一篇： Inno Setup 高级窗口初始化 (I)