欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

连续系统的 LQR 推导

最编程 2024-06-15 18:48:01
...

连续时域上的DP(Dynamic Programming)

首先考虑如下形式的优化问题:

\begin{aligned} &&\min J = h(x(t_f),t_f) + \int_{t_0}^{t_f}g(x(t), u(t), t)dt \\ &\text{subject to} \\ &&\dot{x} &= a(x, u, t) \\ &&x(t_0) &= x_0 \\ &&m(x(t_f), t_f) &= 0 \\ &&u(t) &\in \mathscr{U} \end{aligned} \tag{1}

其中t_f是终止时间,t_0是起始时间,m(x(t_f),t_f)=0是终止条件(可能不唯一,因为m的值域是一个向量),\mathscr{U}表示对于u(t)的约束。

这个问题的解决的最终形式是一个非线性偏微分方程(Nonlinear Partial Differential Equation),被称作Hamilton-Jacobi-Bellman方程(HJB),下面进行推导。

现在我们设[t_0, t_f]区间内的任意一个时间点t,我们考虑[t,t_f]这个区间内的代价函数,其中\tau \in [t, t_f],那么有如下关系:

J(x(t), t, u(\tau)) = h(x(t_f), t_f) + \int_{t}^{t_f}g(x(\tau), u(\tau), \tau)d\tau \tag{2}

显然我们把区间[t,t_f]分成两个区间来考虑:[t,t+\Delta t][t+\Delta t,t_f]。如下:

\begin{aligned} \hat{J}(x(t), t) &= \underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}J(x(t),t,u(\tau)) \\ &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}\left\{h(x(t_f), t_f)+\int_{t}^{t_f}g(x(\tau),u(\tau), \tau)d\tau\right\}\\ &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t_f]}{\min}\left\{h(x(t_f),t_f)+\int_{t}^{t+\Delta{t}}g(x(\tau), u(\tau),\tau)d\tau+\int_{t+\Delta{t}}^{t_f}g(x(\tau), u(\tau),\tau)d\tau\right\} \end{aligned} \tag{4}

我们定义[t+\Delta t,t_f]范围内的最优代价函数:

\begin{aligned} \hat{J}(x(t+\Delta{t}), t+\Delta{t}) &=\underset{u(\tau)\in\mathscr{U},\tau\in[t+\Delta{t}, t_f]}{\min}\left\{h(x(t_f),t_f)+\int_{t+\Delta{t}}^{t_f}g(x(\tau), u(\tau),\tau)d\tau\right\} \end{aligned} \tag{5}

于是有:

\begin{aligned} \hat{J}(x(t), t) &=\underset{u(\tau)\in\mathscr{U},\tau\in[t, t+\Delta{t}]}{\min}\left\{\int_{t}^{t+\Delta{t}}g(x(\tau), u(\tau),\tau)d\tau+\hat{J}(x(t+\Delta{t}), t+\Delta{t}) \right\} \end{aligned} \tag{6}

我们假设\Delta t很小,那么可以对上式进行泰勒展开,有:

\hat{J}(x(t+\Delta{t}), t+\Delta{t}) \approx\hat{J}(x(t), t)+\left[\frac{\partial\hat{J}}{\partial{t}}(x(t),t)\right]\Delta{t}+\left[\frac{\partial\hat{J}}{\partial{x}}(x(t),t)\right][x(t+\Delta{t})-x(t)] \tag{7}

下面定义其中两个偏微分的别名:

\begin{aligned} \hat{J}_t(x(t),t)&=\frac{\partial\hat{J}}{\partial{t}}(x(t),t)\\ \hat{J}_x(x(t),t)&=\frac{\partial\hat{J}}{\partial{x}}(x(t),t) \end{aligned} \tag{8}

代入(7)中可以得到:

\begin{aligned} \hat{J}(x(t+\Delta{t}), t+\Delta{t})&\approx\hat{J}(x(t), t)+\hat{J}_t(x(t),t)\Delta{t}+\hat{J}_x(x(t),t)[x(t+\Delta{t})-x(t)]\\ &\approx\hat{J}(x(t), t)+\hat{J}_t(x(t),t)\Delta{t}+\hat{J}_x(x(t),t)a(x(t),u(t),t)\Delta{t}\\ \end{aligned} \tag{9}

将公式(10)和公式(9)代入到公式(6),得到:

\begin{aligned} \hat{J}(x(t), t)&=\underset{u(t)\in\mathscr{U}}{\min}\{g(x(t),u(t),t)\Delta{t}+\hat{J}(x(t),t)+\hat{J}_t(x(t),t)\Delta{t}+\hat{J}_x(x(t),t)a(x(t),u(t),t)\Delta{t}\}\\ &=\hat{J}(x(t),t)+\hat{J}_t(x(t),t)\Delta{t}+\underset{u(t)\in\mathscr{U}}{\min}\{g(x(t),u(t),t)\Delta{t}+\hat{J}_x(x(t),t)a(x(t),u(t),t)\Delta{t}\}\\ 0&=\hat{J}_t(x(t),t)\Delta{t}+\underset{u(t)\in\mathscr{U}}{\min}\{g(x(t),u(t),t)\Delta{t}+\hat{J}_x(x(t),t)a(x(t),u(t),t)\Delta{t}\} \end{aligned} \tag{11}

上式中提取出和u(t)无关的项目,得到最终的结果。这是一个\hat{J}(x(t),t)的偏微分方程,根据最终状态\hat{J}(x(t_f),t_f)反向推导前边的状态,其中有:

\hat{J}(x(t_f),t_f)=h(x(t_f)) \tag{12}

HJB(Hamiltonian-Jacobi-Bellman)等式

下面定义哈密顿量(Hamiltonian):

\mathscr{H}(x(t),u(t),\hat{J}_x(x(t),t),t)=g(x(t),u(t),t)+\hat{J}_x(x(t),t)a(x(t),u(t),t) \tag{13}

根据公式(11),代入公式(13),并且约掉\Delta t,得到:

\begin{aligned} -\hat{J}_t(x(t),t)&=\underset{u(t)\in\mathscr{U}}{\min}\{\mathscr{H}(x(t),u(t),\hat{J}_x(x(t),t),t)\} \end{aligned} \tag{14}

这即是HJB等式

连续时域LQR(Continuous LQR)

下面考虑如下线性系统模型(Linear System Model)和它的二次代价函数(Quadratic Cost Function):

\dot{x}(t)=A(t)x(t)+B(t)u(t) \tag{15}

J=\frac{1}{2}x(t_f)^THx(t_f)+\frac{1}{2}\int_{t_0}^{t_f}\{x(t)^TR_{xx}(t)x(t)+u(t)^TR_{uu}(t)u(t)\}dt \tag{16}

假设t_f是固定的,u(t)没有约束;假设H,R_{xx}\geq 0(半正定),R_{uu}(t)>0正定),根据公式(16)和公式(13)的定义,可以得到哈密顿量:

\mathscr{H}(x(t),u(t),\hat{J}_x(x(t),t),t)=\frac{1}{2}[x(t)^TR_{xx}(t)x(t)+u(t)^TR_{uu}(t)u(t)]+\hat{J}_x(x(t),t)[A(t)x(t)+B(t)u(t)] \tag{17}

根据公式(14),需要找到一个最优的u(t),即\hat u(t) ,使得\mathscr{H}最小,那么在没有约束的情况下,需要满足以下必要条件:

\frac{\partial\mathscr{H}}{\partial{u}}=u(t)R_{uu}(t)+\hat{J}_x(x(t),t)B(t)=0 \tag{18}

于是,根据上式可以得到一个最优控制率(Optimal Control Law):

\hat{u}(t)=-R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T \tag{19}

如果需要使上述最优条件充分且必要,还需要满足:

\frac{\partial^2\mathscr{H}}{\partial{u}^2}=R_{uu}(t)>0 \tag{20}

显然是成立的,因此该最优值是全局最小值。

下面把(19)的最优控制率代入到(17)中的哈密顿量中,得到:

\begin{aligned} \mathscr{H}(x(t),\hat{u}(t),\hat{J}_x(x(t),t),t)&=\frac{1}{2}[x(t)^TR_{xx}(t)x(t)+\hat{u}(t)^TR_{uu}(t)\hat{u}(t)]+\hat{J}_x(x(t),t)[A(t)x(t)+B(t)\hat{u}(t)]\\ &=\frac{1}{2}[x(t)^TR_{xx}(t)x(t)+[-R_{uu}(t)^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T]^TR_{uu}(t)[-R_{uu}(t)^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T]]+\hat{J}_x(x(t),t)[A(t)x(t)+B(t)[-R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T]]\\ &=\frac{1}{2}[x(t)^TR_{xx}(t)x(t)+\hat{J}_x(x(t),t)B(t)R_{uu}^{-1}(t)R_{uu}(t)R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T]+\hat{J}_x(x(t),t)A(t)x(t)-\hat{J}_x(x(t),t)B(t)R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T\\ &=\frac{1}{2}x(t)^TR_{xx}(t)x(t)+\hat{J}_x(x(t),t)A(t)x(t)-\frac{1}{2}\hat{J}_x(x(t),t)B(t)R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T \end{aligned} \tag{21}

于是根据(14)式,我们可以得到如下关系:

\begin{aligned} -\hat{J}_t(x(t),t)=\frac{1}{2}x(t)^TR_{xx}(t)x(t)+\hat{J}_x(x(t),t)A(t)x(t)-\frac{1}{2}\hat{J}_x(x(t),t)B(t)R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T \end{aligned} \tag{22}

这是一个关于\hat J的偏微分方程,根据(16)可以很容易的知道\hat J的边界条件:

\hat{J}(x(t_f),t_f)=\frac{1}{2}x^T(t_f)Hx(t_f) \tag{23}

根据上述公式,我们可以大胆假设\hat J在所有时间t上均是x(t)的二次型(Quadratic Form),因此我们假设:

\hat{J}(x(t),t)=\frac{1}{2}x^T(t)P(t)x(t), \quad P(t)=P^T(t) \tag{24}

根据假设很容易得到:

\begin{aligned} &\hat{J}_x(x(t),t)=\frac{\partial\hat{J}}{\partial{x}}=x^T(t)P(t)\\ &\hat{J}_t(x(t),t)=\frac{\partial\hat{J}}{\partial{t}}=\frac{1}{2}x^T(t)\dot{P}(t)x(t)\\ \end{aligned} \tag{25}

将其代入到(22)中可以得到如下关系:

\begin{aligned} -\frac{1}{2}x^T(t)\dot{P}(t)x(t)&=\frac{1}{2} x^T(t)R_{xx}(t)x(t)+x^T(t)P(t)A(t)x(t)-\frac{1}{2}x^T(t)P(t)B(t)R_{uu}^{-1}(t)B^T(t)P(t)x(t)\\ &=\frac{1}{2} x^T(t)R_{xx}(t)x(t)+\frac{1}{2}x^T(t)\{P(t)A(t)+A^T(t)P(t)\}x(t)-\frac{1}{2}x^T(t)P(t)B(t)R_{uu}^{-1}(t)B^T(t)P(t)x(t)\\ &=\frac{1}{2}x^T(t)\{R_{xx}(t)+P(t)A(t)+A^T(t)P(t)-P(t)B(t)R_{uu}^{-1}(t)B^T(t)P(t)\}x(t) \end{aligned} \tag{26}

综合(23)和(26)中的结果,P(t)满足如下关系:

\begin{aligned} -\dot{P}(t)&=R_{xx}(t)+P(t)A(t)+A^T(t)P(t)-P(t)B(t)R_{uu}^{-1}(t)B^T(t)P(t)\\ P(t_f)&=H \end{aligned} \tag{27}

上式被称为黎卡提微分方程(Differential Riccati Equation)。通过求解这个方程,可以得到P(t),进而可以根据(25)得到\hat{J}_x(x(t),t),最后根据(19)得到最优控制输入\hat u(t),如下:

\begin{aligned} \hat{u}(t)&=-R_{uu}^{-1}(t)B(t)^T\hat{J}_x(x(t),t)^T\\ &=-R_{uu}^{-1}(t)B(t)^TP(t)x(t) \end{aligned} \tag{28}

综上,可以得到最优反馈控制增益F(t)

F(t)=R_{uu}^{-1}(t)B^T(t)P(t)\Rightarrow u(t)=-F(t)x(t) \tag{29}

线性时不变(LTI)系统

如果系统是线性时不变系统,则(27)中的系统参数矩阵均不再是时间的函数,因此有:

\begin{aligned} -\dot{P}(t)&=R_{xx}+P(t)A+A^TP(t)-P(t)BR_{uu}^{-1}B^TP(t)\end{aligned} \tag{30}

如果考虑当t_f\rightarrow\infty时,我们期望P(t)趋近一个定值,因此有:

\dot{P}(t) = 0 \tag{31}

\begin{aligned} R_{xx}+PA+A^TP-PBR_{uu}^{-1}B^TP=0\end{aligned} \tag{32}

其中P=\lim_{t \rightarrow \infty}{P(t)},上式被称为连续时间代数黎卡提方程(Continuous time Algebraic Riccati Equation - CARE)。对于线性时不变系统的LQR控制器,最优反馈控制增益是:

F=R_{uu}^{-1}B^TP\Rightarrow u(t)=-Fx(t) \tag{33}

推导完毕。

未经授权,禁止转载