A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 13 Issue 5
May  2026

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 19.2, Top 1 (SCI Q1)
    CiteScore: 28.2, Top 1% (Q1)
    Google Scholar h5-index: 95, TOP 5
Turn off MathJax
Article Contents
Y. Su, D. Wang, M. Zhao, D. Xiong, Y. Huang, and W. Han, “Intelligent safe optimal control towards Koopman operator-driven nonlinear systems with asymmetric state and input constraints,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 5, pp. 1135–1150, May 2026. doi: 10.1109/JAS.2025.125945
Citation: Y. Su, D. Wang, M. Zhao, D. Xiong, Y. Huang, and W. Han, “Intelligent safe optimal control towards Koopman operator-driven nonlinear systems with asymmetric state and input constraints,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 5, pp. 1135–1150, May 2026. doi: 10.1109/JAS.2025.125945

Intelligent Safe Optimal Control Towards Koopman Operator-Driven Nonlinear Systems With Asymmetric State and Input Constraints

doi: 10.1109/JAS.2025.125945
More Information
  • For unknown nonlinear systems subject to asymmetric state and input constraints simultaneously, this article establishes a safe value iteration paradigm to learn an optimal control policy in a data-based manner. Initially, the Koopman operator, instead of the black-box neural network, is applied to extract the inherent dynamics of the controlled systems from the measured data, thereby allowing for explicit analysis of the prediction error. To tackle the issue posed by state and input constraints, a crafted control barrier function is seamlessly incorporated into the canonical utility function, which retains the property of positive definiteness for the asymmetric case. Moreover, the value iteration algorithm with regard to the augmented utility function is adopted to attain a safe optimal controller, where the actor and critic networks are leveraged to approximate the control input and associated value function, respectively. The monotonicity, safety, and stability of the raised algorithm are further verified rigorously. Via performing three experiments on the linear system, the nonlinear system, and the manipulator plant, comparative results are obtained to substantiate the superiority and efficacy of the developed approach in achieving optimal performance and safe guarantee.

     

  • loading
  • [1]
    L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annu. Rev. Control Robot. Auton. Syst., vol. 5, no. 1, pp. 411–444, May 2022. doi: 10.1146/annurev-control-042920-020211
    [2]
    L. Yang, H. Dai, A. Amice, and R. Tedrake, “Approximate optimal controller synthesis for cart-poles and quadrotors via sums-of-squares,” IEEE Robot. Autom. Lett., vol. 8, no. 11, pp. 7376–7383, Nov. 2023. doi: 10.1109/LRA.2023.3315228
    [3]
    D. Wang, N. Gao, D. Liu, J. Li, and F. L. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843
    [4]
    B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062, Jun. 2018. doi: 10.1109/TNNLS.2017.2773458
    [5]
    R. Kamalapurkar, P. Walters, and W. E. Dixon, “Model-based reinforcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, Feb. 2016. doi: 10.1016/j.automatica.2015.10.039
    [6]
    C. Li, J. Ding, F. L. Lewis, and T. Chai, “A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems,” Automatica, vol. 129, Art. no. 109687, Jul. 2021. doi: 10.1016/j.automatica.2021.109687
    [7]
    J. Li, S. E. Li, J. Duan, Y. Lyu, W. Zou, Y. Guan, and Y. Yin, “Relaxed policy iteration algorithm for nonlinear zero-sum games with application to H-infinity control,” IEEE Trans. Autom. Control, vol. 69, no. 1, pp. 426–433, Jan. 2024. doi: 10.1109/TAC.2023.3266277
    [8]
    Y. Jiang, W. Gao, J. Wu, T. Chai, and F. L. Lewis, “Reinforcement learning and cooperative H output regulation of linear continuous-time multi-agent systems,” Automatica, vol. 148, Art. no. 110768, Feb. 2023. doi: 10.1016/j.automatica.2022.110768
    [9]
    B. Zhao, S. Zhang, and D. Liu, “Self-triggered approximate optimal neuro-control for nonlinear systems through adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 3, pp. 4713–4723, Mar. 2025. doi: 10.1109/TNNLS.2024.3362800
    [10]
    D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 500–509, Mar. 2017. doi: 10.1109/TNNLS.2015.2503980
    [11]
    M. Zhao, D. Wang, J. Qiao, M. Ha, and J. Ren, “Advanced value iteration for discrete-time intelligent critic control: A survey,” Artif. Intell. Rev., vol. 56, no. 10, pp. 12315–12346, May 2023. doi: 10.1007/s10462-023-10497-1
    [12]
    K. Zhang, S. Luo, H.-N. Wu, and R. Su, “Data-driven tracking control for nonaffine yaw channel of helicopter via off-policy reinforcement learning,” IEEE Trans. Aerosp. Electron. Syst., vol. 61, no. 3, pp. 7725–7737, Jun. 2025. doi: 10.1109/TAES.2025.3539264
    [13]
    O. Qasem, H. Gutierrez, and W. Gao, “Experimental validation of data-driven adaptive optimal control for continuous-time systems via hybrid iteration: An application to rotary inverted pendulum,” IEEE Trans. Ind. Electron., vol. 71, no. 6, pp. 6210–6220, Jun. 2024. doi: 10.1109/TIE.2023.3292873
    [14]
    K. Zhang, R. Su, H. Zhang, and Y. Tian, “Adaptive resilient event-triggered control design of autonomous vehicles with an iterative single critic learning framework,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5502–5511, Dec. 2021. doi: 10.1109/TNNLS.2021.3053269
    [15]
    C. Mu, D. Wang, and H. He, “Novel iterative neural dynamic programming for data-based approximate optimal control design,” Automatica, vol. 81, pp. 240–252, Jul. 2017. doi: 10.1016/j.automatica.2017.03.022
    [16]
    D. Wang, H. He, C. Mu, and D. Liu, “Intelligent critic control with disturbance attenuation for affine dynamics including an application to a microgrid system,” IEEE Trans. Ind. Electron., vol. 64, no. 6, pp. 4935–4944, Jun. 2017. doi: 10.1109/TIE.2017.2674633
    [17]
    P. Bevanda, S. Sosnowski, and S. Hirche, “Koopman operator dynamical models: Learning, analysis and control,” Annu. Rev. Control, vol. 52, pp. 197–212, 2021. doi: 10.1016/j.arcontrol.2021.09.002
    [18]
    M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition,” J. Nonlinear Sci., vol. 25, pp. 1307–1346, Jun. 2015. doi: 10.1007/s00332-015-9258-5
    [19]
    M. Korda and I. Mezić, “Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control,” Automatica, vol. 93, pp. 149–160, Jul. 2018. doi: 10.1016/j.automatica.2018.03.046
    [20]
    D. Bruder, D. Bombara, and R. J. Wood, “A Koopman-based residual modeling approach for the control of a soft robot arm,” Int. J. Robot. Res., vol. 44, no. 3, pp. 388–406, Mar. 2025. doi: 10.1177/02783649241272114
    [21]
    J. Jia, W. Zhang, K. Guo, J. Wang, X. Yu, Y. Shi, and L. Guo, “EVOLVER: Online learning and prediction of disturbances for robot control,” IEEE Trans. Robot., vol. 40, pp. 382–402, 2024. doi: 10.1109/TRO.2023.3326318
    [22]
    M. Zhou, M. Lu, G. Hu, Z. Guo, and J. Guo, “Koopman operator-based integrated guidance and control for strap-down high-speed missiles,” IEEE Trans. Control Syst. Technol., vol. 32, no. 6, pp. 2436–2443, Nov. 2024. doi: 10.1109/TCST.2024.3401609
    [23]
    X. Zhang, W. Pan, R. Scattolini, S. Yu, and X. Xu, “Robust tube-based model predictive control with Koopman operators,” Automatica, vol. 137, Art. no. 110114, Mar. 2022. doi: 10.1016/j.automatica.2021.110114
    [24]
    L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning-based model predictive control: Toward safe learning in control,” Annu. Rev. Control Robot. Autonom. Syst., vol. 3, pp. 269–296, May 2020. doi: 10.1146/annurev-control-090419-075625
    [25]
    C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,” IEEE Trans. Robot., vol. 39, no. 3, pp. 1749–1767, Jun. 2023. doi: 10.1109/TRO.2022.3232542
    [26]
    H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, Jul. 2014. doi: 10.1016/j.automatica.2014.05.011
    [27]
    Y. Yang, Y. Yin, W. He, K. G. Vamvoudakis, H. Modares, and D. C. Wunsch, “Safety-aware reinforcement learning framework with an actor-critic-barrier structure,” in Proc. American Control Conf., Philadelphia, USA, 2019, pp. 2352−2358.
    [28]
    Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Safe intermittent reinforcement learning with static and dynamic event generators,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 12, pp. 5441–5455, Dec. 2020. doi: 10.1109/TNNLS.2020.2967871
    [29]
    Y. Yang, K. G. Vamvoudakis, and H. Modares, “Safe reinforcement learning for dynamical games,” Int. J. Robust Nonlinear Control, vol. 30, no. 9, pp. 3706–3726, Jun. 2020. doi: 10.1002/rnc.4962
    [30]
    Z. Marvi and B. Kiumarsi, “Safe reinforcement learning: A control barrier function optimization approach,” Int. J. Robust Nonlinear Control, vol. 31, no. 6, pp. 1923–1940, Apr. 2021. doi: 10.1002/rnc.5132
    [31]
    M. Zhao, D. Wang, S. Song, and J. Qiao, “Safe Q-learning for data-driven nonlinear optimal control with asymmetric state constraints,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 12, pp. 2408–2422, Dec. 2024. doi: 10.1109/jas.2024.124509
    [32]
    S. Liu, L. Liu, and Z. Yu, “Safe reinforcement learning for affine nonlinear systems with state constraints and input saturation using control barrier functions,” Neurocomputing, vol. 518, pp. 562–576, Jan. 2023. doi: 10.1016/j.neucom.2022.11.006
    [33]
    L. Zhang, L. Xie, Y. Jiang, Z. Li, X. Liu, and H. Su, “Optimal control for constrained discrete-time nonlinear systems based on safe reinforcement learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, pp. 854–865, Jan. 2025. doi: 10.1109/TNNLS.2023.3326397
    [34]
    G. Mamakoukas, I. Abraham, and T. D. Murphey, “Learning stable models for prediction and control,” IEEE Trans. Robot., vol. 39, no. 3, pp. 2255–2275, Jun. 2023. doi: 10.1109/TRO.2022.3228130
    [35]
    S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems,” Proc. Natl. Acad. Sci. USA, vol. 113, no. 15, pp. 3932–3937, Mar. 2016. doi: 10.1073/pnas.1517384113
    [36]
    L. Shi and K. Karydis, “ACD-EDMD: Analytical construction for dictionaries of lifting functions in Koopman operator-based nonlinear robotic systems,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 906–913, Apr. 2022. doi: 10.1109/LRA.2021.3133001
    [37]
    W. Hao, B. Huang, W. Pan, D. Wu, and S. Mou, “Deep Koopman learning of nonlinear time-varying systems,” Automatica, vol. 159, Art. no. 111372, Jan. 2024. doi: 10.1016/j.automatica.2023.111372
    [38]
    A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Trans. Autom. Control, vol. 62, no. 8, pp. 3861–3876, Aug. 2017. doi: 10.1109/TAC.2016.2638961
    [39]
    A. G. Wills and W. P. Heath, “Barrier function based model predictive control,” Automatica, vol. 40, no. 8, pp. 1415–1422, Aug. 2004. doi: 10.1016/j.automatica.2004.03.002
    [40]
    A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(14)  / Tables(1)

    Article Metrics

    Article views (187) PDF downloads(20) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return