Intelligent Safe Optimal Control Towards Koopman Operator-Driven Nonlinear Systems With Asymmetric State and Input Constraints

Yalu Su; Ding Wang; Mingming Zhao; Dan Xiong; Yiyong Huang; Wei Han

doi:10.1109/JAS.2025.125945

Volume 13 Issue 5

May 2026

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2026 > 13(5): 1135-1150

Y. Su, D. Wang, M. Zhao, D. Xiong, Y. Huang, and W. Han, “Intelligent safe optimal control towards Koopman operator-driven nonlinear systems with asymmetric state and input constraints,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 5, pp. 1135–1150, May 2026. doi: 10.1109/JAS.2025.125945

Citation:

Y. Su, D. Wang, M. Zhao, D. Xiong, Y. Huang, and W. Han, “Intelligent safe optimal control towards Koopman operator-driven nonlinear systems with asymmetric state and input constraints,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 5, pp. 1135–1150, May 2026. doi: 10.1109/JAS.2025.125945

Citation:

Y. Su, D. Wang, M. Zhao, D. Xiong, Y. Huang, and W. Han, “Intelligent safe optimal control towards Koopman operator-driven nonlinear systems with asymmetric state and input constraints,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 5, pp. 1135–1150, May 2026. doi: 10.1109/JAS.2025.125945

PDF( 3401 KB)

Intelligent Safe Optimal Control Towards Koopman Operator-Driven Nonlinear Systems With Asymmetric State and Input Constraints

doi: 10.1109/JAS.2025.125945

More Information

Author Bio:
Yalu Su received the B.E. degree in detection, guidance and control from the North University of China in 2018, and the M.E. degree in navigation, guidance and control from the Northwestern Polytechnical University in 2021. He is currently working toward the Ph.D. degree in aerospace engineering at Peking University.His research interests include adaptive dynamic programming, nonlinear control, and imitation learning

Ding Wang (Senior Member, IEEE) received the Ph.D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences in 2012.He is currently a Full Professor with the Faculty of Information Technology, Beijing University of Technology. He has authored or co-authored over 150 journal and conference papers and five monographs. His current research interests include adaptive critic control with industrial applications, reinforcement learning, and intelligent systems.Dr. Wang currently serves as an Associate Editor of IEEE Transactions on Systems, Man, and Cybernetics: Systems, Neural Networks, Engineering Applications of Artificial Intelligence, International Journal of Robust and Nonlinear Control, and Acta Automatica Sinica

Mingming Zhao received the B.E. degree in automation from Henan Polytechnic University in 2019, and the M.E. degree in control engineering in 2022 from the Beijing University of Technology, where he is currently working toward the Ph.D. degree in control science and engineering.His research interests include adaptive dynamic programming, reinforcement learning with industrial applications, and intelligent systems

Dan Xiong received the Ph.D. degree from the National University of Defense Technology in 2017.He is an Assistant Researcher with the Defense Innovation Institute, Chinese Academy of Military Sciences. His research interests include robotic vision and control

Yiyong Huang received the Ph.D. degree in aerospace engineering from the National University of Defense Technology in 1999.He is currently a Professor with the Defense Innovation Institute, Chinese Academy of Military Sciences. His research interest is agile and intelligent control of robotic manipulators

Wei Han received the Ph.D. degree from the National University of Defense Technology in 2016.He is an Associate Researcher with the Defense Innovation Institute, Chinese Academy of Military Sciences. His research interests include flight vehicle design and robotic control
Corresponding author: Dan Xiong, e-mail: xiongdan@alumni.nudt.edu.cn
Received Date: 2025-06-17
Accepted Date: 2025-10-01

Abstract

Abstract

For unknown nonlinear systems subject to asymmetric state and input constraints simultaneously, this article establishes a safe value iteration paradigm to learn an optimal control policy in a data-based manner. Initially, the Koopman operator, instead of the black-box neural network, is applied to extract the inherent dynamics of the controlled systems from the measured data, thereby allowing for explicit analysis of the prediction error. To tackle the issue posed by state and input constraints, a crafted control barrier function is seamlessly incorporated into the canonical utility function, which retains the property of positive definiteness for the asymmetric case. Moreover, the value iteration algorithm with regard to the augmented utility function is adopted to attain a safe optimal controller, where the actor and critic networks are leveraged to approximate the control input and associated value function, respectively. The monotonicity, safety, and stability of the raised algorithm are further verified rigorously. Via performing three experiments on the linear system, the nonlinear system, and the manipulator plant, comparative results are obtained to substantiate the superiority and efficacy of the developed approach in achieving optimal performance and safe guarantee.
- Adaptive optimal control,
- constrained value iteration,
- control barrier function,
- discrete-time nonlinear systems,
- reinforcement learning,
- Koopman operator

FullText(HTML)

References(40)

References

[1]	L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annu. Rev. Control Robot. Auton. Syst., vol. 5, no. 1, pp. 411–444, May 2022. doi: 10.1146/annurev-control-042920-020211
[2]	L. Yang, H. Dai, A. Amice, and R. Tedrake, “Approximate optimal controller synthesis for cart-poles and quadrotors via sums-of-squares,” IEEE Robot. Autom. Lett., vol. 8, no. 11, pp. 7376–7383, Nov. 2023. doi: 10.1109/LRA.2023.3315228
[3]	D. Wang, N. Gao, D. Liu, J. Li, and F. L. Lewis, “Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 1, pp. 18–36, Jan. 2024. doi: 10.1109/JAS.2023.123843
[4]	B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062, Jun. 2018. doi: 10.1109/TNNLS.2017.2773458
[5]	R. Kamalapurkar, P. Walters, and W. E. Dixon, “Model-based reinforcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, Feb. 2016. doi: 10.1016/j.automatica.2015.10.039
[6]	C. Li, J. Ding, F. L. Lewis, and T. Chai, “A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems,” Automatica, vol. 129, Art. no. 109687, Jul. 2021. doi: 10.1016/j.automatica.2021.109687
[7]	J. Li, S. E. Li, J. Duan, Y. Lyu, W. Zou, Y. Guan, and Y. Yin, “Relaxed policy iteration algorithm for nonlinear zero-sum games with application to H-infinity control,” IEEE Trans. Autom. Control, vol. 69, no. 1, pp. 426–433, Jan. 2024. doi: 10.1109/TAC.2023.3266277
[8]	Y. Jiang, W. Gao, J. Wu, T. Chai, and F. L. Lewis, “Reinforcement learning and cooperative H_∞ output regulation of linear continuous-time multi-agent systems,” Automatica, vol. 148, Art. no. 110768, Feb. 2023. doi: 10.1016/j.automatica.2022.110768
[9]	B. Zhao, S. Zhang, and D. Liu, “Self-triggered approximate optimal neuro-control for nonlinear systems through adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 3, pp. 4713–4723, Mar. 2025. doi: 10.1109/TNNLS.2024.3362800
[10]	D. P. Bertsekas, “Value and policy iterations in optimal control and adaptive dynamic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 500–509, Mar. 2017. doi: 10.1109/TNNLS.2015.2503980
[11]	M. Zhao, D. Wang, J. Qiao, M. Ha, and J. Ren, “Advanced value iteration for discrete-time intelligent critic control: A survey,” Artif. Intell. Rev., vol. 56, no. 10, pp. 12315–12346, May 2023. doi: 10.1007/s10462-023-10497-1
[12]	K. Zhang, S. Luo, H.-N. Wu, and R. Su, “Data-driven tracking control for nonaffine yaw channel of helicopter via off-policy reinforcement learning,” IEEE Trans. Aerosp. Electron. Syst., vol. 61, no. 3, pp. 7725–7737, Jun. 2025. doi: 10.1109/TAES.2025.3539264
[13]	O. Qasem, H. Gutierrez, and W. Gao, “Experimental validation of data-driven adaptive optimal control for continuous-time systems via hybrid iteration: An application to rotary inverted pendulum,” IEEE Trans. Ind. Electron., vol. 71, no. 6, pp. 6210–6220, Jun. 2024. doi: 10.1109/TIE.2023.3292873
[14]	K. Zhang, R. Su, H. Zhang, and Y. Tian, “Adaptive resilient event-triggered control design of autonomous vehicles with an iterative single critic learning framework,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5502–5511, Dec. 2021. doi: 10.1109/TNNLS.2021.3053269
[15]	C. Mu, D. Wang, and H. He, “Novel iterative neural dynamic programming for data-based approximate optimal control design,” Automatica, vol. 81, pp. 240–252, Jul. 2017. doi: 10.1016/j.automatica.2017.03.022
[16]	D. Wang, H. He, C. Mu, and D. Liu, “Intelligent critic control with disturbance attenuation for affine dynamics including an application to a microgrid system,” IEEE Trans. Ind. Electron., vol. 64, no. 6, pp. 4935–4944, Jun. 2017. doi: 10.1109/TIE.2017.2674633
[17]	P. Bevanda, S. Sosnowski, and S. Hirche, “Koopman operator dynamical models: Learning, analysis and control,” Annu. Rev. Control, vol. 52, pp. 197–212, 2021. doi: 10.1016/j.arcontrol.2021.09.002
[18]	M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, “A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition,” J. Nonlinear Sci., vol. 25, pp. 1307–1346, Jun. 2015. doi: 10.1007/s00332-015-9258-5
[19]	M. Korda and I. Mezić, “Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control,” Automatica, vol. 93, pp. 149–160, Jul. 2018. doi: 10.1016/j.automatica.2018.03.046
[20]	D. Bruder, D. Bombara, and R. J. Wood, “A Koopman-based residual modeling approach for the control of a soft robot arm,” Int. J. Robot. Res., vol. 44, no. 3, pp. 388–406, Mar. 2025. doi: 10.1177/02783649241272114
[21]	J. Jia, W. Zhang, K. Guo, J. Wang, X. Yu, Y. Shi, and L. Guo, “EVOLVER: Online learning and prediction of disturbances for robot control,” IEEE Trans. Robot., vol. 40, pp. 382–402, 2024. doi: 10.1109/TRO.2023.3326318
[22]	M. Zhou, M. Lu, G. Hu, Z. Guo, and J. Guo, “Koopman operator-based integrated guidance and control for strap-down high-speed missiles,” IEEE Trans. Control Syst. Technol., vol. 32, no. 6, pp. 2436–2443, Nov. 2024. doi: 10.1109/TCST.2024.3401609
[23]	X. Zhang, W. Pan, R. Scattolini, S. Yu, and X. Xu, “Robust tube-based model predictive control with Koopman operators,” Automatica, vol. 137, Art. no. 110114, Mar. 2022. doi: 10.1016/j.automatica.2021.110114
[24]	L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning-based model predictive control: Toward safe learning in control,” Annu. Rev. Control Robot. Autonom. Syst., vol. 3, pp. 269–296, May 2020. doi: 10.1146/annurev-control-090419-075625
[25]	C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,” IEEE Trans. Robot., vol. 39, no. 3, pp. 1749–1767, Jun. 2023. doi: 10.1109/TRO.2022.3232542
[26]	H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, Jul. 2014. doi: 10.1016/j.automatica.2014.05.011
[27]	Y. Yang, Y. Yin, W. He, K. G. Vamvoudakis, H. Modares, and D. C. Wunsch, “Safety-aware reinforcement learning framework with an actor-critic-barrier structure,” in Proc. American Control Conf., Philadelphia, USA, 2019, pp. 2352−2358.
[28]	Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Safe intermittent reinforcement learning with static and dynamic event generators,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 12, pp. 5441–5455, Dec. 2020. doi: 10.1109/TNNLS.2020.2967871
[29]	Y. Yang, K. G. Vamvoudakis, and H. Modares, “Safe reinforcement learning for dynamical games,” Int. J. Robust Nonlinear Control, vol. 30, no. 9, pp. 3706–3726, Jun. 2020. doi: 10.1002/rnc.4962
[30]	Z. Marvi and B. Kiumarsi, “Safe reinforcement learning: A control barrier function optimization approach,” Int. J. Robust Nonlinear Control, vol. 31, no. 6, pp. 1923–1940, Apr. 2021. doi: 10.1002/rnc.5132
[31]	M. Zhao, D. Wang, S. Song, and J. Qiao, “Safe Q-learning for data-driven nonlinear optimal control with asymmetric state constraints,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 12, pp. 2408–2422, Dec. 2024. doi: 10.1109/jas.2024.124509
[32]	S. Liu, L. Liu, and Z. Yu, “Safe reinforcement learning for affine nonlinear systems with state constraints and input saturation using control barrier functions,” Neurocomputing, vol. 518, pp. 562–576, Jan. 2023. doi: 10.1016/j.neucom.2022.11.006
[33]	L. Zhang, L. Xie, Y. Jiang, Z. Li, X. Liu, and H. Su, “Optimal control for constrained discrete-time nonlinear systems based on safe reinforcement learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 36, pp. 854–865, Jan. 2025. doi: 10.1109/TNNLS.2023.3326397
[34]	G. Mamakoukas, I. Abraham, and T. D. Murphey, “Learning stable models for prediction and control,” IEEE Trans. Robot., vol. 39, no. 3, pp. 2255–2275, Jun. 2023. doi: 10.1109/TRO.2022.3228130
[35]	S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems,” Proc. Natl. Acad. Sci. USA, vol. 113, no. 15, pp. 3932–3937, Mar. 2016. doi: 10.1073/pnas.1517384113
[36]	L. Shi and K. Karydis, “ACD-EDMD: Analytical construction for dictionaries of lifting functions in Koopman operator-based nonlinear robotic systems,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 906–913, Apr. 2022. doi: 10.1109/LRA.2021.3133001
[37]	W. Hao, B. Huang, W. Pan, D. Wu, and S. Mou, “Deep Koopman learning of nonlinear time-varying systems,” Automatica, vol. 159, Art. no. 111372, Jan. 2024. doi: 10.1016/j.automatica.2023.111372
[38]	A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Trans. Autom. Control, vol. 62, no. 8, pp. 3861–3876, Aug. 2017. doi: 10.1109/TAC.2016.2638961
[39]	A. G. Wills and W. P. Heath, “Barrier function based model predictive control,” Automatica, vol. 40, no. 8, pp. 1415–1422, Aug. 2004. doi: 10.1016/j.automatica.2004.03.002
[40]	A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof,” IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 38, no. 4, pp. 943–949, Aug. 2008. doi: 10.1109/TSMCB.2008.926614