I obtained my Ph.D. in the area of Autonomy and Control in the School of Aeronautics and Astronautics, Purdue University in July 2021. Prior to Purdue, I worked as a research assistant at Technical University of Munich, Germany. I obtained my Master and Bachelor degrees in Control Science and Engineering from Harbin Institute of Technology, China.
My research lies at the intersection of control, machine learning, and optimization, with motivations stemming from fundamental and pressing challenges in robot and human-robot autonomy.
- Differentiable control and learning, Safe learning and control
- (Inverse) optimal control, (Inverse) reinforcement learning
- Differential games, Robust control, Adversarial learning
- Robot learning with human-on-the-loop, Human-robot collaboration
- Learning from demonstrations, Contact-rich robot manipulations
- Motion and task planning, Computation of cognition & motor control
My long-term goal is to bring together the complementary benefits of the three areas to develop new theories, methods, and systems that provision efficiency, safety, robustness, long-duration adaptability and can be effortlessly deployed onto real-world robots and human-robot systems.
Selected Publications & Submissions
AbstractWe propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic safe differentiable framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of both immediate and long-term constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different types of state and input constraints by incorporating them into the cost and loss through barrier functions. We prove the following fundamental features of Safe PDP: first, both the constrained solution and its gradient in backward pass can be approximated by solving a more efficient unconstrained counterpart; second, the approximation for both the solution and its gradient can be controlled for arbitrary accuracy using a barrier parameter; and third, importantly, any intermediate results throughout the approximation and optimization are strictly respecting all constraints, thus guaranteeing safety throughout the entire learning and control process. We demonstrate the capabilities of Safe PDP in solving various safe learning and control tasks, including safe policy optimization, safe motion planning, and learning MPCs from demonstrations, on different challenging control systems such as 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
AbstractThis paper proposes a technique which enables a robot to learn a control objective function incrementally from human user's corrections. The human's corrections can be as simple as directional corrections -- corrections that indicate the direction of a control change without indicating its magnitude -- applied at some time instances during the robot's motion. We only assume that each of the human's corrections, regardless of its magnitude, points in a direction that improves the robot's current motion relative to an implicit objective function. The proposed method uses the direction of a correction to update the estimate of the objective function based on a cutting plane technique. We establish the theoretical results to show that this process of incremental correction and update guarantees convergence of the learned objective function to the implicit one. The method is validated by two human-robot games, where human players teach a 2-link robot arm and a 6-DoF quadrotor system for motion planning in environments with obstacles, and also on a real quadrotor system in a user study.
AbstractThis paper proposes an approach which enables a robot to learn an objective function from sparse demonstrations of an expert. The demonstrations are given by a small number of sparse waypoints; the waypoints are desired outputs of the robot's trajectory at certain time instances, sparsely located within a demonstration time horizon. The duration of the expert's demonstration may be different from the actual duration of the robot's execution. The proposed method enables to jointly learn an objective function and a time-warping function such that the robot's reproduced trajectory has minimal distance to the sparse demonstration waypoints. Unlike existing inverse reinforcement learning techniques, the proposed approach uses the differential Pontryagin's maximum principle, which allows direct minimization of the distance between the robot's trajectory and the sparse demonstration waypoints and enables simultaneous learning of an objective function and a time-warping function. We demonstrate the effectiveness of the proposed approach in various simulated scenarios. We apply the method to learn motion planning/control of a 6-DoF maneuvering unmanned aerial vehicle (UAV) and a robot arm in environments with obstacles. The results show that a robot is able to learn a valid objective function to avoid obstacles with few demonstrated waypoints.
Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework
Wanxin Jin, Zhaoran Wang, Zhuoran Yang, and Shaoshuai Mou
Advances in Neural Information Processing Systems (NeurIPS), 2020
[PDF] / [Code] / [Videos]
AbstractThis paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP distinguishes from existing methods by two novel techniques: first, we differentiate through Pontryagin’s Maximum Principle, and this allows to obtain the analytical derivative of a trajectory with respect to tunable parameters within an optimal control system, enabling end-to-end learning of dynamics, policies, or/and control objective functions; and second, we propose an auxiliary control system in the backward pass of the PDP framework, and the output of this auxiliary control system is the analytical derivative of the original system’s trajectory with respect to the parameters, which can be iteratively solved using standard control tools. We investigate three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning. We demonstrate the capability of the PDP in each learning mode on different high-dimensional systems, including multi-link robot arm, 6-DoF maneuvering quadrotor, and 6-DoF rocket powered landing.
Inverse Optimal Control from Incomplete Trajectory Observations
Wanxin Jin, Dana Kulic, Shaoshuai Mou, and Sandra Hirche
The International Journal of Robotics Research (IJRR), 40(6-7):848–865, 2021
[PDF] / [Code]
AbstractThis article develops a methodology that enables learning an objective function of an optimal control system from incomplete trajectory observations. The objective function is assumed to be a weighted sum of features (or basis functions) with unknown weights, and the observed data is a segment of a trajectory of system states and inputs. The proposed technique introduces the concept of the recovery matrix to establish the relationship between any available segment of the trajectory and the weights of given candidate features. The rank of the recovery matrix indicates whether a subset of relevant features can be found among the candidate features and the corresponding weights can be learned from the segment data. The recovery matrix can be obtained iteratively and its rank non-decreasing property shows that additional observations may contribute to the objective learning. Based on the recovery matrix, a method for using incomplete trajectory observations to learn the weights of selected features is established, and an incremental inverse optimal control algorithm is developed by automatically finding the minimal required observation. The effectiveness of the proposed method is demonstrated on a linear quadratic regulator system and a simulated robot manipulator.
AbstractThis paper develops a distributed approach for inverse optimal control (IOC) in multi-agent systems. Here each agent can only communicate with certain nearby neighbors and only accesses segments of system’s trajectory, which is not sufficient for the agent to solve the IOC problem alone. By introducing the concept of the data effectiveness and bridging the connection between each segment and its contribution to solving IOC, we formulate the IOC problem as a problem of achieving least-square solutions via a distributed algorithm. Simulations are provided to validate the proposed distributed IOC approach.
AbstractIn this paper, we consider a dynamical system whose trajectory is a result of minimizing a multiphase cost function. The multiphase cost function is assumed to be a weighted sum of specified features (or basis functions) with phase-dependent weights that switch at some unknown phase transition points. A new inverse optimal control approach for recovering the cost weights of each phase and estimating the phase transition points is proposed. The key idea is to use a length-adapted window moving along the observed trajectory, where the window length is determined by finding the minimal observation length that suffices for a successful cost weight recovery. The effectiveness of the proposed method is first evaluated on a simulated robot arm, and then, demonstrated on a dataset of human participants performing a series of squatting tasks. The results demonstrate that the proposed method reliably retrieves the cost function of each phase and segments each phase of motion from the trajectory with a segmentation accuracy above 90%.
Academic Honors & Awards
- Best Student Paper Finalist at IEEE 40th Digital Avionics Systems Conference (DASC) — 09.2021
- ICON Outstanding Research Awards, Purdue University — 04.2021
- Magoon Award for Excellence in Teaching, Purdue University — 09.2020
- Ross Fellowship, Purdue University — 2017-2018
- First prize winner of Provincial Science and Technology Award, Heilongjiang, China – 06.2017