Wanxin Jin | videos

Task-Driven Hybrid Model Reduction for Dexterous Manipulation
Learning from Human Directional Corrections
Learning from Sparse Demonstrations
Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework
Safe Pontryagin Differentiable Programming

Task-Driven Hybrid Model Reduction for Dexterous Manipulation

Arxiv: https://arxiv.org/abs/2211.16657
Code (Python): https://github.com/wanxinjin/Task-Driven-Hybrid-Reduction
Webpage: https://wanxinjin.github.io/td_hybridreduction/

Abstract (click to check abstract)

In contact-rich tasks, like dexterous manipulation, the hybrid nature of making and breaking contact creates challenges for model representation and control. For example, choosing and sequencing contact locations for in-hand manipulation, where there are thousands of potential hybrid modes, is not generally tractable. In this paper, we are inspired by the observation that far fewer modes are actually necessary to accomplish many tasks. Building on our prior work learning hybrid models, represented as linear complementarity systems, we find a reduced-order hybrid model requiring only a limited number of task-relevant modes. This simplified representation, in combination with model predictive control, enables real-time control yet is sufficient for achieving high performance. We demonstrate the proposed method first on synthetic hybrid systems, reducing the mode count by multiple orders of magnitude while achieving task performance loss of less than 5%. We also apply the proposed method to a three-fingered robotic hand manipulating a previously unknown object. With no prior knowledge, we achieve state-of-the-art closed-loop performance in less than five minutes of online learning.

Back to Table of Contents

Learning from Human Directional Corrections

Arxiv: https://arxiv.org/abs/2011.15014
Code (Python): https://github.com/wanxinjin/Learning-from-Directional-Corrections

Abstract (click to check abstract)

This paper proposes a novel approach that enables a robot to learn an objective function incrementally from human directional corrections. Existing methods learn from human magnitude corrections; since a human needs to carefully choose the magnitude of each correction, those methods can easily lead to over-corrections and learning inefficiency. The proposed method only requires human directional corrections -- corrections that only indicate the direction of an input change without indicating its magnitude. We only assume that each correction, regardless of its magnitude, points in a direction that improves the robot's current motion relative to an unknown objective function. The allowable corrections satisfying this assumption account for half of the input space, as opposed to the magnitude corrections which have to lie in a shrinking level set. For each directional correction, the proposed method updates the estimate of the objective function based on a cutting plane method, which has a geometric interpretation. We have established theoretical results to show the convergence of the learning process. The proposed method has been tested in numerical examples, a user study on two human-robot games, and a real-world quadrotor experiment. The results confirm the convergence of the proposed method and further show that the method is significantly more effective (higher success rate), efficient/effortless (less human corrections needed), and potentially more accessible (fewer early wasted trials) than the state-of-the-art robot learning frameworks.

Back to Table of Contents

Learning from Sparse Demonstrations

Arxiv: https://arxiv.org/abs/2008.02159
Code (Python): https://github.com/wanxinjin/Learning-from-Sparse-Demonstrations

Abstract (click to check abstract)

This paper develops the method of Continuous Pontryagin Differentiable Programming (Continuous PDP), which enables a robot to learn an objective function from a few sparsely demonstrated keyframes. The keyframes, labeled with some time stamps, are the desired task-space outputs, which a robot is expected to follow sequentially. The time stamps of the keyframes can be different from the time of the robot's actual execution. The method jointly finds an objective function and a time-warping function such that the robot's resulting trajectory sequentially follows the keyframes with minimal discrepancy loss. The Continuous PDP minimizes the discrepancy loss using projected gradient descent, by efficiently solving the gradient of the robot trajectory with respect to the unknown parameters. The method is first evaluated on a simulated robot arm and then applied to a 6-DoF quadrotor to learn an objective function for motion planning in unmodeled environments. The results show the efficiency of the method, its ability to handle time misalignment between keyframes and robot execution, and the generalization of objective learning into unseen motion conditions.

Back to Table of Contents

Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework

NeurIPS 2020 Presentation

Examples of using PDP to solve robotic tasks

PDP for optimal control.

PDP for inverse reinforcement learning.

Arxiv: https://arxiv.org/pdf/1912.12970
Code (Python): https://github.com/wanxinjin/Pontryagin-Differentiable-Programming

Abstract (click to check abstract)

This paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP distinguishes from existing methods by two novel techniques: first, we differentiate through Pontryagin’s Maximum Principle, and this allows to obtain the analytical derivative of a trajectory with respect to tunable parameters within an optimal control system, enabling end-to-end learning of dynamics, policies, or/and control objective functions; and second, we propose an auxiliary control system in the backward pass of the PDP framework, and the output of this auxiliary control system is the analytical derivative of the original system’s trajectory with respect to the parameters, which can be iteratively solved using standard control tools. We investigate three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning. We demonstrate the capability of the PDP in each learning mode on different high-dimensional systems, including multi-link robot arm, 6-DoF maneuvering quadrotor, and 6-DoF rocket powered landing

Back to Table of Contents

Safe Pontryagin Differentiable Programming

NeurIPS 2021 Presentation

Examples of using Safe-PDP to solve safety-critical robotic tasks

Safe PDP for safe neural policy optimization.

Safe PDP for safe motion planning.

Safe PDP for learning MPCs (i.e., jointly learning dynamics, constraints, and control cost) from demonstrations.

Arxiv: https://arxiv.org/abs/2105.14937
Code (Python): https://github.com/wanxinjin/Safe-PDP

Abstract (click to check abstract)

We propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of safety constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different types of system constraints on states and inputs by incorporating them into the cost or loss through barrier functions. We prove three fundamentals of the proposed Safe PDP: first, both the solution and its gradient in the backward pass can be approximated by solving their more efficient unconstrained counterparts; second, the approximation for both the solution and its gradient can be controlled for arbitrary accuracy by a barrier parameter; and third, importantly, all intermediate results throughout the approximation and optimization strictly respect the constraints, thus guaranteeing safety throughout the entire learning and control process. We demonstrate the capabilities of Safe PDP in solving various safety-critical tasks, including safe policy optimization, safe motion planning, and learning MPCs from demonstrations, on different challenging systems such as 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.

Back to Table of Contents