Dynamics Learning with Cascaded Variational
Inference for Multi-Step Manipulation


The fundamental challenge of planning for multi-step manipulation is to find effective and plausible action sequences that lead to the task goal. We present Cascaded Variational Inference (CAVIN) Planner, a model-based method that hierarchically generates plans by sampling from latent spaces. To facilitate planning over long time horizons, our method learns latent representations that decouple the prediction of high-level effects from the generation of low-level motions through cascaded variational inference. This enables us to model dynamics at two different levels of temporal resolutions for hierarchical planning. We evaluate our approach in three multi-step robotic manipulation tasks in cluttered tabletop environments given high-dimensional observations. Empirical results demonstrate that the proposed method outperforms state-of-the-art model-based methods by strategically interacting with multiple objects.
Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation
Kuan Fang, Yuke Zhu, Animesh Garg, Silvio Savarese, Li Fei-Fei
Conference on Robot Learning (CoRL), 2019 [PDF]  ·  [BibTex]



We observe that the robot comes up with diverse strategies in different task scenarios.

Open Path: When the target object is surrounded by obstacle objects, the robot opens a path for the target object towards the goal without entering the restricted area (red tiles).

Get Around: In presence of a pile of obstacle objects between the target and the goal, the robot pushes the target around.

Squeeze Through: When there is a small gap between a bunch of objects, the robot squeezes the target object through the gap.

Move Away Obstacles: When pushing the target object across the bridge (grey tiles), the robot clears obstacle objects one by one along the way.

Push Target Through Obstacles: When the robot cannot directly reach the target object, it squeezes the target object by pushing obstacle objects.

Clean up a workspace: Clean up a workspace: The robot moves objects out of a designated workspace (blue tiles).


Various layouts and objects for in each task in simulation and the real world.


We’ve released our codebase and the task environments in simulation and the real world.