Simulation Mar – May 2025

Imitation Learning for Peg Placement (ACT + Transformers)

Action Chunking Transformer with VAE latent encoding trained on expert demonstrations — outperforming behavior cloning baselines by 30% on constrained peg placement tasks.

TransformersVAEPyTorchAction ChunkingImitation LearningPython

← Back to Projects

Overview

This project implements Action Chunking Transformers (ACT) for learning peg placement manipulation from expert demonstrations. ACT represents actions as chunks predicted in latent space using a Variational Autoencoder, enabling smoother, more consistent policies than step-wise behavior cloning.

Approach

Expert Demonstration Collection: Demonstrations were collected via scripted oracle policies in simulation, capturing diverse approach trajectories and insertion attempts for the peg-in-hole task.

Action Chunking Transformer: The ACT architecture encodes sequences of actions into a latent space using a VAE. At inference time, the policy predicts chunks of future actions jointly, reducing compounding errors that plague step-by-step cloning.

Training Improvements: Latent-space smoothing was applied to handle multi-modal demonstration distributions. A teacher-forcing curriculum gradually reduced ground-truth conditioning during training to improve closed-loop performance.

Results

+30%Over BC baseline

VAELatent action encoding

GeneralizesUnseen configs

ACT outperformed the behavior cloning baseline by approximately 30% on constrained peg placement success rate. Transfer to unseen initial configurations was demonstrated through latent-space smoothing, showing that the learned representations generalized beyond the training distribution.

Media

🎥 Demo video and project images coming soon.