DDPG and A3C implemented from scratch for pick-and-place manipulation on a simulated UR10 arm, plus an imitation learning pipeline for peg-in-hole tasks.
This project explores reinforcement learning and imitation learning approaches for robotic manipulation on a simulated UR10 6-DOF arm. The focus is on training policies for pick-and-place and peg-in-hole tasks from scratch — without hand-coded controllers.
DDPG Implementation: Deep Deterministic Policy Gradient was implemented from scratch in PyTorch with an actor-critic architecture, experience replay buffer, and target network soft updates. Careful reward shaping and action normalization were key to stable training.
A3C Implementation: Asynchronous Advantage Actor-Critic was also implemented from scratch using Python multiprocessing for parallel environment rollouts. A3C converged faster on the pick-and-place task due to more diverse experience collection.
Imitation Learning Pipeline: Expert demonstrations were recorded via scripted policies and used to pre-train a BC baseline. An IL pipeline with data augmentation then improved upon this baseline for the peg-in-hole task, which has a narrow success region difficult for pure RL to discover.
DDPG and A3C both achieved approximately 85% success on pick-and-place tasks after convergence. The imitation learning baseline achieved ~55% on peg-in-hole and improved with additional demonstration data and augmentation. Feature representations generalized grasping across multiple object geometries.