RL Control for UR10 Robot

Overview

This project explores reinforcement learning and imitation learning approaches for robotic manipulation on a simulated UR10 6-DOF arm. The focus is on training policies for pick-and-place and peg-in-hole tasks from scratch — without hand-coded controllers.

Approach

DDPG Implementation: Deep Deterministic Policy Gradient was implemented from scratch in PyTorch with an actor-critic architecture, experience replay buffer, and target network soft updates. Careful reward shaping and action normalization were key to stable training.

A3C Implementation: Asynchronous Advantage Actor-Critic was also implemented from scratch using Python multiprocessing for parallel environment rollouts. A3C converged faster on the pick-and-place task due to more diverse experience collection.

Imitation Learning Pipeline: Expert demonstrations were recorded via scripted policies and used to pre-train a BC baseline. An IL pipeline with data augmentation then improved upon this baseline for the peg-in-hole task, which has a narrow success region difficult for pure RL to discover.

Results

~85%DDPG/A3C success rate

~55%IL baseline success

Multi-objectGrasp generalization

DDPG and A3C both achieved approximately 85% success on pick-and-place tasks after convergence. The imitation learning baseline achieved ~55% on peg-in-hole and improved with additional demonstration data and augmentation. Feature representations generalized grasping across multiple object geometries.

Media

🎥 Demo video and project images coming soon.

Reinforcement Learning Control for UR10 Robot

Overview

Approach

Results

Media