Projects  /  Simulation  /  RL Control for UR10 Robot
Simulation Feb – Apr 2025

Reinforcement Learning Control for UR10 Robot

DDPG and A3C implemented from scratch for pick-and-place manipulation on a simulated UR10 arm, plus an imitation learning pipeline for peg-in-hole tasks.

PyTorchDDPGA3CGymImitation LearningPythonManipulation
← Back to Projects

Overview

This project explores reinforcement learning and imitation learning approaches for robotic manipulation on a simulated UR10 6-DOF arm. The focus is on training policies for pick-and-place and peg-in-hole tasks from scratch — without hand-coded controllers.

Approach

DDPG Implementation: Deep Deterministic Policy Gradient was implemented from scratch in PyTorch with an actor-critic architecture, experience replay buffer, and target network soft updates. Careful reward shaping and action normalization were key to stable training.

A3C Implementation: Asynchronous Advantage Actor-Critic was also implemented from scratch using Python multiprocessing for parallel environment rollouts. A3C converged faster on the pick-and-place task due to more diverse experience collection.

Imitation Learning Pipeline: Expert demonstrations were recorded via scripted policies and used to pre-train a BC baseline. An IL pipeline with data augmentation then improved upon this baseline for the peg-in-hole task, which has a narrow success region difficult for pure RL to discover.

Results

~85%DDPG/A3C success rate
~55%IL baseline success
Multi-objectGrasp generalization

DDPG and A3C both achieved approximately 85% success on pick-and-place tasks after convergence. The imitation learning baseline achieved ~55% on peg-in-hole and improved with additional demonstration data and augmentation. Feature representations generalized grasping across multiple object geometries.

Media

🎥 Demo video and project images coming soon.