6-DoF Pose Estimation | Divam Trivedi

Overview

This project develops a marker-less 6-DoF pose estimation system for common medical objects — bottles, syringes, and IV trays — designed to run onboard the WPI nursing assistance robot. By eliminating the need for physical fiducial markers, the system makes deployment in real clinical spaces significantly more practical.

The estimator relies on a multi-view fusion pipeline fed by Intel RealSense D435 depth cameras, outputting 6-DoF pose (position + orientation) for each detected object in the robot's workspace.

Approach

Vision-Only Pipeline: The system processes synchronized RGB-D frames from multiple calibrated cameras. Feature extraction and depth back-projection are used to initialize pose hypotheses, which are then refined through iterative alignment.

Multi-View Fusion: Estimates from individual viewpoints are fused probabilistically to reduce ambiguity caused by occlusions or specular surfaces common in medical environments (plastic syringes, reflective trays).

Embedded Deployment: The model runs on NVIDIA Jetson Nano. A model quantization and pruning roadmap is in progress to push from the current ~2.0 s/frame research prototype toward real-time operation.

Results

~30%Setup time reduction

~2.0sInference on Jetson

<5%Pose error

Eliminating physical markers reduced experimental setup time by approximately 30%. Pose estimation error remained under 5% across all tested object categories in controlled hospital testbed conditions.

Media

🎥 Demo video and project images coming soon.

6-DoF Pose Estimation for Medical Objects

Overview

Approach

Results

Media