Master of Science in Robotics — Thesis

Autonomous Perception for
Mobile Manipulator Nursing Robot

An end-to-end autonomous perception framework for IONA — a bi-manual mobile nursing robot — integrating hybrid 6D pose estimation, visual SLAM navigation, and human-aware interaction for real-world assistive tasks.

Institution Worcester Polytechnic Institute
Degree M.S. Robotics Engineering
Advisor Prof. Jane Li
Year May 2026
Lab HiRO Lab
← Back to Projects

Abstract

Nursing shortages and the cognitive demands of clinical work motivate autonomous robotic assistance in healthcare. Existing approaches rely either on continuous teleoperation or data-intensive learning methods impractical for real-world deployment. This thesis presents an autonomous perception framework for a mobile manipulator nursing robot (IONA — Intelligent rObotics Nursing Assistant) combining a YOLO-based object detector with a hybrid 6D pose estimation pipeline — PoseMate, a high-precision mesh-based estimator, and LazyPose, a real-time approximator at 4 Hz — alongside visual SLAM-based navigation and human-aware perception. A validation-based filtering mechanism rejects inconsistent pose estimates before aggregation. Evaluated on a physical platform across representative nursing tasks, the system achieved a 90% pick-and-place success rate with sub-0.5 cm positional error and rotational error below 5°, and 80% navigation success. These results demonstrate that practical assistive robotics can be achieved through structured perception and modular execution, without reliance on continuous human control or data-intensive learning.

Key Results

90% Pick-and-place task success rate
<0.5cm Positional error in planar axes (PoseMate)
<5° Rotational error across all axes
80% Navigation success rate (indoor)
4 Hz LazyPose real-time inference rate
18 Object classes (food + medical) in custom dataset

System Overview

IONA is a bi-manual mobile humanoid robot built in-house at WPI's Human-Inspired Robotics (HiRO) Lab. It features a Fetch Freight 100 mobile base, dual Kinova Gen3 7-DoF arms with Robotiq parallel grippers, and a multi-camera RGB-D sensing suite. The software stack is distributed across a ROS Noetic robot-side system and an ROS2 Humble external compute node communicating over a custom TCP/UDP bridge.

This thesis integrates a modular perception subsystem into IONA's existing architecture, transitioning it from a predominantly teleoperated platform toward structured autonomy for representative nursing tasks — pick and place, shelf organization, and device manipulation.

📷 Scene
Observation
🔍 Object
Detection
📐 Pose
Estimation
🗺️ Motion
Planning
🤖 Task
Execution
👁️ Human
Monitoring
ROS2 Humble ROS Noetic PyTorch Intel RealSense D435 Kinova Gen3 MoveIt + OMPL RTAB-Map Detectron2 FoundationPose MMPose Python OpenCV

Hybrid Pose Estimation

A central contribution of the thesis is a dual-mode pose estimation framework that selects between accuracy and speed based on the task and object properties.

Method Speed Accuracy Planar Error Depth Error Use Case
PoseMate 10–20 s High < 0.5 cm ~1.5 cm Complex objects
LazyPose ~4 Hz Moderate < 1 cm < 3 cm Simple / predictable

PoseMate is built on the FoundationPose framework and uses RGB-D data, instance segmentation masks, and pre-scanned object meshes to compute full 6D pose. A validation pipeline (depth error, mask reprojection IoU) filters and aggregates across five captured frames before accepting a final estimate.

LazyPose provides a lightweight real-time 4D pose (3D position + planar yaw) by applying PCA on the segmentation mask and retrieving depth at the median pixel — enabling continuous streaming at 4 Hz for predictable objects.

Task-Level Performance

Pick & Place 90% ~30 s per object avg.
Navigation 80% RTAB-Map visual SLAM
Shelf Org. Fiducial-guided placement
Device Manip. Pump & device control

Key Contributions

  • Integrated Perception Module into IONA: Transitioned IONA from a teleoperated platform to a structured autonomous system by incorporating object detection, pose estimation, active camera selection, and human-aware monitoring.
  • Hybrid Pose Estimation Framework: A dual-mode system combining fast approximate estimation (LazyPose) with accurate mesh-based estimation (PoseMate), selected dynamically based on object characteristics.
  • Custom Dataset & Mesh Pipeline: Collected and annotated ~1,200 images across 18 medical and food object classes using Roboflow; acquired 3D meshes via handheld Creality scanner and Blender post-processing.
  • Active Viewpoint Selection: Task-driven single-camera selection strategy as a practical alternative to computationally expensive multi-camera fusion.
  • Extensible Perception Architecture: Modular design enabling new object classes, meshes, and task routines to be added with minimal changes to the pipeline.
  • End-to-End Task Pipelines: Functional integration for pick-and-place, shelf organization, and device manipulation demonstrating perception-driven autonomous execution on a physical robot.
  • Human-Aware Execution: Velocity-threshold-based motion monitoring with automatic pause and retraction, enabling safe co-existence in shared spaces.

Hardware Platform

IONA integrates the following hardware into a ~1700 mm tall bi-manual mobile humanoid:

  • Mobile Base: Fetch Freight 100 — indoor navigation with dedicated RGB-D camera for SLAM.
  • Arms: 2× Kinova Gen3 7-DoF with Robotiq 2-finger parallel grippers (85 mm).
  • Cameras: Chest Intel RealSense D435 (primary detection), Neck D435 (scene/human tracking), Left Wrist Kinova camera (close-range), Base D435i with IMU (navigation/SLAM).
  • Compute: Onboard Fetch computer (ROS Noetic) + External workstation Intel i7 / RTX 3080 (ROS2 Humble, perception + planning).

Navigation & Spatial Awareness

Navigation is implemented as an independent ROS Noetic module using RTAB-Map for visual RGB-D SLAM (mapping + localization) and the move_base framework with DWA local planner for goal-directed motion planning. The system uses only a single RealSense D435i for all navigation sensing (RGB, depth, IMU), keeping the pipeline simple and consistent.

Maps are built offline via teleoperation and reloaded for autonomous deployment. Loop closure detection corrects accumulated drift during localization. Navigation and manipulation are sequenced, not concurrent, reducing system complexity.

Media & Demos

🎥

Demo videos, hardware photos, and experiment recordings coming soon.
Check the GitHub repository for the latest updates.