Abstract
Nursing shortages and the cognitive demands of clinical work motivate autonomous robotic assistance in healthcare. Existing approaches rely either on continuous teleoperation or data-intensive learning methods impractical for real-world deployment. This thesis presents an autonomous perception framework for a mobile manipulator nursing robot (IONA — Intelligent rObotics Nursing Assistant) combining a YOLO-based object detector with a hybrid 6D pose estimation pipeline — PoseMate, a high-precision mesh-based estimator, and LazyPose, a real-time approximator at 4 Hz — alongside visual SLAM-based navigation and human-aware perception. A validation-based filtering mechanism rejects inconsistent pose estimates before aggregation. Evaluated on a physical platform across representative nursing tasks, the system achieved a 90% pick-and-place success rate with sub-0.5 cm positional error and rotational error below 5°, and 80% navigation success. These results demonstrate that practical assistive robotics can be achieved through structured perception and modular execution, without reliance on continuous human control or data-intensive learning.
Key Results
System Overview
IONA is a bi-manual mobile humanoid robot built in-house at WPI's Human-Inspired Robotics (HiRO) Lab. It features a Fetch Freight 100 mobile base, dual Kinova Gen3 7-DoF arms with Robotiq parallel grippers, and a multi-camera RGB-D sensing suite. The software stack is distributed across a ROS Noetic robot-side system and an ROS2 Humble external compute node communicating over a custom TCP/UDP bridge.
This thesis integrates a modular perception subsystem into IONA's existing architecture, transitioning it from a predominantly teleoperated platform toward structured autonomy for representative nursing tasks — pick and place, shelf organization, and device manipulation.
Observation
Detection
Estimation
Planning
Execution
Monitoring
Hybrid Pose Estimation
A central contribution of the thesis is a dual-mode pose estimation framework that selects between accuracy and speed based on the task and object properties.
| Method | Speed | Accuracy | Planar Error | Depth Error | Use Case |
|---|---|---|---|---|---|
| PoseMate | 10–20 s | High | < 0.5 cm | ~1.5 cm | Complex objects |
| LazyPose | ~4 Hz | Moderate | < 1 cm | < 3 cm | Simple / predictable |
PoseMate is built on the FoundationPose framework and uses RGB-D data, instance segmentation masks, and pre-scanned object meshes to compute full 6D pose. A validation pipeline (depth error, mask reprojection IoU) filters and aggregates across five captured frames before accepting a final estimate.
LazyPose provides a lightweight real-time 4D pose (3D position + planar yaw) by applying PCA on the segmentation mask and retrieving depth at the median pixel — enabling continuous streaming at 4 Hz for predictable objects.
Task-Level Performance
Key Contributions
- Integrated Perception Module into IONA: Transitioned IONA from a teleoperated platform to a structured autonomous system by incorporating object detection, pose estimation, active camera selection, and human-aware monitoring.
- Hybrid Pose Estimation Framework: A dual-mode system combining fast approximate estimation (LazyPose) with accurate mesh-based estimation (PoseMate), selected dynamically based on object characteristics.
- Custom Dataset & Mesh Pipeline: Collected and annotated ~1,200 images across 18 medical and food object classes using Roboflow; acquired 3D meshes via handheld Creality scanner and Blender post-processing.
- Active Viewpoint Selection: Task-driven single-camera selection strategy as a practical alternative to computationally expensive multi-camera fusion.
- Extensible Perception Architecture: Modular design enabling new object classes, meshes, and task routines to be added with minimal changes to the pipeline.
- End-to-End Task Pipelines: Functional integration for pick-and-place, shelf organization, and device manipulation demonstrating perception-driven autonomous execution on a physical robot.
- Human-Aware Execution: Velocity-threshold-based motion monitoring with automatic pause and retraction, enabling safe co-existence in shared spaces.
Hardware Platform
IONA integrates the following hardware into a ~1700 mm tall bi-manual mobile humanoid:
- Mobile Base: Fetch Freight 100 — indoor navigation with dedicated RGB-D camera for SLAM.
- Arms: 2× Kinova Gen3 7-DoF with Robotiq 2-finger parallel grippers (85 mm).
- Cameras: Chest Intel RealSense D435 (primary detection), Neck D435 (scene/human tracking), Left Wrist Kinova camera (close-range), Base D435i with IMU (navigation/SLAM).
- Compute: Onboard Fetch computer (ROS Noetic) + External workstation Intel i7 / RTX 3080 (ROS2 Humble, perception + planning).
Navigation & Spatial Awareness
Navigation is implemented as an independent ROS Noetic module using RTAB-Map for visual RGB-D SLAM (mapping + localization) and the move_base framework with DWA local planner for goal-directed motion planning. The system uses only a single RealSense D435i for all navigation sensing (RGB, depth, IMU), keeping the pipeline simple and consistent.
Maps are built offline via teleoperation and reloaded for autonomous deployment. Loop closure detection corrects accumulated drift during localization. Navigation and manipulation are sequenced, not concurrent, reducing system complexity.
Media & Demos
Demo videos, hardware photos, and experiment recordings coming soon.
Check the GitHub repository for the latest updates.