A ROS2+Gazebo simulation stack enabling non-verbal gesture control of a robot, with a customizable gesture-mapping interface and 90%+ detection accuracy.
This project builds an accessibility-focused human-robot interface enabling non-verbal users to control a simulated robot using hand gestures. The system is designed to be configurable — operators can define custom gesture-to-command mappings without modifying code.
Gesture Detection: MediaPipe Hands provides real-time 21-keypoint hand landmark detection from a standard webcam. Gesture classifiers trained on these landmarks map hand configurations to robot commands with robust performance across lighting variations.
ROS2 Integration: Recognized gestures are published as ROS2 topics, consumed by a gesture-to-velocity mapping node that translates discrete gesture classes into continuous robot velocity commands for the Gazebo simulation.
Configurable Interface: A YAML-based gesture configuration system allows operators to define custom gesture-command mappings and threshold sensitivities without editing source code. Automated setup scripts reduced per-robot-variant configuration from ~3 hours to ~1 hour.
The system achieved over 90% gesture detection accuracy in simulation trials. The configurable mapping interface reduced adaptation time for new gestures by approximately 40%. Automated setup scripts cut per-robot-variant configuration from roughly 3 hours to 1 hour.