A 10,000+ frame synthetic Vision+IMU dataset built in Blender, paired with MSCKF and deep fusion implementations achieving under 5% trajectory recovery error.
This project addresses a fundamental bottleneck in VIO research: the scarcity of labeled, synchronized Vision+IMU datasets with known ground truth. A synthetic dataset pipeline was built in Blender, and both classical (MSCKF) and learning-based fusion methods were implemented and benchmarked against it.
Synthetic Dataset Generation: Using Blender's scripting API and OysterSim, a pipeline was created to render photorealistic camera sequences with synchronized IMU trajectories. Over 10,000 frames were generated across diverse lighting conditions, motion profiles, and scene types.
MSCKF Implementation: The Multi-State Constraint Kalman Filter was implemented as the classical VIO baseline. Feature tracks from optical flow are used to impose geometric constraints, bounding trajectory drift without loop closure.
Deep Fusion Network: A learning-based approach fuses CNN-extracted visual features with IMU integration windows through a recurrent architecture, learning to weight the two modalities based on motion and lighting conditions.
Both classical MSCKF and the deep fusion network achieved under 5% trajectory recovery error on synthetic benchmarks. The fusion approach improved pose reliability by approximately 20% over vision-only baselines, particularly in high-dynamic-range and textureless scenes where visual features degrade.