Cross-Modal Attention for Accurate Pedestrian Trajectory Prediction

Mayssa ZAIER (IMT NORD EUROPE),* Hazem Wannous (University of Lille), Hassen Drira (University of Strasbourg), Jacques boonaert (imt lille douai)
The 34th British Machine Vision Conference


Accurately predicting human behavior is essential for a variety of applications, including self-driving cars, surveillance systems, and social robots. However, predicting human movement is challenging due to the complexity of physical environments and social interactions. Most studies focus on static environmental information, while ignoring the dynamic visual information available in the scene. To address this issue, we propose a novel approach called Cross-Modal Attention Trajectory Prediction (CMATP) able to predict human paths based on observed trajectory and dynamic scene context. Our approach uses a bimodal transformer network to capture complex spatio-temporal interactions and incorporates both pedestrian trajectory data and contextual information. Our approach achieves state-of-the-art performance on three real-world pedestrian prediction datasets, making it a promising solution for improving the safety and reliability of pedestrian detection and tracking systems.



