Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention


Burak Satar (Nanyang Technological University),* Hongyuan Zhu (Institute for Infocomm, Research Agency for Science, Technology and Research (A*STAR) Singapore), Hanwang Zhang (Nanyang Technological University), Joo-Hwee Lim (Institute for Infocomm Research)
The 34th British Machine Vision Conference

Abstract

Many studies focus on improving pretraining or developing new backbones in text-video retrieval. However, existing methods may suffer from the learning and inference bias issue, as recent research suggests in other text-video-related tasks. For instance, spatial appearance features on action recognition or temporal object co-occurrences on video scene graph generation could induce spurious correlations. In this work, we present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips, which is the first such attempt for a text-video retrieval task, to the best of our knowledge. We first hypothesise and verify the bias on how it would affect the model illustrated with a baseline study. Then, we propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100, YouCook2, and MSR-VTT datasets. Our model overpasses the baseline and SOTA on nDCG, a semantic-relevancy-focused evaluation metric which proves the bias is mitigated, as well as on the other conventional metrics.

Video



Citation

@inproceedings{Satar_2023_BMVC,
author    = {Burak Satar and Hongyuan Zhu and Hanwang Zhang and Joo-Hwee Lim},
title     = {Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
url       = {https://papers.bmvc2023.org/0650.pdf}
}


Copyright © 2023 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection