Domain-Adaptive Semantic Segmentation with Memory-Efficient Cross-Domain Transformers

Ruben Mascaro (ETH Zurich),* Lucas Teixeira (ETH Zurich), Margarita Chli (ETH Zurich)
The 34th British Machine Vision Conference


Unsupervised Domain Adaptation (UDA), a process by which a model trained on a well-annotated source dataset is adapted to an unlabeled target dataset, has emerged as a promising solution for deploying semantic segmentation models in scenarios where annotating extensive amounts of data is cost-prohibitive. Although the recent development of UDA strategies exploiting Transformer-based architectures has represented a major advance in the field, current approaches struggle to effectively learn context dependencies in the target domain, leading to suboptimal semantic label predictions. Aiming at addressing this issue, in this work we introduce a generic three-branch Transformer block that combines self- and cross-attention mechanisms for better source and target feature alignment. We then show how the proposed architecture can be seamlessly incorporated into state-of-the-art self-training UDA schemes for semantic segmentation, yielding enhanced adaptation capabilities without increasing the GPU memory footprint during training. The resulting framework significantly outperforms its baseline on benchmarking datasets for synthetic-to-real (+1.4 mIoU on GTA→Cityscapes and +1.1 mIoU on SYNTHIA→Cityscapes) and clear-to-adverse-weather (+3.4 mIoU on Cityscapes→ACDC) UDA. In addition, it achieves superior robustness compared to using existing cross-domain Transformer architectures that require substantially more GPU memory for training.



author    = {Ruben Mascaro and Lucas Teixeira and Margarita Chli},
title     = {Domain-Adaptive Semantic Segmentation with Memory-Efficient Cross-Domain Transformers},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
url       = {}

Copyright © 2023 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection