BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation

Uyoung Jeong (Ulsan National Institute of Science and Technology),* Seungryul Baek (UNIST), Hyung Jin Chang (University of Birmingham), Kwang In Kim (POSTECH)
The 34th British Machine Vision Conference


Single-stage multi-person human pose estimation (MPPE) methods have shown great performance improvements, but existing methods fail to disentangle features by individual instances under crowded scenes. In this paper, we propose a bounding box-level instance representation learning called BoIR, which simultaneously solves instance detection, instance disentanglement and instance-keypoint association problems. Our new instance embedding loss provides learning signal on the entire area of the image with bounding box annotations, achieving globally consistent and disentangled instance representation. Our method exploits multi-task learning of bottom-up keypoint estimation, bounding box regression and contrastive instance embedding learning, without additional computational cost during inference. We demonstrate that BoIR outperforms state-of-the-arts on COCO (0.5 AP), CrowdPose (4.9 AP) and OCHuman (3.5 AP).



author    = {Uyoung Jeong and Seungryul Baek and Hyung Jin Chang and Kwang In Kim},
title     = {BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
url       = {}

Copyright © 2023 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection