Text-to-Motion Synthesis using Discrete Diffusion Model

Ankur Chemburkar (USC Institute for Creative Technologies),* Shuhong Lu (USC Institute for Creative Technologies), Andrew Feng (USC Institute for Creative Technologies)

The 34^th British Machine Vision Conference

Abstract

We present the motion discrete diffusion model (MoDDM) for synthesizing human motion from text descriptions that addresses challenges in cross-modal mapping and motion diversity. The previous methods that utilized variational autoencoder (VAE) to learn the latent distributions for text-to-motion synthesis tend to produce motions with less diversity and fidelity. While the diffusion models show promising results by generating high quality motions, they require higher computational costs and may produce motions less aligned with the input text. The proposed method combines the discrete latent space and diffusion models to learn an expressive conditional probabilistic mapping for motion synthesis. Our method utilizes vector quantization variational autoencoder (VQ-VAE) to learn discrete motion tokens and then applies discrete denoising diffusion probabilistic models (D3PM) to learn the conditional probability distributions for the motion tokens. The discrete classifier-free guidance is further utilized in the training process with proper guidance scale for aligning the motions and the corresponding text descriptions. By learning the denoising model in the discrete latent space, the method produces high quality motion results while greatly reducing computational costs compared to training the diffusion models on raw motion sequences. The evaluation results show that the proposed approach outperforms previous methods in both motion quality and text-to-motion matching accuracy.

Video

Citation

@inproceedings{Chemburkar_2023_BMVC,
author    = {Ankur Chemburkar and Shuhong Lu and Andrew Feng},
title     = {Text-to-Motion Synthesis using Discrete Diffusion Model},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year      = {2023},
url       = {https://papers.bmvc2023.org/0624.pdf}
}

Copyright © 2023 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection