Andrew Gilbert
  • Home
  • Semantic 3D Pose
  • NAS-DIP
  • Visual & IMU 3D Pose
  • Inpainting MVV

Semantic Estimation of 3D Body Shape and Pose using Minimal Cameras

Andrew Gilbert[1], Mat trumble[1], Adrian Hilton[1], John Collomosse[1,2]
[1] Centre for Vision Speech and Signal Processing, University of Surrey
[2] Creative Intelligence Lab, Adobe Research
31st British Machine Vision Conference 2020

Abstract

We aim to simultaneously estimate the 3D articulated pose and high fidelity volumetric occupancy of human performance, from multiple viewpoint video (MVV) with as few as two views. We use a multi-channel symmetric 3D convolutional encoder-decoder with a dual loss to enforce the learning of a latent embedding that enables inference of skeletal joint positions and a volumetric reconstruction of the performance. The inference is regularised via a prior learned over a dataset of view-ablated multi-view video footage of a wide range of subjects and actions, and show this to generalise well across unseen subjects and actions. We demonstrate improved reconstruction accuracy and lower pose estimation error relative to prior work on two MVV performance capture datasets: Human 3.6M and TotalCapture.

​Paper

Picture

​Citation

@INPROCEEDINGS{Gilbert20,
title = {Semantic Estimation of 3D Body Shape and Pose using Minimal Cameras},
year={2020},
booktitle={British Machine Vision Conference,
author={Gilbert, A. and Trumble, M. and Hilton, A and Collomosse, J.}
​ }
  • Home
  • Semantic 3D Pose
  • NAS-DIP
  • Visual & IMU 3D Pose
  • Inpainting MVV