Andrew Gilbert
  • Home
  • Semantic 3D Pose
  • NAS-DIP
  • Visual & IMU 3D Pose
  • Inpainting MVV

Fusing Visual and Inertial Sensors with Semantics for 3D Human Pose Estimation

Andrew Gilbert[1], Mat trumble[1], Charles Malleson[1], Adrian Hilton[1], John Collomosse[1,2]
[1] Centre for Vision Speech and Signal Processing, University of Surrey
[2] Creative Intelligence Lab, Adobe Research
International Journal of Computer Vision 127 (4), 381-397, 2019​

Abstract

We propose an approach to accurately estimate 3D human pose by fusing multi-viewpoint video (MVV) with inertial measurement unit (IMU) sensor data, without optical markers, a complex hardware setup or a full body model. Uniquely we use a multi-channel 3D convolutional neural network to learn a pose em- bedding from visual occupancy and semantic 2D pose estimates from the MVV in a discretised volumetric probabilistic visual hull (PVH). The learnt pose stream is concurrently processed with a forward kinematic solve of the IMU data and a temporal model (LSTM) exploits the rich spatial and temporal long range dependencies among the solved joints, the two streams are then fused in a final fully connected layer. The two complementary data sources allow for ambiguities to be resolved within each sensor modality, yielding improved accuracy over prior methods. Extensive evaluation is per- formed with state of the art performance reported on the popular Human 3.6M dataset [26], the newly re- leased TotalCapture dataset and a challenging set of outdoor videos TotalCaptureOutdoor. We release the new hybrid MVV dataset (TotalCapture) comprising of multi- viewpoint video, IMU and accurate 3D skeletal joint ground truth derived from a commercial motion capture system. The dataset is available online at http://cvssp.org/data/totalcapture/.

​Paper

Picture

​Citation

@INPROCEEDINGS{Gilbert19,
title = {Fusing visual and inertial sensors with semantics for 3d human pose estimation},
year={2019},
booktitle={International Journal of Computer Vision 127 (4},
pages={381-397}
author={Gilbert, A. and Trumble, M. and Malleson, C. and Hilton, A and Collomosse, J.}
​ }
  • Home
  • Semantic 3D Pose
  • NAS-DIP
  • Visual & IMU 3D Pose
  • Inpainting MVV