ANDREW GILBERT
  • Home
  • Contact

Fusing visual and inertial sensors with semantics for 3d human pose estimation ​

Andrew Gilbert [1], Matthew Trumble [1], Charles Malleson [1], Adrian Hilton [1], John Collomosse [1,2]
[1] University of Surrey,  [2] Adobe Research
In International Journal of Computer Vision, 127 (4), 381-397, 2019
Picture
Our two-stream network fuses IMU data with volumetric (PVH) data derived from multiple viewpoint video (MVV) to learn an embedding
for 3D joint locations (human pose)

Abstract

We propose an approach to accurately estimate 3D human pose by fusing multi-viewpoint video (MVV) with inertial measurement unit (IMU) sensor data, without optical markers, a complex hardware setup or a full body model. Uniquely we use a multi-channel 3D convolutional neural network to learn a pose embedding from visual occupancy and semantic 2D pose estimates from the MVV in a discretised volumetric probabilistic visual hull. The learnt pose stream is concurrently processed with a forward kinematic solve of the IMU data and a temporal model (LSTM) exploits the rich spatial and temporal
long range dependencies among the solved joints, the two streams are then fused in a final fully connected layer. The two complementary data sources allow for ambiguities to be resolved within each sensor modality, yielding improved accuracy over prior methods. Extensive evaluation is performed with state of the art performance reported on the popular Human 3.6M dataset, the newly released TotalCapture dataset and a challenging set of outdoor videos TotalCaptureOutdoor.We release the new hybridMVVdataset (TotalCapture) comprising of multi-viewpoint video, IMU and accurate 3D skeletal joint ground truth derived from a commercial motion capture system. The dataset is available online at http://cvssp.org/data/totalcapture/.
Picture
Network architecture comprising two streams: a 3D Convnet for MVV pose embedding, and kinematic solve from IMUs. Both streams
pass through LSTM before the Fusion of the concatenated estimates in a further FC layer

Paper

Picture
Fusing visual and inertial sensors with semantics for 3d human pose estimation A Gilbert, M Trumble, C Malleson, A Hilton, J Collomosse International Journal of Computer Vision 127 (4), 381-397, 2019​
Paper
TotalCapture Dataset

Videos

​Citation

@inproceedings{Gilbert:BMVC:2020,
AUTHOR = Gilbert, Andrew and Trumble, Matthew and Malleson, Charles  and Hilton, Adrian and Collomosse, John",
TITLE = "Fusing visual and inertial sensors with semantics for 3d human pose estimation,",
BOOKTITLE = "In Proc International Journal of Computer Vision (IJCV)",
YEAR = "2019",
​}
Powered by Create your own unique website with customizable templates.
  • Home
  • Contact