Monocular Real-Time Volumetric Performance Capture (ECCV 2020)

Monocular Real-Time Volumetric Performance Capture

Ruilong Li^*,12, Yuliang Xiu^*,12, Shunsuke Saito¹², Zeng Huang¹², Kyle Olszewski¹² Hao Li¹²³

¹University of Southern California, ²USC Institute for Creative Technologies, ³Pinscreen

ECCV 2020

SIGGRAPH 2020 Real-Time Live

Best in Show Award

Description: Existing volumetric capture systems require many cameras and lengthy post processing. We introduce the first system that can capture a completely clothed human body (including the back) using a single RGB webcam and in real time. Our deep-learning-based approach enables new possibilities for low-cost and consumer-accessible immersive teleportation.

Overview

We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu). While PIFu achieves high-resolution reconstruction in a memory-efficient manner, its computationally expensive inference prevents us from deploying such a system for real-time applications. To this end, we propose a novel hierarchical surface localization algorithm and a direct rendering method without explicitly extracting surface meshes. By culling unnecessary regions for evaluation in a coarse-to-fine manner, we successfully accelerate the reconstruction by two orders of magnitude from the baseline without compromising the quality. Furthermore, we introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which effectively alleviates reconstruction artifacts. Our experiments and evaluations demonstrate the robustness of our system to various challenging angles, illuminations, poses, and clothing styles. We also show that our approach compares favorably with the state-of-the-art monocular performance capture. Our proposed approach removes the need for multi-view studio settings and enables a consumer-accessible solution for volumetric capture.

Monocular Real-Time Volumetric
Performance Capture

Main Contributions

Octree-based Robust Surface Localization

We propose a novel hierarchical surface localization algorithm and a direct rendering method that progressively queries 3D locations in a coarse-to-ﬁne manner and to extract surface from implicit occupancy ﬁelds with a minimum number of points to be evaluate. By culling unnecessary regions for evaluation we successfully accelerate the reconstruction by nearly 200 times without compromising the quality.

Online Hard Example Mining for Surface Sampling

We introduce an Online Hard Example Mining (OHEM) technique that eﬀectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which eﬀectively alleviates reconstruction artifacts.

Qualitative Results (Only Geometry)

Qualitatively evaluate the robustness of our approach by demonstrating theconsistency of reconstruction with different lighting conditions, viewpoints and surfacetopology

Qualitative results on self-captured performances

Qualitative results on Internet Images

More Video Results (Geometry + Texture)

Additional live capture results

Reconstruction from legacy footage

Real-time VR PhD Defense (Dr. Zeng Huang)

Human Digitization with Implicit Representation

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization (ICCV 2019)

Shunsuke Saito*, Zeng Huang*, Ryota Natsume*, Shigeo Morishima, Angjoo Kanazawa, Hao Li
The original work of PIFu for geometry and texture reconstruction, unifying sigle-view and multi-view methods.

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization (CVPR 2020)

Shunsuke Saito, Tomas Simon, Jason Saragih, Hanbyul Joo
High-Resolution and Multi-Level PIFu!

Deep Volumetric Video from Very Sparse Multi-view Performance Capture (ECCV 2018)

Zeng Huang, Tianye Li, Weikai Chen, Yajie Zhao, Jun Xing, Chloe LeGendre, Linjie Luo, Chongyang Ma, Hao Li
Implict surface learning for sparse view human performance capture!

ARCH: Animatable Reconstruction of Clothed Humans (CVPR 2020)

Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, Tony Tung
Learning PIFu in canonical space for animatable avatar generation!

Bibtex