Monocular Real-Time Volumetric Performance Capture
Ruilong Li*,12Yuliang Xiu*,12Shunsuke Saito12 Zeng Huang12Kyle Olszewski12  Hao Li123 
1University of Southern California, 2USC Institute for Creative Technologies, 3Pinscreen
ECCV 2020


Paper

  

Code

SIGGRAPH 2020 Real-Time Live

Best in Show Award

Description: Existing volumetric capture systems require many cameras and lengthy post processing. We introduce the first system that can capture a completely clothed human body (including the back) using a single RGB webcam and in real time. Our deep-learning-based approach enables new possibilities for low-cost and consumer-accessible immersive teleportation.
Overview
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu). While PIFu achieves high-resolution reconstruction in a memory-efficient manner, its computationally expensive inference prevents us from deploying such a system for real-time applications. To this end, we propose a novel hierarchical surface localization algorithm and a direct rendering method without explicitly extracting surface meshes. By culling unnecessary regions for evaluation in a coarse-to-fine manner, we successfully accelerate the reconstruction by two orders of magnitude from the baseline without compromising the quality. Furthermore, we introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which effectively alleviates reconstruction artifacts. Our experiments and evaluations demonstrate the robustness of our system to various challenging angles, illuminations, poses, and clothing styles. We also show that our approach compares favorably with the state-of-the-art monocular performance capture. Our proposed approach removes the need for multi-view studio settings and enables a consumer-accessible solution for volumetric capture.
Monocular Real-Time Volumetric
                                            Performance Capture
Main Contributions
  • Octree-based Robust Surface Localization
  • We propose a novel hierarchical surface localization algorithm and a direct rendering method that progressively queries 3D locations in a coarse-to-fine manner and to extract surface from implicit occupancy fields with a minimum number of points to be evaluate. By culling unnecessary regions for evaluation we successfully accelerate the reconstruction by nearly 200 times without compromising the quality.

  • Online Hard Example Mining for Surface Sampling
  • We introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which effectively alleviates reconstruction artifacts.

    Qualitative Results (Only Geometry)
  • Qualitatively evaluate the robustness of our approach by demonstrating theconsistency of reconstruction with different lighting conditions, viewpoints and surfacetopology
  • Qualitative results on self-captured performances
  • Qualitative results on Internet Images
  • More Video Results (Geometry + Texture)
  • Additional live capture results
  • Reconstruction from legacy footage
  • Real-time VR PhD Defense (Dr. Zeng Huang)
    Human Digitization with Implicit Representation
  • PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization (ICCV 2019)
  •  Shunsuke Saito*, Zeng Huang*, Ryota Natsume*, Shigeo Morishima, Angjoo Kanazawa, Hao Li
     The original work of PIFu for geometry and texture reconstruction, unifying sigle-view and multi-view methods.
  • PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization (CVPR 2020)
  •  Shunsuke Saito, Tomas Simon, Jason Saragih, Hanbyul Joo
     High-Resolution and Multi-Level PIFu!
  • Deep Volumetric Video from Very Sparse Multi-view Performance Capture (ECCV 2018)
  •  Zeng Huang, Tianye Li, Weikai Chen, Yajie Zhao, Jun Xing, Chloe LeGendre, Linjie Luo, Chongyang Ma, Hao Li
     Implict surface learning for sparse view human performance capture!
  • ARCH: Animatable Reconstruction of Clothed Humans (CVPR 2020)
  •  Zeng Huang, Yuanlu Xu, Christoph Lassner, Hao Li, Tony Tung
     Learning PIFu in canonical space for animatable avatar generation!
    Bibtex