Monocular Real-Time Volumetric Performance Capture
Ruilong Li*,12Yuliang Xiu*,12Shunsuke Saito12 Zeng Huang12Kyle Olszewski12  Hao Li123 
1University of Southern California, 2USC Institute for Creative Technologies, 3Pinscreen
ECCV 2020
SIGGRAPH 2020 Real-Time Live

Tuesday, 25 August 2020 (Pacific Time Zone)

Description: Existing volumetric capture systems require many cameras and lengthy post processing. We introduce the first system that can capture a completely clothed human body (including the back) using a single RGB webcam and in real time. Our deep-learning-based approach enables new possibilities for low-cost and consumer-accessible immersive teleportation.
Overview
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu). While PIFu achieves high-resolution reconstruction in a memory-efficient manner, its computationally expensive inference prevents us from deploying such a system for real-time applications. To this end, we propose a novel hierarchical surface localization algorithm and a direct rendering method without explicitly extracting surface meshes. By culling unnecessary regions for evaluation in a coarse-to-fine manner, we successfully accelerate the reconstruction by two orders of magnitude from the baseline without compromising the quality. Furthermore, we introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which effectively alleviates reconstruction artifacts. Our experiments and evaluations demonstrate the robustness of our system to various challenging angles, illuminations, poses, and clothing styles. We also show that our approach compares favorably with the state-of-the-art monocular performance capture.
Monocular Real-Time Volumetric
                                            Performance Capture
Main Contributions
  • Octree-based Robust Surface Localization
  • We propose a novel hierarchical surface localization algorithm and a direct rendering method that progressively queries 3D locations in a coarse-to-fine manner and to extract surface from implicit occupancy fields with a minimum number of points to be evaluate. By culling unnecessary regions for evaluation we successfully accelerate the reconstruction by nearly 200 times without compromising the quality.

  • Online Hard Example Mining for Surface Sampling
  • We introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which effectively alleviates reconstruction artifacts.

    Qualitative Results (Only Geometry)
  • Qualitatively evaluate the robustness of our approach by demonstrating theconsistency of reconstruction with different lighting conditions, viewpoints and surfacetopology
  • Qualitative results on self-captured performances
  • Qualitative results on Internet Images
  • More Video Results (Geometry + Texture)
  • Additional live capture results
  • Reconstruction from legacy footage
  • Bibtex
    
    @article{li2020monocular,
        title={Monocular Real-Time Volumetric Performance Capture},
        author={Li, Ruilong and Xiu, Yuliang and Saito, Shunsuke and Huang, Zeng and Olszewski, Kyle and Li, Hao},
        journal={arXiv preprint arXiv:2007.13988},
        year={2020}
      }