Robust monocular 3D face reconstruction under challenging viewing conditions

Mohaghegh, Hoda and Rahmani, Hossein and Bennamoun, Mohammed (2023) Robust monocular 3D face reconstruction under challenging viewing conditions. Neurocomputing, 520. pp. 82-93. ISSN 0925-2312

[img]
Text (Elsevier_s_CAS_Revision_Final)
Elsevier_s_CAS_Revision_Final.pdf - Accepted Version
Restricted to Repository staff only until 29 November 2023.
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (2MB)

Abstract

Despite extensive research, 3D face reconstruction from a single image remains an open research problem due to the high degree of variability in pose, occlusions and complex lighting conditions. While deep learning-based methods have achieved great success, they are usually limited to near frontal images and images that are free of occlusions. Also, the lack of diverse training data with 3D annotations considerably limits the performance of such methods. As such, existing methods fail to recover, with high fidelity, the facial details especially when dealing with images captured under extreme conditions. To address this issue, we propose an unsupervised coarse-to-fine framework for the reconstruction of 3D faces with detailed textures. Our core idea is that multiple images of the same person but captured under different viewing conditions should provide the same 3D face. We thus propose to leverage a self-augmentation learning technique to train a model that is robust to diverse variations. In addition, instead of directly employing image pixels, we use a set of discriminative features describing the identity and attributes of the face as input to the refinement module, making the model invariant to viewing conditions. This combination of self-augmentation learning with rich face-related features allows the reconstruction of plausible facial details even under challenging viewing conditions. We train the model end-to-end and in a self-supervised manner, without any 3D annotations, landmarks or identity labels, using a combination of an image-level photometric loss and a perception-level loss that is identity and attribute-aware. We evaluate the proposed approach on CelebA and AFLW2000 datasets, and demonstrate its robustness to appearance variations despite learning from unlabeled images. The qualitative comparisons indicate that our method produces detailed 3D faces even under extreme occlusions, out of plane rotations and noise perturbations where existing state-of-the-art methods often fail. We also quantitatively show that our method outperforms SOTA with more than 30.14%, 9.87% and 11.3% in terms of PSNR, SSIM and IDentity similarity, respectively.

Item Type:
Journal Article
Journal or Publication Title:
Neurocomputing
Additional Information:
This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 520, 2022 DOI: 10.1016/j.neucom.2022.11.048
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1700/1706
Subjects:
ID Code:
180539
Deposited By:
Deposited On:
01 Dec 2022 13:25
Refereed?:
Yes
Published?:
Published
Last Modified:
01 Feb 2023 04:05