TY - GEN
T1 - Self-supervised Learning with Multi-view Rendering for 3D Point Cloud Analysis
AU - Tran, Bach
AU - Hua, Binh Son
AU - Tran, Anh Tuan
AU - Hoai, Minh
N1 - Publisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Recently, great progress has been made in 3D deep learning with the emergence of deep neural networks specifically designed for 3D point clouds. These networks are often trained from scratch or from pre-trained models learned purely from point cloud data. Inspired by the success of deep learning in the image domain, we devise a novel pre-training technique for better model initialization by utilizing the multi-view rendering of the 3D data. Our pre-training is self-supervised by a local pixel/point level correspondence loss computed from perspective projection and a global image/point cloud level loss based on knowledge distillation, thus effectively improving upon popular point cloud networks, including PointNet, DGCNN and SR-UNet. These improved models outperform existing state-of-the-art methods on various datasets and downstream tasks. We also analyze the benefits of synthetic and real data for pre-training, and observe that pre-training on synthetic data is also useful for high-level downstream tasks. Code and pre-trained models are available at https://github.com/VinAIResearch/selfsup_pcd.git.
AB - Recently, great progress has been made in 3D deep learning with the emergence of deep neural networks specifically designed for 3D point clouds. These networks are often trained from scratch or from pre-trained models learned purely from point cloud data. Inspired by the success of deep learning in the image domain, we devise a novel pre-training technique for better model initialization by utilizing the multi-view rendering of the 3D data. Our pre-training is self-supervised by a local pixel/point level correspondence loss computed from perspective projection and a global image/point cloud level loss based on knowledge distillation, thus effectively improving upon popular point cloud networks, including PointNet, DGCNN and SR-UNet. These improved models outperform existing state-of-the-art methods on various datasets and downstream tasks. We also analyze the benefits of synthetic and real data for pre-training, and observe that pre-training on synthetic data is also useful for high-level downstream tasks. Code and pre-trained models are available at https://github.com/VinAIResearch/selfsup_pcd.git.
KW - 3D deep learning
KW - Multiple-view rendering
KW - Point cloud analysis
KW - Self-supervised learning
UR - https://www.scopus.com/pages/publications/85151060641
U2 - 10.1007/978-3-031-26319-4_25
DO - 10.1007/978-3-031-26319-4_25
M3 - Conference contribution
SN - 9783031263187
T3 - Lecture Notes in Computer Science
SP - 413
EP - 431
BT - Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings
A2 - Wang, Lei
A2 - Gall, Juergen
A2 - Chin, Tat-Jun
A2 - Sato, Imari
A2 - Chellappa, Rama
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th Asian Conference on Computer Vision, ACCV 2022
Y2 - 4 December 2022 through 8 December 2022
ER -