Effective Multi-Modal Feature Fusion for 3D Semantic Segmentation with Multi-View Images


KIPS Transactions on Software and Data Engineering, Vol. 12, No. 12, pp. 505-518, Dec. 2023
https://doi.org/10.3745/KTSDE.2023.12.12.505,   PDF Download:
Keywords: 3D Semantic Segmentation, Point cloud, Multi-View RGB-D Images, 2D-3D Feature Fusion
Abstract

3D point cloud semantic segmentation is a computer vision task that involves dividing the point cloud into different objects and regions by predicting the class label of each point. Existing 3D semantic segmentation models have some limitations in performing sufficient fusion of multi-modal features while ensuring both characteristics of 2D visual features extracted from RGB images and 3D geometric features extracted from point cloud. Therefore, in this paper, we propose MMCA-Net, a novel 3D semantic segmentation model using 2D-3D multi-modal features. The proposed model effectively fuses two heterogeneous 2D visual features and 3D geometric features by using an intermediate fusion strategy and a multi-modal cross attention-based fusion operation. Also, the proposed model extracts context-rich 3D geometric features from input point cloud consisting of irregularly distributed points by adopting PTv2 as 3D geometric encoder. In this paper, we conducted both quantitative and qualitative experiments with the benchmark dataset, ScanNetv2 in order to analyze the performance of the proposed model. In terms of the metric mIoU, the proposed model showed a 9.2% performance improvement over the PTv2 model using only 3D geometric features, and a 12.12% performance improvement over the MVPNet model using 2D-3D multi-modal features. As a result, we proved the effectiveness and usefulness of the proposed model.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. Bae and I. Kim, "Effective Multi-Modal Feature Fusion for 3D Semantic Segmentation with Multi-View Images," KIPS Transactions on Software and Data Engineering, vol. 12, no. 12, pp. 505-518, 2023. DOI: https://doi.org/10.3745/KTSDE.2023.12.12.505.

[ACM Style]
Hye-Lim Bae and Incheol Kim. 2023. Effective Multi-Modal Feature Fusion for 3D Semantic Segmentation with Multi-View Images. KIPS Transactions on Software and Data Engineering, 12, 12, (2023), 505-518. DOI: https://doi.org/10.3745/KTSDE.2023.12.12.505.