Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion


KIPS Transactions on Software and Data Engineering, Vol. 12, No. 9, pp. 407-418, Sep. 2023
https://doi.org/10.3745/KTSDE.2023.12.9.407,   PDF Download:
Keywords: Multi-Object Goal Visual Navigation, Deep Reinforcement Learning, Multimodal Context Fusion, Global Mapping
Abstract

The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimodal context because use only a unimodal context map. To overcome this limitation, in this paper, we propose a novel deep neural network-based agent model for MultiOn task. The proposed model, MCFMO, uses a multimodal context map, containing visual appearance features, semantic features of environmental objects, and goal object features. Moreover, the proposed model effectively fuses these three heterogeneous features into a global multimodal context map by using a point-wise convolutional neural network module. Lastly, the proposed model adopts an auxiliary task learning module to predict the observation status, goal direction and the goal distance, which can guide to learn the navigational policy efficiently. Conducting various quantitative and qualitative experiments using the Habitat-Matterport3D simulation environment and scene dataset, we demonstrate the superiority of the proposed model.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. H. Choi and I. C. Kim, "Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion," KIPS Transactions on Software and Data Engineering, vol. 12, no. 9, pp. 407-418, 2023. DOI: https://doi.org/10.3745/KTSDE.2023.12.9.407.

[ACM Style]
Jeong Hyun Choi and In Cheol Kim. 2023. Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion. KIPS Transactions on Software and Data Engineering, 12, 9, (2023), 407-418. DOI: https://doi.org/10.3745/KTSDE.2023.12.9.407.