LMD-PGN:Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View BEV Maps for Universal Point Goal Navigation

Riku Uemura, Kanji Tanaka, Kenta Tsukahara, Daiki Iwata

Keywords

Abstract

Point goal navigation (PGN) is a mapless naviga- tion approach that trains robots to visually navigate to goal points without relying on pre-built maps. Despite significant progress in handling complex environments using deep rein- forcement learning, current PGN methods are designed for single-robot systems, limiting their generalizability to multi- robot scenarios with diverse platforms. This paper addresses this limitation by proposing a knowledge transfer framework for PGN, allowing a teacher robot to transfer its learned navigation model to student robots, including those with un- known or black-box platforms. We introduce a novel knowledge distillation (KD) framework that transfers first-person-view (FPV) representations (view images, turning/forward actions) to universally applicable third-person-view (TPV) representations (local maps, subgoals). The state is redefined as reconstructed local maps using SLAM, while actions are mapped to subgoals on a predefined grid. To enhance training efficiency, we propose a sampling-efficient KD approach that aligns training episodes via a noise-robust local map descriptor (LMD). Although validated on 2D wheeled robots, this method can be extended to 3D action spaces, such as drones. Experiments conducted in Habitat-Sim demonstrate the feasibility of the proposed framework, requiring minimal implementation effort. This study highlights the potential for scalable and cross-platform PGN solutions, expanding the applicability of embodied AI systems in multi-robot scenarios.

Related documents

BibTeX

@article{DBLP:journals/corr/abs-2412-17282, author = {Riku Uemura and Kanji Tanaka and Kenta Tsukahara and Daiki Iwata}, title = {{LMD-PGN:} Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View {BEV} Maps for Universal Point Goal Navigation}, journal = {CoRR}, volume = {abs/2412.17282}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2412.17282}, doi = {10.48550/ARXIV.2412.17282}, eprinttype = {arXiv}, eprint = {2412.17282}, timestamp = {Fri, 24 Jan 2025 21:54:26 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2412-17282.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

図表・写真

Fig1

Fig.1. Example of a robot’s movement trajectory. The red dot represents the starting point, the blue curve represents the movement path, and the green circle represents the goal area.

 
Fig2

Fig. 2. We adopted a BEV (Bird’s Eye View) omnidirectional local map, as shown in the figure, with the priority of being independent of specific platforms.

Table1:Table.1 PERFORMANCE RESULTS (ACHIEVEMENT RATE [%])

Table1