Exploring Multi-View Perspectives on Deep Reinforcement Learning Agents for Embodied Object Navigation in Virtual Home Environments


Recent years have brought the exploration of embodied reinforcement learning agents in a variety of domains. One of the advantages of artificial agents is that they can obtain visual inputs simultaneously using multiple input devices. This work explores multi-view reinforcement learning for object navigation tasks in 3D rendered virtual home environments using AI2-THOR. We trained CNN based Deep Q-learning embodied agents with egocentric, allocentric, and combined egocentric-allocentric perspectives to locate an object in an unknown environment. We compared the results of the three RL agents, and evaluated them by both reward improvement rate, and reward obtained. We demonstrate that the egocentric perspective allows for faster reward accumulation in the earlier episodes, whereas the allocentric agents obtained better long-term rewards. Interesting results arise from the combined allocentric and egocentric perspective, where we found that the agent had the best overall results by harnessing the benefits of each perspective. The results show that while single perspective embodied agents each have their own advantages, combining both inputs yield the best overall reward. Our findings provide a foundation and benchmark for building embodied RL agents with multi-view perspectives.