Home Abstract Video Model Dataset Citation Acknowledgement
Overview. (a) Our visual place recognition system takes an omnidirectional visual input from a robot and then retrieve the closest place exemplar on the map using our Omnidirectional Convolutional Neural Network (O-CNN). (b) O-CNN is further used to help navigate a robot to the closest place.


Visual place recognition is challenging, especially when only a few place exemplars are given. To mitigate the challenge, we consider place recognition method using omnidirectional cameras and propose a novel Omnidirectional Convolutional Neural Network (O-CNN) to handle severe camera pose variation. Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place. With the ability to estimate relative distance, a heuristic policy is proposed to navigate a robot to the retrieved closest place. Note that the network is designed to take advantage of the omnidirectional view by incorporating circular padding and rotation invariance. To train a powerful O-CNN, we build a virtual world for training on a large scale. We also propose a continuous lifted structured feature embedding loss to learn the concept of distance efficiently. Finally, our experimental results confirm that our method achieves state-of-the-art accuracy and speed with both the virtual world and real-world datasets.

Video Overview

In this video, our approach will be illustrated in animation, and some experimental results, including qualitative and quantative results, will be shown. Besides, the virtual environments where we collected data and samples of our dataset can be seen in the video.


Network Architecture. (a) In our O-CNN, we add circular padding to each convolution operation in the GoogleNet. After feature extraction, we further perform roll branching to make the architecture robust to purely perspective rotation and compute the lifted structure embedding loss for training. (b) The illustration of the circular convolution operation. We take omnidirectional image as an example, but we actually perform the operation on feature maps after every convolution layer. (c) The illustration of roll branching: After roll branching, we have 20x shifted feature map.


Virtual-world Examples:

Real-world Examples:

Dataset (TBU)


Bibilographic information for this work:

T.H. Wang*, H.J. Huang*, J.T. Lin, C.W. Hu, K.H. Zeng, and M Sun. "Omnidirectional CNN for Visual Place Recognition and Navigation." IEEE International Conference on Robotics and Automation (ICRA), 2018. [Arxiv Preprint][Code]

    title={Omnidirectional CNN for Visual Place Recognition and Navigation},
    author={Wang, Tsun-Hsuan and Huang, Hung-Jui and Lin, Juan-Ting and Hu, Chan-Wei and Zeng, Kuo-Hao and Sun, Min},
    journal={arXiv preprint arXiv:1803.04228},