MV-FractalDB: Formula-driven Supervised Learning for Multi-view Image Recognition

International Conference on Intelligent Robots and Systems (IROS 2021)

Ryosuke Yamada^1,2

Ryo Takahashi³

Ryota Suzuki²

Akio Nakamura¹

Yusuke Yoshiyasu²

Ryusuke Sagawa²

Hirokatsu Kataoka²

1: Tokyo Denki University 2: National Institute of Advanced Industrial Science and Technology (AIST) 3: Keio University

Abstract

Multi-view image recognition is one of the solutions in order to avoid leaving weak viewpoints in robotics applications such as object manipulation, mobile robot services, and navigation robots. For example, a mobile robot in a home must judge an object category and the posture with a given image for household chores. The paper proposes a method for automatic multi-view dataset construction based on formula-driven supervised learning (FDSL). Although a data collection and human annotation of 3D objects are de nitely labor-intensive, we simultaneously and automatically generate 3D models,multi-view images, and their training labels in the proposed multi-view dataset. In order to create a large-scale multi-view dataset, we employ fractal geometry, which is considered the background information of many objects in the real world. It is expected that this background knowledge of the real world would allow convolutional neural networks (CNN) to acquire a better represen- tation in terms of any-view image recognition. We project in a circle from the rendered 3D fractal models to construct the Multi-view Fractal DataBase (MV- FractalDB), which is then used to make a pre-trained CNN model for improving the problem of multi-view image recognition. Since the dataset construction is automatic, the use of our MV-FractalDB does not require any 3D model de nition or additional manual annotations in the pre-training phase. According to the experimental results, the MV-FractalDB pre-trained model surpasses the accuracies with self- supervised methods (e.g., SimCLR and MoCo) and is close to supervised methods (e.g., ImageNet pre- trained model) in terms of performance rates on multi-view image datasets. Also, it was con rmed that MV-FractalDB pre-trained model has better convergence speed than the ImageNet pre-trained model on ModelNet40 dataset. Moreover, we demonstrate the potential for multi-view image recognition with FDSL.

Multi-view Fractal DataBase

Paper and Code

R. Yamada, R. Takahashi, R. Suzuki, A. Nakamura, Y. Yoshiyasu, R. Sagawa, H. Kataoka

MV-FractalDB: Formula-driven Supervised Learning for Multi-view Image Recognition

IROS 2021

[Paper] [Bibtex] [GitHub] [Dataset]

Results


Results on ModelNet and MIRO dataset.	Results on ModelNet dataset.

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP19H01134. Computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST) was used. The website is modified from this template.