Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qingyun Dai

PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database

Jun 23, 2021

Fangyuan Lei, Da Huang, Jianjian Jiang, Ruijun Ma, Senhong Wang, Jiangzhong Cao, Yusen Lin, Qingyun Dai

Figure 1 for PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database

Figure 2 for PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database

Figure 3 for PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database

Figure 4 for PatentNet: A Large-Scale Incomplete Multiview, Multimodal, Multilabel Industrial Goods Image Database

Abstract:In deep learning area, large-scale image datasets bring a breakthrough in the success of object recognition and retrieval. Nowadays, as the embodiment of innovation, the diversity of the industrial goods is significantly larger, in which the incomplete multiview, multimodal and multilabel are different from the traditional dataset. In this paper, we introduce an industrial goods dataset, namely PatentNet, with numerous highly diverse, accurate and detailed annotations of industrial goods images, and corresponding texts. In PatentNet, the images and texts are sourced from design patent. Within over 6M images and corresponding texts of industrial goods labeled manually checked by professionals, PatentNet is the first ongoing industrial goods image database whose varieties are wider than industrial goods datasets used previously for benchmarking. PatentNet organizes millions of images into 32 classes and 219 subclasses based on the Locarno Classification Agreement. Through extensive experiments on image classification, image retrieval and incomplete multiview clustering, we demonstrate that our PatentNet is much more diverse, complex, and challenging, enjoying higher potentials than existing industrial image datasets. Furthermore, the characteristics of incomplete multiview, multimodal and multilabel in PatentNet are able to offer unparalleled opportunities in the artificial intelligence community and beyond.

* 12 pages,7 figures

Via

Access Paper or Ask Questions

BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Dec 03, 2020

Jing Su, Qingyun Dai, Frank Guerin, Mian Zhou

Figure 1 for BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Figure 2 for BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Figure 3 for BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Figure 4 for BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Abstract:Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM. Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of our method with human evaluation.

Via

Access Paper or Ask Questions

Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

Dec 02, 2020

Jing Su, Chenghua Lin, Mian Zhou, Qingyun Dai, Haoyu Lv

Figure 1 for Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

Figure 2 for Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

Figure 3 for Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

Figure 4 for Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

Abstract:In this paper, we propose an end-to-end CNN-LSTM model for generating descriptions for sequential images with a local-object attention mechanism. To generate coherent descriptions, we capture global semantic context using a multi-layer perceptron, which learns the dependencies between sequential images. A paralleled LSTM network is exploited for decoding the sequence descriptions. Experimental results show that our model outperforms the baseline across three different evaluation metrics on the datasets published by Microsoft.

* Accepted by INLG 2018

Via

Access Paper or Ask Questions