INTERNATIONAL FOCUS ON IMAGE DESCRIPTION
Abstract
In recent years, the task of automatically generating image description has attracted a lot of attention in the field of artificial intelligence. Benefitting from the development of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), many approaches based on the CNN-RNN framework have been proposed to solve this task and achieved remarkable process. However, there remain two problems to be tackled in that most of existing methods only use imagelevel representation. One problem is object missing that there may miss some important objects when generating the image description and the other is misprediction that it may recognize one object to a wrong category. In this paper, to address the two problems, we propose a new method called global-local attention (GLA) for generating image description. The proposed GLA model utilizes attention mechanism to integrate objectlevel features with image- level feature. Through this manner, our model can selectively pay attention to objects and context information concurrently. Therefore, our proposed GLA method can generate more relevant image description sentences, and achieves the state-of-the-art performance on the well- known Microsoft COCO caption dataset with several popular evaluation metrics — CIDEr, METEOR, ROUGE-L and BLEU-1,2,3,4.
Downloads
References
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
S. Tang, Y.-T. Zheng, Y. Wang, and T.-
S. Chua, “Sparse ensemble learning for concept detection,” IEEE Transactions on Multimedia, vol. 14, no. 1, pp. 43–54, 2012.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE International Conference on Computer Vision, 2016.
S. Tang, Y. Li, L. Deng, and Y.-D. Zhang, “Object localization based on proposal fusion,” IEEE Transactions on Multimedia, vol. 19, no. 9, pp. 2015–2116,
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Neural Information Processing Systems, 2015.
C. Szegedy, D. Erhan, and A. T. Toshev, “Object detection using deep neural networks,” Mar. 1 2016, uS Patent
,275,308.
W. Ouyang, X. Wang, X. Zeng, S. Qiu,
P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy et al., “Deepid-net: Deformable deep convolutional neural networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2403–2412.
J. Redmon, S. Divvala, R. Girshick, and
A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–
C. Kang, S. Xiang, S. Liao, C. Xu, and
C. Pan, “Learning consistent feature representation for cross-modal multimedia retrieval,” IEEE Transactions on Multimedia, vol. 17, no. 3, pp. 370–381,
S. Bu, Z. Liu, J. Han, J. Wu, and R. Ji,
“Learning high-level feature by deep belief networks for 3-d model retrieval and recognition,” IEEE Transactions on Multimedia, vol. 16, no. 8, pp. 2154–2167,
F. Radenovic, G. Tolias, and O. Chum, “Cnn image retrieval learns ´ from bow:Unsupervised fine-tuning with hard examples,” in European Conference on Computer Vision. Springer, 2016, pp. 3–20.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.