Automatic Image Captioning Using Convolutional Neural Network and Long Short-Term Memory Techniques

K Dhamayandhi; T Jayamalar; N. Krishnaveni; Lavanya C

doi:10.34293/sijash.v13iS2-i4-Jan.10607

K Dhamayandhi Department of Information Technology, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India
T Jayamalar Department of Information Technology, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India
N. Krishnaveni Department of Information Technology, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India
Lavanya C Department of Information Technology, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India

DOI: https://doi.org/10.34293/sijash.v13iS2-i4-Jan.10607

Keywords: Image Captioning, Convolutional Neural Network (CNN), Long Short-Term Memory Network (LSTM), Computer Vision, Bleu Score

Abstract

Image captioning is the task of automatically generating natural language descriptions based on image content, with applications in social media, e-commerce, and content creation. Classic approaches would involve reinforcement learning and multimodal transformers, but they demand large datasets and huge computational resources and are not suited to handle complex scenes. To satisfy these challenges, this work adopts a hybrid approach that employs the task of CNNs for visual feature extraction and uses LSTMs for sequential caption generation. More in detail, rich image features were extracted using a pre-trained Inception V3 model, while an LSTM was adopted for synthesizing image captions. Image caption generation was realized based on both greedy and beam search strategies. Finally, BLEU scores and visualizations demonstrate the successful blend of computer vision and NLP capabilities to ensure the accuracy and coherence of image captions.