Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

doi:10.1007/s42235-025-00743-3

Journal of Bionic Engineering ›› 2025, Vol. 22 ›› Issue (5): 2700-2716.doi: 10.1007/s42235-025-00743-3

Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

M. Gowri Shankar1; D. Surendran2

1 Department of Computer Science and Engineering,Government College of Technology, Coimbatore,Tamilnadu 641013, India 2 Department of Information Technology, Karpagam Collegeof Engineering, Coimbatore, Tamilnadu 641032, India

出版日期:2025-10-15 发布日期:2025-11-19
通讯作者: M. Gowri Shankar1 E-mail:gowrimshankar@gmail.com
作者简介:M. Gowri Shankar1; D. Surendran2

Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

M. Gowri Shankar1; D. Surendran2

1 Department of Computer Science and Engineering,Government College of Technology, Coimbatore,Tamilnadu 641013, India 2 Department of Information Technology, Karpagam Collegeof Engineering, Coimbatore, Tamilnadu 641032, India

Online:2025-10-15 Published:2025-11-19
Contact: M. Gowri Shankar1 E-mail:gowrimshankar@gmail.com
About author:M. Gowri Shankar1; D. Surendran2

摘要/Abstract

摘要： In the realm of video understanding, the demand for accurate and contextually rich video captioning has surged with the increasing volume and complexity of multimedia content. This research introduces an innovative solution for video captioning by integrating a Convolutional BiLSTM Convolutional Bidirectional Long Short-Term Memory (BiLSTM) constructed Variational Sequence-to-Sequence (CBVSS) approach. The proposed framework is adept at capturing intricate temporal dependencies within video sequences, enabling a more nuanced and contextually relevant description of dynamic scenes. However, optimizing its parameters for improved performance remains a crucial challenge. In response, in this research Golden Eagle Optimization (GEO) a metaheuristic optimization technique is used to fine-tune the Convolutional BiLSTM variational sequence-to-sequence model parameters. The application of GEO aims to enhancing the CBVSS ability to produce more exact and contextually rich video captions. The proposed attains an overall higher Recall of 59.75% and Precision of 63.78% for both datasets. Additionally, the proposed CBVSS method demonstrated superior performance across both datasets, achieving the highest METEOR (25.67) and CIDER (39.87) scores on the ActivityNet dataset, and further outperforming all compared models on the YouCook2 dataset with METEOR (28.67) and CIDER (43.02), highlighting its effectiveness in generating semantically rich and contextually accurate video captions.

关键词: Video captioning')">Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies

Abstract: In the realm of video understanding, the demand for accurate and contextually rich video captioning has surged with the increasing volume and complexity of multimedia content. This research introduces an innovative solution for video captioning by integrating a Convolutional BiLSTM Convolutional Bidirectional Long Short-Term Memory (BiLSTM) constructed Variational Sequence-to-Sequence (CBVSS) approach. The proposed framework is adept at capturing intricate temporal dependencies within video sequences, enabling a more nuanced and contextually relevant description of dynamic scenes. However, optimizing its parameters for improved performance remains a crucial challenge. In response, in this research Golden Eagle Optimization (GEO) a metaheuristic optimization technique is used to fine-tune the Convolutional BiLSTM variational sequence-to-sequence model parameters. The application of GEO aims to enhancing the CBVSS ability to produce more exact and contextually rich video captions. The proposed attains an overall higher Recall of 59.75% and Precision of 63.78% for both datasets. Additionally, the proposed CBVSS method demonstrated superior performance across both datasets, achieving the highest METEOR (25.67) and CIDER (39.87) scores on the ActivityNet dataset, and further outperforming all compared models on the YouCook2 dataset with METEOR (28.67) and CIDER (43.02), highlighting its effectiveness in generating semantically rich and contextually accurate video captions.

Key words: Video captioning')">Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies

M. Gowri Shankar, D. Surendran. Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies[J]. Journal of Bionic Engineering, 2025, 22(5): 2700-2716.

Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

赞

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0