Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies
," /> Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies
,"/> Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies,"/> Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

Journal of Bionic Engineering ›› 2025, Vol. 22 ›› Issue (5): 2700-2716.doi: 10.1007/s42235-025-00743-3

• • 上一篇    下一篇

Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

M. Gowri Shankar1; D. Surendran2   

  1. 1 Department of Computer Science and Engineering,Government College of Technology, Coimbatore,Tamilnadu 641013, India 2 Department of Information Technology, Karpagam Collegeof Engineering, Coimbatore, Tamilnadu 641032, India
  • 出版日期:2025-10-15 发布日期:2025-11-19
  • 通讯作者: M. Gowri Shankar1 E-mail:gowrimshankar@gmail.com
  • 作者简介:M. Gowri Shankar1; D. Surendran2

Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

M. Gowri Shankar1; D. Surendran2   

  1. 1 Department of Computer Science and Engineering,Government College of Technology, Coimbatore,Tamilnadu 641013, India 2 Department of Information Technology, Karpagam Collegeof Engineering, Coimbatore, Tamilnadu 641032, India
  • Online:2025-10-15 Published:2025-11-19
  • Contact: M. Gowri Shankar1 E-mail:gowrimshankar@gmail.com
  • About author:M. Gowri Shankar1; D. Surendran2

摘要: In the realm of video understanding, the demand for accurate and contextually rich video captioning has surged with the increasing volume and complexity of multimedia content. This research introduces an innovative solution for video captioning by integrating a Convolutional BiLSTM Convolutional Bidirectional Long Short-Term Memory (BiLSTM) constructed Variational Sequence-to-Sequence (CBVSS) approach. The proposed framework is adept at capturing intricate temporal dependencies within video sequences, enabling a more nuanced and contextually relevant description of dynamic scenes. However, optimizing its parameters for improved performance remains a crucial challenge. In response, in this research Golden Eagle Optimization (GEO) a metaheuristic optimization technique is used to fine-tune the Convolutional BiLSTM variational sequence-to-sequence model parameters. The application of GEO aims to enhancing the CBVSS ability to produce more exact and contextually rich video captions. The proposed attains an overall higher Recall of 59.75% and Precision of 63.78% for both datasets. Additionally, the proposed CBVSS method demonstrated superior performance across both datasets, achieving the highest METEOR (25.67) and CIDER (39.87) scores on the ActivityNet dataset, and further outperforming all compared models on the YouCook2 dataset with METEOR (28.67) and CIDER (43.02), highlighting its effectiveness in generating semantically rich and contextually accurate video captions.

关键词: Video captioning')">Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies

Abstract: In the realm of video understanding, the demand for accurate and contextually rich video captioning has surged with the increasing volume and complexity of multimedia content. This research introduces an innovative solution for video captioning by integrating a Convolutional BiLSTM Convolutional Bidirectional Long Short-Term Memory (BiLSTM) constructed Variational Sequence-to-Sequence (CBVSS) approach. The proposed framework is adept at capturing intricate temporal dependencies within video sequences, enabling a more nuanced and contextually relevant description of dynamic scenes. However, optimizing its parameters for improved performance remains a crucial challenge. In response, in this research Golden Eagle Optimization (GEO) a metaheuristic optimization technique is used to fine-tune the Convolutional BiLSTM variational sequence-to-sequence model parameters. The application of GEO aims to enhancing the CBVSS ability to produce more exact and contextually rich video captions. The proposed attains an overall higher Recall of 59.75% and Precision of 63.78% for both datasets. Additionally, the proposed CBVSS method demonstrated superior performance across both datasets, achieving the highest METEOR (25.67) and CIDER (39.87) scores on the ActivityNet dataset, and further outperforming all compared models on the YouCook2 dataset with METEOR (28.67) and CIDER (43.02), highlighting its effectiveness in generating semantically rich and contextually accurate video captions.

Key words: Video captioning')">Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies