Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies
," /> Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies
,"/> Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies,"/> Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

Quick Search Adv. Search

Journal of Bionic Engineering ›› 2025, Vol. 22 ›› Issue (5): 2700-2716.doi: 10.1007/s42235-025-00743-3

Previous Articles     Next Articles

Convolutional BiLSTM Variational Sequence-To-Sequence Based Video Captioning for Capturing Intricate Temporal Dependencies

M. Gowri Shankar1; D. Surendran2   

  1. 1 Department of Computer Science and Engineering,Government College of Technology, Coimbatore,Tamilnadu 641013, India 2 Department of Information Technology, Karpagam Collegeof Engineering, Coimbatore, Tamilnadu 641032, India
  • Online:2025-10-15 Published:2025-11-19
  • Contact: M. Gowri Shankar1 E-mail:gowrimshankar@gmail.com
  • About author:M. Gowri Shankar1; D. Surendran2

Abstract: In the realm of video understanding, the demand for accurate and contextually rich video captioning has surged with the increasing volume and complexity of multimedia content. This research introduces an innovative solution for video captioning by integrating a Convolutional BiLSTM Convolutional Bidirectional Long Short-Term Memory (BiLSTM) constructed Variational Sequence-to-Sequence (CBVSS) approach. The proposed framework is adept at capturing intricate temporal dependencies within video sequences, enabling a more nuanced and contextually relevant description of dynamic scenes. However, optimizing its parameters for improved performance remains a crucial challenge. In response, in this research Golden Eagle Optimization (GEO) a metaheuristic optimization technique is used to fine-tune the Convolutional BiLSTM variational sequence-to-sequence model parameters. The application of GEO aims to enhancing the CBVSS ability to produce more exact and contextually rich video captions. The proposed attains an overall higher Recall of 59.75% and Precision of 63.78% for both datasets. Additionally, the proposed CBVSS method demonstrated superior performance across both datasets, achieving the highest METEOR (25.67) and CIDER (39.87) scores on the ActivityNet dataset, and further outperforming all compared models on the YouCook2 dataset with METEOR (28.67) and CIDER (43.02), highlighting its effectiveness in generating semantically rich and contextually accurate video captions.

Key words: Video captioning')">Video captioning, Convolutional BiLSTM, Variational sequence-to-sequence model, Golden eagleoptimization, Intricate temporal dependencies