Hybrid Convolutional-Recurrent Neural Networks (CNN-RNN) Model with Temporal Attention and Particle Swarm Optimization for Deepfake Video Detection

Jeremias Esperanza; Jean Fidelio Marquez; Ron Anthony Sy

doi:10.65141/ject.v2i1.n8

Authors

Jeremias Esperanza College of Informatics and Computing Studies, New Era University, Quezon City, 1107, Philippines
Jean Fidelio Marquez College of Informatics and Computing Studies, New Era University, Quezon City, 1107, Philippines
Ron Anthony Sy College of Informatics and Computing Studies, New Era University, Quezon City, 1107, Philippines

DOI:

https://doi.org/10.65141/ject.v2i1.n8

Keywords:

Deepfake video detection, hybrid machine learning, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), temporal attention, Particle Swarm Optimization (PSO)

Abstract

The rapid advancement of deepfake technology presents a growing threat to information integrity and online security. To address this, this research proposed an efficient deepfake video detection framework that integrates Convolutional Neural Networks (CNNs) for spatial feature extraction, Recurrent Neural Networks (RNNs) with a temporal attention mechanism for modeling sequential dependencies, and Particle Swarm Optimization (PSO) for hyperparameter tuning. The pipeline included frame extraction, face alignment, and feature processing using a pre-trained CNN, followed by an RNN that emphasizes critical temporal artifacts through attention. PSO further enhanced model performance by optimizing key hyperparameters such as learning rate and hidden dimensions. To evaluate the effectiveness of the proposed model, a comparative analysis against existing deepfake detection methods, including XceptionNet, LSTM with frame-level features, and CNN-GRU without attention, was conducted. The proposed CNN-RNN model with Temporal Attention and PSO outperformed the baselines, demonstrating the model's improved generalization and reliability, particularly in reducing false negatives, making it a robust solution for real-world media forensics and platform integrity.

References

Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A compact facial video forgery detection network. 2018 IEEE International Workshop on Information Forensics and Security (WIFS), 1–7. https://ieeexplore.ieee.org/document/8630761

Ahmed, A., Jalal, S., & Sayed, A. (2021). Enhancing deep learning models using particle swarm optimization. Journal of Machine Learning Research, 22(1), 567–589.

Al-Adwan, A., Alazzam, H., Al-Anbaki, N., & Alduweib, E. (2023). Detection of deepfake media using a hybrid CNN–RNN model and particle swarm optimization (PSO) algorithm. Computers, 13(4), 99. https://www.mdpi.com/2073-431X/13/4/99

Amerini, I., Galteri, L., Caldelli, R., & Del Bimbo, A. (2020). Deepfake video detection through optical flow–based CNN. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (pp. 1205–1214).

Antad, M., & Arthamwar, P. (2023). A hybrid approach for deepfake detection using CNN-RNN. International Journal of Computer Applications, 182(47), 1–5.

Chadha, A., Kumar, V., Kashyap, S., & Gupta, M. (2021). Deepfake: An overview. In R. Silhavy (Ed.), Lecture Notes in Networks and Systems (Vol. 188, pp. 557–566). Springer. https://doi.org/10.1007/978-981-16-0733-2_39

Chen, J., Lin, T., & Chen, L. (2020). Hybrid CNN-RNN model with attention mechanism for deepfake detection. IEEE Transactions on Information Forensics and Security, 15, 234–245.

Cunha, L., Zhang, L., Sowan, B., Lim, C. P., & Kong, Y. (2024). Video deepfake detection using particle swarm optimization improved deep neural networks. Neural Computing and Applications, 36, 8417–8453. https://doi.org/10.1007/s00521-024-09536-x

Dang, H. T., Liu, F., Stehouwer, J., Liu, X., & Jain, A. K. (2020). On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5781–5790). https://doi.org/10.1109/CVPR42600.2020.01020

Darwish, T., Mohamed, A., & Mersh, M. (2023). Deepfake videos: A comprehensive review. In K. R. Rao & N. Panda (Eds.), Proceedings of the 3rd International Conference on Computing and Communication Systems (pp. 709–726). Springer. https://doi.org/10.1007/978-981-19-7615-5_55

Dimmock, T. (2019). Deepfakes: A growing threat to trust and security. Journal of Cyber Policy, 4(2), 189–207.

Gao, H., Su, Y., & Kong, W. (2021). Temporal attention mechanisms in video analysis: Applications in deepfake detection. IEEE Transactions on Multimedia, 23(6), 320–333.

Güera, D., & Delp, E. J. (2018). Deepfake video detection using recurrent neural networks. 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1–6. https://ieeexplore.ieee.org/document/8639163

Johnson, L. (2023, June 15). Understanding deepfakes: The rise of synthetic media. TechInsights.

Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of ICNN’95 ? International Conference on Neural Networks (Vol. 4, pp. 1942–1948). IEEE. https://doi.org/10.1109/ICNN.1995.488968

Khalid, M., & Akhtar, N. (2023). Deepfake detection: Enhancing performance with spatiotemporal features. In Proceedings of the International Conference on Artificial Intelligence and Data Analytics (AIDA 2023) (pp. 112–118).

Matern, F., Riess, C., & Stamminger, M. (2019). Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (pp. 1–9). https://doi.org/10.1109/ICCVW.2019.00182

Microsoft, Amazon, Facebook, et al. (2019). The Deepfake Detection Challenge. https://deepfakedetectionchallenge.ai

Qi, L., Yang, Y., Song, Y. Z., & Xiang, T. (2020). Deepfake detection using spatiotemporal features and neural architectures. arXiv preprint. https://arxiv.org/abs/2007.02526

Rahman, A., Islam, M., Moon, M., Tasnim, T., Siddique, N., & Ahmed, S. (2022). A qualitative survey on deep learning based deep fake video creation and detection method. Australian Journal of Engineering and Innovative Technology, 4(1), 13–26. https://doi.org/10.34104/ajeit.022.013026

Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 1–11).

Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., & Natarajan, P. (2019). Recurrent convolutional strategies for face manipulation detection in videos. arXiv preprint. https://arxiv.org/abs/1905.00582

Shami, T. M., El-Saleh, A. A., Alswaitti, M., Al-Tashi, Q., Summakieh, M. A., & Mirjalili, S. (2022). Particle swarm optimization: A comprehensive survey. IEEE Access, 10, 10031–10061. https://doi.org/10.1109/ACCESS.2022.3142859

Yan, C., Tu, Y., Wang, X., Zhang, Y., Hao, X., Zhang, Y., & Dai, Q. (2020). STAT: Spatial-temporal attention mechanism for video captioning. IEEE Transactions on Multimedia, 22(1), 229–241. https://doi.org/10.1109/TMM.2019.2924576

Yu, P., Xia, Z., Fei, J., & Lu, Y. (2021). A survey on deepfake video detection. IET Biometrics, 10(6), 607–624. https://doi.org/10.1049/bme2.12031

Hybrid Convolutional-Recurrent Neural Networks (CNN-RNN) Model with Temporal Attention and Particle Swarm Optimization for Deepfake Video Detection

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Current Issue