GESTURE RECOGNITION FOR CLASSICAL DANCE FORMS USING COMPUTER VISION

Praveen Sen; Abhishek Pathak; Manish Gudadhe; Mrudula Gudadhe; Vikas Singh

doi:10.29121/shodhkosh.v7.i2s.2026.7221

Authors

Dr. Praveen Sen Department of Computer Science and Business Systems, St. Vincent Pallotti College of Engineering and Technology, Nagpur, Maharashtra, India
Dr. Abhishek Pathak Department of Computer Science and Engineering (Cyber Security), St. Vincent Pallotti College of Engineering and Technology, Nagpur, Maharashtra, India
Dr. Manish Gudadhe Department of Computer Science and Engineering (Data Science), St. Vincent Pallotti College of Engineering and Technology, Nagpur, Maharashtra, India
Mrudula Gudadhe Department of Information Technology, Priyadarshini College of Engineering, Nagpur, India
Vikas Singh Department of Computer Science and Engineering (Cyber Security), St. Vincent Pallotti College of Engineering and Technology, Nagpur, Maharashtra, India

DOI:

https://doi.org/10.29121/shodhkosh.v7.i2s.2026.7221

Keywords:

Classical Dance Gesture Recognition, Computer Vision, Pose Estimation, Joint-Angle Modeling, CNN-LSTM, Spatial–Temporal Learning, Mudra Classification, Cultural Heritage Digitization, Deep Learning, Human Action Recognition

Abstract [English]

The symbolic communication in classical dance forms is based on codified hand gestures (mudras), postures and the use of rhythmic sequences of movement. Nevertheless, systematic computational identification of fine-grained dance gestures is not widely studied because subtle articulation variations, costume coverups and lack of annotated data exist. The paper suggests a spatial-temporal deep learning model of gesture recognition in classical dance with computer vision methods. This approach combines skeleton extraction using pose estimation, calculating of joint-angle features to achieve rotational invariance, convolutional neural network (CNN)-based spatial embedding and Long Short-Term Memory (LSTM) based temporal modeling to model dynamic gestures development. The hybrid representation is a mixture between the skeletal precision and the contextual visualization, consequently, granting the opportunity to distinguish visually similar mudras. The experimental evaluation of a curated dataset of classical dance gestures evidence reveals that the provided model will be more successful than the default CNN-only and skeleton-based models and will have more successful results in terms of accuracy, precision, recall, and F1-score. The training and validation analysis is the argument of constant convergence and high level of generalization between performers. The findings confirm that angular skeletal modeling and temporal deep learning are effective in the recognition of fine-grained gestures. In addition to the performance of the classification, the framework can also benefit digital cultural heritage preservation, smart systems of dance tutoring, and AI-based performance analytics. The research study provides a strong base to implement computer vision and the application of deep learning to the systematic study of performing arts.

References

Feichtenhofer, C., Fan, H., Malik, J., and He, K. (2019). SlowFast Networks for Video Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (6202–6211). https://doi.org/10.1109/ICCV.2019.00630 DOI: https://doi.org/10.1109/ICCV.2019.00630

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (770–778). https://doi.org/10.1109/CVPR.2016.90 DOI: https://doi.org/10.1109/CVPR.2016.90

Kang, Y. (2023). GeoAI Application Areas and Research Trends. Journal of the Korean Geographical Society, 58, 395–418.

Li, H., Guo, H., and Huang, H. (2022). Analytical Model of Action Fusion in Sports Tennis Teaching by Convolutional Neural Networks. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/7835241 DOI: https://doi.org/10.1155/2022/7835241

Li, R., Yang, S., Ross, D. A., and Kanazawa, A. (2021). AI choreographer: Music-Conditioned 3D Dance Generation with AIST++. In Proceedings of the IEEE/CVF International Conference on Computer Vision (13401–13412). https://doi.org/10.1109/ICCV48922.2021.01315 DOI: https://doi.org/10.1109/ICCV48922.2021.01315

Lin, C.-B., Dong, Z., Kuan, W.-K., and Huang, Y.-F. (2020). A Framework for Fall Detection Based on OpenPose Skeleton and LSTM/GRU Models. Applied Sciences, 11(22), Article 10329. https://doi.org/10.3390/app11010329 DOI: https://doi.org/10.3390/app11010329

Lugaresi, C., et al. (2019). MediaPipe: A Framework for Building Perception Pipelines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1235–1247).

Sayyad, G. G., Salaskar, A., Vishwajeet, B.-P., Ghadage, B., and Khadatare, G. (2025). Dynamic Gesture-Based Mathematical Interfaces and Problem Solvers: A Survey of Emerging Trends, Innovations, and Future Opportunities. International Journal of Recent Advances in Engineering and Technology, 13(2), 37–43.

Shrestha, L., Dubey, S., Olimov, F., Rafique, M. A., and Jeon, M. (2022). 3D Convolutional with Attention for Action Recognition. arXiv.

Simonyan, K., and Zisserman, A. (2014). Two-Stream Convolutional Networks for Action Recognition in Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (568–576).

Yang, C., et al. (2020). Gated Convolutional Networks with Hybrid Connectivity for Image Classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34,12581–12588). https://doi.org/10.1609/aaai.v34i07.6948 DOI: https://doi.org/10.1609/aaai.v34i07.6948

Yuan, X., and Pan, P. (2022). Research on the Evaluation Model of Dance Movement Recognition and Automatic Generation Based on Long Short-Term Memory. Mathematical Problems in Engineering, 2022, Article 6405903. https://doi.org/10.1155/2022/6405903 DOI: https://doi.org/10.1155/2022/6405903

Zhang, B., Wang, L., Wang, Z., Qiao, Q. Y., and Wang, H. (2021). Real-Time Action Recognition with Two-Stream Neural Networks. Journal of Neural Networks, 34, 220–230.

Zhang, F., Bazarevsky, V., and Vakunov, A. (2020). BlazePose: Real-Time 3D Pose Estimation. Google Research.

Zhang, Y., and Yang, Q. (2022). A Survey on Multi-Task Learning. IEEE Transactions on Knowledge and Data Engineering, 34(12), 5586–5609. https://doi.org/10.1109/TKDE.2021.3070203 DOI: https://doi.org/10.1109/TKDE.2021.3070203

GESTURE RECOGNITION FOR CLASSICAL DANCE FORMS USING COMPUTER VISION

Authors

DOI:

Keywords:

Abstract [English]

References

Downloads

Published

How to Cite

Issue

Section

License

Custom-Block-Full

Current Issue