NEURAL NETWORKS IN SOUND CLASSIFICATION FOR ART STUDENTS
DOI:
https://doi.org/10.29121/shodhkosh.v6.i3.2025.6668Keywords:
Neural Networks, Sound Emotion Recognition, CNN–LSTM Architecture, Transformer Attention, Valence–Arousal MappingAbstract [English]
Sound classification has become an important element in contemporary creative practice, in digital art, interactive installation, performance design and multimedia storytelling. For art students, it not only equips a technological basis but also a creative toolkit to design novel expressive modality through neural network understanding of sound: his study of sound emotion mapping hybrid neural network is equipped with CNN-based spectral extractor, LSTM temporal mode and Transformer attention learning. Using datasets obtained from RAVDESS, EMO-DB, and IEMOCAP, the model can promote high accuracy in the categorical emotion recognition and high alignment in continuous valence arousal prediction. The attention mechanism allows to improve the interpretability by focusing on emotionally salient regions of time-frequency representations. Results indicate that combining spatial, temporal, and contextual representations facilitates robust and generalizable emotion mapping to provide a reliable framework for affect-aware audio applications. The proposed approach furthers the understanding of the interpretation of expressive sound by neural networks and informs future works in the creative computing and human-centered AI fields.
References
Adusumalli, B., Kumar, L. N., Kavya, N., Deepak, K. P., and Indu, P. (2025). TweetScan: An Intelligent Framework for Deepfake Tweet Detection using CNN and FastText. IJRAET, 14(1), 62–70.
Ahmed, M. R., Robin, T. I., and Shafin, A. A. (2020). Automatic Environmental Sound Recognition (AESR) Using Convolutional Neural Network. International Journal of Modern Education and Computer Science, 12(5). https://doi.org/10.515/ijmecs.2020.05.04 DOI: https://doi.org/10.5815/ijmecs.2020.05.04
Dang, T., et al. (2022). Exploring Longitudinal Cough, Breath, and Voice Data for COVID-19 Disease Progression Prediction Via Sequential Deep Learning: Model Development and Validation. Journal of Medical Internet Research. https://doi.org/10.2196/preprints.37004 DOI: https://doi.org/10.2196/preprints.37004
Demir, F., Abdullah, D. A., and Sengur, A. (2020). A New Deep CNN Model for Environmental Sound Classification. IEEE Access, , 66529–66537. https://doi.org/10.1109/ACCESS.2020.294903 DOI: https://doi.org/10.1109/ACCESS.2020.2984903
Eskimez, S. E., et al. (2022). Personalized Speech Enhancement: New Models and Comprehensive Evaluation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022) (356–360). https://doi.org/10.1109/ICASSP43922.2022.9746962 DOI: https://doi.org/10.1109/ICASSP43922.2022.9746962
Guzhov, A., Raue, F., Hees, J., and Dengel, A. (2021). ESResNet: Environmental Sound Classification Based on Visual Domain Models. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR) (pp. 4933–4940). https://doi.org/10.1109/ICPR406.2021.9413035 DOI: https://doi.org/10.1109/ICPR48806.2021.9413035
Han, W., et al. (2020). ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context. In Proceedings of Interspeech 2020 (3610–3614). https://doi.org/10.21437/Interspeech.2020-2059 DOI: https://doi.org/10.21437/Interspeech.2020-2059
İnik, Ö. (2023). CNN Hyper-Parameter Optimization for Environmental Sound Classification. Applied Acoustics, 202, Article 10916. https://doi.org/10.1016/j.apacoust.2022.10916 DOI: https://doi.org/10.1016/j.apacoust.2022.109168
Madhu, A., and Suresh, K. (2023). RQNet: Residual Quaternion CNN for Performance Enhancement in Low Complexity and Device-Robust Acoustic Scene Classification. IEEE Transactions on Multimedia. https://doi.org/10.1109/TMM.2023.3241553 DOI: https://doi.org/10.1109/TMM.2023.3241553
Mnasri, Z., Rovetta, S., and Masulli, F. (2022). Anomalous Sound Event Detection: A Survey of Machine Learning Based Methods and Applications. Multimedia Tools and Applications, 1(4), 5537–556. https://doi.org/10.1007/s11042-021-1117-9 DOI: https://doi.org/10.1007/s11042-021-11817-9
Mulla, R. A., Pawar, M. E., Bhange, A., Goyal, K. K., Prusty, S., Ajani, S. N., and Bashir, A. K. (2024). Optimizing Content Delivery in ICN-based VANET using Machine Learning Techniques. In WSN and IoT: An integrated approach for smart applications (pp. 165–16). https://doi.org/10.1201/971003437079-7 DOI: https://doi.org/10.1201/9781003437079-7
Orken, M., Dina, O., Keylan, A., Tolganay, T., and Mohamed, O. (2022). A Study of Transformer-Based End-To-End Speech Recognition System for Kazakh Language. Scientific Reports, 12, Article 337. https://doi.org/10.103/s4159-022-12260-y DOI: https://doi.org/10.1038/s41598-022-12260-y
Sivaraman, A., Kim, S., and Kim, M. (2021). Personalized Speech Enhancement Through Self-Supervised Data Augmentation and Purification. In Proceedings of Interspeech 2021. https://doi.org/10.21437/Interspeech.2021-16 DOI: https://doi.org/10.21437/Interspeech.2021-1868
Sultana, S. K. R., Sravani, K., Ranga Lokesh, N. S., Venkateswararao, K., and Lakshmaiah, K. (2025). Automated ID Card Detection and Penalty System using YOLOv5 and Face Recognition. IJRAET, 14(1), 54–61.
Triantafyllopoulos, A., Liu, S., and Schuller, B. W. (2021). Deep Speaker Conditioning for Speech Emotion Recognition. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2021) (pp. 1–6). https://doi.org/10.1109/ICME51207.2021.942217 DOI: https://doi.org/10.1109/ICME51207.2021.9428217
Verbitskiy, S., Berikov, V., and Vyshegorodtsev, V. (2022). ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition. Pattern Recognition Letters, 161, 3–44. https://doi.org/10.1016/j.patrec.2022.07.012 DOI: https://doi.org/10.1016/j.patrec.2022.07.012
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ayush Gandhi, Dr. A.C Santha Sheela, Lakshya Swarup, Ms. Ipsita Dash, Swati Srivastava, Dr. Varsha Kiran Bhosale

This work is licensed under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.























