GANS FOR MUSICAL STYLE TRANSFER AND LEARNING

Syed Fahar Ali; Keerti Rai; Swapnil M. Parikh; Abhinav Rathour; Manivannan Karunakaran; Nishant Kulkarni

doi:10.29121/shodhkosh.v6.i4s.2025.6875

Authors

Syed Fahar Ali Associate Professor, School of Journalism and Mass Communication, Noida International University, 203201, India.
Dr. Keerti Rai Associate Professor, Department of Electrical and Electronics Engineering, Arka Jain University, Jamshedpur, Jharkhand, India
Dr. Swapnil M. Parikh Professor, Department of Computer science and Engineering, Faculty of Engineering and Technology, Parul institute of Technology, Parul University, Vadodara, Gujarat, India
Abhinav Rathour Centre of Research Impact and Outcome, Chitkara University, Rajpura 140417, Punjab, India.
Manivannan Karunakaran Professor and Head, Department of Information Science and Engineering, JAIN (Deemed-to-be University), Bengaluru, Karnataka, India
Nishant Kulkarni Department of Mechanical Engineering Vishwakarma Institute of Technology, Pune, Maharashtra, 411037, India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i4s.2025.6875

Keywords:

Generative Adversarial Networks (GANs), Musical Style Transfer, Audio Synthesis, Deep Learning in Music, AI Composition Systems

Abstract [English]

Generative Adversarial Networks (GANs) are considered to be disruptive models of computational creativity, especially in music style transfer and learning. This study examines how GAN architecture may be incorporated in translating pieces of music between different stylistic domains without compromising their time and harmonious integrity. The conventional approaches including Autoencoders, RNNs, and Variational Autoencoders (VAEs) have shown a low success rate in the fine-grained representations of music which has led to the adoption of GANs due to their better generative realism. The suggested model uses Conditional GANs and CycleGANs, which allows supervised and unpaired learning with various musical data. The data normalization and preprocessing is done using feature extraction methods that are Mel-frequency cepstral coefficient (MFCCs), chroma features, and spectral contrast. The architecture focuses on balanced loss optimization between the discriminator and the generator and makes sure that there is convergence stability and audio fidelity. The results of experimental analysis show significant enhancement of melody preservation, timbre adaptation, and rhythmic consistency of genres. Moreover, the paper describes the use in AI-assisted composition, intelligent sound design, and interactive music education systems. These results highlight the value of GANs as creative processes, as well as educational instruments, enabling real-time modification of the style and music specifically synthesized to the user. The study, with its developed methodology of learning musical style using GAN and cross-domain adaptation, adds to an area of investigation of machine learning, cognition of music and digital creativity, which is being recently reshaped.

References

Annaki, I., Rahmoune, M., and Bourhaleb, M. (2024). Overview of Data Augmentation Techniques in Time Series Analysis. International Journal of Advanced Computer Science and Applications, 15, 1201–1211. https://doi.org/10.14569/IJACSA.2024.01501118 DOI: https://doi.org/10.14569/IJACSA.2024.01501118

Chen, J., Teo, T. H., Kok, C. L., and Koh, Y. Y. (2024). A Novel Single-Word Speech Recognition on Embedded Systems Using a Convolution Neural Network with Improved Out-of-Distribution Detection. Electronics, 13, 530. https://doi.org/10.3390/electronics13030530 DOI: https://doi.org/10.3390/electronics13030530

Chen, S., Kalanat, N., Xie, Y., Li, S., Zwart, J. A., Sadler, J. M., Appling, A. P., Oliver, S. K., Read, J. S., and Jia, X. (2023). Physics-Guided Machine Learning from Simulated Data with Different Physical Parameters. Knowledge and Information Systems, 65, 3223–3250. https://doi.org/10.1007/s10115-023-01864-z DOI: https://doi.org/10.1007/s10115-023-01864-z

Hazra, D., and Byun, Y. C. (2020). SynSigGAN: Generative Adversarial Networks for Synthetic Biomedical Signal Generation. Biology, 9, 441. https://doi.org/10.3390/biology9120441 DOI: https://doi.org/10.3390/biology9120441

Huang, F., and Deng, Y. (2023). TCGAN: Convolutional Generative Adversarial Network for Time Series Classification and Clustering. Neural Networks, 165, 868–883. https://doi.org/10.1016/j.neunet.2023.06.033 DOI: https://doi.org/10.1016/j.neunet.2023.06.033

Lan, J., Zhou, Y., Guo, Q., and Sun, H. (2024). Data Augmentation for Data-Driven Methods in Power System Operation: A Novel Framework Using Improved GAN and Transfer Learning. IEEE Transactions on Power Systems, 39, 6399–6411. https://doi.org/10.1109/TPWRS.2024.3364166 DOI: https://doi.org/10.1109/TPWRS.2024.3364166

Li, Z., Liu, F., Yang, W., Peng, S., and Zhou, J. (2021). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33, 6999–7019. https://doi.org/10.1109/TNNLS.2021.3084827 DOI: https://doi.org/10.1109/TNNLS.2021.3084827

Mienye, I. D., Swart, T. G., and Obaido, G. (2024). Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information, 15, 517. https://doi.org/10.3390/info15090517 DOI: https://doi.org/10.3390/info15090517

Rakhmatulin, I., Dao, M. S., Nassibi, A., and Mandic, D. (2024). Exploring Convolutional Neural Network Architectures for EEG Feature Extraction. Sensors, 24, 877. https://doi.org/10.3390/s24030877 DOI: https://doi.org/10.3390/s24030877

Sajeeda, A., and Hossain, B. M. (2022). Exploring Generative Adversarial Networks and Adversarial Training. International Journal of Cognitive Computing in Engineering, 3, 78–89. https://doi.org/10.1016/j.ijcce.2022.03.002 DOI: https://doi.org/10.1016/j.ijcce.2022.03.002

Semenoglou, A. A., Spiliotis, E., and Assimakopoulos, V. (2023). Data Augmentation for Univariate Time Series Forecasting with Neural Networks. Pattern Recognition, 134, 109132. https://doi.org/10.1016/j.patcog.2022.109132 DOI: https://doi.org/10.1016/j.patcog.2022.109132

Song, X., Xiong, J., Wang, M., Mei, Q., and Lin, X. (2024). Combined Data Augmentation on EANN to Identify Indoor Anomalous Sound Event. Applied Sciences, 14, 1327. https://doi.org/10.3390/app14041327 DOI: https://doi.org/10.3390/app14041327

Wen, Q., Sun, L., Yang, F., Song, X., Gao, J., Wang, X., and Xu, H. (2020). Time Series Data Augmentation for Deep Learning: A Survey. arXiv. https://doi.org/10.24963/ijcai.2021/631 DOI: https://doi.org/10.24963/ijcai.2021/631

Yang, S., Guo, S., Zhao, J., and Shen, F. (2024). Investigating the Effectiveness of Data Augmentation from Similarity and Diversity: An Empirical Study. Pattern Recognition, 148, 110204. https://doi.org/10.1016/j.patcog.2023.110204 DOI: https://doi.org/10.1016/j.patcog.2023.110204

Yuan, R., Wang, B., Sun, Y., Song, X., and Watada, J. (2022). Conditional Style-Based Generative Adversarial Networks for Renewable Scenario Generation. IEEE Transactions on Power Systems, 38, 1281–1296. https://doi.org/10.1109/TPWRS.2022.3170992 DOI: https://doi.org/10.1109/TPWRS.2022.3170992

GANS FOR MUSICAL STYLE TRANSFER AND LEARNING

Authors

DOI:

Keywords:

Abstract [English]

References

Downloads

Published

How to Cite

Issue

Section

License

Custom-Block-Full

Current Issue