MACHINE LEARNING FOR ART CRITIQUE GENERATION

Authors

  • R. Viswanathan Associate Professor, Department of Computer Science and Engineering, Aarupadai Veedu Institute of Technology, Vinayaka Mission’s Research Foundation (Deemed to be University), Tamil Nadu, India
  • Pooja Yadav Assistant Professor, School of Business Management, Noida International University, India
  • M. S. Pavithra Department of Master of Computer Applications, ATME College of Engineering, Mysuru 570028, Karnataka, India
  • Ankit Sachdeva Centre of Research Impact and Outcome, Chitkara University, Rajpura 140417, Punjab, India
  • Sourav Panda Assistant Professor, Department of Film, Parul Institute of Design, Parul University, Vadodara, Gujarat, India
  • Srushti Deshmukh Department of Electronics and Telecommunication Engineering, Vishwakarma Institute of Technology, Pune 411037, Maharashtra, India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i5s.2025.6921

Keywords:

Machine Learning, Art Critique Generation, Vision Transformers, Multimodal Fusion, Natural Language Generation, Computational Aesthetics

Abstract [English]

The development of artificial intelligence has led to new opportunities to create art critique that is coherent and reacting to context to produce the mimic depth of analysis of humans. The current paper is an in-depth machine learning system that can generate structured, interpretive, and stylistically rich art reviews through the application of state-of-the-art visual comprehension and natural language generation. The suggested system is a combination of the convolutional neural networks (CNNs) and Vision Transformers (ViTs) to extract fine-grained visual evidence, which consists of composition, texture, color harmony, and stylistic cues and fuses them with contextual metadata like the artist background, historical period, and indicative pointers. Multimodal fusion module coordinates these different representations and then sends them to a transformer-based critique generator that is able to generate descriptive, interpretive, comparative, and evaluative text. In order to justify this framework, we assemble a heterogeneous dataset comprising of high-resolution art photographs and professional cura corpora of museums, scholarly publications, and of professional art reviews. The subtle aesthetic judgment and interpretive reference that is lost in technical judgments and lexical richness are made in the form of expert-in-the-loop annotations which are culturally sensitive. The preprocessing methods such as augmentation, normalization, and de-biasing are used to enhance the robustness of the model and minimize the skew in the style. Experiments indicate that, multimodal conditioning greatly increases specificity of critique and conceptual grounding in comparison with vision or text only baselines.

References

Bird, J. J., and Lotfi, A. (2024). CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access, 12, 1–13. https://doi.org/10.1109/ACCESS.2024.3356122 DOI: https://doi.org/10.1109/ACCESS.2024.3356122

Brooks, T., Holynski, A., and Efros, A. A. (2023). InstructPix2Pix: Learning to Follow Image Editing Instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (183–192). https://doi.org/10.1109/CVPR52729.2023.01764 DOI: https://doi.org/10.1109/CVPR52729.2023.01764

Chen, Z. (2024). Graph Adaptive Attention Network with Cross-Entropy. Entropy, 26, 576. https://doi.org/10.3390/e26070576 DOI: https://doi.org/10.3390/e26070576

Chen, Z. (2024). HTBNet: Arbitrary Shape Scene Text Detection with Binarization of Hyperbolic Tangent and Cross-Entropy. Entropy, 26, 560. https://doi.org/10.3390/e26070560 DOI: https://doi.org/10.3390/e26070560

Du, Z., Zeng, A., Dong, Y., and Tang, J. (2024). Understanding Emergent Abilities of Language Models from the Loss Perspective (arXiv:2403.15796). arXiv.

Li, J., Zhong, J., Liu, S., and Fan, X. (2024). Opportunities and Challenges in AI Painting: The Game Between Artificial Intelligence and Humanity. Journal of Big Data Computing, 2, 44–49. https://doi.org/10.62517/jbdc.202401106 DOI: https://doi.org/10.62517/jbdc.202401106

Li, Y., Liu, Z., Zhao, J., Ren, L., Li, F., Luo, J., and Luo, B. (2024). The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking. In Proceedings of the European Conference on Computer Vision Workshops (pp. 1–17). Springer. https://doi.org/10.1007/978-3-031-70879-4_16 DOI: https://doi.org/10.1007/978-3-031-70879-4_16

Parmar, G., Park, T., Narasimhan, S., and Zhu, J. (2024). One-Step Image Translation with Text-To-Image Models (arXiv:2403.12036). arXiv.

Png, W. H., Aun, Y., and Gan, M. (2024). FeaST: Feature-Guided Style Transfer for High-Fidelity Art Synthesis. Computers and Graphics, 122, 103975. https://doi.org/10.1016/j.cag.2024.103975 DOI: https://doi.org/10.1016/j.cag.2024.103975

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents (arXiv:2204.06125). arXiv.

Vartiainen, H., and Tedre, M. (2023). Using Artificial Intelligence in Craft Education: Crafting with Text-To-Image Generative Models. Digital Creativity, 34, 1–21. https://doi.org/10.1080/14626268.2023.2174557 DOI: https://doi.org/10.1080/14626268.2023.2174557

Wang, B., Zhu, Y., Chen, L., Liu, J., Sun, L., and Childs, P. R. N. (2023). A Study of the Evaluation Metrics for Generative Images Containing Combinational Creativity. Artificial Intelligence for Engineering Design, Analysis and Manufacturing, 37, Article e6. https://doi.org/10.1017/S0890060423000069 DOI: https://doi.org/10.1017/S0890060423000069

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., et al. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research.

Xu, Y., Xu, X., Gao, H., and Xiao, F. (2024). SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text-To-Image Generation. IEEE Transactions on Multimedia, 26, 9804–9813. https://doi.org/10.1109/TMM.2024.3399075 DOI: https://doi.org/10.1109/TMM.2024.3399075

Yang, Z., Zhan, F., Liu, K., Xu, M., and Lu, S. (2023). AI-Generated Images as Data Source: The Dawn of Synthetic Era (arXiv:2310.01830). arXiv.

Zhu, M., Chen, H., Yan, Q., Huang, X., Lin, G., Li, W., Tuv, Z., Hu, H., Hu, J., and Wang, Y. (2023). GenImage: A Million-Scale Benchmark for Detecting AI-Generated Images (arXiv:2306.0857). arXiv.

Downloads

Published

2025-12-28

How to Cite

R. Viswanathan, Yadav, P., M. S. Pavithra, Sachdeva, A., Panda, S., & Deshmukh, S. (2025). MACHINE LEARNING FOR ART CRITIQUE GENERATION. ShodhKosh: Journal of Visual and Performing Arts, 6(5s), 373–383. https://doi.org/10.29121/shodhkosh.v6.i5s.2025.6921