DEEP LEARNING APPROACHES TO EMOTION RECOGNITION IN PHOTOGRAPHIC IMAGES
DOI:
https://doi.org/10.29121/shodhkosh.v6.i5s.2025.6974Keywords:
Photo Emotion Recognition, Affective Computing, CNN, Vision Transformer, Feature Fusion, Macro-F1, Calibration, Explainable AIAbstract [English]
Photo Emotion Recognition (PER) is supposed to learn what emotion is expressed or invoked by an image based on visual representations of color harmony, composition, object-scene semantics, human expressions in the presence when possible. In contrast to face-centric affect analysis, PER needs to analyze the emotions that frequently are a result of situational semantics and aesthetics, as opposed to explicit facial expression. This enhances ambiguity, label subjectivity, and overlapping of the classes. Additionally, the benchmarks of PER are often characterized by class imbalance and noisy annotations because of the different human perceptions. The paper is a complete analytical PER study with a proposed hybrid deep learning model (combines convolutional representations and transformer) to simultaneously identify low-level aesthetic representations and global semantic context. The proposed architecture includes CNN and transformer branches with regard to local texture color stimuli and long-range relational reasoning respectively, followed by the gated-feature fusion and using a balanced classification head. Class-balanced focal loss, label smoothing and emotion-preserving augmentation are used to construct a robust training pipeline, which prevents the distortions that are likely to alter affective meaning. The assessments of the results include macro-F1, per-class sensitivity, and the confusion behavior among the neighbouring emotions, calibration, and cross-domain strength. Numerous experiments of ablation prove that fusion and high-resistance loss decisions are always more effective on the macro-F1 and assist less in common confusions (e.g., fear vs. surprise, sadness vs. contentment/neutral). Lastly, it is a case of explainability analysis through gradient-based localization to determine whether the predictions are in agreement with the emotionally salient regions. Conclusion of the paper is deployment advice (latency, model size, and quantization) and ethical inferences of subjective affect modelling.
References
Ali, A., Oyana, C., and Salum, O. (2024). Domestic Cats Facial Expression Recognition Using CNN. International Journal of Engineering and Advanced Technology, 13, 45–52. https://doi.org/10.35940/ijeat.E4484.13050624 DOI: https://doi.org/10.35940/ijeat.E4484.13050624
Bhattacharjee, S., et al. (2021). Cluster Analysis of Cell Nuclei for Prostate Cancer Diagnosis. Diagnostics, 12, 15. https://doi.org/10.3390/diagnostics12010015 DOI: https://doi.org/10.3390/diagnostics12010015
Corujo, L. A., et al. (2021). Emotion Recognition in Horses With Cnns. Future Internet, 13, 250. Https://Doi.Org/10.3390/Fi13100250 DOI: https://doi.org/10.3390/fi13100250
Dalvi, C., et al. (2021). A Survey of Ai-Based Facial Emotion Recognition. IEEE Access, 9, 165806–165840. https://doi.org/10.1109/ACCESS.2021.3131733 DOI: https://doi.org/10.1109/ACCESS.2021.3131733
Feighelstein, M., et al. (2022). Automated Recognition of Pain in Cats. Scientific Reports, 12, 9575. https://doi.org/10.1038/s41598-022-13348-1 DOI: https://doi.org/10.1038/s41598-022-13348-1
Guo, R. (2023). Pre-trained Multi-Modal Transformer for Pet Emotion Detection. In Proceedings of SciTePress. https://doi.org/10.5220/0011961500003612 DOI: https://doi.org/10.5220/0011961500003612
He, K., et al. (2016). Deep Residual Learning for Image Recognition. In Proceedings of IEEE CVPR, Las Vegas, USA (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90 DOI: https://doi.org/10.1109/CVPR.2016.90
Huang, J., Xu, X., and Zhang, T. (2017). Emotion Classification Using Deep Neural Networks and Emotional Patches. In Proceedings of the IEEE BIBM, Kansas City, USA. https://doi.org/10.1109/BIBM.2017.8217786 DOI: https://doi.org/10.1109/BIBM.2017.8217786
Kujala, M. V., et al. (2020). Time-Resolved Classification of Dog Brain Signals. Scientific Reports, 10, 19846. https://doi.org/10.1038/s41598-020-76806-8 DOI: https://doi.org/10.1038/s41598-020-76806-8
Laganà, F., et al. (2024). Detect Carcinomas using Tomographic Impedance. Engineering, 5, 1594–1614. https://doi.org/10.3390/eng5030084 DOI: https://doi.org/10.3390/eng5030084
Le Jeune, F., et al. (2008). Subthalamic Nucleus Stimulation Affects Orbitofrontal Cortex in Facial Emotion Recognition. Brain, 131, 1599–1608. https://doi.org/10.1093/brain/awn084 DOI: https://doi.org/10.1093/brain/awn084
Li, S., and Deng, W. (2022). Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing, 13, 1195–1215. https://doi.org/10.1109/TAFFC.2020.2981446 DOI: https://doi.org/10.1109/TAFFC.2020.2981446
Liu, H., et al. (2021). A Perspective on Pet Emotion Monitoring Using Millimeter Wave Radar. In Proceedings of ISAPE, Zhuhai, China. https://doi.org/10.1109/ISAPE54070.2021.9753337 DOI: https://doi.org/10.1109/ISAPE54070.2021.9753337
O’Shea, K. (2015). An Introduction to Convolutional Neural Networks. arXiv preprint arXiv:1511.08458.
Sinnott, R. O., et al. (2021). Run or Pat: Using Deep Learning to Classify the Species Type and Emotion of Pets. In Proceedings of the IEEE CSDE, Brisbane, Australia. https://doi.org/10.1109/CSDE53843.2021.9718465 DOI: https://doi.org/10.1109/CSDE53843.2021.9718465
Sumon, R. I., et al. (2023). Densely Convolutional Spatial Attention Network for Nuclei Segmentation. Frontiers in Oncology, 13, 1009681. DOI: https://doi.org/10.3389/fonc.2023.1009681
Sumon, R. I., et al. (2023). Enhanced Nuclei Segmentation Using Triple-Encoder Architecture. In Proceedings of IEEE UEMCON, New York, USA.
Sumon, R. I., et al. (2024). Exploring DL and ML for Histopathological Image Classification. In Proceedings of ICECET, Sydney, Australia.
Tanwar, V. (2024). CNN-Based Classification for Dog Emotions. In Proceedings of ICOSEC, India (964–969). https://doi.org/10.1109/ICOSEC61587.2024.10722523 DOI: https://doi.org/10.1109/ICOSEC61587.2024.10722523
Tokuhisa, R., Inui, K., and Matsumoto, Y. (2008). Emotion Classification Using Massive Examples Extracted from the Web. In Proceedings of COLING, Manchester, UK. https://doi.org/10.3115/1599081.1599192 DOI: https://doi.org/10.3115/1599081.1599192
Wu, Z. (2024). Recognition and Analysis of Pet Facial Expression Using Densenet. In Proceedings of SciTePress. https://doi.org/10.5220/0012800000003885 DOI: https://doi.org/10.5220/0012800000003885
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Xma R Pote, Dr. Mahaveerakannan R, Dr. Priscilla Joy, Sheeba Santhosh, Dr. Narina Thakur, M. Vignesh

This work is licensed under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.























