OBJECT DETECTION IN PHOTOGRAPHY USING DEEP LEARNING

Saniya Khurana; Akash Kumar Bhagat; Rajesh.Uttam Kanthe; Dipali Kapil Mundada; Dr. Tanmoy Parida; S.Prayla  Shyry; Kumar Ambar  Pandey

doi:10.29121/shodhkosh.v6.i4s.2025.6835

Authors

Saniya Khurana Centre of Research Impact and Outcome, Chitkara University, Rajpura- 140417, Punjab, India
Mr. Akash Kumar Bhagat Assistant Professor, Department of Computer Science and IT, Arka Jain University Jamshedpur, Jharkhand, India
Dr. Rajesh Uttam Kanthe Director, Bharati Vidyapeeth (Deemed to be University) Institute of Management, Kolhapur -416003, India
Dipali Kapil Mundada Department of Engineering, Science and Humanities, Vishwakarma Institute of Technology, Pune, Maharashtra, 411037 India
Dr. Tanmoy Parida Associate Professor, Department of Computer Science and Engineering, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
Dr. S.Prayla Shyry Professor, Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India,
Kumar Ambar Pandey Assistant Professor, School of Journalism and Mass Communication, Noida, International University, 203201, India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i4s.2025.6835

Keywords:

Object Detection, Deep Learning Photography, YOLO/Faster R-CNN, Image Annotation, Detection Architecture

Abstract [English]

Object detection in photography has developed fast due to deep learning and has changed the manner in which visual content is photographed, arranged, and understood. This paper is a detailed examination of the current detection systems and how they can apply to the photographic process. Starting with the description of classical approaches like HOG, Haar cascades, and SVM-based networks, the paper compares the drawbacks of the mentioned methods with the advancement of CNN-based frameworks. R-CNN to Faster R-CNN is talked about and efficiency of region proposal and representational richness are improved. The single-shot detectors that are investigated are YOLO, SSD, and RetinaNet as they can offer high-speed inference, thus they are applicable to the real-time or mobile photography case. The study also examines photography-focused datasets like COCO, Open Images and expert-curated collections, which are annotation formats and augmentation strategies, which are taken into account in artistic variability, lighting and composition issues common to both professional and amateur photography. A new architecture based on applying modern backbones: ResNet, EfficientNet, and Swin Transformer and flexible detection heads is proposed. The loss functions that encompass robust localization, classification refinement, and variants of the IoU are combined so that they optimize the performance in various photographic scenes. Applications have shown very strong effect: automated tagging and image organization, real-time detection of both DSLR/mobile systems, and intelligent aid to the creation of art and subject-awareness to enhance composition.

References

Albekairi, M., Mekki, H., Kaaniche, K., and Yousef, A. (2023). An Innovative Collision-Free Image-Based Visual Servoing Method for Mobile Robot Navigation Based on the Path Planning in the Image Plan. Sensors, 23(24), 9667. https://doi.org/10.3390/s23249667 DOI: https://doi.org/10.3390/s23249667

Asayesh, S., Darani, H. S., Chen, M., Mehrandezh, M., and Gupta, K. (2023). Toward Scalable Visual Servoing Using Deep Reinforcement Learning and Optimal Control. Arxiv Preprint Arxiv:2310.01360.

Fu, G., Chu, H., Liu, L., Fang, L., and Zhu, X. (2023). Deep Reinforcement Learning for the Visual Servoing Control of UAVs with FOV Constraint. Drones, 7(6), 375. https://doi.org/10.3390/drones7060375 DOI: https://doi.org/10.3390/drones7060375

Jin, Z., Wu, J., Liu, A., Zhang, W. A., and Yu, L. (2022). Policy-Based Deep Reinforcement Learning for Visual Servoing Control of Mobile Robots with Visibility Constraints. IEEE Transactions on Industrial Electronics, 69(2), 1898–1908. https://doi.org/10.1109/TIE.2021.3057005 DOI: https://doi.org/10.1109/TIE.2021.3057005

Li, J., Peng, X., Li, B., Sreeram, V., Wu, J., Chen, Z., and Li, M. (2023). Model Predictive Control for Constrained Robot Manipulator Visual Servoing Tuned by Reinforcement Learning. Mathematical Biosciences and Engineering, 20(9), 10495–10513. https://doi.org/10.3934/mbe.2023463 DOI: https://doi.org/10.3934/mbe.2023463

Machkour, Z., Ortiz-Arroyo, D., and Durdevic, P. (2022). Classical and Deep Learning-Based Visual Servoing Systems: A Survey on State of the Art. Journal of Intelligent and Robotic Systems, 104(1), 11. https://doi.org/10.1007/s10846-021-01540-w DOI: https://doi.org/10.1007/s10846-021-01540-w

Peng, X., Li, J., Li, B., and Wu, J. (2022). Constrained Image-Based Visual Servoing of Robot Manipulator with Third-Order Sliding-Mode Observer. Machines, 10(6), 465. https://doi.org/10.3390/machines10060465 DOI: https://doi.org/10.3390/machines10060465

Ramani, P., Varghese, A., and Balachandar, N. (2024). Image-Based Visual Servoing for Tele-Operated Ground Vehicles. AIP Conference Proceedings, 2802(1), 110001. https://doi.org/10.1063/5.0181872 DOI: https://doi.org/10.1063/5.0181872

Reis, D., Kupec, J., Hong, J., and Daoudi, A. (2024). Real-Time Flying Object Detection with YOLOv8. Arxiv Preprint ArXiv:2305.09972.

Rekavandi, A. M., Rashidi, S., Boussaid, F., Hoefs, S., Akbas, E., and Bennamoun, M. (2023). Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art. Arxiv Preprint arXiv:2309.04902.

Ribeiro, E. G., Mendes, R. Q., Terra, M. H., and Grassi, V. (2024). Second-Order Position-Based Visual Servoing of a Robot Manipulator. IEEE Robotics and Automation Letters, 9(1), 207–214. https://doi.org/10.1109/LRA.2023.3331894 DOI: https://doi.org/10.1109/LRA.2023.3331894

Yang, K., Bai, C., She, Z., and Quan, Q. (2024). High-Speed Interception Multicopter Control by Image-Based Visual Servoing. ArXiv Preprint arXiv:2404.08296. https://doi.org/10.1109/TCST.2024.3451293 DOI: https://doi.org/10.1109/TCST.2024.3451293

Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L. M., and Shum, H. Y. (2022). DINO: DETR with Improved Denoising Anchor Boxes for End-to-End Object Detection. Arxiv Preprint arXiv:2203.03605.

Zhang, Y., Yang, Y., and Luo, W. (2023). Occlusion-Free Image-Based Visual Servoing Using Probabilistic Control Barrier Certificates. IFAC-PapersOnLine, 56(2), 4381–4387. https://doi.org/10.1016/j.ifacol.2023.10.1818 DOI: https://doi.org/10.1016/j.ifacol.2023.10.1818

Zhu, T., Mao, J., Han, L., and Zhang, C. (2024). Fuzzy Adaptive Model Predictive Control for Image-Based Visual Servoing of Robot Manipulators with Kinematic Constraints. International Journal of Control, Automation and Systems, 22(2), 311–322. https://doi.org/10.1007/s12555-022-0205-6 DOI: https://doi.org/10.1007/s12555-022-0205-6

OBJECT DETECTION IN PHOTOGRAPHY USING DEEP LEARNING

Authors

DOI:

Keywords:

Abstract [English]

References

Downloads

Published

How to Cite

Issue

Section

License

Custom-Block-Full

Current Issue