NLP-BASED FOLK STORY DOCUMENTATION SYSTEMS

Authors

  • Richa Srivastava Assistant Professor, School of Business Management, Noida International University, India
  • Ramu K Department of Computer Science and Engineering, Aarupadai Veedu Institute of Technology, Vinayaka Mission’s Research Foundation (DU), Tamil Nadu, India
  • Ayush Gandhi Centre of Research Impact and Outcome, Chitkara University, Rajpura 140417, Punjab, India
  • Shubhangi S. Shambharkar Department of Computer Technology, Yeshwantrao Chavan College of Engineering, India
  • Chandrashekhar Ramesh Ramtirthkar Department of Mechanical Engineering, Vishwakarma Institute of Technology, Pune 411037, Maharashtra, India
  • Dr. L. Lakshmanan Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i5s.2025.6879

Keywords:

Cultural Heritage Preservation, Semantic Annotation, Transformer Models, Multilingual Corpus, Knowledge Graph, BER Topic, Ontology Alignment, Digital Humanities

Abstract [English]

The folk tales are invaluable sources of cultural wisdom, language diversity and collective memory. Much of this intangible heritage is however in danger after the diminishing oral tradition and poor documentation mechanisms. This paper will introduce a NLP-based folk story documentation and preservation framework, which includes the state of the art natural language processing, semantic models as well as ontology-based cultural representations. The suggested pipeline works with multilingual and dialect-containing texts, handling them in a set of computational steps, which are text preprocessing, linguistic analysis, motif discovery, semantic annotation, and the development of the cultural knowledge graphs. Transformer-based models mbERT, IndicBERT and RoBERTa have been fine-tuned on language-specific tasks and BERTopic and TransE embeddings were used to perform thematic clustering and ontology alignment on the CIDOC-CRM schema. The outcomes of the evaluation showed that there were great improvements in linguistic accuracy (F1 = 0.91), motif classification (F1 = 0.83), and topic coherence (CV = 0.74) in comparison to the traditional baselines. The validation of the experts in the field of folklorists and linguists provided a Cultural Authenticity Index (CAI) of 0.87, which validated the interpretive reliability of the system. The acquired body of knowledge can be used to aid semantic querying, comparative motif analysis and cultural pattern recognition and hence convert the archives, which are normally static folklore collections, into smart, interactive ones. It finds applications in the digital heritage management, education, creative storytelling and cross-cultural analytics. Altogether, the framework contributes to a scaled and ethically justified method of the AI-assisted cultural preservation as the linguistic richness and narrative depth of the folk traditions can survive in the digital age.

References

Asyrofi, R., Siahaan, D. O., and Priyadi, Y. (2020). Extraction Dependency Based on Evolutionary Requirement Using Natural Language Processing. In Proceedings of the 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) (pp. 332–337), Yogyakarta, Indonesia. DOI: https://doi.org/10.1109/ISRITI51436.2020.9315489

Bozyiğit, F., Aktaş, Ö., and Kılınç, D. (2021). Linking Software Requirements and Conceptual Models : A Systematic Literature Review. Engineering Science and Technology, an International Journal, 24, 71–82. https://doi.org/10.1016/j.jestch.2020.09.002 DOI: https://doi.org/10.1016/j.jestch.2020.11.006

Bretas, V. P. G., and Alon, I. (2021). Franchising Research on Emerging Markets : Bibliométrie and Content Analyses. Journal of Business Research, 133, 51–65. https://doi.org/10.1016/j.jbusres.2021.04.006 DOI: https://doi.org/10.1016/j.jbusres.2021.04.067

Chakrabarty, B. K. (2022). Integrated Computer-Aided Design by Optimization : An Overview. In Integrated CAD by Optimisation : Architecture, Engineering, Construction, Urban Development and Management (pp. 1–49). Cham, Switzerland : Springer. https://doi.org/10.1007/978-3-030-96895-1_1 DOI: https://doi.org/10.1007/978-3-030-99306-1_1

He, J., Liu, Z., Xia, Y., Wang, J., Zhang, X., and Liu, Y. (2019). Analyzing the Potential of ChatGPT-Like Models in Healthcare: Opportunities and Challenges. Journal of Medical Internet Research, 21, e16279. https://doi.org/10.2196/16279 DOI: https://doi.org/10.2196/16279

Javed, S., Usman, M., Sandin, F., Liwicki, M., and Mokayed, H. (2023). Deep Ontology Alignment Using a Natural Language Processing Approach for Automatic M2M Translation in IIoT. Sensors, 23, 8427. https://doi.org/10.3390/s23218427 DOI: https://doi.org/10.3390/s23208427

Okeke, F. O., Ezema, E. C., Ibem, E. O., Sam-Amobi, C., and Ahmed, A. (2025). Comparative Analysis of the Features of Major Green Building Rating Tools (GBRTs): A Systematic Review. Lecture Notes in Civil Engineering, 539, 355–370. https://doi.org/10.1007/978-3-031-30000-0_25 DOI: https://doi.org/10.1007/978-981-97-5910-1_35

Orji, E. Z., Haydar, A., Erşan, İ., and Mwambe, O. O. (2023). Advancing OCR Accuracy in Image-to-LaTeX Conversion: A Critical and Creative Exploration. Applied Sciences, 13, 12503. https://doi.org/10.3390/app132212503 DOI: https://doi.org/10.3390/app132212503

Osama, M., Zaki-Ismail, A., Abdelrazek, M., Grundy, J., and Ibrahim, A. (2020). Score-Based Automatic Detection and Resolution of Syntactic Ambiguity in Natural Language Requirements. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), 651–661, Adelaide, Australia. https://doi.org/10.1109/ICSME.2020.00077 DOI: https://doi.org/10.1109/ICSME46990.2020.00067

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., et al. (2021). The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71 DOI: https://doi.org/10.1136/bmj.n71

Qin, C., Zhang, Z., Chen, J., Yasunaga, M., and Yang, D. (2023). Is ChatGPT a General-Purpose Natural Language Processing Task Solver?. arXiv Preprint, arXiv:2302.06476. https://arxiv.org/abs/2302.06476 DOI: https://doi.org/10.18653/v1/2023.emnlp-main.85

Rudin, C. (2019). Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature Machine Intelligence, 1, 206–215. https://doi.org/10.1038/s42256-019-0048-x DOI: https://doi.org/10.1038/s42256-019-0048-x

Riaz, M. Q., Butt, W. H., and Rehman, S. (2019). Automatic Detection of Ambiguous Software Requirements: An Insight. In Proceedings of the 5th International Conference on Information Management (ICIM) (1–6), Cambridge, UK. https://doi.org/10.1109/ICIM.2019.00001 DOI: https://doi.org/10.1109/INFOMAN.2019.8714682

Shen, Y., Zhang, R., Jiang, X., Wang, J., and Liu, Y. (2021). Advances in Natural Language Processing for Clinical Text: Applications and Challenges. Journal of Biomedical Informatics, 118, 103799. https://doi.org/10.1016/j.jbi.2021.103799 DOI: https://doi.org/10.1016/j.jbi.2021.103799

Wang, Z., Zhang, Y., Liu, H., Li, Y., and Wu, X. (2020). Log Event2Vec: Log Event-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things. Sensors, 20, 2451. https://doi.org/10.3390/s20082451 DOI: https://doi.org/10.3390/s20092451

Downloads

Published

2025-12-28

How to Cite

Srivastava, R., Ramu K, Gandhi, A., Shambharkar, S. S. ., Ramtirthkar, C. R. ., & L. Lakshmanan. (2025). NLP-BASED FOLK STORY DOCUMENTATION SYSTEMS. ShodhKosh: Journal of Visual and Performing Arts, 6(5s), 197–207. https://doi.org/10.29121/shodhkosh.v6.i5s.2025.6879