Hybrid Approach to Searching and Processing of Complex Structured Big Data for Building an Integrated Algorithm for Ukraine’s Cultural Heritage Analyzing

Authors

  • N. O. Shibaeva National University “Odesa Polytechnic"
  • D. S. Shibaev Private professional educational institution “Odessa College of Computer Technologies and Design “Server”
  • S. I. Grishin National University “Odesa Polytechnic
  • M. D. Rudnichenko National University “Odesa Polytechnic”
  • V. V. Vychuzhanin National University “Odesa Polytechnic"

Keywords:

cultural heritage, data processing, data analysis, Knowledge Graph, NER, BigData, digital archives

Abstract

The issue of preserving and analyzing Ukraine’s cultural heritage requires the development of the advanced intelligent tools capable of processing complex, multimodal, and heterogeneous data. Traditional methods of information retrieval and analysis often fail to account for the multilingual nature of archives, the presence of handwritten and poorly digitized documents, historical variations in terminology, and the necessity of fact verification, which significantly reduces the effectiveness of data integration from diverse sources. To address these challenges, this study proposes a hybrid approach that combines multilevel web parsing, optical and handwritten text recognition (OCR/HTR), natural language processing (NLP) techniques, mechanisms for detecting duplicates and unreliable facts, and the construction of a knowledge graph employing clustering algorithms, PageRank, Apriori, and ARIMA. A distinctive feature of the proposed system is an adaptive search module enabling automated extraction, structuring, and verification of data, as well as an interactive map with geospatial visualization of cultural heritage figures, implemented using the Leaflet library and OpenStreetMap technologies. The architecture of the system supports multilayer data processing — from normalization, lemmatization, and named entity recognition to semantic analysis, associative search, and predictive modeling of cultural and historical dynamics. Computational experiments confirmed the efficiency and scalability of the approach, demonstrating stable system performance in real-time conditions. The obtained results highlight the potential of the developed model as the foundation for a unified national information and retrieval system for Ukraine’s cultural heritage. The practical value of this hybrid framework extends to museum studies, archival science, education, and digital humanities research, ensuring standardized access to cultural data, enhancing analytical reliability, and fostering the integration of Ukrainian heritage into the global digital ecosystem. Further development of the system may involve the incorporation of multimodal data sources such as 3D models, audio archives, and blockchain-based provenance verification to strengthen data authenticity and long-term digital preservation.

Author Biographies

N. O. Shibaeva, National University “Odesa Polytechnic"

Cand. Sc. (Eng.), Associate Professor, Associate Professor of the Department of Information Technologies

D. S. Shibaev, Private professional educational institution “Odessa College of Computer Technologies and Design “Server”

Lecturer

S. I. Grishin, National University “Odesa Polytechnic

Cand. Sc. (Eng.), Associate Professor, Associate Professor of the Chair of Information Technologies

M. D. Rudnichenko, National University “Odesa Polytechnic”

Cand. Sc. (Eng.), Associate Professor, Associate Professor of the Chair of Information Technologies

V. V. Vychuzhanin, National University “Odesa Polytechnic"

Dr. Sc. (Eng.), Professor, Head of the Chair of Information Technologies

References

S. Barzaghi, A. Moretti, I. Heibi, and S. Peroni, “CHAD-KG: A knowledge graph for representing cultural heritage objects and digitisation paradata,” arXiv preprint, 2025. [Electronic resource]. Available: https://arxiv.org/abs/2505.13276. Accessed: 09-Oct-2025.

M. T. Biagetti, “An ontological model for the integration of cultural heritage information: CIDOC-CRM,” Italian Journal of Library, 2016. [Electronic resource]. Available: https://www.cidoc-crm.org/Resources/an-ontological-model-for-the-integration-of-cultural-heritage-information-cidoc-crm. Accessed: 09-Oct-2025.

H. El-Hajj and M. Valleriani, “Representing and validating cultural heritage knowledge graphs in CIDOC-CRM ontology,” Future Internet, vol. 13, no. 11, p. 277, 2021, https://doi.org/10.3390/fi13110277 .

M. Puren, and P. Vernus, “Towards a domain ontology for the analysis of ancient fabrics: The SILKNOW Project and the case of European silk heritage,” arXiv preprint, 2021. [Electronic resource]. Available: https://arxiv.org/abs/2112.15341. Accessed: 09-Oct-202.

P. Fafalios, A. Kritsotaki, and M. Doerr, “The SeaLiT Ontology — an Extension of CIDOC-CRM for the Modelling and Integration of Maritime History Information,” arXiv preprint, 2023. [Electronic resource]. Available: https://arxiv.org/abs/2301.04493. Accessed: 09-Oct-2025.

Z. Wang, and H. Song, “A fusion model for artwork identification based on convolutional neural networks and transformers,” arXiv preprint, 2025. [Electronic resource]. Available: https://arxiv.org/abs/2502.18083. Accessed: 09-Oct-2025.

T. Fan, H. Wang, and S. Deng, “Intangible cultural heritage image classification with multimodal attention and hierarchical fusion,” Expert Systems with Applications, vol. 231, 2023, https://doi.org/10.1016/j.eswa.2023.120555 .

H. El-Hajj, and M. Valleriani, “CIDOC2VEC: Extracting information from atomized CIDOC-CRM humanities knowledge graphs,” Information, vol. 12, no. 12, p. 503, 2021, https://doi.org/10.3390/info12120503 .

Abstract views: 1

Published

2026-02-07

How to Cite

[1]
N. O. Shibaeva, D. S. Shibaev, S. I. Grishin, M. D. Rudnichenko, and V. V. Vychuzhanin, “Hybrid Approach to Searching and Processing of Complex Structured Big Data for Building an Integrated Algorithm for Ukraine’s Cultural Heritage Analyzing”, Вісник ВПІ, no. 6, pp. 127–138, Feb. 2026.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.