Automatic Knowledge Extraction from Environmental Reports with Reference to Time and Spatial Coordinates of Water Bodies

Authors

  • K. O. Bondalietov Vinnytsia National Technical University
  • V. B. Mokin Vinnytsia National Technical University
  • I. M. Shtelmakh Vinnytsia National Technical University
  • O. V. Slobodianiuk Vinnytsia National Technical University

DOI:

https://doi.org/10.31649/1997-9266-2025-180-3-101-110

Keywords:

knowledge mining, SPO-triplets, artificial intelligence, data georeferencing, water array, large language models, Retrieval-Augmented Generation

Abstract

The paper presents a new method for automatically extracting environmental knowledge from reports and news texts related to facts about the state of river waters or their pollution. Knowledge extraction is carried out taking into account the binding of the obtained facts to the spatial coordinates of specific water bodies and time intervals. The relevance of the work is due to the significant availability of such environmental data in the news, websites of institutions, and social media, and the need for their quick and accurate processing. The proposed method combines the detection of facts about the state of waters or their pollution, recognition of geographical names from the text and headlines, as well as the determination of time features by analyzing the hierarchical structure of the document. The method optimizes the contextual-semantic criterion, which maximizes the completeness and probability of detecting all existing connections between key phrases in the text of facts, time periods and water bodies and, at the same time, minimizes the number of false positive connections between them, by formalizing the connections in the form of “subject–predicate–object” (SPO) triplets and using the Jaccard measure to find the degree of similarity between the lists of key phrases that characterize these facts and water bodies. Knowledge extraction is based on identifying and using the hierarchical structure of the document, using large language models, and actualization the knowledge base with information with Retrieval-Augmented Generation (RAG) for regular knowledge update and binding to the time intervals and spatial coordinates. The result is a structured knowledge base in the form of “fact – water body – time interval” triplets, which can be used to analyze the dynamics of water status, identify trends, and make management decisions to improve the state of surface waters.

The result of applying the proposed method is presented using the example of the annual report on the activities of the Southern Booh River Basin Water Resources Management for 2019, which illustrates its efficiency.

Author Biographies

K. O. Bondalietov, Vinnytsia National Technical University

Post-Graduate Student of the Chair of System Analysis and Information Technologies

V. B. Mokin, Vinnytsia National Technical University

Dr. Sc. (Eng.), Professor, Head of the Chair of System Analysis and Information Technologies

I. M. Shtelmakh, Vinnytsia National Technical University

and. Sc. (Eng.), Assistant Professor of the Chair of System Analysis and Information Technologies

O. V. Slobodianiuk, Vinnytsia National Technical University

Cand. Sc. (Ped.), Associate Professor, Associate Professor of the Chair of Strength of Materials, Theoretical Mechanics and Engineering Graphics

References

Верховна Рада України, «Водний Кодекс України», Постанова ВР № 214/95-ВР від 06.06.95, Відомості Верховної Ради (ВВР), 1995, № 24, ст. 189). [Електронний ресурс]. Режим доступу: http://zakon2.rada.gov.ua/laws/show /213/95-%D0%B2%D1%80 .

Кабінет Міністрів України, Водна стратегія України на період до 2050 року. Розпорядження від 9 грудня 2022 р. № 1134-р. [Електронний ресурс]. Режим доступу: https://zakon.rada.gov.ua/laws/show/1134-2022-%D1%80#Text .

Водна Рамкова Директива ЄС 2000/60/ЄС. Основні терміни та їх визначення. Київ, Україна, 2006, 240 с. [Електронний ресурс]. Режим доступу: http://dbuwr.com.ua/docs/Waterdirect.pdf .

J. Zhu, “A Temporal Knowledge Graph Generation Dataset Supervised Distantly by Large Language Models,” Scientific Data, no. 12, p. 734, 2025. [Electronic resource]. Available: https://doi.org/10.1038/s41597-025-05062-0 .

К. Salmas et al., “Extracting Geographic Knowledge from Large Language Models: An Experiment,” Workshop LM-KBC, 2023, [Electronic resource]. Available: https://lm-kbc.github.io/workshop2023/proceedings/13_Salmas.pdf .

М. Gritta et al., “What’s missing in geographical parsing?” Springer Nature Link. [Electronic resource]. Available: https://link.springer.com/article/10.1007/s10579-017-9385-8 .

A. Halterman “Mordecai 3: A Neural Geoparser,” arXiv, 2023, [Electronic resource]. Available: https://arxiv.org/pdf/2303.13675 .

Hanwen Zheng, et al., “A Comprehensive Survey on Document-Level Information Extraction,” in Proceedings of the Workshop on the Future of Event Detection (FuturED), 2024, pp. 58-72, USA: Association for Computational Linguistics. [Electronic resource]. Available: https://aclanthology.org/2024.futured-1.6.pdf .

J. Dagdelen, et al., “Structured information extraction from scientific text with large language models,” Nature Commun. no. 15, pp.1418, 2024. [Electronic resource]. Available: https://doi.org/10.1038/s41467-024-45563-x .

В. Б. Мокін, К. О. Бондалєтов, Є. М. Крижановський, і В. О. Караваєв, «Метод аугментації текстів про стан масивів вод на основі інтелектуальної прив’язки до багатозв’язних геоінформаційних систем іменованих сутностей», Вісник Вінницького політехнічного інституту, № 3, с. 55-65, 2023. https://doi.org/10.31649/1997-9266-2023-168-3-55-65 .

D. Dessí, et al., “CS-KG 2.0: A Large-scale Knowledge Graph of Computer Science,” Scientific Data, no. 12, pp. 964, 2025. [Electronic resource]. Available: https://doi.org/10.1038/s41597-025-05200-8 .

Yunyi Zhang, “Automated Mining of Structured Knowledge from Text in the Era of Large Language Models,” in KDD‘24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. [Electronic resource]. Available: https://doi.org/10.1145/3637528.3671469 .

Haoran Luo, et al., “Text2NKG: Fine-Grained N-ary Relation Extraction for N-ary relational Knowledge Graph Construction,” Advances in Neural Information Processing Systems 37 (NeurIPS), 2024. [Electronic resource]. Available: https://proceedings.neurips.cc/paper_files/paper/2024/hash/Abstract-Conference.html (date of access: 06.06.2025) .

R. Bommasani, et al. “On the Opportunities and Risks of Foundation Models,” Computer Science, Machine Learning, 2021. [Electronic resource]. Available: https://arxiv.org/abs/2108.07258 .

К. Бондалєтов, і В. Мокін, « Інтелектуальна автоматизація геоприв’язки повідомлень з соцмереж до масивів вод за допомогою зваженої Jaccard-міри,» ВНТКП ВНТУ. Факультет інтелектуальних інформаційних технологій та автоматизації ВНТУ, Вінниця, 24-27 березня 2025. [Електронний ресурс]. Режим доступу: https://conferences.vntu.edu.ua/index.php/all-fksa/all-fksa-2025/paper/view/23298/19275 .

Річний звіт про діяльність басейнового управління водних ресурсів річки Південний Буг з питань управління водними ресурсами за 2019 рік, Вінниця. Україна: БУВР, 2019.

Downloads

Abstract views: 53

Published

2025-06-27

How to Cite

[1]
K. O. Bondalietov, V. B. Mokin, I. M. Shtelmakh, and O. V. Slobodianiuk, “Automatic Knowledge Extraction from Environmental Reports with Reference to Time and Spatial Coordinates of Water Bodies”, Вісник ВПІ, no. 3, pp. 101–110, Jun. 2025.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.

Most read articles by the same author(s)

1 2 3 4 5 6 7 > >>