Automated Approach for Dating English Text Using Transformer Neural Networks

Authors

  • M. O. Lytvyn National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”
  • L. M. Oleshchenko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

DOI:

https://doi.org/10.31649/1997-9266-2025-180-3-133-139

Keywords:

software natural language processing (NLP), machine learning, transformer neural networks (TNN), transfer learning, BERT, text dating, stylometry, historical text analysis

Abstract

The paper examines the existing methods of text dating using neural networks, highlighting their advantages and limitations. Text dating is a crucial task in fields such as history, archival studies, linguistics, and forensic science, as accurately determining the creation time of a document can help verify its authenticity, establish authorship, and detect forgeries. However, traditional methods based on stylometric or statistical approaches often lack accuracy, especially when dealing with large volumes of text data. This study proposes an approach for dating English-language texts using transformer neural networks. The model achieves an accuracy of 85 % within a 30-year range for texts written between the 15th and 20th centuries, outperforming existing models applied to English text. The core idea of the proposed automated approach is to utilize transfer learning to fine-tune a pre-trained transformer neural network, optimizing it for the classification of text fragments by decade. One key advantage of this approach is the use of transformer architecture, which, through the self-attention mechanism, effectively captures complex relationships within a text. Another significant benefit is the application of transfer learning, which reduces training time and computational resources compared to training a model from scratch. The approach was implemented in Python using the transformers libraries for training and testing the neural network, datasets for working with the dataset, and numpy for the calculations. Experimental results demonstrated high accuracy: 86 % within a 30-year range and 73 % within a 20-year range on the test dataset. For the 19th and 20th centuries, the model achieved an accuracy of 89% and 90%, respectively, while accuracy for earlier centuries was lower, averaging around 30%. The research also examines the possibility of identifying features that indicate a text's association with a specific period by extracting words with the highest attention scores. Future research will focus on improving the accuracy for underrepresented historical periods by expanding and refining the dataset. Further enhancements may be achieved by optimizing model hyperparameters and experimenting with alternative neural network architectures. Another direction for future research is to explore methods for identifying linguistic or stylistic features that mark texts as belonging to a certain historical period, in order to make the neural network's results more interpretable for the user. The proposed approach has potential applications in historical research, document authentication, plagiarism detection, literary studies, and forensic analysis.

Author Biographies

M. O. Lytvyn, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Student of the Department of Applied Mathematics

L. M. Oleshchenko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Cand. Sc. (Eng.), Associate Professor of the Chair of Computer Systems Software

References

Y. Assael, T. Sommerschield, et al, “Restoring and attributing ancient texts using deep neural networks,” Nature 603, pp. 280-283, 2022. https://doi.org/10.1038/s41586-022-04448-z .

Shikhar Vashishth, Shib Sankar Dasgupta, Swayambhu Nath Ray, and Partha Talukdar. “Dating Documents using Graph Convolution Networks,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1 (Long Papers), pp. 1605-1615, Melbourne, Australia. Association for Computational Linguistics. 2018. https://doi.org/10.18653/v1/P18-1149 .

Wahlberg, Fredrik & Wilkinson, Tomas & Brun, Anders, Historical Manuscript Production Date Estimation Using Deep Convolutional Neural Networks, 2016. https://doi.org/10.1109/ICFHR.2016.0048 .

O. Hellwig, “Dating Sanskrit texts using linguistic features and neural networks,” 2019. [Електронний ресурс]. Режим доступу: https://www.academia.edu/53885816/Dating_Sanskrit_texts_using_linguistic_features_and_neural_networks.3073703.

Ashish Vaswani, et al., “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), pp. 6000-6010, 2017. [Electronic resource]. Available: https://dl.acm.org/doi/10.5555/3295222.3295349 .

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 2018. https://doi.org/10.48550/arXiv.1810.04805 .

Project Gutenberg — English Language eBooks. [Electronic resource]. Available: https://huggingface.co/datasets/sedthh/gutenberg_english .

Downloads

Abstract views: 90

Published

2025-06-27

How to Cite

[1]
M. O. . Lytvyn and L. M. Oleshchenko, “Automated Approach for Dating English Text Using Transformer Neural Networks ”, Вісник ВПІ, no. 3, pp. 133–139, Jun. 2025.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.