Rozanova Anna Vyacheslavovna, Student, Department of Big Data Analytics and Video Analysis Methods, Ural Federal University named after the first President of Russia B.N. Yeltsin
Predein Nikita Sergeevich, Student, Department of Big Data Analytics and Video Analysis Methods, Ural Federal University named after the first President of Russia B.N. Yeltsin
Ivliev Trofim Alekseevich, Student, Department of Big Data Analytics and Video Analysis Methods, Ural Federal University named after the first President of Russia B.N. Yeltsin
Tolstov Avdey Tarasovich, Postgraduate Student of the Institute of Radio Electronics and Information Technology – RTF, Ural Federal University named after the first President of Russia B.N. Yeltsin
Saif Mujahed Abdullah Hayel, Senior Lecturer, Department of Big Data Analytics and Video Analysis Methods, Ural Federal University named after the first President of Russia B.N. Yeltsin
Abstract
The purpose of the study is to implement a tool for automatic detection and further normalization of duplicate regulatory and reference information in accounting systems. To solve the problem, a two-level architecture is used, which includes the “HNSW” algorithm for searching for k nearest neighbors, which performs the initial selection of candidates, and the open large language model “Qwen3″ for semantic analysis and development of recommendations for normalization. The developed concept will significantly increase the quality of data stored in ERP/CRM systems, while automating up to 80% of manual operations and reducing operating costs. The research conducted makes a significant contribution to the field of intelligent data processing, offering a targeted solution to one of the most pressing problems of digital transformation.
KEYWORDS: digital transformation of production processes, intelligent approach to automation, normative reference information, normalization of information, semantic analysis, machine learning, large language models, HNSW.
Download article A TOOL FOR NORMALIZING REGULATORY INFORMATION DATA USING A LARGE LANGUAGE MODEL![]()



