DETECTION OF DISCREPANCIES IN BILINGUAL CLINICAL TRIAL REGISTRIES USING RULE-BASED MAPPING AND NATURAL LANGUAGE PROCESSING
Анотація
Accurate and consistent clinical trial registries are important elements of clinical studies. However, errors and inconsistencies in trial documentation are common, particularly when support of multiple regions is required and local regulatory authorities maintain parallel registries using different languages and data standards. In this paper, an application designed to detect discrepancies between Ukrainian and English versions of clinical trial records is presented. The system combines structured field mapping and rule-based algorithms with modern natural language processing techniques to compare unstructured text. This combination allows the identification of semantic similarity in the fields which cannot be compared using standard methods. The proposed approach was evaluated on a set of records about clinical trials in Ukraine. The system successfully identified numerous discrepancies, including inaccuracies in patient eligibility criteria. This outcome demonstrates that a combination of deterministic algorithms with large language models (LLMs) can provide significant improvements to the quality of clinical trial documentation. Ultimately, the proposed approach contributes to the overall quality and integrity of medical research documentation and creates better grounds for further training of LLMs and developments in this field.
Повний текст:
PDF (English)DOI: http://dx.doi.org/10.30970/vam.2025.35.13752
Посилання
- Поки немає зовнішніх посилань.
