DEVELOPMENT OF A DEEP LEARNING MODEL FOR FRAUD DETECTION

Serhiy Sveleba, Ivan Katerynchuk, Ivan Kunyo, Ihor Polovynko, Yaroslav Shmyhelskyy, Ostap Sumyliak

Abstract


Background. The rapid growth of electronic payments has intensified fraudulent activity, requiring adaptive anomaly detection methods. Traditional rule-based approaches lack flexibility and fail to generalize to previously unseen attacks. In contrast, unsupervised deep learning models, particularly autoencoders, can learn intrinsic data representations and detect anomalies without labeled attack samples. This study evaluates three unsupervised architectures – Autoencoder with Gaussian Mixture Model (AEGMM), Variational Autoencoder with Gaussian Mixture Model (VAEGMM), and a Deep Autoencoder – for network anomaly detection.

Materials and Methods. Experiments were conducted using the KDD’99 (10%) benchmark dataset. Categorical features were transformed using one-hot encoding, while numerical features were standardized. All models were trained exclusively on normal traffic samples following a one-class learning paradigm. The experimental pipeline included preprocessing, model implementation in Python using TensorFlow and the Alibi Detect framework, percentile-based threshold calibration, and evaluation using accuracy, precision, recall, F1-score, and confusion matrices.

Results. AEGMM achieved the highest performance with an F1-score of 0.9936 and an accuracy of 0.9908, demonstrating near-perfect separation between normal and malicious samples. VAEGMM reached an F1-score of 0.9751, showing stable convergence but slightly reduced accuracy due to the stochastic latent space. The Deep Autoencoder achieved approximately 97.5% accuracy, confirming the effectiveness of reconstruction-based methods without probabilistic density estimation. The optimal anomaly threshold, defined at the 99th percentile of reconstruction or density scores, ensured reliable discrimination between normal and attack states.

Conclusion. Autoencoder-based unsupervised models are effective for anomaly detection in large, imbalanced tabular datasets. AEGMM outperformed alternative architectures due to its stable latent representation and deterministic optimization. The proposed approach is suitable for financial fraud detection, cybersecurity monitoring, and industrial anomaly detection. Future work will explore transformer-based models and Explainable AI to improve robustness and interpretability.

Keywords: artificial intelligence; deep learning; autoencoder; Gaussian mixture model; financial fraud; cybersecurity.


Full Text:

PDF

References


[1] Self‑Supervised Contrastive Pre‑Training for Time Series via Time‑Frequency Consistency (X. Zhang et al., NeurIPS 2022). OpenReview: https://openreview.net/forum?id=OJ4mMfGKLN

[2] Financial fraud detection using graph neural networks: A systematic review (Motie & Raahemi, 2024), Expert Systems With Applications, [ScienceDirect], DOI: https://doi.org/10.1016/j.eswa.2023.122156.

[3] Detecting Anomalies in Financial Data Using Machine Learning (Bakumenko & Elragal, 2022) - Systems, 2022, 10(5), 130.

[4] A Systematic Study of Online Class Imbalance Learning with Concept Drift - Shuo Wang, L. L. Minku, Xin Yao. ArXiv, 2017.

[5] Aljunaid S.K., Almheiri S.J., Dawood H., Khan M.A. «Secure and Transparent Banking: Explainable AI-Driven Federated Learning Model for Financial Fraud Detection» Journal of Risk and Financial Management, 2025, 18(4):179. MDPI. DOI: 10.3390/jrfm18040179

[6] IEEE-CIS Fraud Detection Official Dataset on Kaggle. Available at: https://www.kaggle.com/competitions/ieee-fraud-detection

[7] Elliptic Company Official Page with Dataset Description. Available at:
https://www.elliptic.co/media-center/elliptic-releases-bitcoin-transactions-data

[8] J. Wen та ін., «An imbalanced learning method based on graph transactions for fraud detection», Scientific Reports, 2024. DOI:10.1038/s41598-024-67550-4.

[9] Medium-Verzi V., “Understanding Model Evaluation Metrics in Fraud Detection: Beyond Accuracy”, 1 Oct 2025. https://medium.com/@valeria.verzi1/understanding-model-evaluation-metrics-in-fraud-detection-beyond-accuracy-52b224ac0418

[10] Hyphatia: A Card-Not-Present Fraud Detection System Based on Self-Supervised Tabular Learning. A study on self-supervised learning approaches for CNP fraud detection.

[11] Secure and Transparent Banking: Explainable AI-Driven Federated Learning Model for Financial Fraud Detection – S. K. Aljunaid et al., Journal of Risk and Financial Management, 2025.

[12] KDD Cup 1999 Data. https://www.kaggle.com/datasets/galaxyh/kdd-cup-1999-data




DOI: http://dx.doi.org/10.30970/eli.32.7

Refbacks

  • There are currently no refbacks.