Perbandingan Kinerja Algoritma Unsupervised Machine Learning untuk Deteksi Anomali dalam Proses ETL
DOI:
https://doi.org/10.47111/jointecoms.v5i3.22586Keywords:
deteksi anomali, ETL, unsupervised machine learning, kualitas data, isolation forestAbstract
Penelitian ini melakukan perbandingan komprehensif terhadap tiga algoritma unsupervised machine learning untuk deteksi anomal dalam proses Extract, Transformasi, Load (ETL). Algoritma yang dibandingkan adalah Isolation Forest, Local Outlier Factor, dan One-Class Support Vector Machine (OC-SVM). Penelitian ini menggunakan dataset dengan struktur nested array yang umum ditemukan pada aplikasi berbasis web dan Internet of Things (IoT). Hasil penelitian menunjukkan bahwa Isolation Forest memberikan performa terbaik dengan nilai F1-Score 0.723, accuracy 0.935, precision 0.567 dan recall 1.00. Local Outlier Factor menunjukkan performa terendah dengan F1-Score 0.221, dan One-Class SVM memberikan performa moderat dengan F1-Score 0.488. Hasil visualisasi menggunakan Principal Component Analysis (PCA) untuk memperkuat temuan dalam memisahkan data normal dan anomali. penelitian ini memberikan kontribusi penting dalam pemilihan algoritma deteksi anomaly yang tepat untuk menjaga kualitas data setelah proses ETL.
Downloads
References
[1] Q. Yang and Y. Tang, “Big Data-based Human Resource Performance Evaluation Model Using Bayesian Network of Deep Learning,” Appl. Artif. Intell., vol. 37, no. 1, p. 2198897, Dec. 2023, doi: 10.1080/08839514.2023.2198897.
[2] J. Ye, “Modeling of performance evaluation of educational information based on big data deep learning and cloud platform,” J. Intell. Fuzzy Syst., vol. 38, no. 6, pp. 7155–7165, June 2020, doi: 10.3233/JIFS-179793.
[3] F. F. Hasan and M. S. A. Bakar, “Data Transformation from SQL to NoSQL MongoDB Based on R Programming Language,” in 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2021, pp. 399–403. doi: 10.1109/ISMSIT52890.2021.9604548.
[4] W. Bulowski et al., “Optimization of the ETL titanium dioxide layer for inorganic perovskite solar cells,” J. Mater. Sci., vol. 59, no. 17, pp. 7283–7298, May 2024, doi: 10.1007/s10853-024-09581-w.
[5] L. Dinesh and K. G. Devi, “An efficient hybrid optimization of ETL process in data warehouse of cloud architecture,” J. Cloud Comput., vol. 13, no. 1, p. 12, Jan. 2024, doi: 10.1186/s13677-023-00571-y.
[6] G. Hannák, G. Horváth, A. Kádár, and M. D. Szalai, “BILATERAL‐WEIGHTED Online Adaptive Isolation Forest for anomaly detection in streaming data,” Stat. Anal. Data Min. ASA Data Sci. J., vol. 16, no. 3, pp. 215–223, June 2023, doi: 10.1002/sam.11612.
[7] S. Vats, B. B. Sagar, K. Singh, A. Ahmadian, and B. A. Pansera, “Performance Evaluation of an Independent Time Optimized Infrastructure for Big Data Analytics that Maintains Symmetry,” Symmetry, vol. 12, no. 8, p. 1274, Aug. 2020, doi: 10.3390/sym12081274.
[8] H. Xiang, J. Wang, K. Ramamohanarao, Z. Salcic, W. Dou, and X. Zhang, “Isolation Forest Based Anomaly Detection Framework on Non-IID Data,” IEEE Intell. Syst., vol. 36, no. 3, pp. 31–40, May 2021, doi: 10.1109/MIS.2021.3057914.
[9] A. Herreros-Martínez, R. Magdalena-Benedicto, J. Vila-Francés, A. J. Serrano-López, S. Pérez-Díaz, and J. J. Martínez-Herráiz, “Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes: A Hybrid Approach Using Clustering and Isolation Forest,” Information, vol. 16, no. 3, p. 177, Feb. 2025, doi: 10.3390/info16030177.
[10] M. Nalini, B. Yamini, C. Ambhika, and R. Siva Subramanian, “Enhancing early attack detection: novel hybrid density-based isolation forest for improved anomaly detection,” Int. J. Mach. Learn. Cybern., Nov. 2024, doi: 10.1007/s13042-024-02460-5.
[11] N. Biswas, A. S. Mondal, A. Kusumastuti, S. Saha, and K. C. Mondal, “Automated credit assessment framework using ETL process and machine learning,” Innov. Syst. Softw. Eng., vol. 21, no. 1, pp. 257–270, Mar. 2025, doi: 10.1007/s11334-022-00522-x.
[12] H. Azgomi and M. K. Sohrabi, “A novel coral reefs optimization algorithm for materialized view selection in data warehouse environments,” Appl. Intell., vol. 49, no. 11, pp. 3965–3989, Nov. 2019, doi: 10.1007/s10489-019-01481-w.
[13] H. Fu, “Optimization Study of Multidimensional Big Data Matrix Model in Enterprise Performance Evaluation System,” Wirel. Commun. Mob. Comput., vol. 2021, no. 1, p. 4351944, Jan. 2021, doi: 10.1155/2021/4351944.
[14] T. Mendes, P. J. S. Cardoso, J. Monteiro, and J. Raposo, “Anomaly Detection of Consumption in Hotel Units: A Case Study Comparing Isolation Forest and Variational Autoencoder Algorithms,” Appl. Sci., vol. 13, no. 1, p. 314, Dec. 2022, doi: 10.3390/app13010314.
[15] D. Ribeiro, L. M. Matos, G. Moreira, A. Pilastri, and P. Cortez, “Isolation Forests and Deep Autoencoders for Industrial Screw Tightening Anomaly Detection,” Computers, vol. 11, no. 4, p. 54, Apr. 2022, doi: 10.3390/computers11040054.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Information Technology and Computer Science

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.