Perbandingan Kinerja Algoritma Unsupervised Machine Learning untuk Deteksi Anomali dalam Proses ETL

Authors

  • Muhammad Faisal Ashshidiq Universitas Logistik dan Bisnis Internasional
  • Mohammad Nurkamal Fauzan Universitas Logistik dan Bisnis Internasional
  • RN Nuraini Universitas Logistik dan Bisnis Internasional

DOI:

https://doi.org/10.47111/jointecoms.v5i3.22586

Keywords:

deteksi anomali, ETL, unsupervised machine learning, kualitas data, isolation forest

Abstract

Penelitian ini melakukan perbandingan komprehensif terhadap tiga algoritma unsupervised machine learning untuk deteksi anomal dalam proses Extract, Transformasi, Load (ETL). Algoritma yang dibandingkan adalah Isolation Forest, Local Outlier Factor, dan One-Class Support Vector Machine (OC-SVM). Penelitian ini menggunakan dataset dengan struktur nested array yang umum ditemukan pada aplikasi berbasis web dan Internet of Things (IoT). Hasil penelitian menunjukkan bahwa Isolation Forest memberikan performa terbaik dengan nilai F1-Score 0.723, accuracy 0.935, precision 0.567 dan recall 1.00. Local Outlier Factor menunjukkan performa terendah dengan F1-Score 0.221, dan One-Class SVM memberikan performa moderat dengan F1-Score 0.488. Hasil visualisasi menggunakan Principal Component Analysis (PCA) untuk memperkuat temuan dalam memisahkan data normal dan anomali. penelitian ini memberikan kontribusi penting dalam pemilihan algoritma deteksi anomaly yang tepat untuk menjaga kualitas data setelah proses ETL.

Downloads

Download data is not yet available.
DOI: 10.47111/jointecoms.v5i3.22586 DOI URL: https://doi.org/10.47111/jointecoms.v5i3.22586
Views: 0 | Downloads: 0

References

[1] Q. Yang and Y. Tang, “Big Data-based Human Resource Performance Evaluation Model Using Bayesian Network of Deep Learning,” Appl. Artif. Intell., vol. 37, no. 1, p. 2198897, Dec. 2023, doi: 10.1080/08839514.2023.2198897.

[2] J. Ye, “Modeling of performance evaluation of educational information based on big data deep learning and cloud platform,” J. Intell. Fuzzy Syst., vol. 38, no. 6, pp. 7155–7165, June 2020, doi: 10.3233/JIFS-179793.

[3] F. F. Hasan and M. S. A. Bakar, “Data Transformation from SQL to NoSQL MongoDB Based on R Programming Language,” in 2021 5th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2021, pp. 399–403. doi: 10.1109/ISMSIT52890.2021.9604548.

[4] W. Bulowski et al., “Optimization of the ETL titanium dioxide layer for inorganic perovskite solar cells,” J. Mater. Sci., vol. 59, no. 17, pp. 7283–7298, May 2024, doi: 10.1007/s10853-024-09581-w.

[5] L. Dinesh and K. G. Devi, “An efficient hybrid optimization of ETL process in data warehouse of cloud architecture,” J. Cloud Comput., vol. 13, no. 1, p. 12, Jan. 2024, doi: 10.1186/s13677-023-00571-y.

[6] G. Hannák, G. Horváth, A. Kádár, and M. D. Szalai, “BILATERAL‐WEIGHTED Online Adaptive Isolation Forest for anomaly detection in streaming data,” Stat. Anal. Data Min. ASA Data Sci. J., vol. 16, no. 3, pp. 215–223, June 2023, doi: 10.1002/sam.11612.

[7] S. Vats, B. B. Sagar, K. Singh, A. Ahmadian, and B. A. Pansera, “Performance Evaluation of an Independent Time Optimized Infrastructure for Big Data Analytics that Maintains Symmetry,” Symmetry, vol. 12, no. 8, p. 1274, Aug. 2020, doi: 10.3390/sym12081274.

[8] H. Xiang, J. Wang, K. Ramamohanarao, Z. Salcic, W. Dou, and X. Zhang, “Isolation Forest Based Anomaly Detection Framework on Non-IID Data,” IEEE Intell. Syst., vol. 36, no. 3, pp. 31–40, May 2021, doi: 10.1109/MIS.2021.3057914.

[9] A. Herreros-Martínez, R. Magdalena-Benedicto, J. Vila-Francés, A. J. Serrano-López, S. Pérez-Díaz, and J. J. Martínez-Herráiz, “Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes: A Hybrid Approach Using Clustering and Isolation Forest,” Information, vol. 16, no. 3, p. 177, Feb. 2025, doi: 10.3390/info16030177.

[10] M. Nalini, B. Yamini, C. Ambhika, and R. Siva Subramanian, “Enhancing early attack detection: novel hybrid density-based isolation forest for improved anomaly detection,” Int. J. Mach. Learn. Cybern., Nov. 2024, doi: 10.1007/s13042-024-02460-5.

[11] N. Biswas, A. S. Mondal, A. Kusumastuti, S. Saha, and K. C. Mondal, “Automated credit assessment framework using ETL process and machine learning,” Innov. Syst. Softw. Eng., vol. 21, no. 1, pp. 257–270, Mar. 2025, doi: 10.1007/s11334-022-00522-x.

[12] H. Azgomi and M. K. Sohrabi, “A novel coral reefs optimization algorithm for materialized view selection in data warehouse environments,” Appl. Intell., vol. 49, no. 11, pp. 3965–3989, Nov. 2019, doi: 10.1007/s10489-019-01481-w.

[13] H. Fu, “Optimization Study of Multidimensional Big Data Matrix Model in Enterprise Performance Evaluation System,” Wirel. Commun. Mob. Comput., vol. 2021, no. 1, p. 4351944, Jan. 2021, doi: 10.1155/2021/4351944.

[14] T. Mendes, P. J. S. Cardoso, J. Monteiro, and J. Raposo, “Anomaly Detection of Consumption in Hotel Units: A Case Study Comparing Isolation Forest and Variational Autoencoder Algorithms,” Appl. Sci., vol. 13, no. 1, p. 314, Dec. 2022, doi: 10.3390/app13010314.

[15] D. Ribeiro, L. M. Matos, G. Moreira, A. Pilastri, and P. Cortez, “Isolation Forests and Deep Autoencoders for Industrial Screw Tightening Anomaly Detection,” Computers, vol. 11, no. 4, p. 54, Apr. 2022, doi: 10.3390/computers11040054.

Downloads

Published

2025-09-30

How to Cite

Ashshidiq, M. F., Fauzan, M. N., & Nuraini, R. (2025). Perbandingan Kinerja Algoritma Unsupervised Machine Learning untuk Deteksi Anomali dalam Proses ETL. Journal of Information Technology and Computer Science, 5(3), 241–252. https://doi.org/10.47111/jointecoms.v5i3.22586

Similar Articles

<< < 4 5 6 7 8 9 10 11 12 > >> 

You may also start an advanced similarity search for this article.