Cover
Vol. 2 No. 1 (2026)

Published: June 1, 2026

Pages: 10-16

Original Article

Improving Big Data Accuracy Through Datamining and Classification in Human Resources with Nearest Neighbor Algorithm

Abstract

In the period of digital transformation, oil companies have to cope with management of huge volume staff data from different parts company everything from management, maintenance, engineering and geology to drill teams at heart workers front line standpoint. This research presents a complete study of big data accuracy and classification improvement in K-means Clustering Learning (KCL) management for 20,000 employees in an oil company. Data were auto-generated according to global standards and technical specifications. The data tables for formwork of human resources bars were 90% prepared by file laziness. In fact, the test kernel used in this research is also based on this data. The study focuses on important problems of work such as raising data quality and classification of employees according to various factors including practical experience, education levels technical expertise, competence achieved in performance evaluations (which may change over time) or safety training hours. Our methodology incorporates advanced preprocessing techniques, feature engineering and hyper parameter optimization in order to achieve better classification accuracy. The experimental results show that the optimized KNN algorithm is capable of 94.2 percent accuracy for employee classification, which represents a significant bat improvement over the traditional method. This research offers practical lessons for oil companies employing machine learning techniques in human resources management and improved operational efficiency learning and operational efficiency.

References

  1. D. Chae, J. kang, S. Kim, “Zero-injection meets deep learning: Boosting the accuracy of collaborative filtering in top-n recommendation”, In: International Conference on Database Systems for Advanced Applications. Cham: Springer International Publishing, 2020. p. 607-620. https://doi.org/10.1007/978-3-030-59419-0_37
  2. P. A. R. Azmi, M. Yusoff, and M. T. Mohd Sallehud-din, ""A review of predictive analytics models in the oil and gas industries,"" Sensors, vol. 24, no. 12, p. 4013, 2024. https://doi.org/10.3390/s24124013
  3. A. M. Salem, M. S. Yakoot, and O. Mahmoud, ""Addressing diverse petroleum industry problems using machine learning techniques: literary methodology─ spotlight on predicting well integrity failures,"" ACS Omega, vol. 7, no. 5, pp. 4567-4587, 2022. https://doi.org/10.1021/acsomega.1c05658
  4. I. Triguero, D. García‐Gil, J. Maillo, et al., ""Transforming big data into smart data: An insight on the use of the k‐nearest neighbor’s algorithm to obtain quality data,"" Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 4, p. e1289, 2019. https://doi.org/10.1002/widm.1289
  5. A. H. Ali, M. A. Mohammed, and R. A. Hasan, ""Big data classification based on improved parallel k-nearest neighbor,"" TELKOMNIKA Telecommunication, Computing, Electronics and Control, vol. 21, no. 3, pp. 653-664, 2023. https://doi.org/10.12928/telkomnika.v21i1.24290
  6. J. Madrid and A. Min, ""Reducing Oil Well Downtime with a Machine Learning Recommender System,"" MIT Supply Chain Management Program, 2020.
  7. Y. E. S. Khamis, S. G. El-Rammah, et al., ""Rate of penetration prediction in drilling operation in oil and gas wells by k-nearest neighbors and multi-layer perceptron algorithms,"" Journal of Mining and Environment, vol. 14, no. 2, pp. 577-592, 2023.
  8. A. N. kumari Gameti, and A. P. A Singh, (2024). Innovations in Data Quality Management: Lessons from the Oil & Gas Industry. Int. J. Res. Anal. Rev, 11(3), pp. 889-895
  9. M. Mohammadpoor, and F. Torabi. ""Big Data analytics in oil and gas industry: An emerging trend."" Petroleum, vol. 6, no. 4, pp. 321-328, 2020. https://doi.org/10.1016/j.petlm.2018.11.001
  10. R. K. Halder, M. N. Uddin, M. A. Uddin, S. Aryal, A. Khraisat, ""Enhancing K-nearest neighbor algorithm: a comprehensive review,"" Journal of Big Data, vol. 11, no. 1, 2024. https://doi.org/10.1186/s40537-024-00973-y
  11. F. M. Onyije, B. Hosseini, J. Schüz and A. O. Togawa, ""Cancer incidence and mortality among petroleum industry workers and residents living in oil producing communities: a systematic review and meta-analysis."", International journal of environmental research and public health, vol. 18, Issue 8, 2021. https://doi.org/10.3390/ijerph18084343
  12. M. G. Baker, T. K. Peckham, and N. S Seixas, “Estimating the burden of United States workers exposed to infection or disease: a key factor in containing risk of COVID-19 infection,” PloS one, vol. 15, no.4, 2020. https://doi.org/10.1371/journal.pone.0232452
  13. W. Al-Ali, A. Ameen, O. Isaac, G. S. Khalifa, and A. H. Shibami: The mediating effect of job happiness on the relationship between job satisfaction and employee performance and turnover intentions: A case study on the oil and gas industry in the United Arab Emirates.”, Journal of Business and Retail Management Research, vol. 13, Issue 4, 2019. https://doi.org/10.24052/JBRMR/V13IS04/ART-09
  14. N. Labani, S. Berhil, H. Benlahmar, “A review paper on artificial intelligence at the service of human resources management,” Institute of Advanced Engineering and Science (IAES), vol. 18, no. 1, 2020. https://doi.org/10.11591/ijeecs.v18.i1.pp32-40
  15. R. Colomo-Palacios, J. Hlel, N. B. Yahia, “From big data to deep data to support people analytics for employee attrition prediction,” Institute of Electrical and Electronics Engineers, 2021.
  16. E. Ahmed, “Student performance prediction using machine learning algorithms,” Applied Computational Intelligence and Soft Computing, 2024. https://doi.org/10.1155/2024/4067721.
  17. P. Espadinha-Cruz1, R. Godina1, and E. M. G. Rodrigues, “A review of data mining applications in semiconductor manufacturing,” Multidisciplinary Digital Publishing Institute, vol. 9, Issue 2, 2021. https://doi.org/10.3390/pr9020305
  18. S. Unge, H. Hassani, C. Beneki, M. T. Mazinani, and M. R. Yeganegi, “Text mining in big data analytics,” Multidisciplinary Digital Publishing Institute, vol. 4, Issue 1, 2020. https://doi.org/10.3390/bdcc4010001
  19. Y. Yigal, Y. HaCohen-Kerner, D. Miller, “The influence of preprocessing on text classification using a bag-of-words representation,” Public Library of Science, 2020. https://doi.org/10.1371/journal.pone.0232525
  20. T. Eckhardt, J. Ribeiro, R. Lima, S. Paiva, “Robotic process automation and artificial intelligence in industry 4.0 a literature review”, Elsevier BV, vol. 181, pp. 51-58, 2021. https://doi.org/10.1016/j.procs.2021.01.104