Enhancing Machine Learning Performance on Imbalanced Resume Data Using SMOTE for Job Candidate Classification

Authors

  • Warissara Vasuarayasak Student, Program in Computer Science and Artificial Intelligence, Faculty of Science, Chadrakasem Rajabhat University, Bangkok 10900, Thailand
  • Natdanai Mangklang Student, Program in Computer Science and Artificial Intelligence, Faculty of Science, Chadrakasem Rajabhat University, Bangkok 10900, Thailand
  • Jirapat Somseang Student, Program in Computer Science and Artificial Intelligence, Faculty of Science, Chadrakasem Rajabhat University, Bangkok 10900, Thailand
  • Chaisiri Sanitphonklang Assistant Professor, Dr., Program in Computer Science and Artificial Intelligence, Faculty of Science, Chadrakasem Rajabhat University, Bangkok 10900, Thailand

DOI:

https://doi.org/10.14456/jcct.2025.4

Keywords:

Class Imbalance, SMOTE, Machine Learning, Resume Classification

Abstract

The problem of data imbalance in machine learning processes is a significant limitation affecting model performance, particularly when the minority class has considerably fewer samples than the majority class. This imbalance causes the model to be biased and less accurate in classification. A widely adopted solution to this issue is using SMOTE, which generates new synthetic samples for the minority class by calculating the distance between existing data points. This research presents the application of SMOTE to improve the performance of machine learning models, specifically Decision Trees, Random Forests, and K-NN, on imbalanced datasets. Results indicate that the accuracy of the models improved from 0.83, 0.86, and 0.80 to 0.84, 0.88, and 0.82, respectively; there is statistical significance after applying SMOTE. This demonstrates the effectiveness of the technique in addressing data imbalance issues. The findings confirm that SMOTE enhances the ability to classify minority class data, resulting in models that are more accurate and suitable for applications in contexts where data imbalance is a concern. By helping reduce the time spent on personnel selection for the recruitment department.

Downloads

References

Akkharatwiwatthanathorn S. (2020). Artificial Intelligence Innovation with Recruitment. [Master's dissertation, Mahidol University]. CMMU Digital Archive. https://archive.cm.mahidol.ac.th/handle/123456789/3777. (In Thai)

Ali, I., Mughal, N., Khan, Z. H., Ahmed, J., & Mujtaba, G. (2022). Resume Classification System using Natural Language Processing and Machine Learning Techniques. Mehran University Research Journal of Engineering and Technology, 41(1), 65–79. https://doi.org/10.22581/muet1982.2201.07.

Aubaidan, B. H., Kadir, R. A., & Ijab, M. T. (2024). A Comparative Analysis of Smote and CSSF Techniques for Diabetes Classification Using Imbalanced Data. Journal of Computer Science, 20(9), 1146–1165. https://doi.org/10.3844/jcssp.2024.1146.1165

Boonchob, T. (2022). Job-candidate Classifying and Ranking System with Machine Learning Method. [Master's dissertation, Chulalongkorn University]. Chula Digital Collections. https://digital.car.chula.ac.th/chulaetd/5817. (In Thai)

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953.

Cheng, D., Zhang, S., Deng, Z., Zhu, Y., & Zong, M. (2014). kNN Algorithm with Data-Driven k Value. In Luo, X., Yu, J. X., & Li, Z. (Eds), Advanced Data Mining and Application (8933, 499–512). Springer International Publishing. https://doi.org/10.1007/978-3-319-14717-8_39.

Gehrke, J., Ramakrishnan, R., & Ganti, V. (2000). RainForest-A Framework for Fast Decision Tree Construction of Large Datasets. Data Mining and Knowledge Discovery, 4(2/3), 127–162. https://doi.org/10.1023/A:1009839829793.

Maurya, L. S., Hussain, M. S., & Singh, S. (2022). Machine Learning Classification Models for Student Placement Prediction Based on Skills. International Journal of Artificial Intelligence and Soft Computing, 7(3), 194–207. https://doi.org/10.1504/IJAISC.2022.10051214.

Modak, S., Shinde, P., Tiwari, A., & Nalamwar, S. (2024). A Review of Resume Analysis and Job Description Matching Using Machine Learning. International Journal on Recent and Innovation Trends in Computing and Communication, 12(2), 247–250.

Patel, B. R., & Rana, K. K. (2014). A Survey on Decision Tree Algorithm for Classification. International Journal of Engineering Development and Research, 2(1), 1–5.

Patil, P. R. (2023). Resume Classification-based on Personality using Machine Learning Algorithm. International Journal of Scientific and Research Publications, 13(2), 335–341. https://doi.org/10.29322/IJSRP.13.02.2023.p13440.

Posri, N. (2024). Machine Learning-Based Multiclass Classification for Predicting the Cumulative Blood Sugar Levels in Type 2 Diabetes Patient [Master's dissertation, Sukhothai Thammathirat Open University]. STOUIR at Sukhothai Thammathirat Open University. https://ir.stou.ac.th/handle/123456789/13029. (In Thai)

Rezki, M. K., Mazdadi, M. I., Indriani, F., Muliadi, M., Saragih, T. H., & Athavale, V. A. (2024). Application Of SMOTE To Address Class Imbalance in Diabetes Disease Classification Utilizing C5.0, Random Forest, And SVM. Journal of Electronics, Electromedical Engineering, and Medical Informatics, 6(4), 343–354. https://doi.org/10.35882/jeeemi.v6i4.434.

Sanitphonklang, C. (2023). The Identification of Root and Stem Rot Disease of Durian Trees by Algorithm Decision Tree. Journal of Science Ladkrabang, 32(1), 72–87. (In Thai)

Simarmata, J. E., Weber, G.-W., & Chrisinta, D. (2024). Performance Evaluation of Classification Methods on Big Data: Decision Trees, Naive Bayes, K-Nearest Neighbors, and Support Vector Machines. Jurnal Matematika, Statistika Dan Komputasi, 20(3), 623–638. https://doi.org/10.20956/j.v20i3.32970.

Sun, S., Zhou, X., Wei, J., Xiao, Y., & Wang, J. (2023, November 17-19). An Optimization of SMOTE for Anomaly Detection Based on High Contribution Sample Screening. 2023 China Automation Congress (CAC), 2010–2014. https://doi.org/10.1109/CAC59555.2023.10451412.

Valen-Dacanay, J. G., & Palaoag, T. D. (2023, March 18-20). Exploring The Learning Analytics of Skill-Based Course Using Machine Learning Classification Models. 2023 11th International Conference on Information and Education Technology (ICIET), 411–415. https://doi.org/10.1109/ICIET56899.2023.10111210.

Downloads

Published

20-04-2025

How to Cite

Vasuarayasak, W., Mangklang, N., Somseang, J., & Sanitphonklang, C. (2025). Enhancing Machine Learning Performance on Imbalanced Resume Data Using SMOTE for Job Candidate Classification. Journal of Computer and Creative Technology, 3(1), 38–48. https://doi.org/10.14456/jcct.2025.4