Vol 10 No 1 (2025): June (In Progress)
Computer Science

Developing a Prediction Model to Identify Blood Types Most Susceptible to Viral Hepatitis Based on the CRISP-DM Methodology
Mengembangkan Model Prediksi untuk Mengidentifikasi Golongan Darah yang Paling Rentan terhadap Virus Hepatitis Berdasarkan Metodologi CRISP-DM


Esraa Hameed Kamel
Waist University, college of finearts ,waist , Iraq *

(*) Corresponding Author
Picture in here are illustration from public domain image or provided by the author, as part of their works
Published March 6, 2025
Keywords
  • Random Forest algorithm,
  • Hepatitis B and C,
  • KNN algorithm,
  • Blood Groups,
  • Decision Tree algorithm,
  • support vector machine algorithm,
  • ,XG-Boost algorithm,
  • neural network algorithm
  • ...More
    Less
How to Cite
Kamel, E. H. (2025). Developing a Prediction Model to Identify Blood Types Most Susceptible to Viral Hepatitis Based on the CRISP-DM Methodology. Academia Open, 10(1), 10.21070/acopen.10.2025.10740. https://doi.org/10.21070/acopen.10.2025.10740

Abstract

General Background: Viral hepatitis is a prevalent disease worldwide, with hepatitis B and C posing significant public health challenges. While most cases resolve naturally, chronic infections contribute to severe complications. Specific Background: Genetic predisposition, including blood type, has been hypothesized as a risk factor for viral hepatitis; however, its role remains unclear. Knowledge Gap: Limited studies have analyzed the association between ABO blood groups and susceptibility to hepatitis B and C using machine learning techniques. Aims: This study aims to determine the blood groups most susceptible to hepatitis B and C by applying advanced machine learning models. Results: Using a dataset of 500 patients and CRISP-DM methodology, the findings indicate that blood type B has the highest susceptibility (38% infection rate), while type O shows the lowest risk (15%). Statistical analysis (Chi-square, p < 0.01) confirms a significant correlation between blood group B and hepatitis infection. The XG-Boost model achieved the highest predictive accuracy (91%), identifying blood type B as the second most influential risk factor after age. Novelty: This study provides empirical evidence linking genetic factors, particularly blood type B, with hepatitis susceptibility using data-driven approaches. Implications: The findings highlight the importance of blood type screening in high-risk populations and the necessity of targeted prevention strategies.

Highlights:

 

  1. Blood type may influence susceptibility to hepatitis B and C.
  2. Blood type B shows highest risk; XG-Boost model achieves 91% accuracy.
  3. Blood type screening aids early detection and targeted prevention strategies.

Keyword: Random Forest algorithm,  Hepatitis B and C, KNN algorithm, Blood Groups, Decision Tree algorithm, support vector machine algorithm ,XG-Boost algorithm, neural network algorithm.

 

Downloads

Download data is not yet available.

Metrics

No metrics found.

References

  1. “Global progress report on HIV, viral hepatitis and sexually transmitted infections, 2021.” [Online]. Available: https://www.who.int/publications/i/item/9789240027077
  2. L. Zhang, Y., & Fang, “ABO Blood Group and HCV Infection Risk: A Meta-Analysis of Asian Populations,” Wiley Online Libr., vol. 51, no. 9, pp. 789–800, 2021, [Online]. Available: https://doi.org/10.1111/hepr.13689
  3. Kaggle Dataset, “Hepatitis C Patient Records.” [Online]. Available: https://www.kaggle.com/datasets/fedesoriano/hepatitis-c-dataset
  4. E. Abad-Segura, M. D. González-Zamar, J. C. Infante-Moro, and G. R. García, “Sustainable management of digital transformation in higher education: Global research trends,” Sustain., vol. 12, no. 5, 2020, doi: 10.3390/su12052107.
  5. I. Alwiah, U. Zaky, and A. W. Murdiyanto, “Assessing the Predictive Power of Logistic Regression on Liver Disease Prevalence in the Indian Context,” Indones. J. Data Sci., vol. 5, no. 1, pp. 1–7, 2024, doi: 10.56705/ijodas.v5i1.121.
  6. S. Tokala et al., “Liver Disease Prediction and Classification using Machine Learning Techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 2, pp. 871–878, 2023, doi: 10.14569/IJACSA.2023.0140299.
  7. H. Ding, M. Fawad, X. Xu, and B. Hu, “A framework for identification and classification of liver diseases based on machine learning algorithms,” Front. Oncol., vol. 12, no. October, pp. 1–7, 2022, doi: 10.3389/fonc.2022.1048348.
  8. H. Mamdouh Farghaly, M. Y. Shams, and T. Abd El-Hafeez, “Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt,” Knowl. Inf. Syst., vol. 65, no. 6, pp. 2595–2617, 2023, doi: 10.1007/s10115-023-01851-4.
  9. M. Badawy, N. Ramadan, and H. A. Hefny, “Healthcare predictive analytics using machine learning and deep learning techniques: a survey,” J. Electr. Syst. Inf. Technol., vol. 10, no. 1, 2023, doi: 10.1186/s43067-023-00108-y.
  10. J. M. j. Herps, H. H. Van Mal, J. I. m. Halman, J. H. m. Martens, and R. H. m. Borsboom, “The process of selecting technology development projects: a practical framework,” Manag. Res. News, vol. 26, no. 8, pp. 1–15, 2003, doi: 10.1108/01409170310783619.
  11. M. A. Jassim and S. N. Abdulwahid, “Data Mining preparation: Process, Techniques and Major Issues in Data Analysis,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1090, no. 1, p. 012053, 2021, doi: 10.1088/1757-899x/1090/1/012053.
  12. T. Wongvorachan and S. He, “A Comparison of Undersampling , Oversampling , and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” 2023.
  13. M. Sumaiya, A. Mim, J. Nayeem, and S. Rana, “A Predictive Approach for Hepatitis Disease Diagnosis in Early Stage using Machine Learning Techniques,” no. January, 2024, doi: 10.2139/ssrn.4691067.
  14. R. Shouval, O. Bondi, H. Mishan, A. Shimoni, R. Unger, and A. Nagler, “Application of machine learning algorithms for clinical predictive modeling: A data-mining approach in SCT,” Bone Marrow Transplant., vol. 49, no. 3, pp. 332–337, 2014, doi: 10.1038/bmt.2013.146.
  15. D. Martin and W. Powers, “Evaluation : From precision , recall and F-measure to ROC , informedness , markedness & correlation EVALUATION : FROM PRECISION , RECALL AND F-MEASURE TO ROC , INFORMEDNESS , MARKEDNESS & CORRELATION,” no. January 2011, 2015, doi: 10.9735/2229-3981.