Title
Author
DOI
Article Type
Special Issue
Volume
Issue
An explainable hybrid deep learning model for prediabetes prediction in men aged 30 and above
1Department of Digital Anti-Aging Healthcare (BK21), Inje University, 50834 Gimhae, Republic of Korea
2Department of AI-Software, Inje University, 50834 Gimhae, Republic of Korea
DOI: 10.22514/jomh.2024.166 Vol.20,Issue 10,October 2024 pp.52-72
Submitted: 30 May 2024 Accepted: 16 August 2024
Published: 30 October 2024
*Corresponding Author(s): Younsung Choi E-mail: cys2020@inje.ac.kr
*Corresponding Author(s): Haewon Byeon E-mail: bhwpuma@naver.com
† These authors contributed equally.
Men diagnosed with type 2 diabetes at age 30 have a significantly reduced life expectancy compared to their non-diabetic peers. Prediabetes, like diabetes, not only heightens the risk of microvascular complications but also significantly increases the likelihood of cardiovascular diseases and overall mortality. Early detection and intervention in prediabetes can prevent these complications and delay or avoid the progression to diabetes. However, prediabetes prediction remains a challenging area due to biased accuracy and a lack of explainability in many existing machine learning methods. To address these issues, this study aims to develop a novel hybrid model, HyperTab-LIME, which combines a hypernetwork-based deep learning approach designed for small tabular datasets (HyperTab) with the Local Interpretable Model-Agnostic Explanations (LIME) framework to predict prediabetes in men aged over 30. We employed data from 1527 male participants aged 30 and above from the 2022 Korean National Health and Nutrition Examination Survey to train and test our model. The performance of HyperTab-LIME was evaluated against several baseline models. The results demonstrated that the HyperTab-LIME model excelled, achieving the Area Under the Receiver Operating Characteristic Curve (AUC) scores of 0.8429 on the validation set and 0.8300 on the hold-out set, thereby outperforming other baseline models included in this study. To provide a dependable and interpretable diagnostic tool for healthcare professionals, the optimized HyperTab model was integrated with the LIME framework. This innovative approach ensures detailed and precise explanations of predictive outcomes, greatly enhancing the medical community’s ability to understand and trust the model’s decisions, thereby facilitating earlier interventions in the treatment of prediabetes.
Explainable AI; HyperTab; LIME; Prediabetes
Hung Viet Nguyen,Younsung Choi,Haewon Byeon. An explainable hybrid deep learning model for prediabetes prediction in men aged 30 and above. Journal of Men's Health. 2024. 20(10);52-72.
[1] Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan BB, et al. IDF diabetes atlas: global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Research and Clinical Practice. 2022; 183: 109119.
[2] World Health Organization. Global report on diabetes. 1st edn. World Health Organization: Geneva. 2016.
[3] NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants. The Lancet. 2016; 387: 1513–1530.
[4] World Health Organization. Global health estimates: life expectancy and leading causes of death and disability. 2021. Available at: https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates (Accessed: 27 May 2024).
[5] World Health Organization. Global health estimates: leading causes of DALYs. 2021. Available at: https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates/global-health-estimates-leading-causes-of-dalys (Accessed: 27 May 2024).
[6] Gwon YG, Han SJ, Kim KH. Trends in the quality of primary care and acute care in Korea from 2008 to 2020: a cross-sectional study. Journal of Preventive Medicine and Public Health. 2023; 56: 248–254.
[7] Oh J, Kim S, Lee M, Rhee SY, Kim MS, Shin J-Y, et al. National and regional trends in the prevalence of type 2 diabetes and associated risk factors among Korean adults, 2009–2021. Scientific Reports. 2023; 13: 16727.
[8] Jung C-H, Son JW, Kang S, Kim WJ, Kim H-S, Kim HS, et al. Diabetes fact sheets in Korea, 2020: an appraisal of current status. Diabetes & Metabolism Journal. 2021; 45: 1–10.
[9] Bae JH, Han K-D, Ko S-H, Yang YS, Choi JH, Choi KM, et al. Diabetes fact sheet in Korea 2021. Diabetes & Metabolism Journal. 2022; 46: 417–426.
[10] Kaptoge S, Seshasai S, Sun L, Walker M, Bolton T, Spackman S, et al. Life expectancy associated with different ages at diagnosis of type 2 diabetes in high-income countries: 23 million person-years of observation. The Lancet Diabetes & Endocrinology. 2023; 11: 731–742.
[11] American Diabetes Association. Diagnosis and classification of diabetes mellitus. Diabetes Care. 2013; 37: S81–S90.
[12] ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 2. Classification and diagnosis of diabetes: standards of care in diabetes—2023. Diabetes Care. 2022; 46: S19–S40.
[13] Tabák AG, Herder C, Rathmann W, Brunner EJ, Kivimäki M. Prediabetes: a high-risk state for diabetes development. The Lancet. 2012; 379: 2279–2290.
[14] Magliano DJ, Boyko EJ, IDF diabetes atlas 10th edition scientific committee. IDF diabetes atlas. 10th edn. International Diabetes Federation: Brussels. 2021.
[15] Coutinho M, Gerstein HC, Wang Y, Yusuf S. The relationship between glucose and incident cardiovascular events. A metaregression analysis of published data from 20 studies of 95,783 individuals followed for 12.4 years. Diabetes Care. 1999; 22: 233–240.
[16] Port SC, Goodarzi MO, Boyle NG, Jennrich RI. Blood glucose: a strong risk factor for mortality in nondiabetic patients with cardiovascular disease. American Heart Journal. 2005; 150: 209–214.
[17] Tian L, Zhu J, Liu L, Liang Y, Li J, Yang Y. Prediabetes and short-term outcomes in nondiabetic patients after acute ST-elevation myocardial infarction. Cardiology. 2013; 127: 55–61.
[18] Herman WH, Hoerger TJ, Brandle M, Hicks K, Sorensen S, Zhang P, et al. The cost-effectiveness of lifestyle modification or metformin in preventing type 2 diabetes in adults with impaired glucose tolerance. Annals of Internal Medicine. 2005; 142: 323–332.
[19] Jaiswal V, Negi A, Pal T. A review on current advances in machine learning based diabetes prediction. Primary Care Diabetes. 2021; 15: 435–443.
[20] Heltberg A, Andersen JS, Sandholdt H, Siersma V, Kragstrup J, Ellervik C. Predictors of undiagnosed prevalent type 2 diabetes—the Danish general suburban population study. Primary Care Diabetes. 2018; 12: 13–22.
[21] L AJP, Sengan S, G K K, J V, Gopal J, Velayutham P, et al. Medical information retrieval systems for e-Health care records using fuzzy based machine learning model. To be published in Microprocessors and Microsystems. 2020. [Preprint].
[22] Kumar U. Applications of machine learning in disease pre-screening. Research Anthology on Artificial Intelligence Applications in Security (pp. 1052–1084). IGI Global: Hershey, PA. 2020.
[23] Nawi AM, Kamaruddin PSNM, Nordin NRM, Soffian SSS, Baharom M. Machine learning models in prediabetes screening: a systematic review. Journal of Clinical and Diagnostic Research. 2022; 16: LE01–LE09.
[24] Antoniadi AM, Du Y, Guendouz Y, Wei L, Mazo C, Becker BA, et al. Current challenges and future opportunities for XAI in machine learning-based clinical decision support systems: a systematic review. Applied Sciences. 2021; 11: 5088.
[25] Ribeiro M, Singh S, Guestrin C. “Why Should I Trust You?”: explaining the predictions of any classifier. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (pp. 97–101). Association for Computational Linguistics: San Diego, California. 2016.
[26] Nguyen HV, Byeon H. Prediction of Parkinson’s disease depression using LIME-based stacking ensemble model. Mathematics. 2023; 11: 708.
[27] Wang X, Wang S, Liang X, Zhao D, Huang J, Xu X, et al. Deep reinforcement learning: a survey. IEEE Transactions on Neural Networks and Learning Systems. 2024; 35: 5064–5078.
[28] Prashanth A, Jayalakshmi SL, Vedhapriyavadhana R. A review of deep learning techniques in audio event recognition (AER) applications. Multimedia Tools and Applications. 2024; 83: 8129–8143.
[29] Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, et al. Deep learning in clinical natural language processing: a methodical review. Journal of the American Medical Informatics Association. 2020; 27: 457–470.
[30] Chai J, Zeng H, Li A, Ngai EWT. Deep learning in computer vision: a critical review of emerging techniques and application scenarios. Machine Learning with Applications. 2021; 6: 100134.
[31] Shwartz-Ziv R, Armon A. Tabular data: deep learning is not all you need. Information Fusion. 2022; 81: 84–90.
[32] Grinsztajn L, Oyallon E, Varoquaux G. Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems. 2022; 35: 507–520.
[33] Wydmański W, Bulenok O, Śmieja M. HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular Datasets. 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA) (pp. 1–9). IEEE: Thessaloniki, Greece. 2023.
[34] Sachdeva B, Rathee H, Gambhir P, Bansal P. Empirical analysis of machine learning models on Parkinson’s speech dataset. 2023 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) (pp. 1–5). IEEE: Delhi, India. 2023.
[35] Rathee H, Sachdeva B, Gambhir P, Dev A, Bansal P. Advancing Parkinson’s disease detection: an empirical evaluation of machine learning models based on speech analysis. 2023 6th International Conference on Contemporary Computing and Informatics (IC3I) (pp. 2339–2344). IEEE: Gautam Buddha Nagar, India. 2023.
[36] Choi SB, Kim WJ, Yoo TK, Park JS, Chung JW, Lee Y, et al. Screening for prediabetes using machine learning models. Computational and Mathematical Methods in Medicine. 2014; 2014: e618976.
[37] Kushwaha S, Srivastava R, Jain R, Sagar V, Aggarwal AK, Bhadada SK, et al. Harnessing machine learning models for non-invasive pre-diabetes screening in children and adolescents. Computer Methods and Programs in Biomedicine. 2022; 226: 107180.
[38] De Silva K, Jönsson D, Demmer RT. A combined strategy of feature selection and machine learning to identify predictors of prediabetes. Journal of the American Medical Informatics Association. 2020; 27: 396–406.
[39] Tasin I, Nabil TU, Islam S, Khan R. Diabetes prediction using machine learning and explainable AI techniques. Healthcare Technology Letters. 2023; 10: 1–10.
[40] Kim R, Kim C-W, Park H, Lee K-S. Explainable artificial intelligence on life satisfaction, diabetes mellitus and its comorbid condition. Scientific Reports. 2023; 13: 11651.
[41] Aelgani V, Gupta SK, Narayana VA. Local agnostic interpretable model for diabetes prediction with explanations using XAI. Proceedings of Fourth International Conference on Computer and Communication Technologies (pp. 417–425). Springer Nature: Singapore. 2023.
[42] Yun S, Oh K. The Korea national health and nutrition examination survey data linked cause of death data. Epidemiology and Health. 2022; 44: e2022021.
[43] Khaire UM, Dhanalakshmi R. Stability of feature selection algorithm: a review. Journal of King Saud University—Computer and Information Sciences. 2022; 34: 1060–1073.
[44] Hanchuan Peng, Fuhui Long, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005; 27: 1226–1238.
[45] Özyurt F. A fused CNN model for WBC detection with MRMR feature selection and extreme learning machine. Soft Computing. 2020; 24: 8163–8172.
[46] Eroglu Y, Yildirim M, Cinar A. mRMR-based hybrid convolutional neural network model for classification of Alzheimer’s disease on brain magnetic resonance images. International Journal of Imaging Systems and Technology. 2022; 32: 517–527.
[47] Kumar N, Kumar D. Imgwo based ann: a new heart disease diagnosis model to classify real world dataset. Indian Journal of Computer Science and Engineering. 2021; 12: 1001–1017.
[48] Gayathri S, Gopi VP, Palanisamy P. Automated classification of diabetic retinopathy through reliable feature selection. Physical and Engineering Sciences in Medicine. 2020; 43: 927–945.
[49] Srinivasu PN, Shafi J, Krishna TB, Sujatha CN, Praveen SP, Ijaz MF. Using recurrent neural networks for predicting type-2 diabetes from genomic and tabular data. Diagnostics. 2022; 12: 3067.
[50] Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting diabetes mellitus with machine learning techniques. Frontiers in Genetics. 2018; 9: 515.
[51] Freund Y, Schapire RE. Experiments with a new boosting algorithm. Proceedings of the Thirteenth International Conference on International Conference on Machine Learning (pp. 148–156). Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA. 1996.
[52] Friedman JH. Greedy function approximation: a gradient boosting machine. The Annals of Statistics. 2001; 29: 1189–1232.
[53] Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). Association for Computing Machinery: New York, NY, USA. 2016.
[54] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 3149–3157). Curran Associates Inc.: Red Hook, NY, USA. 2017.
[55] Breiman L. Random forests. Machine Learning. 2001; 45: 5–32.
[56] Bisong E. Logistic regression. Building machine learning and deep learning models on google cloud platform: a comprehensive guide for beginners (pp. 243–250). Apress: Berkeley, CA. 2019.
[57] Yuk H, Gim J, Min JK, Yun J, Heo T-Y. Artificial intelligence-based prediction of diabetes and prediabetes using health checkup data in Korea. Applied Artificial Intelligence. 2022; 36: 2145644.
[58] Hollmann N, Müller S, Eggensperger K, Hutter F. TabPFN: a transformer that solves small tabular classification problems in a second. 2023. Available at: https://arxiv.org/abs/2207.01848 (Accessed: 5 May 2024).
[59] Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2623–2631). Association for Computing Machinery: New York, NY, USA. 2019.
[60] Tramunt B, Smati S, Grandgeorge N, Lenfant F, Arnal J-F, Montagner A, et al. Sex differences in metabolic regulation and diabetes susceptibility. Diabetologia. 2020; 63: 453–461.
Science Citation Index Expanded (SciSearch) Created as SCI in 1964, Science Citation Index Expanded now indexes over 9,200 of the world’s most impactful journals across 178 scientific disciplines. More than 53 million records and 1.18 billion cited references date back from 1900 to present.
Journal Citation Reports/Science Edition Journal Citation Reports/Science Edition aims to evaluate a journal’s value from multiple perspectives including the journal impact factor, descriptive data about a journal’s open access content as well as contributing authors, and provide readers a transparent and publisher-neutral data & statistics information about the journal.
Directory of Open Access Journals (DOAJ) DOAJ is a unique and extensive index of diverse open access journals from around the world, driven by a growing community, committed to ensuring quality content is freely available online for everyone.
SCImago The SCImago Journal & Country Rank is a publicly available portal that includes the journals and country scientific indicators developed from the information contained in the Scopus® database (Elsevier B.V.)
Publication Forum - JUFO (Federation of Finnish Learned Societies) Publication Forum is a classification of publication channels created by the Finnish scientific community to support the quality assessment of academic research.
Scopus: CiteScore 0.9 (2023) Scopus is Elsevier's abstract and citation database launched in 2004. Scopus covers nearly 36,377 titles (22,794 active titles and 13,583 Inactive titles) from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-level subject fields: life sciences, social sciences, physical sciences and health sciences.
Norwegian Register for Scientific Journals, Series and Publishers Search for publication channels (journals, series and publishers) in the Norwegian Register for Scientific Journals, Series and Publishers to see if they are considered as scientific. (https://kanalregister.hkdir.no/publiseringskanaler/Forside).
Top