研究成果:Differentiation of intestinal tuberculosis and Crohn's disease through an explainable machine learning method
作者:翁福添,孟钰,卢放根,王玉莹,王玮玮,徐龙,程东升,朱建平*
发表期刊:Scientific Reports (JCR Q1),2022
内容介绍:该论文聚焦于消化内科中克罗恩病和肠结核的鉴别问题,提出一种可解释机器学习框架,对有效鉴别这两种疾病即理解机器学习如何做出预测具有重要意义。
Abstract
Background: Differentiation between Crohn’s disease and intestinal tuberculosis isdifficult but crucial for medical decisions. This study aims to develop an effective framework to distinguish these two diseases through an explainable machine learning (ML) model. Methods: After feature selection, a total of nine variables are extracted, including intestinal surgery, abdominal, bloody stool, PPD, knot, ESAT-6, CFP-10, intestinal dilatation and comb sign. Besides, we compared the predictive performance of the ML methods with traditional statistical methods. This work also provides insights into the ML model’s outcome through the SHAP method for the first time. Results: A cohort consisting of 200 patients' data (CD = 160, ITB = 40) is used in training and validating models. Results illustrate that the XGBoost algorithm outperforms other classifiers in terms of area under the receiver operating characteristic curve (AUC), sensitivity, specificity, precision and Matthews correlation coefficient (MCC), yielding values of 0.891, 0.813, 0.969,0.867 and 0.801 respectively. More importantly, the prediction outcomes of XGBoost can be effectively explained through the SHAP method. Conclusions: The proposed framework proves that the effectiveness of distinguishing CD from ITB through interpretable machine learning, which can obtain a global explanation but also an explanation for individual patients.
作者介绍:
朱建平,2003年获南开大学理学博士学位。2013年在耶鲁大学公共卫生学院生物统计系访问合作一个学期。现任麻花星空mv教授、博士生导师,厦门大学健康医疗大数据国家研究院副院长、厦门大学数据挖掘研究中心主任,国家社科基金重大项目首席专家,浙江工商大学现代商贸研究中心首席专家,教育部新世纪优秀人才,福建省哲学社会科学领军人才。主要研究方向:数理统计、数据挖掘、数据科学与商业智能、计量经济学。曾任第八、九届中国统计学会副会长。其负责的厦门大学数据挖掘研究创新团队荣获第五届“中国侨界贡献奖”,获得福建省第十二届“五四青年奖章”集体奖。主持国家社科基金重大项目、国家社科基金项目、教育部人文社会科学项目、国防科工委计划项目、国家统计局重点课题等20余项纵向课题,主持完成公司和政府横向课题40余项;发表学术论文110余篇,独立完成《世纪之交中国统计学科的回顾与思考》和《数据挖掘的统计方法及实践》等学术专着6部,主编《应用多元统计分析》等教材10余部,副主编及参编《统计学》等教材6部;多项成果获省部级以上奖励。