03 آذر 1403
خداكرم سليمي فرد

خداکرم سلیمی فرد

مرتبه علمی: دانشیار
نشانی: دانشکده کسب و کار و اقتصاد - گروه مدیریت صنعتی
تحصیلات: دکترای تخصصی / تحقیق در عملیات
تلفن: 07731222118
دانشکده: دانشکده کسب و کار و اقتصاد

مشخصات پژوهش

عنوان Machine learning algorithms to uncover risk factors of breast cancer: insights from a large case-control study
نوع پژوهش مقالات در نشریات
کلیدواژه‌ها
breast cancer, machine learning, risk factor, random forest, neural networks, bootstrap aggregating classification and regression tree, extreme gradient boosting
مجله Frontiers in Oncology
شناسه DOI https://doi.org/10.3389/fonc.2023.1276232
پژوهشگران مصطفی دیانتی نسب (نفر اول) ، خداکرم سلیمی فرد (نفر دوم) ، رضا محمدی (نفر سوم) ، سعادتمند سارا (نفر چهارم) ، محمد فرارویی (نفر پنجم) ، بهشید جیاوید شریفی (نفر ششم به بعد) ، کوثر حسینی (نفر ششم به بعد) ، ثیری چاوساله (نفر ششم به بعد) ، سمیرا دهدار (نفر ششم به بعد)

چکیده

Introduction: This large case-control study explored the application of machine learning models to identify risk factors for primary invasive incident breast cancer (BC) in the Iranian population. This study serves as a bridge toward improved BC prevention, early detection, and management through the identification of modifiable and unmodifiable risk factors. Methods: The dataset includes 1,009 cases and 1,009 controls, with comprehensive data on lifestyle, health-behavior, reproductive and sociodemographic factors. Different machine learning models, namely Random Forest (RF), Neural Networks (NN), Bootstrap Aggregating Classification and Regression Trees (Bagged CART), and Extreme Gradient Boosting Tree (XGBoost), were employed to analyze the data. Results: The findings highlight the significance of a chest X-ray history, deliberate weight loss, abortion history, and post-menopausal status as predictors. Factors such as second-hand smoking, lower education, menarche age (>14), occupation (employed), first delivery age (18-23), and breastfeeding duration (>42 months) were also identified as important predictors in multiple models. The RF model exhibited the highest Area Under the Curve (AUC) value of 0.9, as indicated by the Receiver Operating Characteristic (ROC) curve. Following closely was the Bagged CART model with an AUC of 0.89, while the XGBoost model achieved a slightly lower AUC of 0.78. In contrast, the NN model demonstrated the lowest AUC of 0.74. On the other hand, the RF model achieved an accuracy of 83.9% and a Kappa coefficient of 67.8% and the XGBoost, achieved a lower accuracy of 82.5% and a lower Kappa coefficient of 0.6. Conclusion: This study could be beneficial for targeted preventive measures according to the main risk factors for BC among high-risk women.