Hastalık tahmininde makine öğrenmesi sınıflandırma algoritmalarının karşılaştırılması ve bootstrap metodu kullanımı

Kalkan, Seda BağdatlıKaba, Gamze2024-10-102024-10-102022https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=kIrIdtdJ31bRgjb6fHvMUe7lIaK6DetDKWChCV69gXb1K3cb5hypAhOmfpNWlEsAhttps://hdl.handle.net/11467/7735Fen Bilimleri Enstitüsü, İstatistik Ana Bilim DalıSağlık alanında uzun yıllarca verilerin kayıt altında tutulması ile büyük veri yığınları oluşmaktadır. Veri yığınları makine öğrenmesi yöntemleri kullanılarak sınıflandırılabilir ve daha anlaşılabilir duruma getirilebilir. Bu yöntemler aynı zamanda birçok hastalık tanısının tahmin edilmesine olanak sağlamaktadır. Bu çalışmada, günümüzde küresel olarak ölüm nedenlerinde birinci sırada yer alan Kardiyovasküler Hastalığın erken teşhisi için çeşitli risk faktörleri değerlendirilmiştir. Hastalığın erken tanısı tedavi sürecini hızlandıracağı için sağlık alanında büyük önem taşımaktadır. Bu çalışmada kullanılan veri seti, Kaggle platformu üzerinden elde edilen, "UCI Machine Learning Repository" veri tabanına ait 5 farklı veri setinden 11 ortak özellik altında birleştirilmiş verilerden oluşmaktadır. Çalışmada makine öğrenmesi sınıflandırma algoritmalarından Naive Bayes, Lojistik Regresyon, Rastgele Orman, K-En Yakın Komşu ve Destek Vektör Makineleri olmak üzere beş farklı sınıflandırma yöntemi kullanılarak, oluşturulan modellerin başarı performansları karşılaştırılmıştır. Bu çalışmada, denetimli makine öğrenmesi algoritmaları kullanılarak kalp hastalığı tahminini en iyi yapabilecek modeli belirlemek amaçlanmıştır. Bireylerde kalp hastalığı olma ihtimalini etkileyebilecek olası risk faktörleri incelenmiştir. Çalışmadaki temel hedeflerden biri sınıflandırma yöntemlerinin güvenilirliğini ve tahminsel doğruluğunu arttırmaktır. Bu amaçla veri setine Bootstrap yeniden örnekleme metodu uygulanmıştır. Kullanılan her bir sınıflandırma yönteminin başarısı ham veri ve örneklemler üzerinde model performans ölçütleri ile karşılaştırılmıştır. En başarılı modeli Rastgele Orman algoritmasının oluşturduğu görülmüştür.In the field of health, large data piles are formed with the recording of data for many years. Data stacks can be classified and made more understandable with using machine learning methods. These methods also allow the estimation of many disease diagnoses. In this study, various of risk factors for early diagnosis of Cardiovascular Disease, which is currently leading couse of death globally, were evaluated. Early diagnosis of the disease carries great importance in the field of health because it accelerates the treatment process. The dataset used in this study, consists of data accumulate under 11 common features from 5 different datasets of the "UCI Machine Learning Repository" database obtained through the Kaggle platform. In this study, the success performances of the models created by using fice different classification methods, namely Naive Bayes, Logistics Regression, Random Forest, K-Nearest Neighbors and Support Vector Machines, which are machine learning classification algorithms, were compared. In this study, it is aimed to determine the model that can best predict heart desease by using supervised machine learning algorithms. Possible risk factors that may affect the probability of having heart disease in individuals were examined. One of the main goals of the study is to increase the reliability and predictive accuracy of the classification methods. For this purpose, Bootstrap resampling method has been applied to the data set. The success of each classificassion method that is used, has been compared with the model performance measures on raw data and samples. It has been seen that the most successful model is the Random Forest algorithm.trinfo:eu-repo/semantics/openAccessİstatistikStatisticsHastalık tahmininde makine öğrenmesi sınıflandırma algoritmalarının karşılaştırılması ve bootstrap metodu kullanımıComparison of machine learning classification algorithms and using the bootstrap method in disease predictionMaster Thesis199771612