Cancer Classification Using ML

  • Michelle Preetham Department of Biomedical Engineering, Karunya Institute of Technology and Science, Coimbatore, Tamil Nadu, India
  • Tabitha Kusum Arun Department of Biomedical Engineering, Karunya Institute of Technology and Science, Coimbatore, Tamil Nadu, India
  • G. R. Ashisha Department of Biomedical Engineering, Karunya Institute of Technology and Science, Coimbatore, Tamil Nadu, India
Keywords: Somatic Mutation Classification, Cancer Genomic Profiling, Clinical Text Analytics, Supervised Machine Learning, Gradient Boosting (XGBoost), TCGA Dataset, Cancer Classification, Machine Learning, XGBoost, Genomic Data Analysis, Precision Oncology

Abstract

Personalized therapy based on the genetic features of tumors has replaced the one-size-fits-all approach to cancer treatment. This model was developed to facilitate clinical decision-making in individualized cancer treatment by classifying cancer tumors into nine types based on genetics. mutations, this study determined a machine learning-based method for categorizing genetic alterations in cancer. Using a dataset obtained from The Cancer Genome Atlas (TCGA), we assessed four traditional machine learning algorithms (Chen, Yao, & Wang, 2015): Logistic Regression, Random Forest, Support Vector Machine (SVM), and XGBoost. The training and testing results were analyzed and showed a significant difference, with a training accuracy of 87.98 per cent and a test accuracy of 45.60 per cent. Such a strong gap indicates susceptibility to overfitting, particularly with Random Forest and XGBoost, which exhibited the largest train test divergence. Term Frequency-Inverse Document Frequency (TF-IDF) vectorization was used to preprocess the textual clinical data to transform unstructured language into numerical features that may be classified. According to our experimental findings, XGBoost outperformed all other examined classifiers in terms of accuracy, precision, recall, and F1-score while maintaining the lowest log loss. This can be used to treat each type of cancer with a treatment best suited to the type. They can significantly decrease the human workload, as this is currently done manually, while increasing accuracy. In the future, even more classifiers can be used to increase accuracy and provide more reliable and effective treatment.

Published
2026-04-01
Section
Articles