Better Breast Cancer Care with AI

The Project

In this project, a novel deep learning (DL) model was built to predict whether a breast tumor is benign (non-cancerous) or malignant (cancerous). The dataset used was the Breast Cancer Wisconsin Diagnostic Dataset, which is publicly available and contains de-identified data. It has 569 breast cancer cases, 357 benign & 212 malignant. Each case contains 30 computed features from the digitised images of cell nuclei from breast cancer biopsy. In the present DL model, Python was used in the user interface JupyterLab. The data was reconstructed so all of the values of the features fit within a scale of 0-1. The data was split into four train/test split percentages (90/10, 75/25, 60/40 &50/50) to form the training & testing groups. The Multilayer perceptron model used in the this project has an input layer, 3 hidden layers with 30 nodes each, & an output layer. ReLU Activation function was used for calculations. The errors were calculated & the weight values in the hidden layers were modified to reduce output errors. The process was repeated until the errors were minimised. The model was run 30 times for each train/test split to ensure accuracy. The mean test accuracy & standard deviation for each train/test splits over 30 trails were calculated. The mean test accuracy for train/test (%/%) groups 90/10, 75/25, 60/40 and 50/50 were 95.4%, 96.7%, 96.5% & 96.3% respectively. The standard deviation was 0.038, 0.016, 0.012 & 0.012 respectively. Pairwise statistical t-tests between mean test accuracies showed no significant difference (p>0.05) among the 4 different train/test splits. Therefore, all train/test splits were compatible with each other. Based on the data, we can conclude that DL model can be reliable, efficient, convenient and inexpensive method for breast tumor prediction. These advantages, compared to traditional diagnosis by pathologists, may lead to quicker onset of treatment leading to a better outcome.

Advanced
Identity
Health
Community

Team Comments

I chose to make this project because...

Breast cancer is the most common cancer & second leading cause of death in women. I have friends & family with breast cancer. Early detection is key to successful treatment. But, diagnosis by pathologists is expensive & takes weeks causing delay in treatment. Also, it is often prone to misdiagnosis

What I found difficult and how I worked it out

Choosing the right machine learning model for breast tumor prediction was the hardest step. I had to learn about various models, & their pros & cons. It was time consuming & a big learning curve. InitiaIly, I tried many models in my project. Finally, I chose deep learning model for its accuracy

Next time, I would...

Dataset used here is small compared to what a DL model can handle. I would like to use a larger dataset to see how this DL model would perform. Also: 1. Use data from different geographical area/socio-economic & ethnic backgrounds 2. Compare DL model with other AI models 3.Try other algorithms

About the team

  • United States
  • Code Club

Team members

  • Anika