Breast Cancer / AI KNN

Ask questions about projects relating to: biology, biochemistry, genomics, microbiology, molecular biology, pharmacology/toxicology, zoology, human behavior, archeology, anthropology, political science, sociology, geology, environmental science, oceanography, seismology, weather, or atmosphere.

Moderators: AmyCowen, kgudger, bfinio, MadelineB, Moderators

Post Reply
amyCC
Site Admin
Posts: 90
Joined: Wed Apr 01, 2020 4:02 pm
Occupation: Moderator
Project Question: *
Project Due Date: *
Project Status: Not applicable

Breast Cancer / AI KNN

Post by amyCC »

[Posting on behalf of parent/student]

My son, a 7th grader, is interested in doing the "Can AI diagnose breast cancer" project. https://www.sciencebuddies.org/science- ... ast-cancer

We are following the instructions as per the video listed.
In the video by Tracy @2:21 she says that we should normalize or scale the features that are specified in the project instructions. As per the objective, " you will create a K-Nearest Neighbors (KNN) machine learning model that can predict whether a patient has a benign tumor or malignant breast cancer based on the characteristics of the tumor cell nucleus, such as its radius, perimeter, area, and smoothness. states to use characteristics of the tumor cell nucleus, such as its radius, perimeter, area, and smoothness."

In the video in the section of preprocessing data, for normalization all the characteristics are listed. Are we supposed to type all the characteristics or just the ones mentioned above? I tried to enter only four characteristics but when I ran the code, I see all the characteristics without the normalization but when I entered all the characteristics according to the video, I was able to run the code and see the normalization.

Could you please help me with following questions:
1. Whether in the objective we should enter all the characteristics of the tumor cells OR if we keep the objective as per the website, could you please let me know how to run the code using only the four characteristics

2. If we use the four characteristics to see if KNN model can help predict whether patient has benign or malignant cancer, where can I see this data? The science Fair where my son will be presenting are asking to show the data and results.
amyCC
Site Admin
Posts: 90
Joined: Wed Apr 01, 2020 4:02 pm
Occupation: Moderator
Project Question: *
Project Due Date: *
Project Status: Not applicable

Re: Breast Cancer / AI KNN

Post by amyCC »

[Reply from Science Buddies staff scientist]

1. Following the original project procedure, you should enter all of the characteristics listed in that step for the code in the later steps to work. However, if you want to use just the four characteristics, you will have to drop the other features from the dataset using data.drop. <-- This is the first block under the "Preprocessing the data" section. You can simply add all the other characteristics you don't plan to use in quotes.

2. I'm not sure what you mean by "see the data." You can view the data anytime by running print(data). The section "Visualize the Model" helps with seeing how accurate the model is in classifying whether a patient has benign or malignant breast cancer.
VIPawar7
Posts: 1
Joined: Thu Jan 11, 2024 3:14 pm
Occupation: Parent

Re: Breast Cancer / AI KNN

Post by VIPawar7 »

Thank you so much for your response.

We tried to run the code using only four characteristics but it still shows all the characteristics after running the code. We will keep all the characteristics in this project that my son will present.

To see the data, I meant to view the data. I was able to view the data when I chose to view the data fullscreen. In this section there are 5 rows of data. But in the training section there are 568 rows and 10 columns. That's when I thought that I am not able to view all other rows that are standardized.

As we will be using the same graph that shows the 2 D image of the data using PCA, does this graph has data from the 568 biopsy samples?

Appreciate your prompt response and ScienceBuddies team for giving us this opportunity to do this wonderful science project!
amyCC
Site Admin
Posts: 90
Joined: Wed Apr 01, 2020 4:02 pm
Occupation: Moderator
Project Question: *
Project Due Date: *
Project Status: Not applicable

Re: Breast Cancer / AI KNN

Post by amyCC »

Hi - I am passing on some additional information from our team in response to your post.

1. The code should not have the other characteristics after dropping the code. To drop characteristics, you want to replace the code in the first section after "Preprocessing the Data" section. Here's the code for dropping columns:
dropcolumns.png
dropcolumns.png (11.69 KiB) Viewed 383 times
2. Are you viewing the data by using print(data)? It should look like this (shown below). Notice that there is a backslash at the end, meaning that the data keeps going. You will have to scroll down within the cell to view more.
image (14).png
image (14).png (239.39 KiB) Viewed 383 times
image (15).png
image (15).png (135.35 KiB) Viewed 383 times
You could also download and view the dataset as a CSV file here: https://www.kaggle.com/datasets/nancyal ... er-dataset

3. The graph does have data from all 568 samples.

I hope this helps. Let us know how the project goes!

Amy
Science Buddies
Post Reply

Return to “Grades 6-8: Life, Earth, and Social Sciences”