Research Overview

This project focuses on improving the early detection of lung cancer
using AI and machine learning techniques.

Lung cancer is a leading cause of cancer-related deaths worldwide. Early detection is crucial for improving survival rates. Traditional diagnostic methods, such as CT scans, often struggle with accuracy, leading to misdiagnosis or delayed treatment. This research explores the use of ensemble modeling (EM) techniques in Convolutional Neural Networks (CNNs) to enhance the classification of lung cancer subtypes from CT scan images.

Research Question

Can the use of Convolutional Neural Networks with varying activation functions—these determine how a model learns—in Ensemble modeling improve the accuracy of the classification of lung cancer subtypes from CT scan images compared to individual models?

Hypothesis

An ensemble of CNN models with different activation functions will outperform individual models in classifying lung cancer subtypes from CT scan images in accuracy and reliability.

Data Collection

The experiment compares multiple CNN architectures—both as standalone models and with Ensemble Modeling—to evaluate their effectiveness. Each model was assessed using standard classification metrics (accuracy, precision, and others), with testing conducted across two platforms to ensure reliability.

Modeling

The following models were trained and tested with and without ensemble modeling: a basic CNN with no mainstream architecture, AlexNet, VGG16, and VGG19 (the last three being mainstream architectures used in the real world).

The models were trained on the Kaggle dataset "Chest CT-Scan images Dataset" with 1,000 images of lung cancer CT scans split by type (no cancer, Adenocarcinoma, Large Cell Carcinoma, and Squamous Cell Carcinoma).

Sample image of chest CT scan that
has Adenocarcinoma lung cancer.

Execution

Python served as the primary tool for model development, while relevant performance metrics were recorded automatically during testing.

To minimize bias and data error, each model was trained and evaluated twice—once on the researcher's computer and again on Google Colab (a free online service)—to account for computational variations.

Python (programming language) logo

Google Colab (coding service) logo

Results

To view more of the results, read the full article!

The graph above highlights the difference in accuracies between CNN architectures as well as with and without ensemble modeling. The most striking result was the 55.6% relative improvement in accuracy—characterized by the noticeable difference in height of the light-green bars—for VGG16 when ensemble modeling was applied (from 41.11% to 63.97%). This suggests that VGG16, which performed poorly as a standalone model, benefited greatly from the ensemble modeling. Ultimately showing that ensemble modeling techniques have great potential to revolutionize existing AI-powered lung cancer detection solutions.