This case study highlights the research conducted for my Master's thesis addressing the critical issue of fairness in machine learning (ML) for predicting hospital readmissions. The project sought to leverage ML's potential while ensuring equitable predictions for all patients, particularly in the context of rising diabetes prevalence.

Algorithmic bias in healthcare can worsen existing inequalities, potentially leading to adverse outcomes for certain patient groups.
ML models learn from historical data, which often contains biases. If not addressed, these biases are replicated and even amplified in the models' predictions.

The Method of Unawareness (droping sensitive attributes during model training, like race) is insufficient and can mask critical disparities that need to be addressed.
This research specifically examined the impact of race on predictions for diabetic patient readmissions.
The aim was to develop models that are accurate in identifying patients needing readmission while ensuring fairness across racial groups.
The project focused on understanding how data pre-processing and the use of sensitive attributes could impact both model accuracy and fairness.
The study used the "Diabetes 130-Hospitals" dataset (UCI Machine Learning repository), encompassing ten years of data from over 130 US hospitals, representing nearly 100,000 patient encounters.

The dataset presented a significant challenge due to a highly imbalanced racial distribution, with a predominance of Caucasian patients
This imbalance raised concerns about the models' ability to generalise well to minority groups.

Two novel data engineering approaches were developed to address the challenges posed by the imbalanced dataset: