Mitigating Dataset Bias in Machine Learning: A Comparative Study of Reweighting, Data Augmentation, and Adversarial Debiasing Techniques

Authors

  • Mission Franklin

Abstract

Bias in machine learning datasets poses a critical challenge to the development of fair and reliable AI systems. Imbalanced or unrepresentative data can lead models to perpetuate or even amplify existing societal disparities, particularly in sensitive domains such as criminal justice, healthcare, and recruitment. This study systematically investigates and compares three prominent bias mitigation techniques: reweighting, data augmentation, and adversarial debiasing aimed at reducing the discriminatory impact of biased training data. Using publicly available datasets, including the UCI Adult and COMPAS datasets, each technique was implemented within a controlled experimental framework. Evaluation employed fairness metrics, including statistical parity difference, equal opportunity difference, and disparate impact, alongside conventional performance measures such as accuracy and F1-score. Results indicate that reweighting provides a straightforward approach with moderate fairness improvements, whereas adversarial debiasing consistently achieves a superior balance between fairness and predictive performance. Data augmentation yielded variable results depending on dataset characteristics and complexity. This comparative analysis underscores the inherent trade-offs in fairness-oriented learning and suggests that adversarial debiasing offers a more robust solution for practical applications. The findings contribute to ongoing discussions on ethical AI design and offer actionable guidance for developers and policymakers seeking to promote equitable algorithmic decision-making.

Published

2025-10-08