Addressing Bias in AI: How Educators Can Train Students to Detect Flaws in Datasets
Leo Anthony Celi, a senior research scientist at MIT’s Institute for Medical Engineering and Science, a physician at Beth Israel Deaconess Medical Center, and an associate professor at Harvard Medical School, highlights a critical oversight in AI education: the lack of training on detecting biases in datasets. Celi’s research, published in a new paper, underscores the importance of addressing this gap. Bias in AI datasets often originates from flawed or insufficient data collection practices, particularly in medical contexts where clinical trials primarily involve white males, neglecting the diversity of the patient population. One prominent example is the issue with pulse oximeters, which tend to overestimate oxygen levels in people with darker skin tones. This is due to the underrepresentation of people of color in the clinical studies used to calibrate these devices. Similarly, Celi points out that medical devices and equipment are generally optimized for healthy young males and are not rigorously tested on older or sicker patients, which can lead to significant inaccuracies when applied broadly. The electronic health record (EHR) system, a crucial source of data for AI models, also poses challenges. These records were originally designed for administrative purposes and not for data analysis, making them unreliable for AI applications without thorough examination. EHRs are often incomplete or skewed due to social determinants of health and implicit biases from healthcare providers, which can further compound the issue of biased datasets. To mitigate these problems, Celi suggests developing transformer models that can better capture the relationships between different data points, thus reducing the impact of missing or inaccurate data. Celi and his team at MIT began their AI course in 2016 and quickly realized that students were focusing too much on building models rather than evaluating the quality of the data. A review of 11 online courses revealed that only five mentioned data bias, and only two provided substantial coverage. This indicates a significant gap in AI education that needs to be addressed to prevent the creation of biased and potentially harmful models. To tackle this issue, course developers should prioritize teaching students how to critically assess the data they use. This includes providing a comprehensive checklist of questions to guide their evaluation. For instance, students should question the source of the data, the demographics of the patients involved, and the accuracy and consistency of the measurement devices. Courses should also emphasize the importance of understanding the institutional and social context of the data collection, recognizing that certain patient groups may be underrepresented or excluded altogether. Celi advocates for a hands-on approach through datathons, which are collaborative events where healthcare professionals and data scientists come together to analyze and improve datasets. These gatherings foster a diverse and intergenerational environment that naturally promotes critical thinking and problem-solving. By engaging with local data sets, participants can ensure that the models they develop are more relevant and accurate for their specific communities. Celi notes that initial resistance often stems from fear of discovering data inadequacies, but he emphasizes that acknowledging these issues is the first step towards fixing them. MIMIC, a medical database built at Beth Israel Deaconess Medical Center, serves as a case study. It took a decade to develop a reliable schema, a process driven by continuous feedback highlighting its early flaws. Celi finds inspiration in the blog posts from datathon attendees, who express newfound excitement and awareness about the field’s potential and the risks associated with poor data practices. Industry insiders and experts like Celi agree that teaching students to critically evaluate their datasets is essential for the responsible and effective deployment of AI in healthcare. This approach not only enhances the reliability and fairness of AI models but also prepares students to tackle the real-world complexities of data science. Incorporating such training in AI courses can significantly reduce the likelihood of producing biased algorithms that could harm marginalized populations. MIT and its partners have been leading the charge in this direction, aiming to set a new standard in AI education and practice.