Health care AI systems are biased

Thanks to the progress of artificial intelligence (AI) and machine learning, computer systems can now do many impressive things, such as:

  • Diagnose skin cancer – like a dermatologist would.
  • Pick out a stroke on a CT scan – like a radiologist.
  • Detect potential cancers on a colonoscopy – like a gastroenterologist.

These digital diagnosticians offer faster, cheaper, and more efficient diagnosis and care. However, there is a concern that these technologies could also perpetuate biases in medicine.

As the country continues to confront systemic bias in fundamental societal institutions, we need technology to minimize health disparities rather than aggravate them. It has been widely recognized that AI algorithms trained on data that fails to represent the entire population, tend to perform poorly for underrepresented groups. For instance, algorithms trained on gender-imbalanced data struggle to read chest x-rays for the underrepresented gender. Thus, there are concerns that skin cancer detection algorithms, which are primarily trained on individuals with lighter skin, may struggle to detect skin cancer in people with darker skin.

To avoid the serious consequences of incorrect decisions, medical AI algorithms must be trained using data sets that represent diverse populations. Yet, such diverse training is not taking place. A study published in JAMA examined over 70 publications that compared the diagnostic accuracy of doctors to that of digital systems across various areas of clinical medicine. The study revealed that most of the data used to train these AI algorithms was sourced from only three states: California, New York, and Massachusetts.

Medical AI suffers from a lack of data diversity – whether in terms of race, gender, or geography – which makes it difficult for researchers to access comprehensive and varied medical data sets. This issue can lead to biased algorithms.

Improved data availability remains a challenge in medicine. One of our patients (a veteran) expressed his frustration with obtaining his prior medical records by saying” Doc, why is it that we can see a specific car in a moving convoy on the other side of the world, but we can’t see my CT scan from the hospital across the street?”. Sharing medical data, even for a single patient, is challenging and requires a significant amount of effort. Collecting hundreds or thousands of cases needed for training machine learning algorithms is even more challenging. The data in medicine are often segregated in silos, which makes it difficult to access them for patient care or developing AI tools.

Sharing medical data should be more common, but privacy laws and the importance of protecting sensitive data make it challenging. Economic factors can also contribute to data sequestration, with hospitals being reluctant to share data for fear of losing patients to competitors. Interoperability issues between different medical records systems can also be a technical barrier to sharing data. Additionally, there is public concern about the use of personal data by big tech, which has made people skeptical of any attempts to aggregate personal data, even for beneficial purposes.

Medical data lacking diversity is not a new issue. Women and minority groups have historically been underrepresented as study participants, leading to evidence showing that these groups experienced fewer benefits and more side effects from approved medications. Addressing this problem has required a joint effort from the NIH, FDA, researchers, and industry, as well as an act of Congress in 1993. Despite progress being made, it remains an ongoing issue. In fact, one of the companies working towards a COVID vaccine recently announced a delay to recruit more diverse participants, highlighting the importance of diversity in medical research.

AI is being increasingly used as an expert in various high-stakes domains beyond medicine. For example, AI tools can:

  • Assist judges in making sentencing decisions.
  • Direct law enforcement efforts.
  • Provide recommendations to bank officers about loan applications.

Nonetheless, before algorithms become an integral part of such high-stakes decisions, it is crucial to identify and mitigate any inherent biases.

The issue of bias in AI is multifaceted, and diversifying training data alone may not suffice to eliminate it. Other concerns include:

  • Lack of diversity among developers and funders.
  • Framing problems from majority group perspectives.
  • Biased assumptions about data.
  • The possibility of using AI outputs to perpetuate biases.

Due to the difficulty of obtaining high-quality data, researchers are developing algorithms that can achieve more with less. This innovation may lead to new ways of reducing AI’s dependence on large data sets. However, ensuring diversity in the data used to train algorithms remains crucial to understanding and addressing AI biases.

To create powerful and fair algorithms, it is essential to establish a strong infrastructure for technical, regulatory, economic, and privacy concerns to provide the large and diverse data required to train these algorithms. It is no longer acceptable to build and deploy tools with whatever data is available, ignoring the consequences and hoping for progress. We must acknowledge the potential outcomes and work towards preventing them from happening.


Humanitas is a highly specialized Hospital, Research and Teaching Center. Built around centers for the prevention and treatment of cancer, cardiovascular, neurological and orthopedic disease – together with an Ophthalmic Center and a Fertility Center – Humanitas also operates a highly specialised Emergency Department.