Hello and welcome to Eye on AI. In today’s edition…An international initiative aims to tackle bias in medical AI algorithms; Europe’s privacy regulators say training on internet data might pass muster with GDPR—but the hurdles for doing so are high; Geopolitical tensions impede the flow of AI talent from China to the U.S.; Character.ai comes under fire for disturbing content (again); and AI startups hog all the fundraising.
As I’ve written about in this newsletter many times, AI is sweeping the healthcare industry—from drug discovery to AI-enhanced mammograms to transcription of clinical medical documents.
Long before hallucinations and many of the risks brought to the forefront by the generative AI boom became apparent, we had widespread evidence of bias in AI algorithms, which are often less accurate for some groups, such as women and people of color. Now, as AI companies and healthcare providers increasingly integrate AI into patient care, ways to evaluate and address such biases are needed more than ever.
Yesterday, an international initiative called "STANDING Together (STANdards for data Diversity, INclusivity and Generalizability)" released recommendations to address bias in medical AI technologies, hoping to “drive further progress towards AI health technologies that are not just safe on average, but safe for all.” Published in The Lancet Digital Health and NEJM AI—along with a commentary by the initiative’s patient representatives published in Nature Medicine—the recommendation follows a research study involving more than 30 institutions and 350 experts from 58 countries.
The recommendations largely deal with transparency, training data, and how AI medical technologies should be tested for bias, targeting both those who curate datasets and those who use the datasets to create AI systems.
The problem
Before getting to recommendations, let’s review the problem.
Overall, algorithms created to detect illness and injury tend to underperform on underrepresented groups like women and people of color. For example, technologies that use algorithms to detect skin cancer have been found to be less accurate for people with darker skin, while a liver disease detection algorithm was found to underperform for women. One bombshell study revealed that a clinical algorithm used widely by hospitals required Black patients to be much sicker before it recommended they receive the same care it recommended for white patients who were not as ill. Similar biases have been uncovered in algorithms used to determine resource allocation, such as how much assistance people with disabilities receive. These are just a handful of many examples.
The cause of these problems is most often found in the data used to train AI algorithms. This data is itself often incomplete or distorted—women and people of color are historically underrepresented in medical studies. In other cases, algorithms fail because they are trained on data that is meant to be a proxy for some other piece of information, but which turns out not to appropriately capture the issue the AI system is supposed to address. The hospital algorithm that denied Black patients the same level of care as white patients failed because it used health-care costs as a proxy for patient care during training. And it turns out that hospital systems have historically spent less on healthcare for Black patients at every level of care, which meant that the AI failed to accurately predict Black patients’ needs.
Suggested solutions
The collective behind the study issued 29 recommendations — 18 aimed at dataset curators and 11 aimed at data users.
For the dataset curators, the paper recommends that dataset documentation should include a summary of the dataset written in plain language, indicate which groups are present in the dataset, address any missing data, identify known or expected sources of bias or error, make clear who created the dataset, who funded it, and detail any purposes for which dataset use should be avoided, among other steps to increase transparency and provide context.
For data users, the recommendations state that they should identify and transparently report areas of under-representation, evaluate performance for contextualised groups, acknowledge known biases and limitations (and their implications), and manage uncertainties and risks throughout the lifecycle of AI health technologies, including documentation at every step.
Among the overall themes are a call to proactively inquire and be transparent, and the need to be sensitive to context and complexity. “If bias encoding cannot be avoided at the algorithm stage, its identification enables a range of stakeholders relevant to the AI health technology's use (developers, regulators, health policy makers, and end users) to acknowledge and mitigate the translation of bias into harm,” the paper reads.
Will guidelines translate into action?
Like with every emerging use of AI, it’s a delicate balance between the potential benefits, known risks, and responsible implementation. The stakes are high, and this is particularly true when it comes to medical care.
This paper is not the first to try to tackle bias in AI health technologies, but it is among the most comprehensive and arrives at a critical time. The authors write that the recommendations are not intended to be a checklist, but rather to prompt proactive inquiry. But let’s be real: the only way to be certain these lessons will be applied is through regulation.
And with that, here’s more AI news.
Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com