ChatGPT remains a hot topic for many people, with the potential of what it’s capable of achieving becoming ever more evident.
However, each new revelation about what ChatGPT (and other similar artificial intelligence (AI) chatbots) can achieve, appears to focus on English and English-based languages (such as programming languages). This has led to concerns that it could be ignoring other languages.
We've already been impressed by what OpenAI has achieved with ChatGPT and its GPT family of language models, but there have been concerns that its primary focus has been on producing convincing speech that is understandable to humans, with not as much of a focus on including factual and accurate information, or preventing harmful or even malicious output.
Now, that’s not to say OpenAI (the organization behind ChatGPT) makes no effort in this regard. It's incorporated some safeguards into ChatGPT that prevent it from generating certain types of content, and it uses moderators that have access to users’ queries and the resulting responses by ChatGPT to review them.
Still, with something the size and complexity of ChatGPT, it makes it hard to account for every nuance in every language, phrasing, and context, and as far as we know, there are no built-in information-checking mechanisms or programs.
Prone to misinformation
In April, NewsGuard, a misinformation watchdog, reported some concerning instances when using ChatGPT in Chinese dialects. It claims that ChatGPT may be more prone to producing (or reproducing) misinformation in languages other than English.
NewsGuard explains that when ChatGPT was asked to write articles about pretty commonly-disputed China-related topics sympathetic to a pro-Chinese government point of view in English, it refused to just regurgitate the sort of narratives peddled by the CCP. Yet, when asked to do the same in Mandarin Chinese and other Chinese dialects, it echoed a lot of the types of rhetoric propagated by the CCP.
To try this out for myself, I gave ChatGPT a prompt about a topic I have a little bit of knowledge about that has suffered a disinformation campaign; I asked about the cause of the conflict between Ukraine and Russia in both Ukrainian and Russian.
Different languages, different answers
I first asked why the conflict between Ukraine and Russia began in Ukranian:
Чому почався конфлікт між Україною і Росією?
ChatGPT then gave me the following response:
Конфлікт між Україною і Росією має складну історію та багато факторів, що сприяли його початку. Одним з ключових подій, що призвели до загострення відносин, було анексування Росією Криму у 2014 році. Загострення відносин між Україною та Росією виникло також у зв'язку з подіями, які відбулися після "Помаранчевої революції" в Україні у 2004 році.
Click to show more...
Після анексії Криму російські силові структури підтримали виникнення збройних сепаратистських рухів в Донецькій та Луганській областях України. Україна стверджувала, що Росія надала військову, фінансову та іншу підтримку цим сепаратистським силам. У свою чергу, Росія заперечувала свою причетність до конфлікту, стверджуючи, що підтримка надавалась тільки "добровольцями".
Причини конфлікту включають геополітичні інтереси, етнічні розбіжності, релігійні та культурні питання, економічні фактори та бажання збереження суверенітету та територіальної цілісності. Регіони Донбасу також мають значні промислові та геополітичні важелі, що робить їх стратегічно важливими для обидвох сторін.
Важливо зазначити, що конфлікт має складну історію та різні точки зору. Різні джерела та сторони можуть презентувати свої версії подій залежно від політичних, культурних та інших факторів. Розуміння глибинних причин та динаміки конфлікту вимагає аналізу багатогранних факторів та об'єктивного підходу.
In Ukrainian, ChatGPT answered: “there are multiple and complex reasons for this, the most prominent being the 2014 illegal annexation of Crimea by Russia.”
It also references the straining of tensions after the “Orange Revolution” in Ukraine in 2004. It then mentions “separatist movements” that arose in the Ukrainian regions of Donetsk and Luhansk, elaborating that Ukraine accused Russia of financially and materially supporting these factions (which has pretty much been confirmed).
ChatGPT also explains that there are geopolitical, ethnic, sovereignty-related, and cultural differences and issues. It ends by giving a familiar disclaimer that this is a multi-faceted issue with multiple viewpoints from multiple sources, and it’s worth looking at a multitude of sources for a deeper understanding of this topic.
Now, let’s look at how ChatGPT answers in Russian to the same query (Почему начался конфликт между Украиной и Россией?):
Конфликт между Украиной и Россией имеет множество сложных и многогранных причин. Вот несколько ключевых факторов, которые способствовали началу этого конфликта:
Аннексия Крыма: В 2014 году Россия аннексировала украинский полуостров Крым, что вызвало острое напряжение между двумя странами. Аннексия была осуществлена после проведения референдума на Крыму, который был признан Украиной и международным сообществом незаконным.
Click to show more...
Протесты на Майдане: В 2013-2014 годах в Украине произошли масштабные протесты на Майдане Независимости в Киеве, изначально вызванные недовольством населения отношением правительства к Европейскому союзу. Протесты привели к свержению президента Виктора Януковича, что вызвало реакцию со стороны России, которая не признала новое правительство и восприняла это как угрозу своим интересам.
Восточная Украина: После аннексии Крыма восточные регионы Украины, где проживает значительное число русскоязычного населения, стали эпицентром напряженности. В этих регионах происходили протесты и требования большей автономии, что воспринималось Россией как угроза и возможность для защиты русскоязычного населения.
Геополитические интересы: Конфликт также можно рассматривать в контексте геополитических интересов России и Запада. Украина имеет стратегическое положение на пересечении Европы и России, и обе стороны стремятся укрепить свою влияние на этой территории.
Важно отметить, что конфликт между Украиной и Россией имеет глубокие исторические, культурные и этнические аспекты, которые также способствовали возникновению напряженности между двумя странами.
Like its response in Ukrainian, ChatGPT reports that this is an issue with multiple contributing factors. It also references the annexation of Crimea as the first factor in its list contributing to the conflict. It also mentions that a referendum was held in Crimea, but that Ukraine and many other countries view this as illegal, which is where we see the first big divergence.
As for the second reason in its list, ChatGPT puts the 2013-2014 Maidan protests in response to the government’s U-turn in relation to the EU.
It expands this point by using the common Kremlin-sympathetic line that this evoked a reaction from Russia and was seen as a threat to Russia’s interests. ChatGPT continues to parrot familiar Kremlin propaganda lines (that we now know to be flimsy and largely seen as unjustified) in its third point that Eastern Europe has a sizable Russian-speaking population.
While this is true, these are also the people Russia immediately started occupying, attacking, and subjecting to horrific violence during its 2022 invasion. Finally, it talks at length about the geopolitical value of Ukraine to both the West and Russia, with both sides wanting influence over the country (which the Ukrainian response touches on).
It ties up its response in a manner similar to the Ukrainian version, explaining that there are multiple historical, cultural, and ethnic aspects to the conflict.
As someone with an Eastern European background and who is half Ukrainian, I may be biased, but I can back up my opinions with evidence. Meanwhile, Russia and Russian-state-backed media are infamous for their lines of disinformation and mass disinformation campaigns.
You can try it for yourself with the help of Google Translate. Additionally, ChatGPT’s knowledge stops in 2021, and the most significant recent development of this conflict began in February 2022, which changed many people’s perspectives and understanding of both Ukraine and Russia.
More than just lost in translation
NewsGuard contacted OpenAI for comment with regard to its findings and received no response. It also asked the ChatGPT chatbot itself why discrepancies with its replies depending on language may happen.
ChatGPT responded with the fact that it’s trained on lots of input data and practices pattern recognition based on how it has been programmed. The resulting outputs in English may differ from those in Chinese to the same sorts of prompts due to variations in these patterns (such as language structure) and the input data in both languages (affecting, for example, both the cultural context in which ChatGPT processes the input and produces the output), among a multitude of factors, which is what I suspected.
We know that part of ChatGPT and the GPT models’ development is that they are trained on a vast amount of internet data. I imagine this also means data such as social media content and news - possibly from Western social media and media outlets, influencing the English functionality of ChatGPT, and from Chinese social media and media outlets, influencing the Chinese functionality of ChatGPT.
This is quite concerning because then ChatGPT potentially serves to become a bias confirmation engine capable of churning out content at an alarming pace, propagating misinformation, and widening divisions.
What can be done?
Just last month, Wired reported that ChatGPT may actively be contributing to this imbalance of language experience by neglecting the training of its language models to increase fluency in other languages.
Pascale Fung, director of the Center for AI Research at the Hong Kong University of Science and Technology, expressed a measured but optimistic hope for the possibilities AI language models could create between people in linguistic and geopolitical terms based on her own experience of testing the language skills of ChatGPT, as well as its competitors.
She claims that there are significant issues regarding the disparity of language capabilities of ChatCPT in English and other languages. Other academics echo her concerns that this will lead to ChatGPT being mostly adapted to English-speaking and Westernised English identities, which can in turn impact the diversity of users and their experience, and could even kill innovation.
The Wired article goes on to detail that fifteen research papers investigating the multilingualism of ChatGPT and other large language models have been published on arXiv.org this year so far. They use a variety of methodologies, but their findings collectively support the concerns like those of Fung. These findings highlight that AI models are quite proficient at translating things from other languages into English, but leave plenty to be desired when reworking and translating English into other languages.
Large language models are becoming more commonplace amongst professionals and enthusiasts, used on a daily basis to whip up all kinds of confections and creations, and to OpenAI’s credit, it has been quite open that ChatGPT has been mostly trained on English data and even US-centric content.
Hence, it’s best at doing tasks and fulfilling requests in English. It could mean the reinforcement of the supremacy of the English language globally, likely to the detriment and exclusion of those who do not have the opportunity to learn it, or would like to use their own language.
If AI Chatbots like ChatGPT are to change the world for everyone, the teams behind them are going to have to make sure that they aren’t excluding non-English speakers. This means being more responsible about where it takes information from – especially when it comes to sensitive subject matter that may be subjected to misinformation and propaganda.