Multilingualism in the Era of AI with Viorica Marian and Morten Christiansen
This season on the podcast, we are exploring the intersections between emerging technology, global affairs and the United Nations Sustainable Development Goals. In this episode, Annelise Riles dives into a topic she is personally passionate about: the power of multilingual engagement. Multilingualism has been identified as key to achieving the 17 United Nations Sustainable Development Goals. Guests Morten Christiansen and Viorica Marian join Riles to discuss multilingualism research and shed light on how the use of large language models, such as ChatGPT, might impact the way we think, speak and interact with people around the world.
Background reading
- More on Marian’s book The Power of Language and Christiansen's book The Language Game with Nick Chater.
- Read Marian’s op-ed on AI published in The Washington Post
Subscribe
Subscribe to Breaking Boundaries wherever you listen to podcasts so you never miss an episode:
Read the transcript of this show
[00:00:00] Annelise Riles: Welcome to the Breaking Boundaries podcast. I'm Annelise Riles, Executive Director of Northwestern University's Roberta Buffett Institute for Global Affairs. The Northwestern Buffett Institute is dedicated to breaking through traditional silos of expertise, geography, culture, and language to surface novel solutions to pressing global challenges. This season on the podcast, we're exploring the intersections between emerging technologies and global affairs and how developments in technology can help us to achieve the United Nations Sustainable Development Goals. Today, we're diving into a topic I'm really personally passionate about, which is the power of multilingual engagement. Multilingualism has been identified as key to achieving the 17 United Nations Sustainable Development Goals. In this episode, we are going to be discussing how new research on multilingualism sheds light on how the use of large language models such as ChatGPT might impact the way we think, speak, and interact with people around the world. So, joining me to discuss this are two really renowned scholars in this field. We're just thrilled to have you both with us, Viorica Marian and Morten Christensen. Viorica is a faculty member here at Northwestern University where she's the Ralph and Jean Sundin Endowed Chair in the Department of Communication Sciences and Disorders. And she's also a professor in the Department of Psychology and Director of the Bilingualism and Psycholinguistics Research Lab. Morten is a cognitive scientist and the William Kennan Professor of Psychology at Cornell University. He's also a professor in Cognitive Science of Language at the School of Communication and Culture, as well as the Interacting Mind Center at Aarhus University in Denmark. Now both of my guests today have new and very widely read books on language just out. Viorica is the author of The Power of Language and Morten is the co author of The Language Game with Nick Charter. Welcome to both of you. Thank you so much for being here.
[00:02:10] Viorica Marian: Thank you for having us. It's a pleasure to be here.
[00:02:12] Morten Christiansen: Thanks for having us.
[00:02:13] Annelise Riles: Alright, so I want to start by asking you each what languages you speak. Viorica, what languages do you speak?
[00:02:20] Viorica Marian: I grew up with Romanian and Russian, and then I later learned English and a spattering of other languages, but not fluency.
[00:02:27] Annelise Riles: What about you, Morton?
[00:02:28] Morten Christiansen: I grew up speaking Danish and then in school I had English, German, Latin, French. And then of course, being Scandinavian, we also had Norwegian and Swedish.
[00:02:39] Annelise Riles: All right, that was at least seven in there. Seven or eight. Incredible. So, maybe that is the answer to our question here, but Viorica, what got you interested in this topic of language?
[00:02:50] Viorica Marian: That's partly the reason, I did grow up surrounded by different languages and being exposed to multiple languages. And from an early age, was noticing patterns that may differ across languages and wondering how those change the way we think about the world. Later on, I became more interested in philosophy and found the psychology of language a really, good place for the intersection of the two philosophy of mind and also a language and, studying the psychology of language.
[00:03:19] Annelise Riles: What about you, Morton? How did your interest in the subject develop?
[00:03:22] Morten Christiansen: Guess I've always been interested in language from fairly early on. So I took Latin instead of taking typewriting, which has caused me to write quite slowly on computers up to, still now, so that's not perhaps the greatest choice I could have made. But I think my real, sort of more academic interest came when I took a course in psycholinguistics when I was doing a master's degree at the University of Warwick in the UK. And I just fell in love with it. I thought, well, this is just the coolest subject ever. And I sort of never looked back.
[00:03:51] Annelise Riles: Morten, I want to start with a big question for you first because you've written in your book that our species defining characteristic is language. Can you explain a little bit what that means? Because I think many people think, oh, well, don't primates or some other animals have something like language? And what about machines? Do they have a language? So what do you mean by that? What is unique to human beings?
[00:04:14] Morten Christiansen: Well, first of all, when we think about language it might be useful to think about If you lost language. So for example, it might happen to us if we go to a foreign city and we don't understand the language, so we sort of realize just how important language is for a lot of what we do. But more generally from, evolutionary perspective the human language ability essentially allows us to transmit knowledge across generations without new generations having to redo everything that the previous generation had to sort of work out. We can tell, individuals how to do this, how to do that. And it also allows us to create all sorts of agreements, laws regulations norms, and so on that can regulate how we live together in various kinds of societies and cultures. And so without language, we wouldn't be able to build up this very complex system of rules and norms that govern the way we live together in modern societies. And it doesn't really have to be an industrial society. It can be any kind of society. Without language to allow us to negotiate, to sort of help each other problem solve we wouldn't be where we are today, for better or worse. Now, of course, we can also now use our language technologies to perhaps even destroy the world as we seem like we sometimes we want to do.
[00:05:24] Annelise Riles: And maybe this gets to what your definition of language is, but why is it that you wouldn't say that, say, what primates do, or what dolphins do, or what computers do, is language?
[00:05:36] Morten Christiansen: Well, I think that other animals certainly have very complex and sophisticated communication systems, but they don't really allow, for the kind of sort of cultural transmission of huge amount of knowledge across generations that we see in the human case. Now we do see some aspects of culture in, some primates and in some animals, but it's, at a completely different scale than what we see in human societies, so that's why. Another aspect of it is that so if we look across species, we see all sorts of amazing ways in which animals are communicating using, you know, chemistry, using visual signals or sense or all sorts of things. But when you look within a given species, you actually see very little variation in the communication system for that species. Whereas when we look to human language, we have more than 7, 000 different languages across the world and they differ in all sort of amazing ways. And we don't really see that in other communication system that as probably part of what gives language the power that we can use to sort of develop societies and negotiate and solve problems.
[00:06:36] Annelise Riles: Well, that's a perfect segue to your work, Viorica, because you are really a specialist in that linguistic diversity and in multilingualism. So, tell us a little bit about what the trend is here. Is the diversity of languages increasing, decreasing? What's happening there?
[00:06:52] Viorica Marian: So it is estimated a little over 7, 000 languages are being spoken in the world today. They were a lot more just a few years or decades ago. And with each year, we lose a few more languages. The rate varies, but it is estimated about nine become extinct each year. It is possible that the rate will speed up over time. There are multiple reasons why languages are dying and becoming extinct, but overall we are still continuing to be a linguistically diverse society. Humans speak multiple languages. Many people speak more than one language, so there is still a lot of linguistic diversity in the world.
[00:07:32] Annelise Riles: So you were just mentioning that languages become extinct.Are there ever any new languages being created?
[00:07:38] Viorica Marian: Yeah, so there are a number of reasons why languages are becoming extinct. Some of them are globalization, industrialization migration, some cultural replacement and dominance, colonialism. All of these historically reasons why some languages fade out. New languages do emerge, for example, new sign languages some appear spontaneously and are developed by societies, artificial languages, including languages like Esperanto, for example, computer languages or languages that appear in fictional universes created all the time, but natural languages as a whole, their number is on decline.
[00:08:17] Annelise Riles: So, let me ask you, Viorica, about the trends around this declining multilingualism, because it seems like a lot of the business world now is using English. I just heard a story on NPR about the fact that the number of American students studying Chinese is decreasing dramatically. And the argument was, well, most employers don't need a Chinese speaker anymore because they can just use a computer. Or... They can hire a native Chinese speaker because we have plenty of those. So this idea that we're going to farm out the task of multilingualism to the English learners, rather than the learners of Chinese and other language from the native English community. Why do you think this is happening? And is it a good or a bad thing for a society when their own language, like English, becomes dominant such that they don't have to be the ones to learn a second language?
[00:09:06] Viorica Marian: Right, so cultural dominance and economic dominance is a big factor there. English is indeed one of the main dominant languages in the world, as is Mandarin, for that matter. This is also part of the reason why many of the small languages are disappearing, and the major languages are becoming even more dominant. But at the same time, we have two things going on here, the linguistic diversity in the world, and also the linguistic diversity within individuals. Where there are in fact a lot of people who speak two or more languages, they are the majority of the world population and they are also on the rise in the United States. So yes, English is a dominant language. Yes, many people choose English as the primary language of communication, but the number of people who speak more than one language in the United States is also increasing.
[00:09:54] Annelise Riles: And one of the things that I loved learning about in your book, Viorica is about all the advantages to a person, to an individual, cognitively, when they know more than one language. Could you share just a few of those advantages.
[00:10:08] Viorica Marian: Yes. So language is so powerful in how it shapes our minds and how it shapes our brain and how it shapes societies. So its influence can be find from the neural level to the entire social society level. For example, one of the things I talk about in the book is speaking multiple languages is related to cognitive health, especially during aging and increased incidence of cognitive decline and dementia as individuals get older. At the individual level, there is some evidence that people who develop dementia if they speak two or more languages, are diagnosed or show the clinical symptoms four- to- six years later than monolinguals. Then at national epidemiological levels, nations that speak two or more languages with each new language there is a decreased incidence in dementia. So the more languages spoken in the country, the lower the incidence of dementia. Now, of course correlation does not imply causation, but these are some interesting preliminary indices that speaking multiple languages gives our brain the kind of workout. By juggling multiple languages, by controlling competition from multiple languages, by focusing on a language that's being used and inhibiting the language that's not being used. That, promotes agility, cognitive agility, and in a way perhaps is similar to the way our body gets a workout in a gym, our brain gets a workout from speaking multiple languages.
[00:11:33] Annelise Riles: Morten let me ask you then about our main topic for today, which is the impact of new AI technologies on language. So, in the conclusion to your book, you address this issue directly about what parts of translation could be taken over by computers and what part will never be taken over by computers.
[00:11:55] Morten Christiansen: Well, not only when it comes to translation, but when it comes to sort of language use in general, these new AI systems are, incredibly proficient in many ways. And so they only do one thing that they do somewhat well, but what they're not so good at is sort of the kind of back and forth that we normally do when we engage in conversation with one another. So these systems are trained by going through hundreds of billions of written text. So they're trained on text rather than on interaction. So, whereas kids, learn language they grow up learning from interacting with other people, other kids, other adults but that's not how these systems learn. Nonetheless, they do pick up a surprising amount of language. And certainly like two years ago, I would not have predicted that we would have these systems today that can use language as well as they can do today. But they're not using it for communication and such. They're using it because you ask them a question or you ask them to interact about this or that. They're not really engaging in the back and forth that we normally do. And they don't seem to be using some of the kind of mechanisms that we normally do use when we are interacting with one another. So for example, typically when you're in conversation with somebody, you might nod your head to indicate that you're sort of on the same page, or we might say, Mm hmm, huh, and so on. We may also, if we sort of kind of beginning to think that, "Oh, I'm not really understanding what it is you're saying," we might ask for what's called a repair. So I might ask you to say, "Oh, what did you say? Did you mean this or that?" But you don't really see this and so they're not really catching up on this kind of give and take, that's part of a conversation. That is really an Achilles heel of these systems that they can't really engage in that. But nonetheless, they can do a lot of interesting things, but they are limited.
[00:13:41] Annelise Riles: So do you think, Morton, that this improvement that you've seen is a curve that will eventually sort of flatten out? That there will come a point of diminishing returns where these technologies will not be able to get closer and closer to human language use, or do you think that over time, we're going to see them break through those ceilings that you just described as well?
[00:14:01] Morten Christiansen: To be honest, I don't know.
[00:14:03] Annelise Riles: Fair enough.
[00:14:04] Morten Christiansen: And I think again, they, have, in just a few years, they have just taking off in a variety of ways. Now, they do that primarily within a particular language, that is English. Now, I have been surprised in how well something like ChatGPT actually works for Danish, which it's, hasn't really received that much training on, but nonetheless, it actually does surprisingly well. And I think one of the things that sort of underestimated with these systems, and that is that they might sometimes say things that are either factually wrong, or they might say things that semantically or sort of in terms of their meaning doesn't quite make sense. But most of what they say is almost always grammatically correct. It's rare to see them produce text that is not what we would think of as grammatically correct. So in that regard, they can be quite useful and I think it actually shows something quite deep about human language learning. So for a very long time, there's been a huge debate about human language learning, regarding whether we need something built into our brains to be able to learn language. So people like Noam Chomsky and Steven Pinker has long argued that. But what these models show is that you can generate grammatical language purely from just going through billions and billions of words. And that's sort of an existence proof that you don't necessarily need these building, sort of super rules or whatever you want to think of it sort of grammatical knowledge that we've sometimes heard about.
[00:15:22] Annelise Riles: So that's a hopeful note for us. And Viorica, I want to ask you if you could share some of the notes of caution that you've raised, because in a recent op ed in the Washington Post, you suggested that perhaps there might be some ways in which these new technologies could negatively impact linguistic diversity, right?
[00:15:42] Viorica Marian: Yes, and I think Morten is absolutely right that we don't know, nobody really knows right now what will happen. We can make our best informed guesses based on what we know, but, really it's difficult to predict what will happen. In terms of linguistic diversity, if we take a long term approach and look at this, over time large language models are likely to favor the larger dominant languages even more so, and to disfavor the smaller languages. And that's because, like Morten alluded the data sets that they're being trained on vary across languages. Highly spoken languages or languages that have a lot of text online are trained to a higher degree of precision and just become better and stronger. And then languages that have fewer online resources are not as trained which results in basically roughly 20 languages more or less being the most dominant languages and training the large language models to be just better than the thousands of other available. Over time, this will further privilege some languages and sort of draw out other languages. And this is just one of the possible outcomes when it comes to linguistic diversity, but of course there are a lot of other influences that large language models are likely to have on society and how people interact with machines. I do have a question, for Morten. Do you think that the way machines learn language is fundamentally different from the way the human brain learns language? What do you think are the differences and the similarities?
[00:17:16] Morten Christiansen: Well, I think the similarities are in that the way, an AI system like ChatGPT learns language is by picking up on various kind of statistical patterns in a service of trying to figure out what comes next. And I think we do that to a certain degree. And I think one of the things I'm taking away from the system is just how far you can get if you do that. Now, that doesn't mean that that's all there is to language. And we sort of touched upon that earlier as well. And I do think there's many other things going into language. That's in part, what we are sort of talking about in the book, The Language Game as well, namely that there's this crucial interaction between people and that sort of more fundamental than many other aspects. And that's really not captured by these systems at all and so I think that's certainly missing. Of course, these systems are not embodied either. So humans have bodies and that affects how we use language and how we conceptualize things and these systems don't have bodies. Now they're beginning to try to equip them with the ability to deal with visual information as well and maybe there'll be more down the line. But there's a lot of multimodal aspects of language that these systems are not catching. And they're not likely to be catching anytime soon, I think. So there are some important differences, but I think we can also learn from that sort of in terms of the science of language. And I think that's the approach I've been taking to these systems. What do you think?
[00:18:36] Viorica Marian: I think this is, such a big question that are keeping me awake at night. What is language? What is mind? What is consciousness? Where are the boundaries? It's becoming increasingly clear to me that machine learning can accomplish a lot. More than maybe we thought when we were thinking that language is extremely special. Now it may still be extremely special and uniquely human, but the machines are surprising us now.
[00:18:59] Annelise Riles: We're releasing this episode on International Translation Day, and I would love to hear each of your thoughts about, what you're most hopeful about and what gives you the greatest amount of worry as you think about the ways in which these technologies are going to shape our multilingual future?
[00:19:17] Viorica Marian: I don't know whether these AI systems will take over for all aspects of translation. They can certainly take over in sort of more trivial day to day kind of translation. So we're already using, sometimes you can use your phone and you can take a picture of a sign and it'll tell you what it's saying, and that's quite useful. But there are other sort of cases where say you have a negotiation for a company to establish some sort of trade deal or something like that, or between politicians on the world stage. And for there, getting precise translation is incredibly important. So even small errors can be incredibly dangerous. So there, I think, perhaps AI systems should not be introduced, but there's many other aspects where many times people sort of running into problems if they're on vacation somewhere where they don't know the language, well, they can use that and that's going to be useful. And so I'm hopeful that these systems can be used in that regard. But I also think that these have limitations in many ways. So in our book, The Language Game, we compare systems like ChatGPT with a horse. So essentially, ChatGPT to humans or human language users is a little bit like a horse is to a car. And so horses for a long time was sort of the primary mode of transport and doing all sorts of things. And then the car was invented and that took over, but it doesn't mean that we don't have horses anymore. And cars do really just one thing, and they do it really well, namely transporting people and goods, but they're not able to metabolize, they don't eat grass, they don't reproduce or care for their young or anything like that,But... a similar thing with ChatGPT and so on is that, they don't learn language in the same way as people do, as we talked about earlier. So from interacting with other people and so on, and they just do it from vacuuming up billions of words of written text and they're doing certain things very well, just like a car does. But there's many other things that when it comes to interaction, to sort of the kind of improvised way we interact with one another, that it's not really going to capture very well. There's another thing that this is my real worry, I suppose, and that is that it's not that I don't think that ChatGPT or, any kind of systems coming after that will sort of take over the world as such. What I'm more concerned about is the way some people might use these systems in nefarious ways to get at, you know, profits or for suppression of other people and so on. And that's where I think the real danger is. Not in some sort of Terminator future where we are suppressed by the machines and so on, but really in the way that other people might use these systems to keep other people under the thumb as it were. So it is true that ChatGPT can now accomplish many translations quite well and I think people can use them easily in their day to day life. ChatGPT is not yet at the point where we want to rely on it for matters of life and death, like medical settings or political negotiations, but it's not impossible that it will eventually become able to do those tasks with the sensitivity that a human can do. It may never get there because language can be so nuanced but it may also get there. Actually brings us to the broader question of trying to understand, to what extent artificial languages that machines use and human languages that humans use are similar and to what extent they're learned in a similar way? The feats that large language models can already accomplish now have surpassed many expectations that people have had in the past. And so this is one of the reasons for concern. As AI becomes stronger and as it continues to evolve, it will become increasingly difficult to regulate. So as a first step, I think it's important being able to know when we interact with AI versus when we interact with human intelligence in providing some guardrails for the immediate future as experts assemble and figure out how human intelligence coexists with artificial intelligence.
[00:23:11] Annelise Riles: I just want to thank you for the insight you've given us today. Thank you so much.
[00:23:15] Viorica Marian: Thank you for having us. Thank you, Annelise. Thank you, Morton. It's been a pleasure to be here with both of you today.
[00:23:20] Morten Christiansen: And thank you, it's been wonderful being on this show. I really appreciate it.
[00:23:23] Annelise Riles: For more information on this episode and on the Northwestern Buffett Institute for Global Affairs, visit us at buffett.northwestern.edu.