[해외 DS] 마침내 인공지능으로 동물과 대화할 수 있다

데이터와 기술이 준비된 지금은 동물 언어 번역 전성시대 비영리 단체 주축으로 활발한 오픈소스 개발 한계점도 분명히 있지만 중요한 건 공감을 위한 끝없는 노력

[해외DS]는 해외 유수의 데이터 사이언스 전문지들에서 전하는 업계 전문가들의 의견을 담았습니다. 저희 데이터 사이언스 경영 연구소 (GIAI R&D Korea)에서 영어 원문 공개 조건으로 콘텐츠 제휴가 진행 중입니다.


whale
고래류 번역 이니셔티브(Cartacean Translation Initiative, CETI) 프로젝트는 향유고래의 음성을 이해하기 위해 머신러닝을 활용했다/사진=Scientific American

센서의 가격이 저렴해지고 바이오 로깅, 드론, 수중음향 센서와 같은 기술이 발전하면서 동물 데이터양이 폭발적으로 증가했다. 대규모 언어모델도 함께 발전하는 지금은 동물 언어 번역의 황금기다.

2017년 두 연구 그룹이 로제타 스톤 없이도 인간 언어 간 번역을 할 방법을 발견했다. 단어 간의 의미 관계를 기하학적 관계로 바꾼 것이 발상의 전환이었다. 이제 기계학습 모델은 ‘어머니’와 ‘딸’과 같은 단어가 가까이 나타나는 빈도를 사용하여 다음에 나올 내용을 정확하게 예측하는 등, 언어 간 숨겨져 있는 기본 구조를 드러내어 해독할 수 없었던 언어를 번역할 수 있게 되었다.

더 나아가 2020년 자연어 처리가 여러 형태의 정보를 언어로 취급할 수 있게 되면서 또 다른 이정표를 맞이했다. DALLE-2와 같은 멀티모달 분석 엔진은 언어 설명을 기반으로 사실적인 이미지를 생성할 수 있는데 바로 이런 점이 동물의 언어를 번역하는 데 유용하다. 사람과 마찬가지로 동물도 몸짓을 통해 많은 메시지를 전달하기 때문이다. 특히 반려견과의 대화에서 그들이 짖는 소리에 신경을 너무 쓴 나머지 멀티모달 시그널을 놓치는 경우가 많다. 반려견의 표정과 몸짓 그리고 짖는 소리를 모두 고려한 AI 번역기는 인간과 반려견 사이의 새로운 종류의 대화를 가능케할 수 있다.

AI 기술 적용과 가능성

반려견뿐만 아니라 가축도 이러한 심층적인 이해의 혜택을 받을 수 있다. 코펜하겐 대학의 동물 행동학 부교수인 엘로디 브리퍼(Elodie F. Briefer)는 동물의 음성을 기반으로 감정 상태를 평가할 수 있다는 것을 보여줬다. 그녀는 최근 수천 마리의 돼지 소리를 학습한 알고리즘을 개발하여 동물이 긍정적인 감정을 느끼는지 부정적인 감정을 느끼는지 예측해 냈다. 브리퍼는 동물이 감정을 어떻게 느끼는지 더 잘 파악하면 동물 복지를 개선하기 위한 노력에 박차를 가할 수 있다고 강조했다. 한편 코넬 조류학 연구소의 무료 앱인 Merlin을 사용하면 새 종을 식별하는 AI 기반 분석 도구를 사용할 수 있다. Merlin은 소리로 새를 식별하기 위해 녹음 파일을 받아 새의 울음소리 볼륨, 음높이, 길이를 시각화한 스펙트로그램으로 변환한다. 이 모델은 코넬의 오디오 라이브러리를 학습한 후 사용자의 녹음과 비교하여 종 식별을 예측하고 코넬의 데이터베이스인 eBird와 비교하여 사용자의 위치에서 발견될 것으로 예상되는 종인지 검증하는 단계도 거친다. Merlin은 1,000종 이상의 조류의 울음소리를 놀라운 정확도로 식별할 수 있는 것으로 확인됐다.

AI 기술을 본격적으로 연구에 활용하는 단체들도 생겨났다. 비영리 단체인 지구 종 프로젝트(Earth Species Project, ESP)는 인공 지능 과학자, 생물학자, 보존 전문가로 구성되어 다양한 종으로부터 광범위한 데이터를 수집하고 이를 분석하기 위한 머신러닝 모델을 구축하고 있다. ESP는 관련 연구자들에게 큰 도움을 준 두 가지 모델을 만들어서 오픈소스로 공개했다. 2021년에는 동물 소리를 개별 트랙으로 분리하고 자동차 경적과 같은 배경 소음을 필터링할 수 있는 신경망을 개발했으며 최근엔 데이터세트의 패턴을 자동으로 감지하고 분류할 수 있는 이른바 기반 모델(foundational model)을 만들었다. 고래류 번역 이니셔티브(Cartecean Translation Initiavie, CETI) 프로젝트는 향유고래와 같은 특정 종을 이해하는 데 집중하고 있다. 그들은 도미니카 인근 바다에 음성 수집기를 부표에 달아서 고래의 코다 신호를 수집하고 신호의 기본 언어 구조를 파악하고자 노력하고 있다.

동물 언어 번역의 한계와 극복하는 노력의 의의

AI 번역 도구가 긍정적인 영향을 미치는 것만은 아니다. 물고기 떼를 유인해서 훨씬 더 짧은 시간에 많은 양의 물고기를 잡을 수 있고 멸종위기 종을 노리는 밀렵꾼들이 사용했을 때는 생태계에 끼치는 악영향이 심각하다. 그리고 언어의 의미를 정확하게 해석하기도 전에 AI 번역 도구로 그들의 언어를 흉내 낼 수 있으므로 부정확한 신호로 혼란을 일으킬 수 있다. 따라서 영리 단체들도 해당 연구에 적극적으로 뛰어들기 전에 번역 모델 사용 지침을 하루빨리 수립해야 하는 필요성이 대두됐다.

한편 동물의 발화가 전달하고자 하는 바를 인간의 언어와 비교할 수 있는지는 여전히 의견이 분분하다. 회의론자들은 동물의 의사소통을 언어로 취급하거나 번역을 시도하면 그 의미가 왜곡될 수 있다고 우려했다. ESP의 설립자 중 한 명인 아자 라스킨(Aza Raskin)은 이러한 우려를 일축했다. 그는 동물이 “바나나 좀 줘”라고 말하는 것은 의심스럽지만, 일반적인 경험에서 의사소통의 근거를 발견할 수 있다고 생각했다. 그는 여러 종에서 ‘슬픔’, ‘엄마’, ‘배고프다’라는 표현을 발견해도 놀랍지 않을 것이라고 주장했다. 화석 기록에 따르면 고래와 같은 생물은 수천만 년 동안 발성을 해왔는데 무언가가 오랫동안 살아남으려면 매우 깊고 진실한 의미를 담고 있어야 한다는 것이 그의 논리다.

궁극적으로 진정한 번역을 위해서는 새로운 도구뿐만 아니라 우리 자신의 편견과 기대를 뛰어넘을 수 있는 능력이 필요하다. 인간 중심적 사고에서 벗어나 동물의 처지에서 삶을 바라보는 태도를 가질 때 그들의 목소리를 조금 더 이해하게 될 수 있을 것이다.


Artificial Intelligence Could Finally Let Us Talk with Animals

AI is poised to revolutionize our understanding of animal communication

Underneath the thick forest canopy on a remote island in the South Pacific, a New Caledonian Crow peers from its perch, dark eyes glittering. The bird carefully removes a branch, strips off unwanted leaves with its bill and fashions a hook from the wood. The crow is a perfectionist: if it makes an error, it will scrap the whole thing and start over. When it’s satisfied, the bird pokes the finished utensil into a crevice in the tree and fishes out a wriggling grub.

The New Caledonian Crow is one of the only birds known to manufacture tools, a skill once thought to be unique to humans. Christian Rutz, a behavioral ecologist at the University of St Andrews in Scotland, has spent much of his career studying the crow’s capabilities. The remarkable ingenuity Rutz observed changed his understanding of what birds can do. He started wondering if there might be other overlooked animal capacities. The crows live in complex social groups and may pass toolmaking techniques on to their offspring. Experiments have also shown that different crow groups around the island have distinct vocalizations. Rutz wanted to know whether these dialects could help explain cultural differences in toolmaking among the groups.

New technology powered by artificial intelligence is poised to provide exactly these kinds of insights. Whether animals communicate with one another in terms we might be able to understand is a question of enduring fascination. Although people in many Indigenous cultures have long believed that animals can intentionally communicate, Western scientists traditionally have shied away from research that blurs the lines between humans and other animals for fear of being accused of anthropomorphism. But with recent breakthroughs in AI, “people realize that we are on the brink of fairly major advances in regard to understanding animals’ communicative behavior,” Rutz says.

Beyond creating chatbots that woo people and producing art that wins fine-arts competitions, machine learning may soon make it possible to decipher things like crow calls, says Aza Raskin, one of the founders of the nonprofit Earth Species Project. Its team of artificial-intelligence scientists, biologists and conservation experts is collecting a wide range of data from a variety of species and building machine-learning models to analyze them. Other groups such as the Project Cetacean Translation Initiative (CETI) are focusing on trying to understand a particular species, in this case the sperm whale.

Decoding animal vocalizations could aid conservation and welfare efforts. It could also have a startling impact on us. Raskin compares the coming revolution to the invention of the telescope. “We looked out at the universe and discovered that Earth was not the center,” he says. The power of AI to reshape our understanding of animals, he thinks, will have a similar effect. “These tools are going to change the way that we see ourselves in relation to everything.”

When Shane Gero got off his research vessel in Dominica after a recent day of fieldwork, he was excited. The sperm whales that he studies have complex social groups, and on this day one familiar young male had returned to his family, providing Gero and his colleagues with an opportunity to record the group’s vocalizations as they reunited.

For nearly 20 years Gero, a scientist in residence at Carleton University in Ottawa, kept detailed records of two clans of sperm whales in the turquoise waters of the Caribbean, capturing their clicking vocalizations and what the animals were doing when they made them. He found that the whales seemed to use specific patterns of sound, called codas, to identify one another. They learn these codas much the way toddlers learn words and names, by repeating sounds the adults around them make.

Having decoded a few of these codas manually, Gero and his colleagues began to wonder whether they could use AI to speed up the translation. As a proof of concept, the team fed some of Gero’s recordings to a neural network, an algorithm that learns skills by analyzing data. It was able to correctly identify a small subset of individual whales from the codas 99 percent of the time. Next the team set an ambitious new goal: listen to large swathes of the ocean in the hopes of training a computer to learn to speak whale. Project CETI, for which Gero serves as lead biologist, plans to deploy an underwater microphone attached to a buoy to record the vocalizations of Dominica’s resident whales around the clock.

As sensors have gotten cheaper and technologies such as hydrophones, biologgers and drones have improved, the amount of animal data has exploded. There’s suddenly far too much for biologists to sift through efficiently by hand. AI thrives on vast quantities of information, though. Large language models such as ChatGPT must ingest massive amounts of text to learn how to respond to prompts: ChatGPT-3 was trained on around 45 terabytes of text data, a good chunk of the entire Library of Congress. Early models required humans to classify much of those data with labels. In other words, people had to teach the machines what was important. But the next generation of models learned how to “self-supervise,” automatically learning what’s essential and independently creating an algorithm of how to predict what words come next in a sequence.

In 2017 two research groups discovered a way to translate between human languages without the need for a Rosetta stone. The discovery hinged on turning the semantic relations between words into geometric ones. Machine-learning models are now able to translate between unknown human languages by aligning their shapes—using the frequency with which words such as “mother” and “daughter” appear near each other, for example, to accurately predict what comes next. “There’s this hidden underlying structure that seems to unite us all,” Raskin says. “The door has been opened to using machine learning to decode languages that we don’t already know how to decode.”

The field hit another milestone in 2020, when natural-language processing began to be able to “treat everything as a language,” Raskin explains. Take, for example, DALL-E 2, one of the AI systems that can generate realistic images based on verbal descriptions. It maps the shapes that represent text to the shapes that represent images with remarkable accuracy—exactly the kind of “multimodal” analysis the translation of animal communication will probably require.

Many animals use different modes of communication simultaneously, just as humans use body language and gestures while talking. Any actions made immediately before, during, or after uttering sounds could provide important context for understanding what an animal is trying to convey. Traditionally, researchers have cataloged these behaviors in a list known as an ethogram. With the right training, machine-learning models could help parse these behaviors and perhaps discover novel patterns in the data. Scientists writing in the journal Nature Communications last year, for example, reported that a model found previously unrecognized differences in Zebra Finch songs that females pay attention to when choosing mates. Females prefer partners that sing like the birds the females grew up with.

You can already use one kind of AI-powered analysis with Merlin, a free app from the Cornell Lab of Ornithology that identifies bird species. To identify a bird by sound, Merlin takes a user’s recording and converts it into a spectrogram—a visualization of the volume, pitch and length of the bird’s call. The model is trained on Cornell’s audio library, against which it compares the user’s recording to predict the species identification. It then compares this guess to eBird, Cornell’s global database of observations, to make sure it’s a species that one would expect to find in the user’s location. Merlin can identify calls from more than 1,000 bird species with remarkable accuracy.

But the world is loud, and singling out the tune of one bird or whale from the cacophony is difficult. The challenge of isolating and recognizing individual speakers, known as the cocktail party problem, has long plagued efforts to process animal vocalizations. In 2021 the Earth Species Project built a neural network that can separate overlapping animal sounds into individual tracks and filter background noise, such as car honks—and it released the open-source code for free. It works by creating a visual representation of the sound, which the neural network uses to determine which pixel is produced by which speaker. In addition, the Earth Species Project recently developed a so-called foundational model that can automatically detect and classify patterns in datasets.

Not only are these tools transforming research, but they also have practical value. If scientists can translate animal sounds, they may be able to help imperiled species. The Hawaiian Crow, known locally as the ‘Alalā, went extinct in the wild in the early 2000s. The last birds were brought into captivity to start a conservation breeding program. Expanding on his work with the New Caledonian Crow, Rutz is now collaborating with the Earth Species Project to study the Hawaiian Crow’s vocabulary. “This species has been removed from its natural environment for a very long time,” he says. He is developing an inventory of all the calls the captive birds currently use. He’ll compare that to historical recordings of the last wild Hawaiian Crows to determine whether their repertoire has changed in captivity. He wants to know whether they may have lost important calls, such as those pertaining to predators or courtship, which could help explain why reintroducing the crow to the wild has proved so difficult.

Machine-learning models could someday help us figure out our pets, too. For a long time animal behaviorists didn’t pay much attention to domestic pets, says Con Slobodchikoff, author of Chasing Doctor Dolittle: Learning the Language of Animals. When he began his career studying prairie dogs, he quickly gained an appreciation for their sophisticated calls, which can describe the size and shape of predators. That experience helped to inform his later work as a behavioral consultant for misbehaving dogs. He found that many of his clients completely misunderstood what their dog was trying to convey. When our pets try to communicate with us, they often use multimodal signals, such as a bark combined with a body posture. Yet “we are so fixated on sound being the only valid element of communication, that we miss many of the other cues,” he says.

Now Slobodchikoff is developing an AI model aimed at translating a dog’s facial expressions and barks for its owner. He has no doubt that as researchers expand their studies to domestic animals, machine-learning advances will reveal surprising capabilities in pets. “Animals have thoughts, hopes, maybe dreams of their own,” he says.

Farmed animals could also benefit from such depth of understanding. Elodie F. Briefer, an associate professor in animal behavior at the University of Copenhagen, has shown that it’s possible to assess animals’ emotional states based on their vocalizations. She recently created an algorithm trained on thousands of pig sounds that uses machine learning to predict whether the animals were experiencing a positive or negative emotion. Briefer says a better grasp of how animals experience feelings could spur efforts to improve their welfare.

But as good as language models are at finding patterns, they aren’t actually deciphering meaning—and they definitely aren’t always right. Even AI experts often don’t understand how algorithms arrive at their conclusions, making them harder to validate. Benjamin Hoffman, who helped to develop the Merlin app before joining the Earth Species Project, says that one of the biggest challenges scientists now face is figuring out how to learn from what these models discover.

“The choices made on the machine-learning side affect what kinds of scientific questions we can ask,” Hoffman says. Merlin Sound ID, he explains, can help detect which birds are present, which is useful for ecological research. It can’t, however, help answer questions about behavior, such as what types of calls an individual bird makes when it interacts with a potential mate. In trying to interpret different kinds of animal communication, Hoffman says researchers must also “understand what the computer is doing when it’s learning how to do that.”

Daniela Rus, director of the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory, leans back in an armchair in her office, surrounded by books and stacks of papers. She is eager to explore the new possibilities for studying animal communication that machine learning has opened up. Rus previously designed remote-controlled robots to collect data for whale-behavior research in collaboration with biologist Roger Payne, whose recordings of humpback whale songs in the 1970s helped to popularize the Save the Whales movement. Now Rus is bringing her programming experience to Project CETI. Sensors for underwater monitoring have rapidly advanced, providing the equipment necessary to capture animal sounds and behavior. And AI models capable of analyzing those data have improved dramatically. But until recently, the two disciplines hadn’t been joined.

At Project CETI, Rus’s first task was to isolate sperm whale clicks from the background noise of the ocean realm. Sperm whales’ vocalizations were long compared to binary code in the way that they represent information. But they are more sophisticated than that. After she developed accurate acoustic measurements, Rus used machine learning to analyze how these clicks combine into codas, looking for patterns and sequences. “Once you have this basic ability,” she says, “then we can start studying what are some of the foundational components of the language.” The team will tackle that question directly, Rus says, “analyzing whether the [sperm whale] lexicon has the properties of language or not.”

But grasping the structure of a language is not a prerequisite to speaking it—not anymore, anyway. It’s now possible for AI to take three seconds of human speech and then hold forth at length with its same patterns and intonations in an exact mimicry. In the next year or two, Raskin predicts, “we’ll be able to build this for animal communication.” The Earth Species Project is already developing AI models that emulate a variety of species, with the aim of having “conversations” with animals. He says two-way communication will make it that much easier for researchers to infer the meaning of animal vocalizations.

In collaboration with outside biologists, the Earth Species Project plans to test playback experiments, playing an artificially generated call to Zebra Finches in a laboratory setting and then observing how the birds respond. Soon “we’ll be able to pass the finch, crow or whale Turing test,” Raskin asserts, referring to the point at which the animals won’t be able to tell they are conversing with a machine rather than one of their own. “The plot twist is that we will be able to communicate before we understand.”

The prospect of this achievement raises ethical concerns. Karen Bakker, a digital innovations researcher and author of The Sounds of Life: How Digital Technology Is Bringing Us Closer to the Worlds of Animals and Plants, explains that there may be unintended ramifications. Commercial industries could use AI for precision fishing by listening for schools of target species or their predators; poachers could deploy these techniques to locate endangered animals and impersonate their calls to lure them closer. For animals such as humpback whales, whose mysterious songs can spread across oceans with remarkable speed, the creation of a synthetic song could, Bakker says, “inject a viral meme into the world’s population” with unknown social consequences.

So far the organizations at the leading edge of this animal-communication work are nonprofits like the Earth Species Project that are committed to open-source sharing of data and models and staffed by enthusiastic scientists driven by their passion for the animals they study. But the field might not stay that way—profit-driven players could misuse this technology. In a recent article in Science, Rutz and his co-authors noted that “best-practice guidelines and appropriate legislative frameworks” are urgently needed. “It’s not enough to make the technology,” Raskin warns. “Every time you invent a technology, you also invent a responsibility.”

Designing a “whale chatbot,” as Project CETI aspires to do, isn’t as simple as figuring out how to replicate sperm whales’ clicks and whistles; it also demands that we imagine an animal’s experience. Despite major physical differences, humans actually share many basic forms of communication with other animals. Consider the interactions between parents and offspring. The cries of mammalian infants, for example, can be incredibly similar, to the point that white-tailed deer will respond to whimpers whether they’re made by marmots, humans or seals. Vocal expression in different species can develop similarly, too. Like human babies, harbor seal pups learn to change their pitch to target a parent’s eardrums. And both baby songbirds and human toddlers engage in babbling—a “complex sequence of syllables learned from a tutor,” explains Johnathan Fritz, a research scientist at the University of Maryland’s Brain and Behavior Initiative.

Whether animal utterances are comparable to human language in terms of what they convey remains a matter of profound disagreement, however. “Some would assert that language is essentially defined in terms that make humans the only animal capable of language,” Bakker says, with rules for grammar and syntax. Skeptics worry that treating animal communication as language, or attempting to translate it, may distort its meaning.

Raskin shrugs off these concerns. He doubts animals are saying “pass me the banana,” but he suspects we will discover some basis for communication in common experiences. “It wouldn’t surprise me if we discovered [expressions for] ‘grief’ or ‘mother’ or ‘hungry’ across species,” he says. After all, the fossil record shows that creatures such as whales have been vocalizing for tens of millions of years. “For something to survive a long time, it has to encode something very deep and very true.”

Ultimately real translation may require not just new tools but the ability to see past our own biases and expectations. Last year, as the crusts of snow retreated behind my house, a pair of Sandhill Cranes began to stalk the brambles. A courtship progressed, the male solicitous and preening. Soon every morning one bird flapped off alone to forage while the other stayed behind to tend their eggs. We fell into a routine, the birds and I: as the sun crested the hill, I kept one eye toward the windows, counting the days as I imagined cells dividing, new wings forming in the warm, amniotic dark.

Then one morning it ended. Somewhere behind the house the birds began to wail, twining their voices into a piercing cry until suddenly I saw them both running down the hill into the stutter start of flight. They circled once and then disappeared. I waited for days, but I never saw them again.

Wondering if they were mourning a failed nest or whether I was reading too much into their behavior, I reached out to George Happ and Christy Yuncker, retired scientists who for two decades shared their pond in Alaska with a pair of wild Sandhill Cranes they nicknamed Millie and Roy. They assured me that they, too, had seen the birds react to death. After one of Millie and Roy’s colts died, Roy began picking up blades of grass and dropping them near his offspring’s body. That evening, as the sun slipped toward the horizon, the family began to dance. The surviving colt joined its parents as they wheeled and jumped, throwing their long necks back to the sky.

Happ knows critics might disapprove of their explaining the birds’ behaviors as grief, considering that “we cannot precisely specify the underlying physiological correlates.” But based on the researchers’ close observations of the crane couple over a decade, he writes, interpreting these striking reactions as devoid of emotion “flies in the face of the evidence.”

Everyone can eventually relate to the pain of losing a loved one. It’s a moment ripe for translation.

Perhaps the true value of any language is that it helps us relate to others and in so doing frees us from the confines of our own minds. Every spring, as the light swept back over Yuncker and Happ’s home, they waited for Millie and Roy to return. In 2017 they waited in vain. Other cranes vied for the territory. The two scientists missed watching the colts hatch and grow. But last summer a new crane pair built a nest. Before long, their colts peeped through the tall grass, begging for food and learning to dance. Life began a new cycle. “We’re always looking at nature,” Yuncker says, “when really, we’re part of it.”